Archive for November, 2010

Find out how we build in document security for open source search

My colleague Tom Mortimer will be talking at the London Intranet Show & Tell on 2nd December, about how to implement document-level security for search: his presentation is titled “Implementing ACLs in an open source search solution”.

There are still a few tickets left for this small event, which will be of value to those working on intranet search.

Tags: , , ,

Posted in events

November 12th, 2010

No Comments »

Legal search is broken – can it be fixed with open source taxonomies?

I spent yesterday afternoon at the International Society for Knowledge Organisation’s Legal KnowHow event, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and academics, and the talks came from those who are planning legal knowledge systems or implementing them. I also particularly enjoyed hearing from Adam Wyner from Liverpool University who is modelling legal arguments in software, using open source text analysis. You can see some of the key points I picked up on our Twitter feed.

What became clear to me during the afternoon is that search technology is not currently serving the needs of lawyers or law firms. The users want a simple Google-like interface (or think they do), the software is having trouble presenting results in context and the source data is large, complex and unwieldy. The software used for search is from some of the biggest commercial search vendors (legal firms seem to ‘follow the pack’ in terms of what vendor they select – unfortunately few of the large law firms seem to have even considered the credible open source alternatives such as Lucene/Solr or Xapian).

In many cases taxonomies were presented as the solution – make sure every document fits tidily into a heirarchy and all the search problems go away, as lawyers can simply navigate to what they need. All very simple in theory – however each big law firm and each big legal information publisher has their own idea of what this taxonomy should be.

After the final presentation I argued that this seemed to be a classic case where an open source model could help. If a firm, or publisher were prepared to create an open source legal taxonomy (and to be fair, we’re only talking about 5000 entries or so – this wouldn’t be a very big structure) and let this be developed and improved collaboratively, they would themselves benefit from others’ experience, the transfer of legal data between repositories would be easier and even the search vendors might learn a little about how lawyers actually want to search. The original creators would be seen as thought-leaders and could even license the taxonomy so it could not be rebadged and passed off as original by another firm or publisher.

However my plea fell on stony ground: law firms seem to think that their own taxonomies have inherent value (and thus should never be let outside the company) and they regard the open source model with suspicion. Perhaps legal search will remain broken for the time being.

Tags: , , , , , , ,

Posted in events

November 11th, 2010

2 Comments »

More about LucidWorks Enterprise

If you’re considering a Lucene/Solr powered search solution, you may be interested in LucidWorks Enterprise, produced by our partners Lucid Imagination. They’ve taken Lucene/Solr and added a powerful admin GUI, ReST API, web spiders, file crawlers, database connectors, alerts, a clickthrough framework and more. All this comes with a range of excellent support options backed by the experts at Lucid.

If you’d like to know more read this downloadable PDF or contact us for more information and a demo.

Tags: , , ,

Posted in Reference

November 5th, 2010

No Comments »

Questions to ask your search vendor

#1 – How does it work?
You’ll probably get as many different answers to this as there are vendors – but you may not get the whole truth. Bear in mind that a lot of search engines share what theoretical ideas they apply. An engine might use a vector-space or probabilistic models for ordering results, for example. Most will create an inverted index.

#2 – How fast is it?
Every search engine will take a finite amount of time to index a document or produce search results. Some of these processes will be limited by how fast data can be written to or read from disk, some by how fast the processor can do calculations. The key point is whether this time is going to work for you – will your users care if some complicated queries take ten seconds rather then a fraction of a second? Is there a time in the middle of the night when the system can spend a couple of hours building a new index? Watch out for silly answers such as “it’s instantaneous”.

#3 – How does it scale?
Whatever data you have today, you’ll have more tomorrow! How many servers will you need today, and how easy is it to add more in the future as necessary? Will this affect the speed of indexing or searching? Cloud-based solutions can help, especially when the amount of data or queries can be variable.

#4 – How much does it cost?
This is a question with several potential answers: the cost of a software license (of course, with open source code this can be zero), the cost of integration and customisation so the engine fits your requirements and the cost of ongoing support. Beware of a solution that promises much, but only after months of customisation. You should also ask how the cost scales with any growth in the number of source documents or users.

#5 – What happens if the vendor is taken over or disappears?
If the vendor is acquired by another company, or goes out of business, what happens to the software? The new owners may force you to move to their preferred solution, or in the worst case you can be left with no support for an obsolescent product. Ask if the vendor offers escrow. Open source licensing may also be a solution.

The above is not meant to be a complete list – feel free to suggest further questions!

Tags: , ,

Posted in Uncategorized

November 2nd, 2010

No Comments »