Posts Tagged ‘lucene’

Legal search is broken – can it be fixed with open source taxonomies?

I spent yesterday afternoon at the International Society for Knowledge Organisation’s Legal KnowHow event, a series of talks on legal knowledge and how it is managed. The audience was a mixture of lawyers, legal information managers, vendors and academics, and the talks came from those who are planning legal knowledge systems or implementing them. I also particularly enjoyed hearing from Adam Wyner from Liverpool University who is modelling legal arguments in software, using open source text analysis. You can see some of the key points I picked up on our Twitter feed.

What became clear to me during the afternoon is that search technology is not currently serving the needs of lawyers or law firms. The users want a simple Google-like interface (or think they do), the software is having trouble presenting results in context and the source data is large, complex and unwieldy. The software used for search is from some of the biggest commercial search vendors (legal firms seem to ‘follow the pack’ in terms of what vendor they select – unfortunately few of the large law firms seem to have even considered the credible open source alternatives such as Lucene/Solr or Xapian).

In many cases taxonomies were presented as the solution – make sure every document fits tidily into a heirarchy and all the search problems go away, as lawyers can simply navigate to what they need. All very simple in theory – however each big law firm and each big legal information publisher has their own idea of what this taxonomy should be.

After the final presentation I argued that this seemed to be a classic case where an open source model could help. If a firm, or publisher were prepared to create an open source legal taxonomy (and to be fair, we’re only talking about 5000 entries or so – this wouldn’t be a very big structure) and let this be developed and improved collaboratively, they would themselves benefit from others’ experience, the transfer of legal data between repositories would be easier and even the search vendors might learn a little about how lawyers actually want to search. The original creators would be seen as thought-leaders and could even license the taxonomy so it could not be rebadged and passed off as original by another firm or publisher.

However my plea fell on stony ground: law firms seem to think that their own taxonomies have inherent value (and thus should never be let outside the company) and they regard the open source model with suspicion. Perhaps legal search will remain broken for the time being.

Tags: , , , , , , ,

Posted in events

November 11th, 2010

1 Comment »

More about LucidWorks Enterprise

If you’re considering a Lucene/Solr powered search solution, you may be interested in LucidWorks Enterprise, produced by our partners Lucid Imagination. They’ve taken Lucene/Solr and added a powerful admin GUI, ReST API, web spiders, file crawlers, database connectors, alerts, a clickthrough framework and more. All this comes with a range of excellent support options backed by the experts at Lucid.

If you’d like to know more read this downloadable PDF or contact us for more information and a demo.

Tags: , , ,

Posted in Reference

November 5th, 2010

No Comments »

Further revolutions

Back for the second day of Lucene Revolution, with some great talks on migrating to Solr from FAST ESP, the new flexible indexing features coming to Lucene ‘real soon now’, and finishing off with a panel discussion. I felt privileged to sit as part of this panel between Eric Gries, CEO of Lucid Imagination, and Paul Doscher of Exalead – the discussion was lively and interesting (I hope!) to the audience.

I’m looking forward to returning to the UK with all I’ve learnt from this event, and to follow up on some of the ideas generated – for example, it would be great to be able to demonstrate Lucid Works Enterprise to interested parties in London.

Thanks to Stephen Arnold’s team and all at Lucid Imagination for organising such a great conference. It won’t be the last I’m sure!

Tags: , , , ,

Posted in events

October 8th, 2010

No Comments »

A revolution indeed

I’m at the Lucene Revolution conference in Boston, USA for the next few days – and it’s aptly named. If there’s anyone out there who still doubts that open source search is a serious alternative to a commercial engine, the numbers and other information coming out of this event will be convincing. Twitter are now using Lucene to handle a billion queries a day; LinkedIn and SalesForce.com are already veterans with similarly huge installations. The conversations I’m having and overhearing are about billions of documents, tens of thousands of users, all easily handled by open source search.

The other big news here is that Lucid Imagination have released software to fill in most if not all of the gaps between Lucene/Solr and the closed-source competition – it’s called LucidWorks Enterprise and adds a detailed administration UI, a REST API, crawlers, scaling functionality and much more. I’m looking forward to getting my hands on a demo and showing it off when back in the UK.

There’s an optimistic, buzzing energy at this event – a real feeling that we’re here at the beginning of something big. More revolutionary news to come!

Tags: , , ,

Posted in News, events

October 7th, 2010

1 Comment »

Flax partners with Lucid Imagination

We’re very happy to announce that we’ve been selected as an Authorized Partner by Lucid Imagination, the commercial company for Lucene and Solr. You can read the press release as a PDF here.

Apache Lucene and Solr, available as open source software from the Apache Software Foundation, are powerful, scalable, reliable and fully-featured search technologies. Solr is the Lucene Search Server, making it easy to build search applications for the enterprise.

With our long experience of customising, installing and supporting open source search engines, this partnership is a natural fit for us, and we’re excited by the opportunities it presents. In addition to our current offerings, Flax will now offer installation, integration and commercial support packages for Lucene and Solr, backed by Lucid Imagination.

Tags: , , , ,

Posted in Business, News

October 4th, 2010

No Comments »

Autumn events

Autumn seems to be conference season: first is the Lucene Revolution event in Boston, USA from October 7th-8th, where I’ll be on the closing panel whose subject is “Data Crossroads – At The Intersection Of Search And Open Source”.

Next is the British Computer Society’s Search Solutions 2010 in London on October 21st, where I’m giving a presentation titled “What’s the story with open source? – Searching and monitoring news media with open-source technology”.

Both events feature a wide range of other speakers from organisations such as Cisco, LinkedIn, Twitter, Google and Microsoft.

Tags: , , , ,

Posted in events

September 10th, 2010

No Comments »

Open source search engines and programming languages

So you’re writing a search-related application in your favourite language, and you’ve decided to choose an open source search engine to power it. So far, so good – but how are the two going to communicate?

Let’s look at two engines, Xapian and Lucene, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ – so if you’re using those languages respectively, everything should be relatively simple – just download the source code and get on with it. However if this isn’t the case, you’re going to have to work out how to interface to the engine.

The Lucene project has been rewritten in several other languages: for C/C++ there’s Lucy (which includes Perl and Ruby bindings), for Python there’s PyLucene, and there’s even a .Net version called, not surprisingly, Lucene.NET. Some of these ‘ports’ of Lucene are ‘looser’ than others (i.e. they may not share the same API or feature set), and they may not be updated as often as Lucene itself. There are also versions in Perl, Ruby, Delphi or even Lisp (scary!) – there’s a full list available. Not all are currently active projects.

Xapian takes a different approach, with only one core project, but a sheaf of bindings to other languages. Currently these bindings cover C#, Java, Perl, PHP, Python, Ruby and Tcl – but interestingly these are auto-generated using the Simplified Wrapper and Interface Generator or SWIG. This means that every time Xapian’s API changes, the bindings can easily be updated to reflect this (it’s actually not quite that simple, but SWIG copes with the vast majority of code that would otherwise have to be manually edited). SWIG actually supports other languages as well (according to the SWIG website, “Common Lisp (CLISP, Allegro CL, CFFI, UFFI), Lua, Modula-3, OCAML, Octave and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme, Chicken)”) so in theory bindings to these could also be built relatively easily.

There’s also another way to communicate with both engines, using a search server. SOLR is the search server for Lucene, whereas for Xapian there is Flax Search Service. In this case, any language that supports Web Services (you’d be hard pressed to find a modern language that doesn’t) can communicate with the engine, simply passing data over the HTTP protocol.

Tags: , , , , , , ,

Posted in Technical

September 3rd, 2010

1 Comment »

Revolutions and interviews

This October I’ve been invited to speak at Lucene Revolution, a conference on open source search to be held in Boston, USA. I’ll be part of the closing panel on October 8th, together with speakers from Lucid Imagination and Exalead. It looks like a very interesting event, with speakers from IBM, Cisco, LinkedIn and the Smithsonian.

As part of the run-up to the conference Stephen Arnold has interviewed me – we discussed the wider picture of open source search, why a strong community is important and why flexibility can be the key to successful integration.

Tags: , ,

Posted in events

September 1st, 2010

No Comments »

Open Source Search Event

We sponsored Open Source Search Cambridge last week, which went very well, with attendees from as far away as Tokyo and New Zealand, a great variety of talks, presentation and networking and some excellent food!

Shane Evans from mydeco gave a detailed talk on Creating a product search engine, with some interesting details on how query-independent weights are calculate. He was followed by Olly Betts on How Gmane is implemented using Xapian – 72 million messages indexed on a single server! We also had talks from those involved with the Cheshire3 XML search engine, PuppyIR, project to develop search frameworks for children, and found out more about how Glasses Direct have implemented their search using SOLR.

The afternoon consisted of a number of well-attended seminars on search topics, such as comparisons of the various open source search engines available. The day ended with informal networking in a nearby pub.

Based on the feedback we got, there’s definitely interest in a similar event next year – watch this space.

Update: sounds like Search Solutions 2009 was also a good day.

Tags: , , ,

Posted in events

October 6th, 2009

1 Comment »

Open Source Search event in Cambridge on 29th September

We’re sponsoring a one-day event on open source search – details here, there will be more announced soon. Hope some of you can make it!

Tags: , , ,

Posted in News

July 27th, 2009

No Comments »