Four of the Flax team are in Dublin this week for Lucene Revolution, almost certainly the largest event centred on open source search and specifically Lucene. There are probably a couple of hundred Lucene enthusiasts here and the event is being held at the Aviva Stadium on Landsdowne Road: look out the windows and you can see the pitch! Here are some personal reflections: a number of the talks I attended today have a connection to our own work in media monitoring which we’re talking about tomorrow.
Doug Turnbull’s Test Driven Relevancy was interesting, discussing OSC’s Quepid tool that allows content owners and search experts to work together to tweak and tune Solr’s options to present the right results for a query. I wondered whether this tool might eventually be used to develop a Learning to Rank option for Solr, as Lucene 4 now supports a pluggable scoring model.
I enjoyed Real-Time Inverted Search in the Cloud Using Lucene and Storm during which Joshua Conlin told us about running hundreds of thousands of stored queries in a distrubuted architecture. Storm in particular sounds worth investigating further. There is currently no attempt to reduce or ‘prune’ the set of queries before applying them: Joshua quoted speeds of 4000 queries/sec across their cluster of 8 instances: impressive numbers, but our own monitoring applications are working at 20 times that speed by working out which queries not to apply.
I broke out at this point to catch up with some contacts, including the redoubtable Iain Fletcher of Search Technologies – always a pleasure. After a sandwich lunch I went along to hear Andrzej Bialecki of Lucidworks talk about Sidecar Indexes, a method for allowing rapid updates to Lucene fields. This reminded me of our own experiments in this area using Lucene’s pluggable codecs.
Next was more from the Opensource Connections team, as John Berryman talked about their work to update a patent search application that uses a very old search syntax, BRS. This sounds very much the work we’ve done to translate one search engine syntax into another for various media monitoring companies – so far we can handle dtSearch and we’re currently finishing off support for HP/Autonomy Verity’s VQL (PDF).
This latter issue has got me thinking that perhaps it might be possible to collaboratively develop an open source search engine query language – various parsers could be developed to turn other search syntaxes into this language, and search engines like Lucene (or anything else) could then be extended to implement support for it. This would potentially allow much easier migration between search engine technologies. I’m discussing the concept with various folks at the event this week so do please get in touch if you are interested!
Back tomorrow with a further update on this exciting conference – tonight we’re all off to the Temple Bar area of Dublin for food and drink, generously provided by Lucidworks who should also be thanked for organising the Revolution.