Posts Tagged ‘lucene’

Revolutions and interviews

This October I’ve been invited to speak at Lucene Revolution, a conference on open source search to be held in Boston, USA. I’ll be part of the closing panel on October 8th, together with speakers from Lucid Imagination and Exalead. It looks like a very interesting event, with speakers from IBM, Cisco, LinkedIn and the Smithsonian.

As part of the run-up to the conference Stephen Arnold has interviewed me – we discussed the wider picture of open source search, why a strong community is important and why flexibility can be the key to successful integration.

Tags: , ,

Posted in events

September 1st, 2010

No Comments »

Open Source Search Event

We sponsored Open Source Search Cambridge last week, which went very well, with attendees from as far away as Tokyo and New Zealand, a great variety of talks, presentation and networking and some excellent food!

Shane Evans from mydeco gave a detailed talk on Creating a product search engine, with some interesting details on how query-independent weights are calculate. He was followed by Olly Betts on How Gmane is implemented using Xapian – 72 million messages indexed on a single server! We also had talks from those involved with the Cheshire3 XML search engine, PuppyIR, project to develop search frameworks for children, and found out more about how Glasses Direct have implemented their search using SOLR.

The afternoon consisted of a number of well-attended seminars on search topics, such as comparisons of the various open source search engines available. The day ended with informal networking in a nearby pub.

Based on the feedback we got, there’s definitely interest in a similar event next year – watch this space.

Update: sounds like Search Solutions 2009 was also a good day.

Tags: , , ,

Posted in events

October 6th, 2009

1 Comment »

Open Source Search event in Cambridge on 29th September

We’re sponsoring a one-day event on open source search – details here, there will be more announced soon. Hope some of you can make it!

Tags: , , ,

Posted in News

July 27th, 2009

No Comments »

Whitepaper on enterprise search

Our technical partners Cognidox have released a whitepaper detailing their view of the enterprise search market, titled “Why you can’t just ‘Google’ for Enterprise Knowledge” – it’s well worth a read. You can download the PDF from their archive.

Tags: , , ,

Posted in News

July 13th, 2009

No Comments »

Xapian compared

Vik Singh has been comparing various open source solutions for search. He only spent a weekend performing the comparison, which is probably not enough time to get any search software performing at its best, and his results reflect this. Xapian was marked down for being slow at indexing (he says 5x slower than SQLite – but then again, SQLite isn’t a search engine, it’s a RDBMS, and really isn’t suitable for search applications) and for producing large index files, much bigger than Lucene.

The reason for this is that Xapian stores different information to Lucene. For example, the full term list (un-inverted index) is retained, which makes it possible to do relevance feedback. Also, Lucene handles deletes by maintaining a separate list of deleted documents, which is merged at the next optimise step – which means that the internal statistics are wrong until this point, and that updates can be more complicated, as an updated document needs a new ID.

Neither approach is wrong and both have advantages – Lucene certainly has smaller index files. Some judicious use of the XAPIAN_FLUSH_THRESHOLD parameter, as suggested in some of the comments on the article, would have certainly speeded up Xapian indexing. We can also look forward to the release of the new Xapian ‘Chert’ backend, which will produce indexes at least 50% smaller than the current ‘Flint’ backend. It’s also hard to say how important index sizes are in these days of cheap storage.

On the search side, Xapian performed comparably to Lucene in terms of relevance and search speed (both were ahead of all the other solutions on these metrics, especially SQLite). There are some other metrics he quoted, such as a ’support’ figure, given as a score out of 5, which he admits is entirely subjective – you’d have to ask our customers about that one! There’s also no comparison of features, ease of integration and scalability to very large collections.

We’ve talked before about performance metrics. Vik should be applauded for his article and for releasing his test framework as open source, hopefully this can be a foundation for some more in-depth studies.

More on performance metrics

Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released.

By the way, the article mentions Xappy: this is a Python interface to Xapian that is a major part of our Flax enterprise search platform. You can get Xappy here.

Tags: , , , ,

Posted in Technical

March 13th, 2009

No Comments »