Questions to ask your search vendor

#1 - How does it work? You'll probably get as many different answers to this as there are vendors - but you may not get the whole truth. Bear in mind that a lot of search engines share what theoretical ideas they apply. An engine might use a vector-space or probabilistic models for ordering results, for example. Most will create an Continue reading

Xapian 1.2.0 arrives

Xapian 1.2.0, the first of a new 'stable' release series, was announced a few weeks ago and we've just uploaded pre-built binaries for Windows and associated build files. You can find them on our Xapian downloads page. This version features a new, faster, more compact database format and enhanced backwards compatibility with existing databases; a built-in replication system (so in a distributed system you only need to propagate the changes to a Xap...Continue reading

Online Information 2009, day 3

Back at Online 2009 on Thursday, to take part in the closing panel: "Cloud Computing, Open Source and Semantics: Content and Search Predictions", moderated by Stephen Arnold. We only touched on four of the ten controversial themes Stephen had prepared: we talked a lot about how 'Google pressure' will affect the market, how XML isn't necessarily the universal panacea for representing data, on the growth of rich m...Continue reading

Xapian compared

Vik Singh has been comparing various open source solutions for search. He only spent a weekend performing the comparison, which is probably not enough time to get any search software performing at its best, and his results reflect this. Xapian was marked down for being slow at indexing (he says 5x slower than SQLite - but then again, SQLite isn't a search engine, it's a RDBMS, and...Continue reading

Distributed search and partition functions

For most applications, Xapian/Flax's search performance will be excellent to acceptable on a single machine of reasonable spec (see here for a discussion of CPU and RAM requirements). However, if the document corpus is unusually large - more than about 20 million items - then one server may not be enough for acceptable speed. Xapian provides a mechanism called remote backends which lets the load be shared ov...Continue reading

More on performance metrics

Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released. By the way, the article mentions Xappy: this is a Python interface to Xapian that is a major part of our Flax enterprise search platform. You can ge...Continue reading

Performance metrics

Stephen Arnold recently posted some rather impressive performance figures for Autonomy's IDOL search engine. This kind of data is all very well, but without independent testing and more detail it's hard to know how these figures apply to the real world. So here's an idea. Why not create an openly available collection of test data, a set of searches and a set of conditions, then compare the performance of the various av...Continue reading