Search requirements and asking the right questions

When we're contacted by potential clients, we have to gather as much information as possible about how and why they need search technology. This either takes the form of a physical or telephone meeting and much scribbling in notebooks, or a long exchange of emails. In all cases there are some important questions that must be answered, and I thought it might be useful to list the most common ones here: How many items do you need to search? The number of items to search varies w...Continue reading

More on performance metrics

Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released. By the way, the article mentions Xappy: this is a Python interface to Xapian that is a ma...Continue reading

Image searching

Searching images is a difficult problem, and it's not a feature offered by many commercial search engines. Some will cheat slightly, by indexing the title or filename of the image, or the text surrounding an image embedded on a page, and call this 'image search' - but this method doesn't work very well, especially when you have a standalone image called 'IMG0000064.jpg' which is actually a picture of an apple. We've seen some good demos of actual image search - I...Continue reading

Performance metrics

Stephen Arnold recently posted some rather impressive performance figures for Autonomy's IDOL search engine. This kind of data is all very well, but without independent testing and more detail it's hard to know how these figures apply to the real world. So here's an idea. Why not create an openly available collection of test data, a set of searches and a set of conditions, then compare the performance of the various av...Continue reading