If you’re planning an enterprise search project and have no background in the technologies or principles involved, here are some tips to get you started. This isn’t going to be a definitive list so if you know more, please do comment.
There haven’t been a lot of books written on this area over the years, but more are appearing now (especially on open source options). Managing Gigabytes is a good, if slightly elderly, starting point on basic principles. For thoughts on search user interfaces try Peter Morville’s Search Patterns and for an application focus there’s the recent Search Based Applications. For those developing in the Lucene/Solr world there’s the classic (and recently updated) Lucene in Action and the related Solr 1.4 Enterprise Search Server and Building Search Applications: Lucene, LingPipe, and Gate.
Most people will (of course) start their research on the web, although sometimes it’s hard to find nuggets of real information amongst all the marketing. Wikipedia has a list of vendors, including open source solutions, and Avi Rappaport maintains the useful (although not completely up to date) Search Tools website. Some vendors and some open source projects provide FAQs and tutorials (for example the Lucene FAQ, Xapian and Sphinx documentation), which may also contain general information about search principles.
You might also consider joining discussion groups such as the popular LinkedIn Enterprise Search Engine Professionals or a local Meetup group. Training is another option – offered by some vendors and open source companies such as ourselves.
Last night I went to another excellent Enterprise Search London Meetup, at Skinkers near London Bridge. I’d been at the Online show all day, which was rather tiring, so it was great to sit down with beer and nibbles and hear some excellent speakers.
Max Wilson kicked off with a talk on exploratory search and ’searching for leisure’. His Search Interface Inspector looks like a fascinating resource, and we heard about how he and his team have been constructing a taxonomy for the different kinds of search people do, using Twitter as a data source.
Martina Schell was next with details of Travel Match, a holiday search engine that’s trying to do for holidays what our customer Mydeco is doing for interior design: scrape/feed/gather as much holiday data as you can, put it all into a powerful search engine and build innovative interfaces on top. They’ve tried various interfaces including a ‘visual search’, but after much user testing have reined back their ambitions somewhat – however they’re still unique in allowing some very complex queries of their data. Interestingly, one challenge they identified is how to inform users that one choice (say, airport to fly from) may affect the available range of other choices (say, destinations) – apparently users often click repeatedly on ‘greyed-out’ options, unsure as to why they’re not working…
The inimitable Stephen Arnold concluded the evening with a realistic treatment of the current fashion for ‘real-time’ search. His point was that unless you’re Google, with their fibre-connected, hardware-accelerated gigascale architecture, you’re not going to be able to do real-time web search or anything close to it; on a smaller scale, for financial trading, military and other serious applications you again need to rely on the hardware – so for proper real-time (that means very close to zero latency), your engineering capability, not your software capability is what counts. I’m inclined to agree – I trained as an electronic engineer and worked on digital audio, back when this was also only possible with clever hardware design. Of course, eventually the commodity hardware gets fast enough to move away from specialised devices, and at this point even the laziest coder can create responsive systems, but we’re far away from that point. Perhaps the marketing departments of some search companies should take note – if you say you can do real-time indexing, we’re not going to believe you.
Thanks again to Tyler Tate and all at TwigKit for continuing to organise and support this excellent event.
Peter Morville has created a Flickr collection of ’search patterns’, showing the different kind of search interfaces available. I can highly recommend you take a look if you’d like some good examples of clustering, faceted navigation, auto-suggest and interfaces for certain sectors such as e-commerce. We often find these concepts difficult to explain to customers without some real-world examples.