London Lucene/Solr Usergroup – website search and indexing the cloud

This week’s London Lucene/Solr Meetup was hosted by asset management company BlackRock who also provided our first speakers. BlackRock manages an astonishing $4.7 trillion in assets (that’s more than the GDP of Germany) and operates 90 different websites with around 250,000 content items, so a good and accurate website search engine is essential. Although BlackRock use HP Autonomy‘s content management system and IDOL search engine, the latter is hard to tune (‘not deterministic, and why it ranks the way it does can be mysterious’) and Ife Nkechukwu and Erica Sundberg have been investigating Apache Solr as an alternative: being open source and with a powerful debugging features, Solr allows complete understanding of why a particular result is scored and ranked.

Starting with this great video (it’s from Google not BlackRock, but amusing and worth a look), Ife and Erica gave an engaging and clear presentation of their journey with Solr: how they explored the various options for crawling (Nutch and Heritrix were mentioned), how Analyzers are used to condition content for indexing and how Solr scoring ranking is actually calculated. This was one of the best ‘how to get started with Solr’ presentations I have seen and I was also very pleased to hear Ife say ‘you can’t just build search and forget it – you have to tune search like an instrument’ – entirely consistent with our own experience.

After a quick pizza break, Jim Liddle of Storage Made Easy was next up. Jim’s company provides appliances that connect to a myriad of cloud storage systems and provide a number of services (collaboration, sharing, governance, search) accessible via any computing or mobile device. Jim told us how they’d integrated Solr into their system to provide deep content search and filtering. Interestingly, Storage Made Easy chose Solr over Elasticsearch because they are ‘not quite sure where Elastic will end up in terms of commercials’ – even though Jim worked with Shay Banon (creator of Elasticsearch) at Gigaspaces. You can see Jim’s slides here where he explains how the hardest task was indexing permissions data. I was particularly interested in the ‘visual query builder’ they had developed for clients with very complex search requirements – this chimed with our own experience of working with complex media monitoring queries.

We finished with a Solr Q&A (Upayavira was kind enough to provide many of the answers) – BlackRock had kindly provided a prize for the best question (a mini quadcopter) – our winner was very happy! Thanks again to our hosts and presenters and I look forward to seeing you all again soon.

Leave a Reply

Your email address will not be published. Required fields are marked *