Back to the very impressive Bloomberg lecture theatre for this month’s Lucene/Solr Meetup, with an good turnout (I’m guessing 60-70 people). Our first talk came from Diego Ceccarelli of Bloomberg on how his team have created a Solr implementation of Learning to Rank, an improved way to rank search results using machine learning. Diego first took us through the basics of Lucene’s ranking methods, based on the venerable TF/IDF algorithm (although note that BM25 will be the default very soon). Bloomberg’s implementation first retrieves 1000 search results using standard TF/IDF (which is fast) and then extracts ‘features’ (a simple example might be ‘does the title match the search query?’) which are then fed to a machine learning model. This model is then used to re-rank the 1000 initial results and the top 10 supplied to the user. Interestingly, they have chosen to implement the features as Lucene queries, allowing for easy re-use. Initial tests have shown some metrics such as ‘clicks on the first result’ up by 10%, which is encouraging. There is now a Solr patch (SOLR-8542) which they hope to commit to Solr soon, and you can find slides and a video of a previous presentation on this topic online. I first heard about Learning to Rank from Microsoft Research some years ago and it’s great to see an open source implementation.
Next Sanne Grinovero of RedHat talked about Hibernate Search, an implementation of full-text search for users of this Java ORM. He gave us some great examples of how relational databases can be bad at full text search and thus the need for a full-text engine like Lucene. His implementation hides some of the finer details of Lucene but allows use of advanced Lucene API calls where necessary, and automatically keeps the Lucene index in sync with a relational database. A simple query DSL is available which he demonstrated in use for indexing and querying Twitter data. He then told us about Infinispan, a highly scalable key-value store which can also be used for storing Lucene indexes and mentioned ongoing work to add Elasticsearch and Solr integration.
We finished with a brief informal Q&A session outside; thanks to both presenters and to my co-hosts at Bloomberg for helping to organise the event. We hope to run another Meetup in a couple of months – as ever, offers of talks, a venue and sponsorship of snacks & drinks are very welcome!