Helping Bloomberg build a real-time news search engine with Luwak

Bloomberg is one of the world's leading providers of financial news via the Bloomberg Terminal, an almost ubiquitous presence on the desks of finance professionals. As you might expect their systems heavily depend on effective search and over the last few years they have become increasingly involved in the open source community, sponsoring events such as Lucene Revolution and also he...Continue reading

Working with Hadoop, Kafka, Samza and the wider Big Data ecosystem

We've been working on a number of projects recently involving open source software often quoted as 'Big Data' solutions - here's a quick overview of them. The grandfather of them all of course is Apache Hadoop, now not so much a single project as an ecosystem including storage and processing for potentially huge amounts of data, spread across clusters of machines. Interestingly Hadoop was originally created by D...Continue reading

London Lucene/Solr Meetup – Learning to Rank and Hibernate Search

Back to the very impressive Bloomberg lecture theatre for this month's Lucene/Solr Meetup, with an good turnout (I'm guessing 60-70 people). Our first talk came from Diego Ceccarelli of Bloomberg on how his team have created a Solr implementation of Continue reading

Better search for life sciences at the BioSolr Workshop, day 2 – Elasticsearch & others

Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Better search for life sciences at the BioSolr Workshop, day 1 – Apache Lucene/Solr

          Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Time to replace your Google Search Appliance with open source search

As many others have noted, Google have recently announced their Google Search Appliance (GSA) will not be available for sale from 2017. Search gurus Miles Kehoe and Martin White have written an insightful analysis of the move with some recommendations as to what to do - because your GSA will simply stop working once the 2-year license expires. I don't agree with Lauren...Continue reading

The fun and frustration of writing a plugin for Elasticsearch for ontology indexing

As part of our work on the BioSolr project, I have been continuing to work on the various Elasticsearch ontology annotation plugins (note that even though the project started with a focus on Solr - thus the name - we have also been developing some features for Ela...Continue reading

XJoin for Solr, part 1: filtering using price discount data

In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.

What is XJoin?

...Continue reading

Elasticsearch vs. Solr: performance improvements

I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading