Better search for life sciences at the BioSolr Workshop, day 1 – Apache Lucene/Solr

BioSolr_logo

          Over the last 18 months we've been working closely with the European Bioinformatics Institute on a project to improve their use of open source search engines, funded by the BBSRC. The project was originally named BioSolr but has since grown to encompass Continue reading

Time to replace your Google Search Appliance with open source search

As many others have noted, Google have recently announced their Google Search Appliance (GSA) will not be available for sale from 2017. Search gurus Miles Kehoe and Martin White have written an insightful analysis of the move with some recommendations as to what to do - because your GSA will simply stop working once the 2-year license expires. I don't agree with Lauren...Continue reading

The fun and frustration of writing a plugin for Elasticsearch for ontology indexing

As part of our work on the BioSolr project, I have been continuing to work on the various Elasticsearch ontology annotation plugins (note that even though the project started with a focus on Solr - thus the name - we have also been developing some features for Ela...Continue reading

XJoin for Solr, part 1: filtering using price discount data

In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.

What is XJoin?

...Continue reading

Elasticsearch vs. Solr: performance improvements

I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading

London Text Analytics Meetup – Making sense of text with Lumi, Signal & Bloomberg

This month's London Text Analytics Meetup, hosted by Bloomberg in their spectacular Finsbury Square offices, was only the second such event this year, but crammed in three great talks and attracted a wide range of people from both academia and business. We started with Gabriella Kazai o...Continue reading

Out and about in search & monitoring – Autumn 2015

It's been a very busy few months for events - so busy that it's quite a relief to be back in the office! Back in late November I travelled to Vienna to speak at the FIBEP World Media Intelligence Congress with our client Infomedia about how we've helped them to migrate their media monitoring platform from the elderly, unsupported and hard to scale Continue reading

Elasticsearch vs. Solr performance: round 2.1

Last week's post on my performance comparison tests stimulated quite a lot of discussion on the blog and Twitter, not least about the large disparity in index sizes (and many thanks to everyone who contributed to this!) The Elasticsearch index was apparently nearly twice the size of the Solr index (the performance was also roughly double). In the end, it seems that the most likely reason for the appare...Continue reading

Elasticsearch vs. Solr performance: round 2

About a year ago we carried out some performance comparison tests of Solr (version 4.10) and Elasticsearch (version 1.2) and presented our results at search meetups. Our conclusion was that there was not a great deal of difference. Both search engines had more than adequate performance for the vast majority of applications, although Solr performed rather better with complex filter quer...Continue reading