XJoin for Solr, part 2: a click-through example

Posted on January 29, 2016 by Tom Winch

In my last blog post, I demonstrated how to set up and configure Solr to use the new XJoin search components we've developed for the BioSolr project, using an example from an e-commerce setting. This time, I'll show...Continue reading

The fun and frustration of writing a plugin for Elasticsearch for ontology indexing

Posted on January 27, 2016 by Matt Pearce

As part of our work on the BioSolr project, I have been continuing to work on the various Elasticsearch ontology annotation plugins (note that even though the project started with a focus on Solr - thus the name - we have also been developing some features for Ela...Continue reading

XJoin for Solr, part 1: filtering using price discount data

Posted on January 25, 2016 by Tom Winch

In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.

What is XJoin?

Elasticsearch vs. Solr: performance improvements

Posted on December 18, 2015 by Tom

I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading

Elasticsearch vs. Solr performance: round 2

Posted on December 2, 2015 by Tom

About a year ago we carried out some performance comparison tests of Solr (version 4.10) and Elasticsearch (version 1.2) and presented our results at search meetups. Our conclusion was that there was not a great deal of difference. Both search engines had more than adequate performance for the vast majority of applications, although Solr performed rather better with complex filter quer...Continue reading

Luwak 1.3.0 released

Posted on November 17, 2015 by Alan Woodward

The latest version of Luwak, our open-source streaming query engine, has been released on the Sonatype Nexus repository and will be making its way to Maven Central in the next few hours. Here's a summary of the new features and improvements we've made: Batch processing I...Continue reading

Faster bulk indexing in Elasticsearch

Posted on September 28, 2015 by Alan Woodward

We recently did some work for Arachnys who provide data on a wide range of emerging markets. Their data, gathered by a complex process of web crawling, is stored in HBase and served out of a 10-node Elasticsearch cluster. Periodically, better ways of extracting data from the raw crawl will be implemented...Continue reading

Real-time full-text search with Luwak and Samza

Posted on August 26, 2015 by Charlie Hull

This is an edited transcript of a talk given by Alan Woodward of Flax and Martin Kleppmann at FOSDEM 2015. It was originally published on the Confluent blog. ...Continue reading

Elasticsearch Percolator & Luwak: a performance comparison of streamed search implementations

Posted on July 27, 2015 by Tom

Most search applications work by indexing a relatively stable collection of documents and then allowing users to perform ad-hoc searches to retrieve relevant documents. However, in some cases it is useful to turn this model on its head, and match individual documents against a collection of saved queries. I shall refer to this model as "streamed search". One example of streamed search is in media monitoring. The monitoring agency's ...Continue reading

Free file filters, search & taxonomy tools from our old Googlecode repository

Posted on March 19, 2015 by Charlie Hull

Google's GoogleCode service is closing down, in case you hadn't heard, and I've just started the process of moving everything over to our Github account. This prompted me to take a look at what's there and there's a surprising amount of open source code I'd forgotten about. So, here's a quick rundown of the useful tools, examples...Continue reading