A search-based suggester for Elasticsearch with security filters

Posted on November 16, 2017 by Tom

Both Solr and Elasticsearch include suggester components, which can be used to provide search engine users with suggested completions of queries as they type: Query autocomplete has become an expected part of the search experience. Its benefits to the user include les...Continue reading

Better performance with the Logstash DNS filter

Posted on August 17, 2017 by Tom

We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ ...Continue reading

Elasticsearch, Kibana and duplicate keys in JSON

Posted on August 3, 2017 by Tom

JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:

{
  "foo": "spam",
  "foo": "eggs",
  ...
}

Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading

Release 1.0 of Marple, a Lucene index detective

Posted on February 24, 2017 by Tom

Back in October at our London Lucene Hackday Flax's Alan Woodward started to write Marple, a new open source tool for inspecting Lucene indexes. Since then we have made nearly 240 commits to the Marple GitHub repository, and are now happy to announce its first release.Continue reading

Simple Solr connector for React.js

Posted on June 29, 2016 by Tom

We've just published a simple (60 lines of code) React.js component to npm which makes it easy to perform searches on a Solr 6 instance and get the data into the app to display. Unlike Twigkit or Searchkit this is not a UI library - it is just a connector. If you use it you will have to implement all the UI components yourself. ...Continue reading

Running out of disk space with Elasticsearch and Solr: a solution

Posted on April 21, 2016 by Tom

We recently did a proof-of-concept project for a customer which ingested log events from various sources into a Kafka - Logstash - Elasticsearch - Kibana stack. This was configured with Ansible and hosted on about a dozen VMs inside the customer's main...Continue reading

Elasticsearch vs. Solr: performance improvements

Posted on December 18, 2015 by Tom

I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading

Elasticsearch vs. Solr performance: round 2.1

Posted on December 11, 2015 by Tom

Last week's post on my performance comparison tests stimulated quite a lot of discussion on the blog and Twitter, not least about the large disparity in index sizes (and many thanks to everyone who contributed to this!) The Elasticsearch index was apparently nearly twice the size of the Solr index (the performance was also roughly double). In the end, it seems that the most likely reason for the appare...Continue reading

Elasticsearch vs. Solr performance: round 2

Posted on December 2, 2015 by Tom

About a year ago we carried out some performance comparison tests of Solr (version 4.10) and Elasticsearch (version 1.2) and presented our results at search meetups. Our conclusion was that there was not a great deal of difference. Both search engines had more than adequate performance for the vast majority of applications, although Solr performed rather better with complex filter quer...Continue reading

Elasticsearch Percolator & Luwak: a performance comparison of streamed search implementations

Posted on July 27, 2015 by Tom

Most search applications work by indexing a relatively stable collection of documents and then allowing users to perform ad-hoc searches to retrieve relevant documents. However, in some cases it is useful to turn this model on its head, and match individual documents against a collection of saved queries. I shall refer to this model as "streamed search". One example of streamed search is in media monitoring. The monitoring agency's ...Continue reading