Both Solr and Elasticsearch include suggester components, which can be used to provide search engine users with suggested completions of queries as they type: Query autocomplete has become an expected part of the search experience. Its benefits to the user include les...Continue reading
Author Archives: Tom
Better performance with the Logstash DNS filter
We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:
dns { resolve => [ ...Continue reading
Elasticsearch, Kibana and duplicate keys in JSON
JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:
{ "foo": "spam", "foo": "eggs", ... }Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading
Release 1.0 of Marple, a Lucene index detective
Back in October at our London Lucene Hackday Flax's Alan Woodward started to write Marple, a new open source tool for inspecting Lucene indexes. Since then we have made nearly 240 commits to the Marple GitHub repository, and are now happy to announce its first release.Continue reading
Simple Solr connector for React.js
We've just published a simple (60 lines of code) React.js component to npm which makes it easy to perform searches on a Solr 6 instance and get the data into the app to display. Unlike Twigkit or Searchkit this is not a UI library - it is just a connector. If you use it you will have to implement all the UI components yourself. ...Continue reading
Running out of disk space with Elasticsearch and Solr: a solution
We recently did a proof-of-concept project for a customer which ingested log events from various sources into a Kafka - Logstash - Elasticsearch - Kibana stack. This was configured with Ansible and hosted on about a dozen VMs inside the customer's main...Continue reading
Elasticsearch vs. Solr: performance improvements
I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading
Elasticsearch vs. Solr performance: round 2.1
Last week's post on my performance comparison tests stimulated quite a lot of discussion on the blog and Twitter, not least about the large disparity in index sizes (and many thanks to everyone who contributed to this!) The Elasticsearch index was apparently nearly twice the size of the Solr index (the performance was also roughly double). In the end, it seems that the most likely reason for the appare...Continue reading
Elasticsearch vs. Solr performance: round 2
About a year ago we carried out some performance comparison tests of Solr (version 4.10) and Elasticsearch (version 1.2) and presented our results at search meetups. Our conclusion was that there was not a great deal of difference. Both search engines had more than adequate performance for the vast majority of applications, although Solr performed rather better with complex filter quer...Continue reading
Elasticsearch Percolator & Luwak: a performance comparison of streamed search implementations
Most search applications work by indexing a relatively stable collection of documents and then allowing users to perform ad-hoc searches to retrieve relevant documents. However, in some cases it is useful to turn this model on its head, and match individual documents against a collection of saved queries. I shall refer to this model as "streamed search". One example of streamed search is in media monitoring. The monitoring agency's ...Continue reading