About a year ago we carried out some performance comparison tests of Solr (version 4.10) and Elasticsearch (version 1.2) and presented our results at search meetups. Our conclusion was that there was not a great deal of difference. Both search engines had more than adequate performance for the vast majority of applications, although Solr performed rather better with complex filter queries and supported a significantly higher QPS.
Since then, both Solr and Elasticsearch have released new major versions, so it seemed like a good time to do another comparison. This time, under our present test conditions, the results were reversed. Elasticsearch 2.0.0 was substantially faster than Solr 5.3.1 for searching, and could maintain more than twice the QPS under indexing load. Solr, on the other hand, supported much faster indexing and used less disk space for its indexes.
For various reasons, it was not practical to exactly duplicate the original tests. We used a single Amazon EC2 r3.4xlarge instance backed by 400GB of EBS storage. Both the Elasticsearch or Solr nodes and the test scripts were run on the same machine. The configuration of both search engines was as follows:
- 4 nodes
- 4 index shards
- no replication
- 16GB per node
Since the EC2 instance had 122GB of memory, this left 58GB for disk cache (minus whatever was used by the OS and the test scripts). The built-in instance of ZooKeeper was used for managing SolrCloud.
We created 20 million random documents, using (as before) a Markov chain trained on a document on the philosophy of Stoicism downloaded from gutenberg.org. Random integers were also generated for use as filters and facets. In both search engines, the text was indexed but not stored. Other fields were both indexed and stored (in Elasticsearch, _source and _all were disabled).
We indexed the documents into Elasticsearch or Solr using two concurrent instances of a Python script, with a batch size of 1000 documents. Solr was configured to do a soft commit every 1s, to be consistent with the default Elasticsearch behaviour. The elapsed indexing time for Solr was 66m 52s, while Elasticsearch took more than twice as long, at 142m 2s. The total index sizes were 38GB and 79GB respectively (update: see comments, this may be a mistake).
After indexing, we carried our search time tests under conditions of indexing load (2 processes as before) or no load. We also performed QPS tests in the loaded condition. In all cases, queries were composed of 3 OR terms, with three filters, and facets generated from the numeric terms. 5000 searches were run before all tests, to warm caches.
With no concurrent indexing load, Elasticsearch had a mean search time of 0.10s, with 99% of searches under 0.22s. For Solr, this was 0.12s and 0.54s (see figure 1). With an indexing load, Elasticsearch had a mean search time of 0.14s, with 99% below 0.34s. The same figures for Solr were 0.24s and 0.68s (figure 2). QPS tests were carried out with 1, 2, 4, 8, 16 and 32 concurrent search clients. Elasticsearch approached a maximum of 30 QPS, while Solr approached 15 (figure 3).
Thus, under indexing load, Elasticsearch appeared to have approximately twice the search performance of Solr. It wasn’t clear why this might be the case. One idea which occurred to us was that the query execution changes announced for Elasticsearch 2 might be responsible. To test this, we also compared Elasticsearch 1.7.3 against the current version. The older version was slightly slower (99% of searches under 0.50s as opposed to 0.34s, figure 4) but this was a smaller difference than for Solr. The QPS test was inconclusive (figure 5).
These results must be interpreted with caution. The first caveat is that we only tested a narrow range of either engine’s functionality, in one specific configuration. Other functional areas may perform very differently. Second, the runtime environment was fairly unrealistic. In practice, network latency and bandwidth are likely to have an effect on performance. Third, both engines were used more or less “out of the box”, with minimal effort put into tuning the performance of either.
There are also many factors other than raw performance to be taken into consideration when choosing a search engine. We are not saying that either choice is “better” than the other in all circumstances. However, if search performance is a critical factor in your system design, then it would pay to try both Solr and Elasticsearch, to see which would work better within your parameters.
The difference in performance in this study is interesting, but the reasons remain unclear, and we’d welcome any suggestions to refine our methodology, or similar studies.