Comments on: Elasticsearch vs. Solr performance: round 2 http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/ The Open Source Search Specialists Tue, 12 Feb 2019 14:44:32 +0000 hourly 1 https://wordpress.org/?v=4.9.8 By: Zachary Tong http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33563 Fri, 11 Dec 2015 13:16:19 +0000 http://www.flax.co.uk/?p=2826#comment-33563 Oops, sorry, this was supposed to be a reply to the other thread 🙂

]]>
By: Zachary Tong http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33562 Fri, 11 Dec 2015 13:15:29 +0000 http://www.flax.co.uk/?p=2826#comment-33562 Another potential (perf related) issue with the setup: ES automatically assumes it has full run of the server. E.g. it set’s various threadpools based on multiples of the CPU core count. So unless these were isolated in VMs, the server was probably burning a lot of cycles in context switching (because there were 4x the appropriate number of threads). Ditto for Lucene merge scheduler threads, and merge throttling, etc.

I don’t know enough about Solr, but it could very well have run into the same problems. I imagine they scale threadpools/IO similarly (and definitely the Lucene components would)

The solution is to A) run them on their own machine, B) isolate with VMs and scale down core counts appropriately, or C) use the `processors` setting to tell ES how many cores are actually available (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html#processors)

🙂

]]>
By: Tom http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33558 Fri, 11 Dec 2015 12:12:53 +0000 http://www.flax.co.uk/?p=2826#comment-33558 Thanks for the clarification, Zachary, that’s interesting. So if this is done by host ID rather than node ID, I guess it explains why I didn’t have any replicas in the original 4-node case, since they were all running on the same host. (Not very realistic, I accept, but the same for both ES and Solr).

]]>
By: Andreas http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33557 Fri, 11 Dec 2015 12:10:35 +0000 http://www.flax.co.uk/?p=2826#comment-33557 @Zachary: perfect description!

]]>
By: Zachary Tong http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33517 Thu, 10 Dec 2015 19:33:25 +0000 http://www.flax.co.uk/?p=2826#comment-33517 Ah, I see. Nothing stupid on your end, just some nuances to how ES handles replicas… which become more apparent on single-node clusters.

The culprit is a setting called Write Consistency (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency), which comes into play when trying to index documents. Write consistency is a safety mechanism that helps to prevent writing to the wrong side of a split-brain (helps, but doesn’t prevent).

When trying to index a document, ES will first evaluate the cluster state to see if a quorum of shards are available. If a quorum is not available, it will simply refuse to index the document (after a sufficient timeout period to see if more nodes join).

So in the single node scenario where you specify two replicas, you need two available shards (1 primary + 2 replicas == 3 shards. Quorum of three is two).

The gotcha here is that Elasticsearch refuses to allocate replicas to the same machine as a primary, since that defeats the point of the replica (e.g. if the machine dies, you lose both copies of data). So in this single-node cluster, only the primaries are allocated regardless of replica count.

So the end result is that each index will have a single primary shard, and in the 2-replica case, that single primary does not satisfy the quorum needed to index, so the operation stalls and aborts.

If you were to spin up a second node, one replica would be allocated there and you’d be able to index because the quorum will now be satisfied.

]]>
By: Tom http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33511 Thu, 10 Dec 2015 17:02:59 +0000 http://www.flax.co.uk/?p=2826#comment-33511 My colleague @romseygeek has suggested that I might have measured the index size in the middle of a merge, which is possible and could certainly explain the size disparity and the difficulty in reproducing it.

]]>
By: Tom http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33507 Thu, 10 Dec 2015 15:44:55 +0000 http://www.flax.co.uk/?p=2826#comment-33507 OK, I’m confused now. I’ve just tried running one ES node, and configuring indexes with number_of_shards=1 and number_of_replicas=0, 1 and 2 (indexes s1r0, s1r1 and s1r2). When I add a document to each, I end up with the same index structure for the first two cases.

For the index where number_of_replicas=2, the PUT blocks (as you’d expect if there is no node available to handle the replica).

So it looks to me like the 0 and 1 number_of_replicas cases are handled identically, with no replica shards (and yes, I can search both). If I’m doing something stupid, please explain what!

]]>
By: Zachary Tong http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33455 Wed, 09 Dec 2015 13:32:34 +0000 http://www.flax.co.uk/?p=2826#comment-33455 Sorry, that’s not how Elasticsearch shards work.

`number_of_shards` controls the number of primary shards that makeup a single index. For example, your test uses four primary shards, which means the index can be split across four machines total.

`number_of_replicas` controls the number of replica sets that are maintained alongside the primary shards. So in your test, you have one replica enabled. Which means there will be four primary shards + one complete set of replicas (another four shards), totaling 8 shards and twice the data.

I’m not sure why you weren’t able to search, but it wasn’t due to shards. ES will happily search a non-replicated index 🙂

]]>
By: Shawn Heisey http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33402 Tue, 08 Dec 2015 15:38:43 +0000 http://www.flax.co.uk/?p=2826#comment-33402 The defaults on HttpClient limit the number of simultaneous connections per host to 2 and the maximum number of total connections (all hosts contacted by that httpclient instance) to 20.

When Solr creates the internal HttpClient for shard requests, it sets the max connections per host value to 20, and shares that client between all of the SolrClient objects that it uses. Because the index was sharded, depending on exactly how the query requests are made, 12 of them might end up waiting for one or more of the initial 20 requests to finish. There are configuration options on HttpShardHandlerFactory for changing the number of connections allowed.

]]>
By: Tom http://www.flax.co.uk/blog/2015/12/02/elasticsearch-vs-solr-performance-round-2/#comment-33398 Tue, 08 Dec 2015 14:31:58 +0000 http://www.flax.co.uk/?p=2826#comment-33398 I think that “number_of_replicas” is the same as Solr’s replicationFactor, i.e. 1 means that there is 1 index for each shard (and no copies). Certainly when I tried setting it to 0, there was nothing searchable (although ES did not complain).

I’ve not managed to reproduce the index size discrepancy on a single node configuration, at up to 10M docs. Therefore I think it’s quite likely I made some stupid mistake in measuring the ES index size in the 4 node configuration, though as I no longer have that data available I unfortunately can’t check that. The next step is to do the performance tests on single nodes.

]]>