Comments on: Distributed search and partition functions http://www.flax.co.uk/blog/2009/04/25/distributed-search-and-partition-functions/ The Open Source Search Specialists Tue, 12 Feb 2019 14:44:32 +0000 hourly 1 https://wordpress.org/?v=4.9.8 By: Tom http://www.flax.co.uk/blog/2009/04/25/distributed-search-and-partition-functions/#comment-18558 Fri, 03 Dec 2010 14:14:02 +0000 http://www.flax.co.uk/blog/?p=111#comment-18558 “the search will run on a databases of 1/3rd size of original, so it will be faster.”

That’s mostly incorrect. The search time is proportional not to the size of the database, but to the number of blocks read. In this case, the number of block reads will be the same. So the search time will be just as long. Think of it this way: reading the first 1000 bytes of a 10GB flat file is much faster than reading the whole file.

Regarding the relevance issue, in Xapian, full statistics are exchanged by the remote protocol so the ranking will be exactly the same as for as for a single database. This is not the case in SOLR, and so you have to be careful that each shard is “balanced” in terms of similar statistics, otherwise the final ranking will be skewed.

– Tom

]]>
By: James http://www.flax.co.uk/blog/2009/04/25/distributed-search-and-partition-functions/#comment-18557 Fri, 03 Dec 2010 14:02:43 +0000 http://www.flax.co.uk/blog/?p=111#comment-18557 I would disagree with “not provide any performance gain at all” for Figure 2 model – the search will run on a databases of 1/3rd size of original, so it will be faster.

What I do not understand in both (figure 2 and figure 3) models is when I want say first 10 results and I am getting 10 results from each server – how can I compare the relevance of results returned from each server and sort them by relevance?

]]>