The latest version of Luwak, our open-source streaming query engine, has been released on the Sonatype Nexus repository and will be making its way to Maven Central in the next few hours. Here’s a summary of the new features and improvements we’ve made:
Batch processing
Inspired by a question raised during our talk at FOSDEM last February, you can now stream documents through the Luwak Monitor in batches, as well as one-at-a-time. This will generally improve your throughput, at the cost of a drop in latency. For example, local benchmarking against a set of 10,000 queries showed an improvement from 10 documents/second to 30 documents/second when the batch size was increased from 1 document to 30 documents; however, processing latency went from ~100ms for the single document to 10 seconds for the larger batch. You’ll need to experiment with batch sizes to find the right balance for your own use.
Presearcher performance improvements
Luwak speeds up document matching by filtering out queries that we can detect won’t match a given document or batch, a process we call presearching. Profiling revealed that creating the presearcher query was a serious performance bottleneck, particularly for presearchers using the WildcardNGramPresearcherComponent
, so this has been largely rewritten in 1.3.0. We’ve seen improvements of up to 400% in query build times after this rewrite.
Concurrent query loading
Luwak now ships with a ConcurrentQueryLoader
helper class to help speed up Monitor
startup. The loader uses multiple threads to add queries to the index, allowing you to make use of all your CPUs when parsing and analyzing queries. Note that this requires your MonitorQueryParser
implementations to be thread-safe!
Easier configuration and state monitoring
In 1.2.0 and earlier, clients had to extend the Monitor itself in order to configure the internal query caches or get state update information. Configuration has now been extracted into a QueryIndexConfiguration
class, passed to the Monitor at construction, and you can get notified about updates to the query index by registering QueryIndexUpdateListeners
.
For more information, see the CHANGES for 1.3.0. We’ll also be re-running the comparison with Elasticsearch Percolator soon, as this has also been improved as part of Elasticsearch’s recent 2.0 release.