Better performance with the Logstash DNS filter

We’ve been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.)

While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ "Source_IP" ] 
  action => "replace" 
  hit_cache_size => 8000 
  hit_cache_ttl => 300 
  failed_cache_size => 1000 
  failed_cache_ttl => 10

the maximum throughput was only 600 messages/s, as opposed to 1000 messages/s with no caching (4000/s with no DNS lookup at all).

This was very odd, so I looked at the source code. Here is the DNS lookup when a cache is configured:

address = @hitcache.getset(raw) { retriable_getaddress(raw) }

This executes retriable_getaddress(raw) inside the getset() cache method, which is synchronised. Therefore, concurrent DNS lookups are impossible when a cache is used.

To see if this was the problem, I created a fork of the dns filter which does not synchronise the retriable_getaddress() call.

 address = @hit_cache[raw]
 if address.nil?
   address = retriable_getaddress(raw)
   unless address.nil?
     @hit_cache[raw] = address

Tests on the same data revealed a throughput of nearly 2000 messages/s with four worker threads (and 2600 with eight threads), which is a significant improvement.

This filter has the disadvantage that it might redundantly look up the same address multiple times, if the same domain name/IP address turns up in several worker threads simultaneously (but the risk of this is probably pretty low, depending on the input data, and in any case it’s harmless.)

I have released a gem of the plugin if you want to try it. Comments appreciated.

2 thoughts on “Better performance with the Logstash DNS filter

Leave a Reply

Your email address will not be published. Required fields are marked *