We’ve known about Apache Kafka for several years now – we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts – everything it deals with is a stream of data (like a messaging system), streams can be combined for processing and stored reliably in a highly fault-tolerant way. It’s also massively scalable.
For search applications, Kafka is a great choice for the ‘wiring’ between source data (databases, crawlers, flat files, feeds) and the search index and other parts of the system. We’ve used other message passing systems (like RabbitMQ) in projects before, but none have the simplicity and power of Kafka. Combine the search index with analysis and visualisation tools such as Kibana and you can build scalable, real-time systems for ingesting, storing, searching and analysing huge volumes of data – for example, we’ve already done this for clients in the financial sector wanting to monitor log data using open-source technology, rather than commercial tools such as Splunk.
The development of Kafka has been masterminded by our partners Confluent, and it’s a testament to this careful management that the milestone 1.0 version has only just appeared. This doesn’t mean that previous versions weren’t production ready – far from it – but it’s a sign that Kafka has now matured to be a truly enterprise-scale project. Congratulations to all the Kafka team for this great achievement.
We look forward to working more with this great software – and if you need help with your Kafka project do get in touch!