The rise of Apache Kafka as a streaming data solution is something we’ve been watching for a while – as part of a collection of Big Data tools, it provides a ‘TiVo for data‘ feature. We’ve begun to use it in client projects covering both search and log analysis and we’ve recently partnered with Confluent, founded by the creators of Kafka.
Last night we spoke at the Apache Kafka London Meetup – hosted by British Gas Connected Homes, it was well supplied with drinks, pizza and snacks and also very well attended – there was a great buzz of conversation before the talks had even started! Alan Woodward of Flax started with an updated talk about our proof-of-concept integration of Kafka, Apache Samza and our own Luwak streaming search library (slides are available here). This allows full-text search within a Kafka stream, with the search queries supplied as another stream, for a truly real-time solution – as opposed to the more usual (and much higher latency) approach of indexing the endpoint of a stream. Alan has also tried the very new Kafka Streams feature which can be used as an alternative to Apache Samza – there is some very early code available, although note that this still needs some work! (We’ll update this blog when it’s finished).
The second talk was by one of our hosts, Josep Casals, on how British Gas have used Kafka, Spark Streaming and Apache Cassandra to build a platform for analyzing data from smart meters, boilers and thermostats. Over 2 million smart meters are installed across the UK and there are also over 300,000 connected thermostats, plus many other data sources, and these devices can report every 30 minutes and 2 minutes respectively, so their system has to cope with around 30,000 messages/second. One interesting feature for me was how machine learning is used to disaggregrate power consumption data, so the consumption for say, a fridge can be split out from the overall figure. Apache Samza is also used in this system to provide estimates of consumption and interpolate between readings, allowing data to be fed back to an app on the customer’s mobile device. Further use cases include spotting outlier events, which might indicate failing heating devices or even unusual patterns in an elderly person’s home to alert relatives or carers.
Both talks were live streamed and you can watch them here.
We concluded with some informal discussion and a chance to meet some of Confluent’s UK-based team. Thanks to the organisers and hosts and we look forward to returning! If you have a Kafka project and you’d like any help or advice, do let us know.