cassandra – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Apache Kafka London Meetup – Real time search and insights http://www.flax.co.uk/blog/2016/04/14/apache-kafka-london-meetup-real-time-search-insights/ http://www.flax.co.uk/blog/2016/04/14/apache-kafka-london-meetup-real-time-search-insights/#respond Thu, 14 Apr 2016 09:50:05 +0000 http://www.flax.co.uk/?p=3202 The rise of Apache Kafka as a streaming data solution is something we’ve been watching for a while – as part of a collection of Big Data tools, it provides a ‘TiVo for data‘ feature. We’ve begun to use it … More

The post Apache Kafka London Meetup – Real time search and insights appeared first on Flax.

]]>
The rise of Apache Kafka as a streaming data solution is something we’ve been watching for a while – as part of a collection of Big Data tools, it provides a ‘TiVo for data‘ feature. We’ve begun to use it in client projects covering both search and log analysis and we’ve recently partnered with Confluent, founded by the creators of Kafka.

Last night we spoke at the Apache Kafka London Meetup – hosted by British Gas Connected Homes, it was well supplied with drinks, pizza and snacks and also very well attended – there was a great buzz of conversation before the talks had even started! Alan Woodward of Flax started with an updated talk about our proof-of-concept integration of Kafka, Apache Samza and our own Luwak streaming search library (slides are available here). This allows full-text search within a Kafka stream, with the search queries supplied as another stream, for a truly real-time solution – as opposed to the more usual (and much higher latency) approach of indexing the endpoint of a stream. Alan has also tried the very new Kafka Streams feature which can be used as an alternative to Apache Samza – there is some very early code available, although note that this still needs some work! (We’ll update this blog when it’s finished).

The second talk was by one of our hosts, Josep Casals, on how British Gas have used Kafka, Spark Streaming and Apache Cassandra to build a platform for analyzing data from smart meters, boilers and thermostats. Over 2 million smart meters are installed across the UK and there are also over 300,000 connected thermostats, plus many other data sources, and these devices can report every 30 minutes and 2 minutes respectively, so their system has to cope with around 30,000 messages/second. One interesting feature for me was how machine learning is used to disaggregrate power consumption data, so the consumption for say, a fridge can be split out from the overall figure. Apache Samza is also used in this system to provide estimates of consumption and interpolate between readings, allowing data to be fed back to an app on the customer’s mobile device. Further use cases include spotting outlier events, which might indicate failing heating devices or even unusual patterns in an elderly person’s home to alert relatives or carers.

Both talks were live streamed and you can watch them here.

We concluded with some informal discussion and a chance to meet some of Confluent’s UK-based team. Thanks to the organisers and hosts and we look forward to returning! If you have a Kafka project and you’d like any help or advice, do let us know.

The post Apache Kafka London Meetup – Real time search and insights appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2016/04/14/apache-kafka-london-meetup-real-time-search-insights/feed/ 0
Elastic London User Group Meetup – scaling with Kafka and Cassandra http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/ http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/#respond Thu, 26 Mar 2015 10:41:02 +0000 http://www.flax.co.uk/blog/?p=1421 The Elastic London User Group Meetup this week was slightly unusual in that the talks focussed not so much on Elasticsearch but rather on how to scale the systems around it using other technologies. First up was Paul Stack with … More

The post Elastic London User Group Meetup – scaling with Kafka and Cassandra appeared first on Flax.

]]>
The Elastic London User Group Meetup this week was slightly unusual in that the talks focussed not so much on Elasticsearch but rather on how to scale the systems around it using other technologies. First up was Paul Stack with an amusing description of how he had worked on scaling the logging infrastructure for a major restaurant booking website, to cope with hundreds of millions of messages a day across up to 6 datacentres. Moving from an original architecture based on SQL and ASP.NET, they started by using Redis as a queue and Logstash to feed the logs to Elasticsearch. Further instances of Logstash were added to glue other parts of the system together but Redis proved unable to handle this volume of data reliably and a new architecture was developed based on Apache Kafka, a highly scalable message passing platform originally built at LinkedIn. Kafka proved very good at retaining data even under fault conditions. He continued with a description of how the Kafka architecture was further modified (not entirely successfully) and how monitoring systems based on Nagios and Graphite were developed for both the Kafka and Elasticsearch nodes (with the infamous split brain problem being one condition to be watched for). Although the project had its problems, the system did manage to cope with 840 million messages one Valentine’s day, which is impressive. Paul concluded that although scaling to this level is undeniably hard, Kafka was a good technology choice. Some of his software is available as open source.

Next, Jamie Turner of PostcodeAnywhere described in general terms how they had used Apache Cassandra and Apache Spark to build a scalable architecture for logging interactions with their service, so they could learn about and improve customer experiences. They explored many different options for their database, including MySQL and MongoDB (regarding Mongo, Jamie raised a laugh with ‘bless them, they do try’) before settling on Cassandra which does seem to be a popular choice for a rock-solid distributed database. As PostcodeAnywhere are a Windows house, the availability and performance of .Net compatible clients was key and luckily they have had a good experience with the NEST client for Elasticsearch. Although light on technical detail, Jamie did mention how they use Markov chains to model customer experiences.

After a short break for snacks and beer we returned for a Q&A with Elastic team members: one interesting announcement was that there will be a Elastic(on) in Europe some time this year (if anyone from the Elastic team is reading this please try and avoid a clash with Enterprise Search Europe on October 20th/21st!). Thanks as ever to Yann Cluchey for organising the event and to open source recruiters eSynergySolutions for sponsoring the venue and refreshments.

The post Elastic London User Group Meetup – scaling with Kafka and Cassandra appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/03/26/elastic-london-user-group-meetup-scaling-with-kafka-and-cassandra/feed/ 0
Cambridge Search Meetup – Cassandra & Solr http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/ http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/#respond Thu, 15 May 2014 07:32:23 +0000 http://www.flax.co.uk/blog/?p=1202 A sunny evening last night for the latest Cambridge Search Meetup, which featured a couple of talks from Datastax on the highly scalable NoSQL database Apache Cassandra and how it is integrated with Apache Lucene/Solr. Jeremy Hanna started us off … More

The post Cambridge Search Meetup – Cassandra & Solr appeared first on Flax.

]]>
A sunny evening last night for the latest Cambridge Search Meetup, which featured a couple of talks from Datastax on the highly scalable NoSQL database Apache Cassandra and how it is integrated with Apache Lucene/Solr. Jeremy Hanna started us off with a brief history of the Facebook-incubated Cassandra, which is a fully distributed, highly reliable system used by many including Netflix and Spotify with some customers running thousands of nodes in multiple data centres. Cassandra has its own SQL-like language, CQL3 and some basic collections such as Lists and Maps, but due to its fully distributed nature does lack some traditional features such as JOINs. Datastax themselves are now responsible for most of the ongoing work on Cassandra and offer the usual array of training, support, management services and tools. One common application mentioned was high speed and reliable recording of sensor data, increasingly important now with the rise of the Internet of Things.

After a short break for drinks and snacks (which this time were kindly sponsored by Datastax) Sergio Bossa told us how Solr is integrated with Cassandra, also running in a distributed fashion. Interestingly, this integration doesn’t use the same Zookeeper system as SolrCloud (the standard way to run clusters of Solr servers) but relies instead on Cassandra’s own internal scaling systems, passing data about using ‘gossip‘ between nodes. Zookeeper is not always the easiest thing to get running so an alternative is very interesting! Data can be added to the system over HTTP or the aforementioned CQL3 and after being entered into Cassandra’s tables is subsequently indexed by Solr. Queries can then be made over HTTP as usual. Some work is still necessary to prevent duplication of effort (at present one needs to create data structures in Cassandra and subsequently in Solr).

It was pleasing so see that so much care has been taken with this integration process and also that Datastax offer their Datastax Enterprise Search stack not only free for non-production use, but free to startups. Thanks to Jeremy, Sergio and all who came along and we’ll be back with another Search Meetup soon.

The post Cambridge Search Meetup – Cassandra & Solr appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/05/15/cambridge-search-meetup-cassandra-solr/feed/ 0