After a long day at a Real Time Analytics event (of which more later) I dropped into the Elasticsearch London User Group, hosted by Red Badger and provided with a ridiculously huge amount of pizza (I have a theory that you’ll be able to spot an Elasticsearch developer in a few years by the size of their pizza-filled belly).
First up was Reuben Sutton of Artirix, describing how his team had moved away from the Elasticsearch Ruby libraries (which can be very slow, mainly due to the time taken to decode/encode data as JSON) towards the relatively new Mustache templating framework. This has allowed them to remove anything complex to do with search from their UI code, although they have had some trouble with Mustache’s support for partial templates. They found documentation was somewhat lacking, but they have contributed some improvements to this.
Next was David Laing of CityIndex describing Logsearch, a powerful way to spin up clusters of ELK (Elasticsearch+Logstash+Kibana) servers for log analysis. Based on the BOSH toolchain and open sourced, this allows CityIndex to create clusters in minutes for handling large amounts of data (they are currently processing 50GB of logs every day). David showed how the system is resilient to server failure and will automatically ‘resurrect’ failed nodes, and interestingly how this enables them to use Amazon spot pricing at around a tenth of the cost of the more stable AWS offerings. I asked how this powerful system might be used in the general case of Elasticsearch cluster management but David said it is targetted at log processing – but of course according to some everything will soon be a log anyway!
The last talk was by Alex Mitchell and Francois Bouet of Growth Intelligence who provide lead generation services. They explained how they have used Elasticsearch at several points in their data flow – as a data store for the web pages they crawl (storing these in both raw and processed form using multi-fields), for feature generation using the term vector API and to encode simple business rules for particular clients – as well as to power the search features of their website, of course.
A short Q&A with some of the Elasticsearch team followed: we heard that the new Shield security plugin has had some third-party testing (the details of which I suggested are published if possible) and a preview of what might appear in the 2.0 release – further improvements to the aggregrations features including derivatives and anomaly detection sound very useful. A swift drink and natter about the world of search with Mark Harwood and it was time to get the train home. Thanks to all the speakers and of course Yann for organising as ever – see you next time!