London Elasticsearch User Group – September Meetup

Last night I joined a good-sized crowd at a venue on Hoxton Square for some talks on Elasticsearch – this Meetup group is very popular and always attracts a good proportion of people new to the world of search, as well as some familiar faces. I started with a quick announcement of our own Elasticsearch hackday in a few weeks time.

First of the speakers was Richard Pijnenburg with a surprisingly brief talk on Puppet and Elasticsearch – brief, because integrating the two is apparently very simple, requiring only a few lines of Puppet code. Some questions from the floor sparked a discussion of combining Puppet and Vagrant for setting up Elasticsearch instances: apparently very soon we’ll see a complete demo instance of Elasticsearch built using these technologies and including some example data, which will be very useful for those wanting to get started with the engine (here’s some more on this combination).

Next was Amit Talhan, ably assisted by Geza Kerekes, both from AlignAlytics who have been using Elasticsearch both as a data store, reporting store and more recently for analysing data from a survey of all the retail outlets in Nigeria. Generating a wealth of data across up to 1000 fields, including geolocation data harvested every five seconds, this survey could have been difficult if not impossible to handle using a traditional SQL database, but many of their colleagues were very used to SQL syntax and methods for analyzing data. Amit and Geza explained how they have used Elasticsearch and in particular aggregations to provide functionality such as checking for bad reporting by surveyors and unexpectedly high density areas (such as markets, where there may be 200 retail outlets in a few square metres). One challenge seems to have been how to explain to colleagues from the data analysis community that Elasticsearch can provide some, but not all of the functionality of a traditional database, but that alternative ways of indexing and querying data can be used to solve the same problems. Interestingly, performance testing by AlignAlytics proved that BigStep, a provider of ‘bare metal’ cloud hosting, could provide much better performance than their own dedicated servers.

Next was Mark Harwood with another of his fascinating investigations into how Elasticsearch can be used for analysis of user behaviour, showing how after a bad personal experience buying a new battery that turned out to be second-hand, he identified Amazon.com vendors with suspiciously positive reviews. He also discussed how behaviour-based term suggesters might be built using Elasticsearch’s significant_terms aggregration. His demonstration did remind me slightly of Xapian’s relevance feedback feature. I heard several people later say that they wished they had time for some of the fun projects Mark seems to work on!

The event finished with some lively discussion and some free pizza courtesy of Elasticsearch (the company). Thanks to Yann Cluchey as ever for organising the event and I look forward to seeing a few of the attendees in Cambridge soon – we’re only an hour or so by train from Cambridge plus a ten minute walk to the venue, so it should be an easy trip!

2 thoughts on “London Elasticsearch User Group – September Meetup

  1. I have a problem with understanding of ElasticSearch.

    Here’s my example:

    for query:

    curl -XPOST ‘http://localhost:9200/_search’ -d ‘
    {
    “query”: {
    “term”: {
    “main_place”: “katowice”
    }
    }
    }’

    i have 0 results, but i have documents with main_place = katowice
    i have a clue:

    curl -XPOST ‘http://localhost:9200/_search’ -d ‘
    {
    “query”: {
    “term”: {
    “main_place”: “katowic”
    }
    }
    }’

    it gives me results with main_place = katowice and one document with main_place = katowic
    i see that problem is with katowic is prefix of katowice

    there is no problem with query:

    curl -XPOST ‘http://localhost:9200/_search’ -d ‘
    {
    “query”: {
    “term”: {
    “main_place”: “warszawa”
    }
    }
    }’

    i get good results, but there is no “warszaw” in main_places.

    I’m waiting for solutions, i know that “text” instead for “term” works, but not with “custom_filter_score” filters.

  2. I’m not sure I understand the problem correctly, but are you indexing with any stemming for text fields? (e.g. have you set up any custom mappings?) “term” queries bypass analysis, so may not match stemmed fields the way you expect.

Leave a Reply

Your email address will not be published. Required fields are marked *