Worth the wait – Apache Kafka hits 1.0 release

We've known about Apache Kafka for several years now - we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts - everything it deals with is a stream ...Continue reading

Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch

I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack bas...Continue reading

How to build a search relevance team

We've spent a lot of time working with clients who recognise that their search engine isn't delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration - but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements. It's thus important to consider relevance as...Continue reading

Better performance with the Logstash DNS filter

We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ ...Continue reading

Elasticsearch, Kibana and duplicate keys in JSON

JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:

{
  "foo": "spam",
  "foo": "eggs",
  ...
}
Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading

Announcing our new book, Searching the Enterprise

For the last year or so I've been working with Professor Udo Kruschwitz of the University of Essex on a long-form journal article on enterprise search - although at 156 pages this is more of a book than a journal. Released as part of the Foundations and Trends® in Information Retrieval series by Now Publishing, the b...Continue reading

ECIR 2017 Industry Day, our book & a demo of live TV factchecking

I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen's Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we're co-authoring (draft title 'Searching the Enterprise', designed as a review of t...Continue reading

London Lucene/Solr Meetup – Introducing Marple & Solr Classification

A small crowd for this month's London Lucene/Solr Meetup, kindly hosted by Barclays in their sumptuous Canary Wharf offices. I introduced the Meetup and spoke briefly on how Flax is currently looking for team members (want to work on a variety of cutting-edge open source search projects in the UK and abroad? Get in touch!) before introducing Flax's Alan Woodwar...Continue reading

Recipe for a strategic search review

We're sometimes asked by clients to examine not just their technical implementation of search, but also the wider picture: how search functionality is exposed to users, how it compares to competitors' websites and best practice. This process usually takes us ten days to two weeks and results in a highly detailed report with clear recommendations for improvement. This process is slightly different each time, but usually includes the following steps, shown with some examples of questions we mig...Continue reading