It’s not just about technology – training for search managers is vital

A few weeks ago I sat in on a workshop in London at the Taxonomy Boot Camp conference, run by Jeff Fried of BA Insight. I've known Jeff for many years from various events and we share some views on how search systems should be built and managed - using best-of-breed technology and effective management processes. He was kind enough to ask me to join a recent podcast. During the podcast, we had a great conversation about open source search, enterprise...Continue reading

Worth the wait – Apache Kafka hits 1.0 release

We've known about Apache Kafka for several years now - we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts - everything it deals with is a stream ...Continue reading

Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch

I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack bas...Continue reading

How to build a search relevance team

We've spent a lot of time working with clients who recognise that their search engine isn't delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration - but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements. It's thus important to consider relevance as...Continue reading

Better performance with the Logstash DNS filter

We've been working on a project for a customer which uses Logstash to read messages from Kafka and write them to Elasticsearch. It also parses the messages into fields, and depending on the content type does DNS lookups (both forward and reverse.) While performance testing I noticed that adding caching to the Logstash DNS filter actually reduced performance, contrary to expectations. With four filter worker threads, and the following configuration:

dns { 
  resolve => [ ...Continue reading

Elasticsearch, Kibana and duplicate keys in JSON

JSON has been the lingua franca of data exchange for many years. It's human-readable, lightweight and widely supported. However, the JSON spec does not define what parsers should do when they encounter a duplicate key in an object, e.g.:

{
  "foo": "spam",
  "foo": "eggs",
  ...
}
Implementations are free to interpret this how they like. When different systems have different interpretations this can cause problems. We recently encounter...Continue reading

Announcing our new book, Searching the Enterprise

For the last year or so I've been working with Professor Udo Kruschwitz of the University of Essex on a long-form journal article on enterprise search - although at 156 pages this is more of a book than a journal. Released as part of the Foundations and Trends® in Information Retrieval series by Now Publishing, the b...Continue reading

ECIR 2017 Industry Day, our book & a demo of live TV factchecking

I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen's Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we're co-authoring (draft title 'Searching the Enterprise', designed as a review of t...Continue reading