Elasticsearch Percolator & Luwak: a performance comparison of streamed search implementations

Most search applications work by indexing a relatively stable collection of documents and then allowing users to perform ad-hoc searches to retrieve relevant documents. However, in some cases it is useful to turn this model on its head, and match individual documents against a collection of saved queries. I shall refer to this model as "streamed search". One example of streamed search is in media monitoring. The monitoring agency's ...Continue reading

Free file filters, search & taxonomy tools from our old Googlecode repository

Google's GoogleCode service is closing down, in case you hadn't heard, and I've just started the process of moving everything over to our Github account. This prompted me to take a look at what's there and there's a surprising amount of open source code I'd forgotten about. So, here's a quick rundown of the useful tools, examples...Continue reading

Comparing Solr and Elasticsearch – here's the code we used

A couple of weeks ago we presented the initial results of a performance study between Apache Solr and Elasticsearch, carried out by my colleague Tom Mortimer. Over the last few years we've tested both engines for client projects and noticed some significant performance differences, which we thought deserved fuller investigation. ...Continue reading

Solr geolocation searches using WKT – latitude or longitude first?

Matt Pearce writes: We have been working with a client who needs to search for documents based on location, either using a single point or (sometimes very) complex polygons. They supplied the location data in WKT format which we assumed we could feed directly into our search engine (in this case Solr) without any modifications being necessary. Then we started testing the location se...Continue reading

Why GCloud search is badly broken & how to fix it

The GCloud initiative and the associated CloudStore are a great idea - hoping to level the field of UK government IT supply, take advantage of flexible and agile delivery of software and services and help SMEs like ourselves compete against the large System Integrators (SIs) that dominate this market. GCloud sales have now reached £154m although this is still a fraction of what the UK government spends on IT. We're on GCloud 5 Continue reading

Searching for IP addresses in text with Elasticsearch

We recently implemented a search solution for a customer using Elasticsearch. Most of their requirements were fairly standard, however they also wanted to be able to search for IP addresses embedded in the document text, using a flexible and precise search syntax, e.g. given the following document fragment:

    ... the API can be accessed at on port 8700 ...

the following searches should all find the document:

...Continue reading

How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP

Matt Pearce writes: We recently released UKMP, a search application built on work done on last year's Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations and organisations) ex...Continue reading