Comparing Solr and Elasticsearch – here's the code we used

A couple of weeks ago we presented the initial results of a performance study between Apache Solr and Elasticsearch, carried out by my colleague Tom Mortimer. Over the last few years we've tested both engines for client projects and noticed some significant performance differences, which we thought deserved fuller investigation. ...Continue reading

Solr geolocation searches using WKT – latitude or longitude first?

Matt Pearce writes: We have been working with a client who needs to search for documents based on location, either using a single point or (sometimes very) complex polygons. They supplied the location data in WKT format which we assumed we could feed directly into our search engine (in this case Solr) without any modifications being necessary. Then we started testing the location se...Continue reading

Why GCloud search is badly broken & how to fix it

The GCloud initiative and the associated CloudStore are a great idea - hoping to level the field of UK government IT supply, take advantage of flexible and agile delivery of software and services and help SMEs like ourselves compete against the large System Integrators (SIs) that dominate this market. GCloud sales have now reached £154m although this is still a fraction of what the UK government spends on IT. We're on GCloud 5 Continue reading

Searching for IP addresses in text with Elasticsearch

We recently implemented a search solution for a customer using Elasticsearch. Most of their requirements were fairly standard, however they also wanted to be able to search for IP addresses embedded in the document text, using a flexible and precise search syntax, e.g. given the following document fragment:

    ... the API can be accessed at 167.87.3.201 on port 8700 ...

the following searches should all find the document:

...Continue reading

How we built a search engine for UK MP tweets with Solr, Python & StanfordNLP

Matt Pearce writes: We recently released UKMP, a search application built on work done on last year's Enterprise Search hack day. This presents the tweets of UK Members of Parliament with search options including filtering by party, retweet and favourite count, and entities (people, locations a...Continue reading

Principles of Solr application design – part 2 of 2

We’ve been working internally on a document encapsulating how we build (and recommend others should build) search applications based on Apache Solr, probably the most popular open source search engine library. As an early Christmas present we’re releasing these as a two part series – if you have any feedback we’d welcome comments! Here's the second part, you can also read the Continue reading

Principles of Solr application design – part 1 of 2

We've been working internally on a document encapsulating how we build (and recommend others should build) search applications based on Apache Solr, probably the most popular open source search engine library. As an early Christmas present we're releasing these as a two part series - if you have any feedback we'd welcome comments! So without further ado here's the first part: 1. Use the latest release of Solr Unless there are compelling rea...Continue reading

Introducing Luwak, a library for high-performance stored queries

A few weeks ago we spoke in Dublin at Lucene Revolution 2013 on our work in the media monitoring sector for various clients including Gorkana and Australian Associated Press. These organisations handle a huge number (sometimes hundreds of thousands) of news articles every day ...Continue reading

G-Cloud and open file formats, a cautionary tale

We're lucky enough to have our services available on the G-Cloud, a new initiative by the UK Government's Cabinet Office with the aim of breaking the sometimes monopolistic practices of 'big IT' when supplying government clients. We've recently had a couple of contracts procured via the G-Cloud iii framework and one of the requirements is to report whenever a client is invoiced. This is done via a website called Management Information Systems Onli...Continue reading