Posts Tagged ‘durrants’

Search backwards – media monitoring with open source search

We’re working with a number of clients on media monitoring solutions, which are a special case of search application (we’ve worked on this previously for Durrants). In standard search, you apply a single query to a large amount of documents, expecting to get a ranked list of documents that match your query as a result. However in media monitoring you need to search each incoming document (for example, a news article or blog post) with many queries representing what the end user wants to monitor – and you need to do this quickly as you may have tens or hundreds of thousands of articles to monitor in close to real time (Durrants have over 60,000 client queries to apply to half a million articles a day). This ‘backwards’ search isn’t really what search engines were designed to do, so performance could potentially be very poor.

There are several ways around this problem: for example in most cases you don’t need to monitor every article for every client, as they will have told you they’re only interested in certain sources (for example, a car manufacturer might want to keep an eye on car magazines and the reviews in the back page of the Guardian Saturday magazine, but doesn’t care about the rest of the paper or fashion magazines). However, pre-filtering queries in this way can be complex especially when there are so many potential sources of data.

We’ve recently managed to develop a method for searching incoming articles using a brute-force approach based on Apache Lucene which in early tests is performing very well – around 70,000 queries applied to a single article in around a second on a standard MacBook. On suitable server hardware this would be even faster – and of course you have all the other features of Lucene potentially available, such as phrase queries, wildcards and highlighting. We’re looking forward to being able to develop some powerful – and economically scalable – media monitoring solutions based on this core.

Just the job for a recruitment client

We’re pleased to announce our work with Reed Specialist Recruitment, one of the UK’s largest recruitment companies, where we helped them implement an Apache Solr powered application to allow their 3000+ staff to search for and match candidates to jobs. We built an innovative indexing framework, a configuration tool and performance monitoring system for Reed and the system launched on time and under budget, a great testament to the flexibility and power of this open source software. The new system responds in under a second – a massive improvement on the previous response time of several minutes. You can read the press release here.

If you’d like to hear more I’ll be giving a presentation on the project at Lucene Eurocon in Barcelona tomorrow – Wednesday 19th October at 1.30 p.m. – slides and a video will be online after the event.

If you can’t make it to Barcelona I’ll also be talking in London, on the business benefits of open source search, at around 10am on Tuesday 25th October with our client Stephen Wicks, CTO of Gorkana Group as part of Enterprise Search Europe – there are still tickets available and you can even get a 20% discount if you join the Cambridge or London Enterprise Search Meetups, who are hosting a joint event on the Monday evening of the conference.

Tags: , , , , ,

Posted in News, events

October 18th, 2011

No Comments »

A busy Autumn – forthcoming events

The diary is filling up quickly already after the summer break (which turned out not to be much of a break at all, what with the HP/Autonomy news and everything). Here’s where you can hear us speak over the next few months:

Hope to meet some of you at these exciting events (do get in touch if you’d like to arrange something more formal). There’s certainly a lot to talk about!

Tags: , ,

Posted in events

September 6th, 2011

No Comments »

Whitepaper – Why you should be considering open source search

I’ve uploaded a whitepaper I wrote a short while ago :

“In these rapidly changing times we don’t know what we will need to search tomorrow – so it’s important to be adaptable, flexible and able to cope with data volumes that may not scale linearly. Maintaining control over the future of your search software is also key. Open source search has come of age and every modern business should be aware of its advantages.”

It’s available in our downloads area, together with several case studies on open source search projects we’ve carried out for clients.

The year open source search got serious

It’s been an interesting and busy twelve months here at Flax – we’ve worked on some fantastic customer projects, spoken at conferences at home and abroad and made some great alliances and partnerships. We are talking to more people than ever before about the advantages of open source search and we’ve even started a local Meetup group.

This has been the year when open source search moved out of the shadows and became a force to reckon with – whether handling billions of queries or millions of customers, powering innovative new APIs for open content from forward-looking media companies or simply making it easier for search applications to be developed. Commercial support is now available to rival anything offered by the closed source world and there are now fully packaged solutions built on open source. In some sectors open source may even become the default choice (see what IDC said about the embedded/OEM market).

There’s still significant change to come in the search sector – I expect a few vendors will be in trouble by this time next year as they realise their business models (often built on per-document charges) are out-of-date, and we might also see further acquisitions by the usual behemoths. All this leads to reduced choice and increased costs for customers, and this is where open source can help – you can build your search solution in-house, or engage companies like ours to help, but you’re no longer locked in to a vendor’s roadmap and shackled to their business plan (or the consequences of its failure!).

I’ll leave the final word to Matt Asay of Canonical, who says: “Open source is how we do business 10 years into this new millennium.”

Next-generation media monitoring with open source search

Media monitoring is not a traditional search application: for a start, instead of searching a large number of documents with a single query, a media monitoring application must search every incoming news story with potentially thousands of queries, searching for words and terms relevant to client requirements. This can be difficult to scale, especially when accuracy must be maintained – a client won’t be happy if their media monitors miss relevant stories or send them news that isn’t relevant.

We’ve been working with Durrants Ltd. of London for a while now on replacing their existing (closed source) search engine with a system built on open source. This project, which you can read more about in a detailed case study (PDF), has reduced the hardware requirements significantly and led to huge accuracy improvements (in some cases where 95% of the results passed through to human operators were irrelevant ‘false positives’, the new system is now 95% correct).

The new system is built on Xapian and Python and supports all the features of the previous engine, to ease migration – it even copes with errors introduced during automated scanning of printed news. The new system scales easily and cost effectively.

As far as we know this is one of the first large-scale media monitoring systems built on open source, and a great example of search as a platform, which we’ve discussed before.

Tags: , , , , , ,

Posted in News

December 13th, 2010

No Comments »