We’re happy to announce we’ve just finished a successful project for a division of the Australian Associated Press to replace a closed source search engine with a considerably more powerful open source solution. You can read the press release here.
As our client had a large investment in stored searches (which represent a client’s interests) which were defined in the query language of their previous search engine, we first had to build a modified version of Apache Lucene that replicated exactly this syntax. I’ve previously blogged about how we did this. However this wasn’t the only challenge: search engines are designed to be good at applying a few queries to a very large document collection, not necessarily at applying tens of thousands of stored queries to every single new document. For media monitoring applications this kind of performance is essential as there may be hundreds of thousands of news articles to monitor every day. The system we’ve built is capable of applying tens of thousands of stored queries every second.
With the rapid increase in the volume of content that media monitoring companies have to check for their clients – today’s news isn’t just in print, but online, in social media and indeed multimedia – it may be that open source software is the only way to build monitoring systems that are economically scalable, while remaining accurate and flexible enough to deliver the right results to clients.