When the Financial Times decided to bring their digital press cuttings in-house in summer 2010, they asked us to build a powerful ‘search server’ that they could easily integrate into their existing product offerings.
We built an indexer for the XML source data and a RESTful Web Service API, offering search features including Boolean operators, phrase searches, area specifiers (search whole article, body, headline, byline or any combination), date range restrictions, similarity search (“articles like this one”) and faceted search. Also available is spelling correction and synonyms, and detailed logging of indexing and all searches.
This might sound like a complex task, but using open source technology we created this system within less than a fortnight. Initially designed as a small-scale prototype, the system scaled easily to indexing hundreds of thousands of pages. You can use the service at