A fascinating event last night as the Guardian team told us more about how they’ve used open source search technology to build their new open platform. The presentations were brief and to-the-point, and covered how the team have created a detailed, rich API to their news content, all built on the open source engine Apache Solr – opening up Guardian Media Group content to the world for mashups, repurposing and innovative new business models.
The Guardian have an existing Oracle database with J2EE web applications to serve content, but discovered that certain operations such as returning content with multiple tags, or dynamically generated ‘related’ content, were very database-intensive and difficult to scale. The use of Solr effectively flattens the cost of these complex queries, and also allows them to scale up capacity on demand by simply spinning up more Solr instances on the Amazon EC2 cloud . Interestingly, site search for the Guardian website doesn’t yet use Solr, although they hope to move this across soon.
What we’re seeing here is a change in how search technology is used especially by forward-looking organisations – from being a bolt-on to an existing website or application, search is now the platform for new developments. I’ll be talking about other ways open source search has been used for news content at the British Computer Society this coming Thursday 21st October – I believe there are still a few places available.