We've been working on a number of projects recently involving open source software often quoted as 'Big Data' solutions - here's a quick overview of them. The grandfather of them all of course is Apache Hadoop, now not so much a single project as an ecosystem including storage and processing for potentially huge amounts of data, spread across clusters of machines. Interestingly Hadoop was originally created by Doug Cutting, who also wrot...Continue reading
In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.
What is XJoin?...Continue reading
I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading
After a long day at a Real Time Analytics event (of which more later) I dropped into the Elasticsearch London User Group, hosted by Red Badger and provided with a ridiculously huge amount of pizza (I have a theory that you'll be able to spot an Elasticsearch developer in a few years by the size of their pizza-filled belly). ...Continue reading
Matt Pearce writes: We have been working with a client who needs to search for documents based on location, either using a single point or (sometimes very) complex polygons. They supplied the location data in WKT format which we assumed we could feed directly into our search engine (in this case Solr) without any modifications being necessary. Then we started testing the location se...Continue reading
Staying over in London due to the aforementioned tube strike proved to be a good idea and a large fried breakfast an even better one, so I arrived at the second day of the conference right on time and ready for the second day's keynote by Jeff Fried of BA Insight and Professor Elaine Toms from Sheffield University, who hadn't met before the event but spoke in turn on the Future ...Continue reading