During a recent client visit we encountered a common problem in search - over-application of 'boosts', which can be used to weight the influence of matches in one particular field. For example, you might sensibly use this to make results that match a query on their title field come higher in search results. However in this case we saw huge boost values used (numbers in the hundreds) which were probably swamping everything else - and it wasn't at all clear where the values had come from, be it ex...Continue reading
Over the years we've dealt with quite a few migration projects where the query syntax of the client's existing search engine must be preserved. This might be because other systems (or users) depend on it, or a large number of stored expressions exist and it is difficult or uneconomic to translate them all by hand. Our usual approach is to write a query parser, which understands the current syntax but creates a Continue reading
As any regular reader of this blog will be aware, we use almost exclusively open source software on customer projects. To meet their requirements, we often have to extend the functionality of the software (e.g. XJOIN in Solr). As far as possible, with the agreement of the customer, we like to then contribute these changes b...Continue reading
Last night I dropped in on the Unified Log Meetup at JustEat's offices (of course, they provided lots of pizza for us all!). I've written about this Meetup before - as a rule the events cover logging and analytics at massive scale, with search being only part of the picture. Joseph Francis from Continue reading
In this blog post I want to introduce you to a new Apache Solr plugin component called XJoin. I'll show how we can use this to solve a common problem in e-commerce - how to use price discount data, provided by an external web API, to either filter the results of a product search or boost scores. A further post will show another example, using click-through data to influence the score of subsequent searches.
What is XJoin?...Continue reading
I had been planning not to continue with these posts, but after Matt Weber pointed out the github pull requests (which to my embarrassment I'd not even noticed) he'd made to address some methodological flaws, another attempt was the least I could do. For Solr there was a slight reduction in mean search time, from 39ms (for my original, suboptimal query structure) to 34ms and median search time from 27ms to 25ms - see figure 1. Elasticsearch, on the ...Continue reading