quepid – Flax

Defining relevance engineering part 4: tools

Charlie Hull — Thu, 15 Nov 2018 14:30:51 +0000

Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

In my previous installment of this guide I promised to write next about how to deliver the results of a relevance assessment, but I’ve since decided that this blog should instead cover the tools a relevance engineer can use to measure and tune search performance. Of course, some of these might be used to show results to a client as well, so it’s not an entirely different direction!

It’s also important to note that this is a rapidly evolving field and therefore cannot be a definitive list – and I welcome comments with further suggestions.

1. Gathering judgements

There are various ways to measure relevance, and one is to gather judgement data – either explicit (literally asking users to manually rate how relevant a result is) and implicit (using click data as a proxy, assuming that clicking on a result means it is relevant – which isn’t always true, unfortunately). One can build a user interface that lets users rate results (e.g. from Agnes Van Belle’s talk at Haystack Europe, see page 7) which may be available to everyone or just a select group, or one can use a specialised tool like Quepid that provides an alternative UI on top of your search engine. Even Excel or another spreadsheet can be used to record judgements (although this can become unwieldly at scale). For implicit ratings, there are Javascript libraries such as SearchHub’s search-collector or more complete analytics platforms such as Snowplow which will let you record the events happening on your search pages.

2. Understanding the query landscape

To find out what users are actually searching for and how successful their search journeys are, you will need to look at the log files of the search engine and the hosting platform it runs within. Open source engines such as Solr can provide detailed logs of every query, which will need to be processed into an overall picture. Google Analytics will tell you which Google queries brought users to your site. Some sophisticated analytics & query dashboards are also available – Luigi’s Box is a particularly powerful example for site search. Even a spreadsheets can be useful to graph the distribution of queries by volume, so you can see both the popular queries and those rare queries in the ‘long tail’. On Elasticsearch it’s even possible to submit this log data back into a search index and to display it using a Kibana visualisation.

3. Measurement and metrics

Once you have your data it’s usually necessary to calculate some metrics – overall measurements of how ‘good’ or ‘bad’ relevance is. There’s a long list of metrics commonly used by the Information Retrieval community such as NCDG which show the usefulness, or gain of a search result based on its position in a list. Tools such as Rated Ranking Evaluator (RRE) can calculate these metrics from supplied judgement lists (RRE can also run a whole test environment, spinning up Solr or Elasticsearch, performing a list of queries and recording and displaying the results).

4. Tuning the engine

Next you’ll need a way to adjust the configuration of the engine and/or figure out just why particular results are appearing (or not). These tools are usually specific to the search engine being used: Quepid, for example works with Solr and Elasticsearch and allows you to change query parameters and observe the effect on relevance scores; with RRE you can control the whole configuration of the Solr or Elasticsearch engine that it can then spin up for you. Commercial search engines will have their own tools for adjusting configuration or you may have to work within an overall content management (e.g Drupal) or e-commerce system (e.g. Hybris). Some of these latter systems may only give you limited control of the search engine, but could also let you adjust how content is processed and ingested or how synonyms are generated.

For Solr, tools such as the Google Chrome extension Solr Query Debugger can be used and the Solr Admin UI itself allows full control of Solr’s configuration. Solr’s debug query shows hugely detailed information as to why a query returned a result, but tools such as Splainer and Solr Explain are useful to make sense of this.

For Elasticsearch, the Kopf plugin was a useful tool, but has now been replaced by Cerebro. Elastic, the commercial company behind Elasticsearch offer their own tool Marvel on a 30-day free trial, after which you’ll need an Elastic subscription to use it. Marvel is built on the open source Kibana which also includes various developer tools.

If you need to dig (much) deeper into the Lucene indexes underneath Solr and Elasticsearch, the Lucene Index Toolbox (Luke) is available, or Flax’s own Marple index inspector.

As I said at the beginning this is by no means a definitive list – what are your favourite relevance tuning tools? Let me know in the comments!

In the next post I’ll cover how a relevance engineer can develop more powerful and ‘intelligent’ ways to tune search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 4: tools appeared first on Flax.

How to build a search relevance team

Charlie Hull — Mon, 11 Sep 2017 11:08:48 +0000

We’ve spent a lot of time working with clients who recognise that their search engine isn’t delivering relevant results to users. Often this is seen as solely a technical problem, which can be resolved simply by changing query parameters or the search engine configuration – but technical teams need clear direction on why a result should or should not appear at a certain position, not just request for general relevance improvements.

It’s thus important to consider relevance as a business-wide issue, with multiple stakeholders providing input to the tuning process. We recommend the creation of a search relevance team – in a perfect world this should consist of dedicated staff, but even in the largest organisations this can be difficult to resource. It’s possible however to create a team to share the responsibility of improving relevance, contributing as they can.

The team should be drawn from the following business areas. Note that in some organisations some of these roles will be shared.

Content – the content team create and manage the source data for the search engine, are responsible for keeping this data clean and consistent with reliable metadata. They may process external data into a database or other repository as well as creating it from scratch. The best search engine in the world can’t give good results if the underlying data is unreliable, inconsistent or badly formatted.
Vendor – if the search engine is a commercial product, the vendor must provide sufficient documentation, training and support to the client to allow the engine to be tuned. If the engine is an open source project this information should be openly available and backed up by specialist consultancies who can provide training and technical support (such as Flax).
Development – the development team are responsible for integrating the search engine into the client’s systems, indexing the source data, maintaining the configuration, writing the search queries and adding new features. They will make any changes that will improve relevance.
Testing – the test team should create a process for test-driven relevance tuning, using tools such as Quepid to gather relevance judgements from the business. The test cases themselves can be built up from a combination of query logs, known important query terms (e.g. new products, common industry terms, SEO terms) and those queries deemed most valuable to the business.
Operations – this team is responsible for keeping the search engine running at best performance with appropriate server provision and monitoring, plus providing a failover capacity as required.
Sales & marketing, product owners – these teams should know why a particular result is more relevant than another to a customer or other user, by gathering online feedback, talking to users and knowing the current business goals. This team can thus help create the test cases discussed above.
Management – management support of the relevance tuning process is essential, to commit whatever resources are required to the technical implementation and test process and to lead the search relevance team.

The search relevance team should meet on a regular basis to discuss how to build test cases for important search queries, examine the current position in terms of search relevance and set out objectives for improving relevance. The metrics chosen to measure progress should be available to all of the team.

Search relevance tuning should be seen as a shared responsibility, rather than simply a technical issue or something that can be easily resolved by building or buying a new search engine (a new, un-tuned search engine is unlikely to be as good as the current one). A well structured and resourced search relevance team can make huge strides towards improving search across the business – reducing the time users take to find information and improving responsiveness. For businesses that trade online, relevant search results are simply essential for retaining customers and a high level of conversion.

Flax regularly visit clients to discuss how to build an effective search team – do get in touch if we can help your business in this way.

The post How to build a search relevance team appeared first on Flax.

London Lucene/Solr Meetup: Query Pre-processing & SQL with Solr

Charlie Hull — Fri, 02 Jun 2017 14:31:32 +0000

Bloomberg kindly hosted the London Lucene/Solr Meetup last night and we were lucky enough to have two excellent speakers for the thirty or so attendees. René Kriegler kicked off with a talk about the Querqy library he has developed to provide a pre-processing layer for Solr (and soon, Elasticsearch) queries. This library was originally developed during a project for Germany’s largest department store Galeria Kaufhof and allows users to add a series of simple rules in a text file to raise or lower results containing certain words, filter out certain results, add synonyms and decompound words (particularly important for German!). We’ve seen similar rules-based systems in use at many of our e-commerce clients, but few of these work well with Solr (Hybris in particular has a poor integration with Solr and can produce some very strange Solr queries). In contrast, Querqy is open source and designed by someone with expert Solr knowledge. With the addition of a simple UI or an integration with a relevancy-testing framework such as Quepid, this could be a fantastic tool for day-to-day tuning of search relevance – without the need for Solr expertise. You can find Querqy on Github.

Michael Suzuki of Alfresco talked next about the importance of being bilingual (actually he speaks 4 languages!) and how new features in Solr version 6 allow one to use either Solr syntax, SQL expressions or a combination of both. This helps hide Solr’s complexity and also allows easy integration with database administration and reporting tools, while allowing use of Solr by the huge number of developers and database administrators familiar with SQL syntax. Using a test set from the IMDB movie archive he demonstrated how SQL expressions can be used directly on a Solr index to answer questions such as ‘what are the highest grossing film actors’. He then used visualisation tool Apache Zeppelin to produce various graphs based on these queries and also showed dbVisualizer, a commonly used database administration tool, connecting directly to Solr via JDBC and showing the index contents as if they were just another set of SQL tables. He finished by talking briefly about the new statistical programming features in Solr 6.6 – a powerful new development with features similar to the R language.

We continued with a brief Q&A session . Thanks to both our speakers – we’ll be back again soon!

The post London Lucene/Solr Meetup: Query Pre-processing & SQL with Solr appeared first on Flax.

Setting up your first Quepid test case

Charlie Hull — Fri, 08 Jul 2016 11:10:20 +0000

Quepid is an innovative tool from our partners Open Source Connections, which allows you to bridge the gap between content owners (who really know what’s in your search index and how people might search for it) and search developers (who can tweak the search engine to improve relevance, given some examples of ‘good’ and ‘bad’ results for a query). We’re increasingly using it in client projects – but how do you get started with creating test cases in Quepid? Viewing the various Quepid videos at http://quepid.com/support/ is the best place to get a sense of how Quepid works – so this is probably a good first step.

Now, let’s assume you have Quepid running in your browser – there’s a 30 day free trial which lets you create a single test case, which is a great way to try it out. A Case is used to illustrate a particular problem with search relevancy – say, how searching for ‘iPhone’ shows iPhone cases higher up the list than actual iPhones. Each Case contains a number of Queries. Note in this example we’re using Solr, but Quepid also works with Elasticsearch.

1. Hooking Quepid up to your search engine.

You’re going to need the help of your search developer for this one! He’ll need to tell you the URL of your Solr or Elasticsearch engine – and this will need to be accessible from the PC you’re running Quepid on. Since Quepid runs in the browser (although it stores its data in the Cloud) you shouldn’t have any trouble setting up secure access to your search engine – after all, your own PC is probably already within your corporate network. In Quepid, Click ‘Relevancy cases’ and ‘Create a case’. Give the case a name, like ‘iPhone_problem_English’ or ‘Two_word_queries’.

Enter the URL provided by your developer: for Solr, it will probably look a bit like:
http://your domain/solr/name of a Solr index/select
e.g.
http://www.mycompany.com/solr/myproducts/select

Quepid will then check it can see the Solr index – if it can’t, check that the URL is correct.

2. Setting up the right query fields

Now you need to tell Quepid an ID field (which must be unique) and a title field for each result. If you start typing, Quepid will show some suggestions – check with your developer for which ones to use as these will be defined in the schema configuration for your search engine. You can select any other fields to be displayed for each result: let Quepid suggest some by clicking in the Additional Display Fields box. All the above can be changed the Settings pane of the Tune Relevance panel later, so don’t worry if you don’t add everything now.

3. Adding some queries

You can now add some queries to test – ‘iPhone’, ‘iPhone case’, ‘iphone’ or whatever fits the test you’re creating. Add a few for now, you can add more later. Once you’re done click Continue, then Finish and Quepid will try these queries out. Don’t worry if you don’t get many results for now.

4. Using the right query parameters

By default, Quepid only sends a very simple query to Solr or Elasticsearch (click on Tune Relevance and check the Tune panel, you should see just ‘#$query##’ – a token that represents the various test queries you added above), and your search application almost certainly sends something a lot more complicated! So you can be sure you’re testing the same configuration as your search application uses, you need to tell Quepid what query pattern is being used.

One way to start is to use Solr’s log files to see what actual queries are being run by your search application. Your search developer should be able to find a section that looks like this:

INFO - 2016-06-03 09:12:37.964; [ mydomain.com] org.apache.solr.core.SolrCore; [mydomain.com] webapp=/solr path=/select params={hl.fragsize=70&sort=+score+desc,+date_text+desc&hl.mergeContiguous=true&qf=tm_body:summary^1.0&qf=tm_body:value^1.0&qf=tm_field_product^5.0&hl.simple.pre=[HIGHLIGHT]&json.nl=map&hl.fl=spell&wt=json&hl=true&rows=8&fl=*,score&hl.snippets=3&start=0&q="iphone"&hl.simple.post=[/HIGHLIGHT]&fq=bs_status:"true"&fq=index_id:"node_index"} hits=5147 status=0 QTime=46

Stripping out the query gives us:

hl.fragsize=70&sort=+score+desc,+date_text+desc&hl.mergeContiguous=true&qf=tm_body:summary^1.0&qf=tm_body:value^1.0&qf=tm_field_product^5.0&hl.simple.pre=[HIGHLIGHT]&json.nl=map&hl.fl=spell&wt=json&hl=true&rows=8&fl=*,score&hl.snippets=3&start=0&q="iphone"&hl.simple.post=[/HIGHLIGHT]&fq=bs_status:"true"&fq=index_id:"node_index"

We need to replace the query (highlighted above, we’re searching for ‘iphone’) with a special token so Quepid can use this string to send all its test queries:

hl.fragsize=70&sort=+score+desc,+date_text+desc&hl.mergeContiguous=true&qf=tm_body:summary^1.0&qf=tm_body:value^1.0&qf=tm_field_product^5.0&hl.simple.pre=[HIGHLIGHT]&json.nl=map&hl.fl=spell&wt=json&hl=true&rows=8&fl=*,score&hl.snippets=3&start=0&q=#$query##&hl.simple.post=[/HIGHLIGHT]&fq=bs_status:"true"&fq=index_id:"node_index"

If you paste this string into Quepid’s Tune panel (click Tune Relevance to toggle this) then you know Quepid is sending the same type of queries as your search application. Click ‘Rerun my Searches’ and the results you see should be in a similar, if not identical, order to your search application.

5. Starting the tuning process

You should now have Quepid connected to your actual Solr index and running queries the same way that your search application does – you can now start the process of ranking the results. Once you have some scores, you can ask your search developer to try changing the query in the Tune panel to see if he can improve the relevance scores. Your journey towards better relevance has begun!

Do get in touch if you’d like more information about Quepid or how Flax can help you develop a process of test-based relevancy tuning.

The post Setting up your first Quepid test case appeared first on Flax.

Measuring search relevance scores

Charlie Hull — Tue, 19 Apr 2016 09:23:41 +0000

A series of blogs by Karen Renshaw on improving site search:

How to get started on improving Site Search Relevancy

A suggested approach to running a Site Search Tuning Workshop

Auditing your site search performance

Developing ongoing search tuning processes

Measuring search relevance scores

In my last blog I talked about creating a framework for measuring search relevancy scores. In this blog I’ll show how this measurement can be done with a new tool, Quepid.

As I discussed, it’s necessary to record scores assigned to each search result based on how well that result answers the original query. Having this framework in place is necessary to ensure that you avoid the ‘see-saw’ effect of fixing one query but breaking many others further down the chain.

The challenge with this is the time taken to re-score queries once configuration changes have been made – especially given you could be testing thousands of queries.

That’s why it’s great to see a tool like Quepid now available. Quepid sits on top of open source search engines Apache Solr and Elasticsearch (it can also incorporate scores from other engines, which is useful for comparison purposes if you are migrating) and it automatically recalculates scores when configuration changes are made, thus reducing the time taken to understanding the impact of your changes.

Business and technical teams benefit

Quepid is easy to get going with. Once you have set up and scored an initial set of search queries (known as cases), developers can tweak configurations within the Quepid Sandbox (without pushing to live) and relevancy scores are automatically recalculated enabling business users to see changes in scores immediately.

This score, combined with the feedback from search testers, provides the insight into how effective the change has been – removing uncertainty about whether you should publish the changes to your live site.

Improved stakeholder communication

Having figures that shows how search relevancy is improving is also a powerful tool for communicating search performance to stakeholders (and helps to overcome those HIPPO and LIPPO challenges I’ve mentioned before too). Whilst a relevancy score itself doesn’t translate to a conversion figure, understanding how your queries are performing could support business cases and customer metric scores.

Test and Learn

As the need to manually re-score queries is removed, automated search testing is possible and combined with greater collaboration and understanding across the entire search team means that the test and learn process is improved.

Highly Customisable

Every organisation has a different objective when it comes to improving search, but Quepid is designed so that it can support your organisation and requirements:

Choose from a range of available scorers or create your own
Set up multiple cases so that you can quickly understand how different types of queries perform
Share cases amongst users for review and auditing
Download and export cases and scores
Assist with a ‘deep dive’ into low scoring queries
Identify if there are particular trends or patterns you need to focus on as part of your testing
Create a dashboard to share with category managers and other stakeholders

Flax are the UK resellers for Quepid, built by our partners OpenSource Connections – contact us for a demo and free 30-day trial.

Karen Renshaw is an independent On Site Search consultant and an associate of Flax. Karen was previously Head of On Site Search at RS Components, the world’s largest electronic component distributor.

Flax can offer a range of consulting, training and support, provide tools for test-driven relevancy tuning and we also run Search Workshops. If you need advice or help please get in touch.

The post Measuring search relevance scores appeared first on Flax.

Auditing your site search performance

Charlie Hull — Fri, 08 Apr 2016 09:22:23 +0000

A series of blogs by Karen Renshaw on improving site search:

In my last blog I wrote in depth about how to run a search workshop. In this blog I cover how to create an audit of your current site search performance.

When starting on your journey to improve on site search relevancy investing time in understanding current performance is essential. Whilst the level of detail available to you will vary by industry and business, there are multiple sources of information you will have access to that provide insight. Combining these will ensure you have a holistic view of your customers experience.

The main sources of information to consider are:

Web Analytics
Current Relevancy Scores
Customer Feedback
Known Areas of Improvement
Competitors

Web Analytics

The metrics you use will be dependent upon your business, the role that search plays on your site and what you currently measure. What is key is to develop a view of how core search queries are performing. Classifying and creating an aggregated view of performance for different search queries allows you to identify any differences by search type, which you might want to focus on as part of your testing.

This approach also helps to prevent reacting to the opinions of HIPPOS and LIPPOS (Highest Paid Persons and Loudest Paid Persons Opinions) when constructing test matrixes.

Another measure to consider is zero results – what percentage of your search queries lead customers to a see the dreaded ‘no results found’ message. Don’t react to the overall percentage as a figure per se (a low percentage could mean too many irrelevant results are being returned, a high percentage that you don’t have the product / information your customers are looking for). Again what you’re trying to get to is an understanding of the root cause so you can build changes into your overall test plan. It’s a manual process but even a review of the top 200 zero results will throw up meaningful insights.

Current Relevancy Scores

Very closely linked to Web Analytics is a view of current search relevancy scores. It’s good practice to develop a benchmark as to how search queries are performing through creating a search relevancy framework. Simply put, this is a score assigned to each search result based on how well that result answers the original query.

Use queries from your search logs so you know you are scoring the queries important to your customers (and not just those important to those HIPPO’s and LIPPO’s). And whilst scoring will always be subjective providing guidelines to your testers helps mitigate this.

Tools like Quepid, which can sit on top of open source search engines Apache Solr and Elasticsearch (and also incorporate scores from other engines) and automatically recalculate scores when configuration changes are made, can support ongoing search tuning processes.

Customer Feedback

Whether in the form of structured or unstructured feedback, with site search critical to a successful customer experience, your customers will undoubtedly be providing you with a wealth of feedback.

Take the time to read through as much of it as possible. Even better, walk through some of the journeys yourself to understand the experience from the eyes of your customers. Whilst the feedback might be vague you’ll quickly find you can classify and pull out key themes.

Internal customer service departments can also provide you with customer logs and real life scenarios. Involving them up front to identify problem areas can help in the long term as they can be an invaluable resource when testing different search set ups.

Known Areas of Improvement

You’ve probably already got a list of search configurations on your backlog you want to review and test. Your developers will too, as will multiple teams across the organisation. Pulling together all these different views can provide a useful perspective on how to tackle problem areas.

Whilst you need to develop your search strategy based on customers needs (not just what other people like) it’s always useful to have sight of what search functionality exists that has helped them to find the right product, so capture these as you go along.

Competitor Review

A very important question for e-commerce sites is how are your competitors answering common queries? As you have for your own site, scoring common search queries across multiple sites provides a view of how you fare compared to your competitors.

Getting Started!

Now you have all this insight you can start to build out your search test plans with your developers. In my next blog I’ll cover how to start developing search tuning processes.

Flax can offer a range of consulting, training and support, provide tools for test-driven relevancy tuning and we also run Search Workshops. If you need advice or help please get in touch.

The post Auditing your site search performance appeared first on Flax.

A suggested approach to running a Site Search Tuning Workshop

Charlie Hull — Thu, 24 Mar 2016 15:25:15 +0000

A series of blogs by Karen Renshaw on improving site search:

In my last blog I talked about getting started on improving site search relevancy, including the idea of running a two-day initial workshop. In this blog I cover more detail around what the workshop looks like in terms of structure.

Your reason for improving on site search could be driven by migration to a new platform or a need to improve ‘business as usual’ performance. As such, the exact structure should be tailored to you. It’s also worth remembering that whilst the workshop is the starting point, to get the most from it you will need to spend time in advance to gather all the relevant information you’ll need.

Workshop Overview

Objectives : Spend 30 mins at the start of the day to ensure that that the objectives (for workshop and overall project) are communicated and agreed across the entire project team.

Review the current search set up

It might seem wasteful to spend time reviewing your current set up – especially if you are moving to a new search platform – but ensuring everyone understands what and why you have the set up you have today is essential when designing future state.

It’s useful to break this session further into a Technical Set Up and Business Process. This helps to uncover if there are:

Particular search cases that you have developed workarounds for and which you need to protect revenue for – your intent will be to remove these workarounds but do you need to be aware they exist
Changes to your content model or content systems that you need to take into consideration
Technical constraints that you had in the past that are now gone

Ensuring a common level of understanding helps as the project moves forward.

Review current performance

Ensuring that the team knows how search queries are currently performing again increases buy in and engagement and provides a benchmark against which changes can be measured.

Your metrics will be dependent upon your business and what you currently measure (if you aren’t measuring anything – this would also be a good time to plan out what you should).

Classifying the types of search queries your customers are using is also important: do customers search predominately for single keywords, lengthy descriptors or part numbers? Whilst getting to this level of detail involves manual processes it not only provide a real insight into how your customers formulate queries but helps to avoid the ‘see-saw’ impact of focusing on fixes for some whilst unknowingly breaking others further down the tail.

Develop a search testing methodology

With the information to hand around current search set up and performance, now comes the fun part – figuring out the configuration set ups and tests you want to include as part of that new set up.

If you are migrating to a new platform, new approaches are possible, but if you’re working with existing technology there are opportunities to review and test current assumptions.

Search tuning is an iterative process: impacts of configuration changes are only understood once you start testing and determine if the results are as you expected, so build this into the plan from the start.

Dependent upon timescales and objectives you might chose to make wholescale changes immediately or you might decide to make a series of small changes to be able to test and measure each of them independently. Whichever option is best for you, measuring and tracking changes to your search relevancy scores are critical, tools such as Quepid make this possible (it’s also a great tool for building those collaborative working practices which are so important).

Whilst the focus is around improving search relevancy, excellent search experiences are achieved as a result of the holistic user experience, so remember to consider your UX strategy alongside your search relevancy strategy.

Making plans

Alongside clearly defined objectives you should aim to end the workshop with clearly defined action plans. The level of detail you capture and maintain again depends on your needs but as a minimum you should have mapped out:

Initial Configuration Tests
Test Search Queries
Test Team
Ongoing project management (Stand Ups / Project Reviews)

In my next blog I’ll write in more detail about how to audit your current and future search performance.

Flax can offer a range of consulting, training and support, provide tools for test-driven relevancy tuning and we also run Search Workshops. If you need advice or help please get in touch.

The post A suggested approach to running a Site Search Tuning Workshop appeared first on Flax.

Search Solutions 2015: Towards a new model of search relevance testing

Charlie Hull — Fri, 27 Nov 2015 15:53:30 +0000

Find out more about Quepid here.

Search Solutions 2015: Towards a new model of search relevance testing from Charlie Hull

The post Search Solutions 2015: Towards a new model of search relevance testing appeared first on Flax.

Flax Newsletter November 2015

Charlie Hull — Tue, 10 Nov 2015 11:03:43 +0000

In this month’s Flax Newsletter:

Building an open source search team is hard – let us help with training & mentoring on Solr and Elasticsearch
RS Components: Flax & Quepid help us to make “crucial” data driven decisions for tuning search
40x faster indexing with Elasticsearch for Hadoop – over a gigabyte per second!

The post Flax Newsletter November 2015 appeared first on Flax.

Quepid & Flax – if you’re not testing your search, you’re doing it wrong!

Charlie Hull — Mon, 09 Nov 2015 15:02:16 +0000

Earlier this year an e-commerce company asked us to look into how they should improve how they tested their website search queries. A relatively simple task you might think – but the company concerned has a turnover of over a billion pounds with at least half of this via digital channels, so measuring how well search works is essential to preserve revenue. Like (I suspect) many others, they were recording the results of thousands of test searches, carried out manually by their staff in several different languages, in spreadsheets – which worked, but made it very slow to improve search results. It was also often the case that a configuration change made to address one problem would negatively affect another set of results. This is an issue we’ve seen many times before.

I’ve known the guys at OpenSource Connections (OSC) for several years now – working out of Charlottesville, Virginia, like Flax they provide expertise in search and related technologies. Last year they’d shown me an early version of Quepid, a browser-based tool for recording relevance judgements. This tool seemed like the perfect fit and we began to work with OSC to add various enterprise features for the aforementioned client. Along the way, Quepid gained compatibility with both Elasticsearch and Solr and many user interface improvements and is now in daily use at the client’s site. As it can be used by both business users (to rate searches) and developers (to adjust search configuration and to instantly see the effect on those ratings, across the board) it helps to develop a fast feedback loop for improving relevance.

I’m very glad to annnounce that we’re now announcing a full partnership with OSC and will be offering Quepid to all our clients (let me know if you want a demo!). We’ll also be talking about test-driven relevance tuning over the next few months – I’m particularly looking forward to the publication of this book co-written by Quepid developer Doug Turnbull.

If you’re not measuring how good your search is performing, you simply have no idea if your search engine is correctly configured. Too often, changes to search are driven by the HiPPO, by reacting to customer feedback without considering the effects of this across the whole system, or simply by dropping in a new technology and assuming this will fix everything. We can change this, by introducing test-driven relevance tuning.

The post Quepid & Flax – if you’re not testing your search, you’re doing it wrong! appeared first on Flax.