Sectors – Flax

Catching MICES – a focus on e-commerce search

Charlie Hull — Tue, 19 Jun 2018 14:15:55 +0000

The second event I attended in Berlin last week was the Mix Camp on e-commerce search (MICES), a small and focused event now in its second year and kindly hosted by Mytoys at their offices. Slides for the talks are available here and I hope videos will appear soon.

The first talk was given by Karen Renshaw of Grainger, who Flax worked with at RS Components (she also wrote a great series of blog posts for us on improving relevancy). Karen’s talk drew on her long experience of managing search teams from a business standpoint – this wasn’t about technology but about combining processes, targets and objectives to improve search quality. She showed how to get started by examining customer feedback, known issues, competitors and benchmarks; how to understand and categorise query types; create a test plan within a cross-functional team and to plan for incremental change. Testing was covered including how to score search quality and how to examine the impact of search changes, with the message that “all aspects of search should work together to help customers through their journey”. She concluded with the clear point that there are no silver bullets, and that expectations must be managed during an ongoing, iterative process of improvement. This was a talk to set the scene for the day and containing lessons for every search manager (and a good few search technologists who often ignore the business factors!).

Next up were Christine Bellstedt & Jens Kürsten from Otto, Germany’s second biggest online retailer with over 850,000 search queries a day. Their talk focused on bringing together the users and business perspective to create a search quality testing cycle. They quoted Peter Freis’ graphic from his excellent talk at Haystack to illustrate how they created an offline system for experimentation with new ranking methods based on linear combinations of relevance scores from Solr, business performance indicators and product availability. They described how they learnt how hard it can be to select ranking features, create test query sets with suitable coverage and select appropriate metrics to measure. They also talked about how the experimentation cycle can be used to select ‘challengers’ to the current ‘champion’ ranking method, which can then be A/B tested online.

Pavel Penchev of SearchHub was next and presented their new search event collector library – a Javascript SDK which can be used to collect all kinds of metrics around user behaviour and submit them directly to a storage or analytics system (which could even be a search engine itself – e.g. Elasticsearch/Kibana). This is a very welcome development – only a couple of months ago at Haystack I heard several people bemoaning the lack of open source tools for collecting search analytics. We’ll certainly be trying out this open source library.

Andreas Brückner of e-commerce search vendor Fredhopper talked about the best way to optimise search quality in a business context. His ten headings included “build a dedicated search team” – although 14% of Fredhoppers own customers have no dedicated search staff – “build a measurement framework” – how else can you see how revenue might be improved? and “start with user needs, not features”. Much to agree with in this talk from someone with long experience of the sector from a vendor viewpoint.

Johannes Peter of MediaMarktSaturn described an implementation of a ‘semantic’ search platform which attempts to understand queries such as ‘MyMobile 7 without contract’, recognising this is a combination of a product name, a Boolean operator and an attribute. He described how an ontology (perhaps showing a family of available products and their variants) can be used in combination with various rules to create a more focused query e.g. “title:(“MyMobile7″) AND NOT (flag:contract)”. He also mentioned machine learning and term co-occurrence as useful methods but stressed that these experimental techniques should be treated with caution and one should ‘fail early’ if they are not producing useful results.

Ashraf Aaref & Felipe Besson described their journey using Learning to Rank to improve search at GetYourGuide, a marketplace for activities (e.g. tours and holidays). Using Elasticsearch and the LtR plugin recently released by our partners OpenSourceConnections they tried to improve the results for their ‘location pages’ (e.g. for Paris) but their first iteration actually gave worse results than the current system and was thus rejected by their QA process. They hope to repeat the process using what they have learned about how difficult it is to create good judgement data. This isn’t the first talk I’ve seen that honestly admits that ML approaches to improving search aren’t a magic silver bullet and the work itself is difficult and requires significant investment.

Duncan Blythe of Zalando gave what was the most forward-looking talk of the event, showing a pure Deep Learning approach to matching search queries to results – no query parsing, language analysis, ranking or anything, just a system that tries to learn what queries match which results for a product search. This reminded me of Doug & Tommaso’s talk at Buzzwords a couple of days before, using neural networks to learn the journey between query and document. Duncan did admit that this technique is computationally expensive and in no way ready for production, but it was exciting to hear about such cutting-edge (and well funded) research.

Doug Turnbull was the last speaker with a call to arms for more open source tooling, datasets and relevance judgements to be made available so we can all build better search technology. He gave a similar talk to keynote the Haystack event two months ago and you won’t be surprised to hear that I completely agree with his viewpoint – we all benefit from sharing information.

Unfortunately I had to leave MICES at this point and missed the more informal ‘bar camp’ event to follow, but I would like to thank all the hosts and organisers especially René Kriegler for such an interesting day. There seems to be a great community forming around e-commerce search which is highly encouraging – after all, this is one of the few sectors where one can draw a clear line between improving relevance and delivering more revenue.

The post Catching MICES – a focus on e-commerce search appeared first on Flax.

Finding the Bad Actor: Custom scoring & forensic name matching with Elasticsearch

Charlie Hull — Thu, 01 Feb 2018 10:13:56 +0000

Finding the Bad Actor: Custom scoring & forensic name matching with Elasticsearch from Charlie Hull

The post Finding the Bad Actor: Custom scoring & forensic name matching with Elasticsearch appeared first on Flax.

Worth the wait – Apache Kafka hits 1.0 release

Charlie Hull — Thu, 02 Nov 2017 09:50:20 +0000

We’ve known about Apache Kafka for several years now – we first encountered it when we developed a prototype streaming Boolean search engine for media monitoring with our own library Luwak. Kafka is a distributed streaming platform with some simple but powerful concepts – everything it deals with is a stream of data (like a messaging system), streams can be combined for processing and stored reliably in a highly fault-tolerant way. It’s also massively scalable.

For search applications, Kafka is a great choice for the ‘wiring’ between source data (databases, crawlers, flat files, feeds) and the search index and other parts of the system. We’ve used other message passing systems (like RabbitMQ) in projects before, but none have the simplicity and power of Kafka. Combine the search index with analysis and visualisation tools such as Kibana and you can build scalable, real-time systems for ingesting, storing, searching and analysing huge volumes of data – for example, we’ve already done this for clients in the financial sector wanting to monitor log data using open-source technology, rather than commercial tools such as Splunk.

The development of Kafka has been masterminded by our partners Confluent, and it’s a testament to this careful management that the milestone 1.0 version has only just appeared. This doesn’t mean that previous versions weren’t production ready – far from it – but it’s a sign that Kafka has now matured to be a truly enterprise-scale project. Congratulations to all the Kafka team for this great achievement.

We look forward to working more with this great software – and if you need help with your Kafka project do get in touch!

The post Worth the wait – Apache Kafka hits 1.0 release appeared first on Flax.

Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch

Charlie Hull — Thu, 28 Sep 2017 08:44:26 +0000

I finally made it to a London Elastic Meetup again after missing a few of the recent events: this time Rightmove were the hosts and the first speakers. They described how they had used Elasticsearch Percolator to run 3.5 million stored searches on new property listings as part of an overall migration from the Exalead search engine and Oracle database to a new stack based on Elasticsearch, Apache Kafka and CouchDB. After creating a proof-of-concept system on Amazon’s cloud they discovered that simply running all 3.5m Percolator queries every time a new property appeared would be too slow and thus implemented a series of filters to cut down the number of queries applied, including filtering out rental properties and those in the wrong location. They are now running around 40m saved searches per day and also plan to upgrade from their current Elasticsearch 2.4 system to the newer version 5, as well as carry out further performance improvements. After the talk I chatted to the presenter George Theofanous about our work for Bloomberg using our own library Luwak, which could be an way for Rightmove to run stored searches much more efficiently.

Next up was Signal Media, describing how they built an automated system for upgrading Elasticsearch after their cluster grew to over 60 nodes (they ingest a million articles a day and up to May 2016 were running on Elasticsearch 1.5 which had a number of issues with stability and performance). To avoid having to competely shut down and upgrade their cluster, Joachim Draeger described how they carried out major version upgrades by creating a new, parallel cluster (he named this the ‘blue/green’ method), with their indexing pipeline supplying both clusters and their UI code being gradually switched over to the new cluster once stability and performance were verified. This process has cut their cluster to only 23 nodes with a 50% cost saving and many performance and stability benefits. For ongoing minor version changes they have built an automated rolling upgrade system using two Amazon EBS volumes for each node (one is for the system, and is simply switched off as a node is disabled, the other is data and is re-attached to a new node once it is created with the upgraded Elasticsearch machine image). With careful monitoring of cluster stability and (of course) testing, this system enables them to upgrade their entire production cluster in a safe and reliable way without affecting their customers.

After the talks I announced the Search Industry Awards I’ll be helping to judge in November (please apply if you have a suitable search project or innovation!) and then spoke to Simone Scarduzio about his free Elasticsearch and Kibana security plugin, a great alternative to the Elastic X-Pack (only available to Elastic subscription customers). We’ll certainly be taking a deeper look at this plugin for our own clients.

Thanks again to Yann Cluchey for organising the event and all the speakers and hosts.

The post Elastic London Meetup: Rightmove & Signal Media and a new free security plugin for Elasticsearch appeared first on Flax.

A fabulous FactHack for Full Fact

Charlie Hull — Fri, 27 Jan 2017 10:49:20 +0000

Last week we ran a hackday for Full Fact, hosted by Facebook in their London office. We had planned to gather a room full of search experts from our London Lucene/Solr Meetup and around twenty people attended from a range of companies including Bloomberg, Alfresco and the European Bioinformatics Institute, including a number of Lucene/Solr committers.

Mevan Babakar of Full Fact has already written a detailed review of the day, but to summarise we worked on three areas:

Building a web service around our Luwak stored query engine, to give it an easy-to-use API. We now have an early version of this which allows Full Fact to check claims they have previously fact checked against a stream of incoming data (e.g. subtitles or transcripts of political events).
Creating a way to extract numbers from text and turn them into a consistent form (e.g. ‘eleven percent’, ‘11%’, ‘0.11’) so that we can use range queries more easily – Derek Jones’ team researched existing solutions and he has blogged about what they achieved.
Investigating how to use natural language processing to identify parts of speech and tag them in a Lucene index using synonyms and token stacking, to allow for queries such as ‘ is rising’ to match text like ‘crime is rising’ – the team forked Lucene/Solr to experiment with this.

We’re hoping to build on these achievements to continue to support Full Fact as they develop open source automated fact checking tools for both their own operations and for other fact checking organisations across the world (there were fact checkers from Argentina and Africa attending to give us an international perspective). Our thanks to all of those who contributed.

I’ve also introduced Full Fact to many others within the search and text analytics community and we would welcome further contributions from anyone who can lend their expertise and time – get in touch if you can help. This is only the beginning!

The post A fabulous FactHack for Full Fact appeared first on Flax.

Just the facts with Solr & Luwak

Charlie Hull — Wed, 04 Jan 2017 15:58:19 +0000

It won’t have escaped your notice that factchecking is very much in the news recently due to last year’s political upheavals in both the US and UK and the suspected influence of fake news on voters. Both traditional and social media organisations are making efforts in this area; examples include Channel 4 and Facebook.

At our recent London Lucene/Solr Meetup UK charity Full Fact spoke eloquently on the need for automated factchecking tools to help identify and correct stories that are demonstrably false. They’ve also published a great report on The State of Automated Factchecking which mentions both Apache Solr and our powerful stored query library Luwak as components of their platform. We’ve been helping FullFact with their prototype factchecking tools for a while now but during the Meetup I suggested we might run a hackday to develop these further.

Thus I’m very pleased to announce that Facebook have offered us a venue in London for the hackday on January 20th (register here). Many Solr developers, including several committers and PMC members, are signed up to attend already. We’ll use Full Fact’s report and their experiences of factchecking newspapers, TV’s Question Time and Hansard to design and build practical, useful tools and identify a future roadmap. We’ll aim to publish what we build as open source software which should also benefit factchecking organisations across the world.

If you’re concerned about the impact of fake news on the political process and want to help, join the Meetup and/or donate to Full Fact.

The post Just the facts with Solr & Luwak appeared first on Flax.

Boosts Considered Harmful – adventures with badly configured search

Charlie Hull — Fri, 19 Aug 2016 13:10:10 +0000

During a recent client visit we encountered a common problem in search – over-application of ‘boosts’, which can be used to weight the influence of matches in one particular field. For example, you might sensibly use this to make results that match a query on their title field come higher in search results. However in this case we saw huge boost values used (numbers in the hundreds) which were probably swamping everything else – and it wasn’t at all clear where the values had come from, be it experimentation or simply wild guesses. As you might expect, the search engine wasn’t performing well.

A problem with both Solr, Elasticsearch and other search engines is that so many factors can affect the ordering of results – the underlying relevance algorithms, how source data is processed before it is indexed, how queries are parsed, boosts, sorting, fuzzy search, wildcards…it’s very easy to end up with a confusing picture and configuration files full of conflicting settings. Often these settings are left over from example files or previous configurations or experiments, without any real idea of why they were used. There are so many dials to adjust and switches to flick, many of which are unnecessary. The problem is compounded by embedding the search engine within another system (e.g. a content management platform or e-commerce engine) so it can be hard to see which control panel or file controls the configuration. Generally, this embedding has not been done by those with deep experience of search engines, so the defaults chosen are often wrong.

The balance of relevance versus recency is another setting which is often difficult to get right. At a news site we were asked to bias the order of results heavily in favour of recency (as the saying goes, yesterday’s newspaper is today’s chip wrapper) – the result being, as we had warned, that whatever the query today’s news would appear highest – even if it wasn’t relevant! Luckily by working with the client we managed to achieve a sensible balance before the site was launched.

Our approach is to strip back the configuration to a very basic one and to build on this, but only with good reason. Take out all the boosts and clever features and see how good the results are with the underlying algorithms (which have been developed based on decades of academic research – so don’t just break them with over-boosting). Create a process of test-based relevancy tuning where you can clearly relate a configuration setting to improving the result of a defined test. Be clear about which part of your system influences a setting and whose responsibility it is to change it, and record the changes in source control.

Boosts are a powerful tool – when used correctly – but you should start by turning them off, as they may well be doing more harm than good. Let us know if you’d like us to help tune your search!

The post Boosts Considered Harmful – adventures with badly configured search appeared first on Flax.

Can we fix your Solr or Elasticsearch system in a single day?

Charlie Hull — Fri, 17 Jun 2016 09:55:55 +0000

Here at Flax, we’re often called in to take a look at existing Apache Solr or Elasticsearch search applications, to suggest improvements, tune-ups or enhancements. It’s impossible for us to know ahead of time what we might find – out-of-date versions of the software, slow performance on either (or both) the indexing or search side of the application and untidy or incorrect configuration files are all common. We also have to learn something about your particular business or sector – the search needs of an e-commerce company are very different to those of a legal firm, startup or government organisation, for example.

Often we’re asked ‘how long will this take’ before we have any detail of the business, the application or how it has been set up. Our clients are obviously keen to know as soon as possible the potential costs of any work that might be necessary and what impact it might have. Some search specialists will only engage with a client for a minimum period (say, a week) which can be quite a commitment, especially for smaller enterprises, both in terms of budget and staff time. However, we’re quite happy to admit we don’t know how long it will take – yet.

Our approach is very simple. We’ll spend a first day with you, on-site if possible, examining the following things:

What’s are the business requirements for search?
How is the search engine software hosted & deployed?
What does the data to be searched look like? How is it indexed by the search engine?
What search features have been used and has this been done correctly?
How fast is search? What factors are affecting this?
How is search relevance and performance tested?

This is by by no means an exhaustive list, but we’ll do what we can during this first day. At the end of the day we will write a brief report (probably no longer than two pages) detailing what we’ve found and some recommendations. If we find anything simple to fix that will make an immediate improvement, we’ll tell you (and if possible help you do so on-site). We charge a flat rate for this kind of engagement.

Even after this single day, you should now have enough information to make some decisions about improving your search – you could decide to let us help run a search workshop, or ask us to come up with a more detailed and costed improvement plan.

Hopefully you’ll also have realised that with over 15 years experience of building search applications with open source software, we are the right team to help you improve your search. If you need help, get in touch today.

The post Can we fix your Solr or Elasticsearch system in a single day? appeared first on Flax.

Out with the old – and in with the new Lucene query parser?

Charlie Hull — Fri, 13 May 2016 12:41:52 +0000

Over the years we’ve dealt with quite a few migration projects where the query syntax of the client’s existing search engine must be preserved. This might be because other systems (or users) depend on it, or a large number of stored expressions exist and it is difficult or uneconomic to translate them all by hand. Our usual approach is to write a query parser, which understands the current syntax but creates a query suitable for a modern open source search engine based on Apache Lucene. We’ve done this for legacy engines including dtSearch and Verity and also for in-house query languages developed by clients themselves. This allows you to keep the existing syntax but improve performance, scalability and accuracy of your search engine.

There are a few points to note during this process:

What appears to be a simple query in your current language may not translate to a simple Lucene query, which may lead to performance issues if you are not careful. Wildcards for example can be very expensive to process.
You cannot guarantee that the new search system will return exactly the same results, in the same order, as the old one, no matter how carefully the query parser is designed. After all, the underlying search engine algorithms are different.
Some element of manual translation may be necessary for particularly large, complex or unusual queries, especially if the original intention of the person who wrote the query is unclear.
You may want to create a vendor-neutral query language as an intermediate step – so you can migrate more easily next time. We’ve done this for Danish media monitors Infomedia.
If you have particularly large and/or complex queries that may have been added to incrementally over time, they may contain errors or logistical inconsistencies – which your current engine may not be telling you about! If you find these you have two choices: fix the query expression (which may then give you slightly different results) or make the new system give the same (incorrect) results as before.

To mitigate these issues it is important to decide on a test set of queries and expected results, and what level of ‘correctness’ is required – bearing in mind 100% is going to be difficult if not impossible. If you are dealing with languages outside the experience of the team you should also make sure you have access to a native speaker – so you can be sure that results really are relevant!

Do let us know if you’re planning this kind of migration and how we can help – building Lucene query parsers is not a simple task and some past experience can be invaluable.

The post Out with the old – and in with the new Lucene query parser? appeared first on Flax.

Measuring search relevance scores

Charlie Hull — Tue, 19 Apr 2016 09:23:41 +0000

A series of blogs by Karen Renshaw on improving site search:

How to get started on improving Site Search Relevancy

A suggested approach to running a Site Search Tuning Workshop

Auditing your site search performance

Developing ongoing search tuning processes

Measuring search relevance scores

In my last blog I talked about creating a framework for measuring search relevancy scores. In this blog I’ll show how this measurement can be done with a new tool, Quepid.

As I discussed, it’s necessary to record scores assigned to each search result based on how well that result answers the original query. Having this framework in place is necessary to ensure that you avoid the ‘see-saw’ effect of fixing one query but breaking many others further down the chain.

The challenge with this is the time taken to re-score queries once configuration changes have been made – especially given you could be testing thousands of queries.

That’s why it’s great to see a tool like Quepid now available. Quepid sits on top of open source search engines Apache Solr and Elasticsearch (it can also incorporate scores from other engines, which is useful for comparison purposes if you are migrating) and it automatically recalculates scores when configuration changes are made, thus reducing the time taken to understanding the impact of your changes.

Business and technical teams benefit

Quepid is easy to get going with. Once you have set up and scored an initial set of search queries (known as cases), developers can tweak configurations within the Quepid Sandbox (without pushing to live) and relevancy scores are automatically recalculated enabling business users to see changes in scores immediately.

This score, combined with the feedback from search testers, provides the insight into how effective the change has been – removing uncertainty about whether you should publish the changes to your live site.

Improved stakeholder communication

Having figures that shows how search relevancy is improving is also a powerful tool for communicating search performance to stakeholders (and helps to overcome those HIPPO and LIPPO challenges I’ve mentioned before too). Whilst a relevancy score itself doesn’t translate to a conversion figure, understanding how your queries are performing could support business cases and customer metric scores.

Test and Learn

As the need to manually re-score queries is removed, automated search testing is possible and combined with greater collaboration and understanding across the entire search team means that the test and learn process is improved.

Highly Customisable

Every organisation has a different objective when it comes to improving search, but Quepid is designed so that it can support your organisation and requirements:

Choose from a range of available scorers or create your own
Set up multiple cases so that you can quickly understand how different types of queries perform
Share cases amongst users for review and auditing
Download and export cases and scores
Assist with a ‘deep dive’ into low scoring queries
Identify if there are particular trends or patterns you need to focus on as part of your testing
Create a dashboard to share with category managers and other stakeholders

Flax are the UK resellers for Quepid, built by our partners OpenSource Connections – contact us for a demo and free 30-day trial.

Karen Renshaw is an independent On Site Search consultant and an associate of Flax. Karen was previously Head of On Site Search at RS Components, the world’s largest electronic component distributor.

Flax can offer a range of consulting, training and support, provide tools for test-driven relevancy tuning and we also run Search Workshops. If you need advice or help please get in touch.

The post Measuring search relevance scores appeared first on Flax.