relevance – Flax

More needles, more Haystacks, more relevance!

Charlie Hull — Wed, 05 Dec 2018 11:28:31 +0000

Those of us who have been working in the search sector for a while know that search tuning isn’t just a matter of installing the default configuration, pointing the engine at some content and starting it up – in fact, if you do just that you’ll probably end up with a search user experience that’s even worse then whatever you’re replacing and certainly a lot worse than your competitors’ solution. It’s also no longer about just knowing how one engine behaves and the magic tweaks to improve it – you need to understand the fundamentals of search and how a range of different products and projects implement this. You also need to understand user requirements and their often entirely subjective views of what is a ‘good’ and ‘bad’ search result, plus how different types of businesses can use search technology for site search, enterprise search, media monitoring, process improvement and myriad of other uses.

Over the last year or so we’ve seen the emergence of a new profession dedicated to improving how search systems present information to users – Relevance Engineering. Importantly this covers not just the technical aspects of search, but the business aspects – understanding the why as much as the how. Relevance engineers understand that search tuning is a multifaceted problem and there are no magic bullets (or magic AI robots) that will do all the work for you. I’ve started to write about relevance engineering recently to try and define what it means.

One of my favourite events last year was the first Haystack conference run by our partners Open Source Connections, which brought together both experienced relevance engineers and those new to the profession. It was friendly, informal, focused and informative. In fact, I enjoyed it so much that by the second day I was already thinking about how to bring the event to Europe – which we did successfully in October.

I’m very happy to say that Haystack is back in April 2019 and the Call for Papers is open until January 9th. If you’ve got an exciting relevance project or idea to talk about please do submit it. See you there!

The post More needles, more Haystacks, more relevance! appeared first on Flax.

Three weeks of search events this October from Flax

Charlie Hull — Tue, 04 Sep 2018 10:11:56 +0000

Flax has always been very active at conferences and events – we enjoy meeting people to talk about search! With much of our consultancy work being carried out remotely these days, attending events is a great way to catch up in person with our clients, colleagues and peers and to learn from others about what works (and what doesn’t) when building cutting-edge search solutions. I’m thus very glad to announce that we’re running three search events this coming October.

Earlier in the year I attended Haystack in Charlottesville, one of my favourite search conferences ever – and almost immediately began to think about whether we could run a similar event here in Europe. Although we’ve only had a few months I’m very happy to say we’ve managed to pull together a high-quality programme of talks for our first Haystack Europe event, to be held in London on October 2nd. The event is focused on search relevance from both a business and a technical perspective and we have speakers from global retailers and by specialist consultants and authors. Tickets are already selling well and we have limited space, so I would encourage you to register as soon as you can (Haystack USA sold out even after the capacity was increased). We’re running the event in partnership with Open Source Connections.

The next week we’re running a Lucene Hackday on October 9th as part of our London Lucene/Solr Meetup programme. Building on previous successful events, this is a day of hacking on the Apache Lucene search engine and associated software such as Apache Solr and Elasticsearch. You can read up on what we achieved at our last event a couple of years ago – again, space is limited, so sign up soon to this free event (huge thanks to Mimecast for providing the venue and to Elastic for sponsoring drinks and food for an evening get-together afterwards). Bring a laptop and your ideas (and do comment on the event page if you have any suggestions for what we should work on).

We’ll be flying to Montreal soon afterwards to attend the Activate conference (run by our partners Lucidworks) and while we’re there we’ll host another free Lucene Hackday on October 15th. Again, this would not be possible without sponsorship and so thanks must go to Netgovern, SearchStax and One More Cloud. Remember to tell us your ideas in the comments.

So that’s three weeks of excellent search events – see you there!

The post Three weeks of search events this October from Flax appeared first on Flax.

Defining relevance engineering part 3: technical assessment

Charlie Hull — Wed, 11 Jul 2018 09:49:11 +0000

In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

When Flax is working with clients on relevance tuning engagements we aim to gain an overview of the various technology the client uses and how it is obtained, deployed, managed and maintained. This will include not just the search engine but the various systems that supply data to it, host it, monitor it and interface to it to pass results to users. In addition we must understand who is responsible for the various areas, be it in-house staff, consultants, outsourcing or third party suppliers.

We try to answer the following questions in detail, including who supplies, modifies, maintains and supports the various systems concerned, what versions are used and where and how they are hosted and configured. We hope for full access to inspect the systems but this is not always possible – at the least, we need copies of configuration files and settings.

What systems supply the source data for search?
What is the current search technology?
Is the search engine part of another system (such as a content management system or product information system)?
What interface is there between the systems that supply source data and the search engine?
What systems monitor and manage the search engine?
What systems are used to submit queries to the search engine?
What query logging is performed and at what level?
How are development, test, staging and production systems arranged and what access is available to these?
What are the processes used to deploy new software and configuration?
What testing is performed?

It’s common to find flaws in the overall technical landscape – as an example, we’ll often find that there is no effective source control of search engine configuration files, with these having been originally derived from an example setup not intended for production use and since modified ad-hoc as issues arose. In this case it’s quite common that no-one knows why a particular setting has been used!

Without a good overall idea of the technology landscape it will be hard if not impossible to improve relevance. External processes (such as how hard it is to obtain a recent and complete log file from a production system) will also impact how effective these improvements will be.

Finally, as search is often owned by the IT department (and by the time we arrive, search is usually viewed as ‘broken’) we sometimes find a ‘bunker mentality’ – those responsible for the implementation are hunkered down and used to being harried and complained at by others who are unhappy with how search is (not) working. It’s important to communicate that only by being open and honest about the current situation can we all work together to improve things and build better search.

In the next post I’ll cover the tools a relevance engineer can use. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 3: technical assessment appeared first on Flax.

Lucene Solr London: Search Quality Testing and Search Procurement

Charlie Hull — Fri, 29 Jun 2018 11:09:34 +0000

Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those offers coming!).

First up was Andrea Gazzarini presenting a brand new framework for search quality testing. Designed for offline measurement, Rated Ranking Evaluator is an open source Java library (although it can be used from other languages). It uses a heirarchical model to arrange queries into query groups (all queries in a query group should be producing the same results). Each test can run across a number of search engine configuration versions and outputs results in JSON format – but these can also be translated into Excel spreadsheets, PDFs or sent to a server that provides a live console showing how search quality is affected by a search engine configuration change. Although aimed at Elasticsearch and Solr, the platform is extensible to any underlying search engine. This is a very useful tool for search developers and joins Quepid and Searchhub’s recently released search analytics acquisition library in the ‘toolbox’ for relevance engineers. You can see Andrea’s slides here.

Martin White spoke next on how open source search solutions fare in corporate procurements for enterprise search. This was an engaging talk from Martin , showing the scale of the opportunities for open source platforms with budgets of several million pounds being common for enterprise search projects. However, as he mentioned it can be very difficult for procurement departments to get information from vendors and ‘the last thing you’ll know about a piece of enterprise software is how much it will cost’. He detailed how open source solutions often compare badly against closed source commercial offerings due to it being hard to see the ‘edges’ – e.g. what custom development will be necessary to fulfil enterprise requirements. Although the opportunities are clear, it seems open source based solutions still have a way to go to compete. You can read more from Martin on this subject in the recent free Search Insights report.

Thanks to Mimecast and both speakers – we’ll be back after the summer with another Meetup!

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

Defining relevance engineering part 2: learning the business

Charlie Hull — Tue, 26 Jun 2018 11:16:57 +0000

In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

Before a relevance engineer can install or configure a search engine they need to understand the business concerned. I’ve called this ‘learning the business’ and it’s something that Flax has to do on a weekly basis. One week we may be talking to a recruitment business that thinks and operates in terms of jobs, skills, candidates and roles; the next week it could be a company that sells specialised products and is more concerned with features, prices, availability, stock levels and pack sizes. Even within a single sector, each business will work in a slightly different way, although there will be some common factors.

Example data is key to learning how a business works, but is next to useless without someone to explain it in context. In some cases the business has lost some of the internal knowledge about how their own systems work: “Jeff built that database, but he left two years ago.”. What seems obvious to them may not be obvious to anyone else. Generic terms e.g. “products”, “location”, “keywords” can mean completely different things in each business context. If they exist, corporate glossaries, dictionaries or taxonomies are very useful, but again they may need annotating to explain what each entry means. If a glossary doesn’t exist, it’s a good first step to start one.

Finding the right people to talk to is also vital. Although relevance engineers are usually engaged or recruited by the IT department, this may not be the best place to learn about the business. The marketing department may have the best view of how the business interacts with its clients; the CEO or Managing Director will know the overall direction and objectives but may not have time for the detail; the content creators (which could be librarians, web editors or product information managers) will know about the items the search engine will need to find.

In many companies there are hierarchies and structures that sometimes actively prevent the sharing of information: it’s common to discover who blames who for past bad decisions and to be used as a sounding board by those with axes to grind. At Flax we try to make sure we talk to people at all levels in the client organisation: sometimes the most junior employees – and especially those who are customer-facing – have the most useful information as they have to deal with problems on a day-to-day basis. As external consultants one of our most useful skills is being able to listen without making sudden judgements or assumptions.

The end result of these many conversations is an understanding of where source data is created, gathered and stored; what a ‘search result’ is in the context of a particular business (a product on sale? A contract? A CV or resumé?) and how it might be constructed from this data; what a ‘relevant’ result is in this context (a more valuable product to sell? The most recent contract version? The best candidate for a job?) and how good/bad/nonexistent the current search solution is. This is vital information to be gathered before one even begins thinking about how to install, develop and/or configure and test a search solution.

In the next post I’ll cover how a relevance engineer might assess the technical capability of a business with respect to search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 2: learning the business appeared first on Flax.

Defining relevance engineering, part 1: the background

Charlie Hull — Mon, 25 Jun 2018 10:40:12 +0000

Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

Let’s start by turning the clock back a few years. Ten or fifteen years ago search engines were usually closed source, mysterious black boxes, costing five or six-figure sums for even relatively modest installations (let’s say a couple of million documents – small by today’s standards). Huge amounts of custom code were necessary to integrate them with other systems and projects would take many months to demonstrate even basic search functionality. The trick was to get search working at all, even if the eventual results weren’t very relevant. Sadly even this was sometimes difficult to achieve.

Nowadays, search technology has become highly commoditized and many developers can build a functioning index of several milion documents in a couple of days with off-the-shelf, open source, freely available software. Even the commercial search firms are using open source cores – after all, what’s the point of developing them from scratch? Relevance is often ‘good enough’ out of the box for non business-critical applications.

A relevance engineer is required when things get a little more complicated and/or when good search is absolutely critical to your business. If you’re trading online, search can be a major driver of revenue and getting it wrong could cost you millions. If you’re worried about complying with the GDPR, MiFID or other regulations then ‘good enough’ simply isn’t if you want to prevent legal issues. If you’re serious about saving the time and money your employees waste looking for information or improving your business’ ability to thrive in a changing world then you need to do search right.

So what search engine should you choose before you find a relevance engineer to help with it? I’m going to go out on a limb here and say it doesn’t actually matter that much. At Flax we’re proponents of open source engines such as Apache Lucene/Solr and Elasticsearch (which have much to recommend them) but the plain fact is that most search engines are the same under the hood. They all use the same basic principles of information retrieval; they all build indexes of some kind; they all have to analyze the source data and user queries in much the same way (ignore ‘cognitive search’ and other ‘AI’ buzzwords for now, most of this is marketing rather than actual substance). If you’re using Microsoft Sharepoint across your business we’re not going to waste your time trying to convince you to move wholesale to a Linux-based open source alternative.

Any modern search engine should allow you the flexibility to adjust how data is ingested, how it is indexed, how queries are processed and how ranking is done. These are the technical tools that the relevance engineer can use to improve search quality. However, relevance engineering is never simply a technical task – in fact, without a business justification, adjusting these levers may make things worse rather than better.

In the next post I’ll cover how a relevance engineer can engage with a business to discover the why of relevance tuning. In the meantime you can read Doug Turnbull’s chapter in the free Search Insights 2018 report by the Search Network (the rest of the report is also very useful) and you might also be interested in the ‘Think like a relevance engineer’ training he is running soon in the USA. Of course, feel free to contact us for details of similar UK or EU-based training or if you need help with relevance engineering.

The post Defining relevance engineering, part 1: the background appeared first on Flax.

Catching MICES – a focus on e-commerce search

Charlie Hull — Tue, 19 Jun 2018 14:15:55 +0000

The second event I attended in Berlin last week was the Mix Camp on e-commerce search (MICES), a small and focused event now in its second year and kindly hosted by Mytoys at their offices. Slides for the talks are available here and I hope videos will appear soon.

The first talk was given by Karen Renshaw of Grainger, who Flax worked with at RS Components (she also wrote a great series of blog posts for us on improving relevancy). Karen’s talk drew on her long experience of managing search teams from a business standpoint – this wasn’t about technology but about combining processes, targets and objectives to improve search quality. She showed how to get started by examining customer feedback, known issues, competitors and benchmarks; how to understand and categorise query types; create a test plan within a cross-functional team and to plan for incremental change. Testing was covered including how to score search quality and how to examine the impact of search changes, with the message that “all aspects of search should work together to help customers through their journey”. She concluded with the clear point that there are no silver bullets, and that expectations must be managed during an ongoing, iterative process of improvement. This was a talk to set the scene for the day and containing lessons for every search manager (and a good few search technologists who often ignore the business factors!).

Next up were Christine Bellstedt & Jens Kürsten from Otto, Germany’s second biggest online retailer with over 850,000 search queries a day. Their talk focused on bringing together the users and business perspective to create a search quality testing cycle. They quoted Peter Freis’ graphic from his excellent talk at Haystack to illustrate how they created an offline system for experimentation with new ranking methods based on linear combinations of relevance scores from Solr, business performance indicators and product availability. They described how they learnt how hard it can be to select ranking features, create test query sets with suitable coverage and select appropriate metrics to measure. They also talked about how the experimentation cycle can be used to select ‘challengers’ to the current ‘champion’ ranking method, which can then be A/B tested online.

Pavel Penchev of SearchHub was next and presented their new search event collector library – a Javascript SDK which can be used to collect all kinds of metrics around user behaviour and submit them directly to a storage or analytics system (which could even be a search engine itself – e.g. Elasticsearch/Kibana). This is a very welcome development – only a couple of months ago at Haystack I heard several people bemoaning the lack of open source tools for collecting search analytics. We’ll certainly be trying out this open source library.

Andreas Brückner of e-commerce search vendor Fredhopper talked about the best way to optimise search quality in a business context. His ten headings included “build a dedicated search team” – although 14% of Fredhoppers own customers have no dedicated search staff – “build a measurement framework” – how else can you see how revenue might be improved? and “start with user needs, not features”. Much to agree with in this talk from someone with long experience of the sector from a vendor viewpoint.

Johannes Peter of MediaMarktSaturn described an implementation of a ‘semantic’ search platform which attempts to understand queries such as ‘MyMobile 7 without contract’, recognising this is a combination of a product name, a Boolean operator and an attribute. He described how an ontology (perhaps showing a family of available products and their variants) can be used in combination with various rules to create a more focused query e.g. “title:(“MyMobile7″) AND NOT (flag:contract)”. He also mentioned machine learning and term co-occurrence as useful methods but stressed that these experimental techniques should be treated with caution and one should ‘fail early’ if they are not producing useful results.

Ashraf Aaref & Felipe Besson described their journey using Learning to Rank to improve search at GetYourGuide, a marketplace for activities (e.g. tours and holidays). Using Elasticsearch and the LtR plugin recently released by our partners OpenSourceConnections they tried to improve the results for their ‘location pages’ (e.g. for Paris) but their first iteration actually gave worse results than the current system and was thus rejected by their QA process. They hope to repeat the process using what they have learned about how difficult it is to create good judgement data. This isn’t the first talk I’ve seen that honestly admits that ML approaches to improving search aren’t a magic silver bullet and the work itself is difficult and requires significant investment.

Duncan Blythe of Zalando gave what was the most forward-looking talk of the event, showing a pure Deep Learning approach to matching search queries to results – no query parsing, language analysis, ranking or anything, just a system that tries to learn what queries match which results for a product search. This reminded me of Doug & Tommaso’s talk at Buzzwords a couple of days before, using neural networks to learn the journey between query and document. Duncan did admit that this technique is computationally expensive and in no way ready for production, but it was exciting to hear about such cutting-edge (and well funded) research.

Doug Turnbull was the last speaker with a call to arms for more open source tooling, datasets and relevance judgements to be made available so we can all build better search technology. He gave a similar talk to keynote the Haystack event two months ago and you won’t be surprised to hear that I completely agree with his viewpoint – we all benefit from sharing information.

Unfortunately I had to leave MICES at this point and missed the more informal ‘bar camp’ event to follow, but I would like to thank all the hosts and organisers especially René Kriegler for such an interesting day. There seems to be a great community forming around e-commerce search which is highly encouraging – after all, this is one of the few sectors where one can draw a clear line between improving relevance and delivering more revenue.

The post Catching MICES – a focus on e-commerce search appeared first on Flax.

London Lucene/Solr Meetup – Relevance tuning for Elsevier’s Datasearch & harvesting data from PDFs

Charlie Hull — Thu, 03 May 2018 09:47:48 +0000

Elsevier were our kind hosts for the latest London Lucene/Solr Meetup and also provided the first speaker, Peter Cotroneo. Peter spoke about their DataSearch project, a search engine for scientific data. After describing how most other data search engines only index and rank results using metadata, Peter showed how Elsevier’s product indexes the data itself and also provides detailed previews. DataSearch uses Apache NiFi to connect to the source repositories, Amazon S3 for asset storage, Apache Spark to pre-process the data and Apache Solr for search. This is a huge project with many millions of items indexed.

Relevance is a major concern for this kind of system and Elsevier have developed many strategies for relevance tuning. Features such as highlighting and auto-suggest are used, lemmatisation rather than stemming (with scientific data, stemming can cause issues such as turning ‘Age’ into ‘Ag’ – the chemical symbol for silver) and a custom rescoring algorithm that can be used to promote up to 3 data results to the top of the list if deemed particularly relevant. Elsevier use both search logs and test queries generated by subject matter experts to feed into a custom-built judgement tool – which they are hoping to open source at some point (this would be a great complement to Quepid for test-based relevance tuning)

Peter also described a strategy for automatic optimization of the many query parameters available in Solr, using machine learning, based on some ideas first proposed by Simon Hughes of dice.com. Elsevier have also developed a Phrase Service API, which helps improve phrase based search over the standard un-ordered ‘bag of words’ model by recognising acronyms, chemical formulae, species, geolocations and more, expanding the original phrase based on these terms and then boosting them using Solr’s query parameters. He also mentioned a ‘push API’ available for data providers to push data directly into DataSearch. This was a necessarily brief dive into what is obviously a highly complex and powerful search engine built by Elsevier using many cutting-edge ideas.

Our next speaker, Michael Hardwick of Elite Software, talked about how textual data is stored in PDF files and the implications for extracting this data for search applications. In an engaging (and at some times slightly horrifying) talk he showed how PDFs effectively contain instructions for ‘painting’ characters onto the page and how certain essential text items such as spaces may not be stored at all. He demonstrated how fonts are stored within the PDF itself, how character encodings may be deliberately incorrect to prevent copy-and-paste operations and in general how very little if any semantic information is available. Using newspaper content as an example he showed how reading order is often difficult to extract as the PDF layout is a combination of the text from the original author and how it has been laid out on the page by an editor – so the headline may be have been added after the article text, which itself may have been split up into sections.

Tables in PDFs were described as a particular issue when attempting to extract numerical data for re-use – the data order may not be in the same order as it appears, for example if only part of a table is updated each week a regular publication appears. With PDF files sometimes compressed and encrypted the task of data extraction can become even more difficult. Michael laid out the choices available to those wanting to extract data: optical character recognition, a potentially very expensive Adobe API (that only gives the same quality of output as copy-and-paste), custom code as developed by his company and finally manual retyping, the latter being surprisingly common.

Thanks to both our speakers and our hosts Elsevier – we’re planning another Meetup soon, hopefully in mid to late June.

The post London Lucene/Solr Meetup – Relevance tuning for Elsevier’s Datasearch & harvesting data from PDFs appeared first on Flax.

Haystack, the search relevance conference – day 2

Charlie Hull — Mon, 23 Apr 2018 15:23:56 +0000

Two weeks ago I attended the Haystack relevance conference – I’ve already written about my overall impressions and on the first day’s talks but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Some of the presentations I haven’t linked to directly have now appeared on the conference website.

The second day of the event started for me with the enjoyable job of hosting a ‘fishbowl’ style panel session titled “No, You Don’t Want to Do It Like That! Stories from the search trenches”. The idea was that a rotating panel of speakers would tell us tales of their worst and hopefully most instructive search tuning experiences and we heard some great stories – this was by its nature an informal session and I don’t think anyone kept any notes (probably a good idea in the case of commercial sensitivity!).

The next talk was my favourite of the conference, given by René Kriegler on relevance scoring using product data and image recognition. René is an expert on e-commerce search (he also runs the MICES event in Berlin which I’m looking forward to) and described how this domain is unlike many others: the interests of the consumer (e.g. price or availability) becoming part of the relevance criteria. One of the interesting questions for e-commerce applications is how ranking can affect profit. Standard TF/IDF models don’t always work well for e-commerce data with short fields, leading to a score that can be almost binary: as he said ‘a laptop can’t be more laptop-ish than another’. Image recognition is a potentially useful technique and he demonstrated a way to take the output Google’s Inception machine learning model and use it to enrich documents within a search index. However, there can be over 1000 vectors output from this model and he described how a technique called random projection trees can be used to partition the vector space and thus produce simpler data for adding to the index (I think this is basically like slicing up a fruitcake and recording whether a currant was one side of the knife or the other, but that may not be quite how it works!). René has built a Solr plugin to implement this technique.

Next I went to Matt Overstreet’s talk on Vespa, a recently open sourced search and Big Data library from Oath (a part of Yahoo! Inc.). Matt described how Vespa could be used to build highly scalable personalised recommendation, search or realtime data display applications and took us through how Vespa is configured through a series of APIs and XML files. Interestingly (and perhaps unsurprisingly) Vespa has very little support for languages other than English at present. Queries are carried out through its own SQL-like language, YQL, and grouping and data aggregation functions are available. He also described how Vespa can use multidimensional arrays of values – tensors, for example from a neural network. Matt recommended we all try out Vespa – but on a cloud service not a low-powered laptop!

Ryan Pedala was up next to talk about named entity recognition (NER) and how it can be used to annodate or label data. He showed his experiments with tools including Prodigy and a custom GUI he had built and compared various NER libraries such Stanford NLP and OpenNLP and referenced an interesting paper on NER for travel-related queries. I didn’t learn a whole lot of new information from this talk but it may have been useful to those who haven’t considered using NER before.

Scott Stultz talked next on how to integrate business rules into a search application. He started with examples of key performance indicators (KPIs) that can be used for search – e.g. conversion ratios or average purchase values and how these should be tied to search metrics. They can then be measured both before and after changes are made to the search application: automated unit tests and more complex integration tests should also be used to check that search performance is actually improving. Interestingly for me he included within the umbrella of integration tests such techniques as testing the search with recent queries extracted from logs. He made some good practical points such as ‘think twice before adding complexity’ and that good autocomplete will often ‘cannibalize’ existing search as users simply choose the suggested completion rather than finishing typing the entire query. There were some great tips here for practical business-focused search improvements.

I then went to hear John Kane’s talk about interleaving for relevancy tuning which covered a method for updating a machine learning model in real-time using feedback from the current ranking powered by this model – simply by interleaving the results from two versions of this model. This isn’t a particularly new technique and the talk was somewhat of a product pitch for 904Labs, but the technique does apparently work and some customers have seen a 30% increase in conversion rate.

The last talk of the day came from Tim Allison on an evaluation platform for Apache Tika, a well-known library for text extraction from a variety of file formats. Interspersed with tales of ‘amusing’ and sometimes catastrophic ways for text extraction to fail, Tim described how tika-eval can be used to test how good Tika is at extracting data and output a set of metrics e.g. how many different MIME file types were found. The tool is now used to run regular regression tests for Tika on a dataset of 3 million files from the CommonCrawl project. We’re regular users of Tika at Flax and it was great to hear about the project is moving forward.

Doug Turnbull finished the conference with a brief summing up and thanks. There was a general feeling in the room that this conference was the start of something big and people were already asking when the next event would be! One of my takeaways from the event was that even though many of the talks used open source tools (perhaps unsurprisingly as it is so much easier to talk about these publically) the relevance tuning techniques and methods described can be applied to any search engine. The attendees were from a huge variety of companies, large and small, open and closed source based. This was an event about relevance engineering, not technology choices.

Thanks to all at OSC who made the event possible and for inviting us all to your home town – I think most if not all of us would happily visit again.

The post Haystack, the search relevance conference – day 2 appeared first on Flax.

Haystack, the search relevance conference – day 1

Charlie Hull — Wed, 18 Apr 2018 12:53:41 +0000

Last week I attended the Haystack relevance conference – I’ve already written about my overall impressions but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Those presentations I haven’t linked to directly should appear soon on the conference website.

Doug Turnbull of Open Source Connections gave the keynote presentation which led on the idea that we need more open source tools and methods for tuning relevance, including those to gather search analytics. He noted how the Learning to Rank plugins recently developed for both Solr and Elasticsearch have provided commoditized capabilities previously only described by academia and how we also need to build a cohesive community around search relevance. As it turned out, this conference did in my view signal the birth of that community.

Next up was Peter Fries who talked about a business-friendly approach to search quality, a subject close to my heart as I regularly have to discuss relevance tuning with non-technical staff. Peter described how search quality is often presented to business teams as mysterious and ‘not for them’ – without convincing these people of the value of search tuning we will fail to take account of business-related factors (and we’re also unlikely to get full buy-in for a relevance tuning project). He went on to say how it is important to include the marketing and management mindsets in this process and a method for search tuning involving feedback loops and an ‘iron triangle’ of measurement, data and optimisation. This was a very useful talk.

I then went to hear Chao Han of Lucidworks demonstrate how their product Fusion App Studio allows one to capture various signals and use these for ‘head and tail analysis’ – looking not just at the ‘head’ of popular, often-clicked results but those in the ‘tail’ that attract few clicks, possibly due to problems such as mis-spellings. Interestingly this approach allows automatic tail query rewriting – an example might be spotting a colour word such as ‘red’ in the query and rewriting this into a field query of colour:red. This was a popular talk although the presenter was a little mysterious about the exact methodology used, perhaps unsurprisingly as Fusion is a commercial product.

After a tasty Mexican-themed lunch I took a short break for some meetings, so missed the next set of talks. I then went to Elizabeth Haubert’s talk on Click Analytics. She began with a description of the venerable TREC conference (now in its 27th year!) which has evaluated relevance judgements and how these methods might be applied to real-world situations. For example, the TREC evaluations have shown that how relevance tests are assessed is as important as the tests themselves – the assessors are effectively also users of the system under test. She recommended calbrating both the rankings to a tester and the tester to the rankings, and to create a story around each test to put it in context and to help with disambiguation.

We finished the day with some lightning talks, sadly I didn’t take notes on these but check out Sujit’s aforementioned blog for more information. I do remember Tom Burgmans’ visualisation tool for Solr’s Explain debug feature which I’m very much looking forward to seeing as open source. The evening continued with a conference dinner nearby and some excellent local craft beer.

I’ll be covering the second day next.

The post Haystack, the search relevance conference – day 1 appeared first on Flax.