relevance engineering – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Little Mermaids, Haystacks and moving on http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/ http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/#respond Fri, 15 Feb 2019 09:47:25 +0000 http://www.flax.co.uk/?p=4033 As I announced recently Flax is joining OpenSource Connections, and I recently spent a very pleasant week in Virginia with my new colleagues discussing our plans for the year to come. Without giving too much away I can say that … More

The post Little Mermaids, Haystacks and moving on appeared first on Flax.

]]>
As I announced recently Flax is joining OpenSource Connections, and I recently spent a very pleasant week in Virginia with my new colleagues discussing our plans for the year to come. Without giving too much away I can say that this is a very exciting time to be joining OSC: one thing I will be doing soon is starting to write more about OSC’s proven process for supporting our clients as they move up the search relevance curve.

However before then I’ll be at speaking at a few events. At the end of this month I’ll be in Copenhagen to speak on Keeping Search Relevant in a Digital Workplace at the Intrateam conference. This is a fantastic conference on intranets and I’m looking forward to speaking for the second time and joining a very august gathering of speakers. I’m also glad to be returning to both City University and the University of Essex during February and March to talk to students about working in search and information retrieval

In April I’ll be returning to the US for OSC’s Haystack search relevance conference, which was my favourite event of last year – I liked it so much I brought it to London that October. This year we have a fantastic lineup of talks from speakers representing organisations including LexisNexis, Wikimedia Foundation, Eventbrite and Yelp, a new and more capacious venue in downtown Charlottesville, three training options before the main conference (Think Like A Relevance Engineer for Elasticsearch and Solr, and Learning to Rank) and of course the chance to meet, chat with and get to know some of the best search people in the business. Earlybird tickets are available until the end of February and are already selling well, so make your plans to join us soon!

It’s already shaping up to be a busy year – so do keep an eye on this blog and my new home at www.opensourceconnections.com/blog for further news, and if you’d like to know how OSC can help you empower your search team get in touch.

The post Little Mermaids, Haystacks and moving on appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2019/02/15/little-mermaids-haystacks-and-moving-on/feed/ 0
Flax joins OpenSource Connections http://www.flax.co.uk/blog/2018/12/21/flax-joins-opensource-connections/ http://www.flax.co.uk/blog/2018/12/21/flax-joins-opensource-connections/#respond Fri, 21 Dec 2018 12:09:24 +0000 http://www.flax.co.uk/?p=4017 We have some news! From February 1st 2019 Flax’s Managing Director Charlie Hull will be joining OpenSource Connections (OSC), Flax’s long-standing US partner, as a senior Managing Consultant. Charlie will manage a new UK division of OSC who will also … More

The post Flax joins OpenSource Connections appeared first on Flax.

]]>
We have some news!

From February 1st 2019 Flax’s Managing Director Charlie Hull will be joining OpenSource Connections (OSC), Flax’s long-standing US partner, as a senior Managing Consultant. Charlie will manage a new UK division of OSC who will also acquire some of Flax’s assets and brands. OSC are a highly regarded organisation in the world of search and relevance, wrote the seminal book Relevant Search and run the popular Haystack relevance conference. Their clients include the US Patent Office, the Wikimedia Foundation and Under Armour and their services include comprehensive training, Discovery engagements, Trusted Advisor consulting and expert implementation.

Lemur Consulting Ltd., which as most of you will know trades as Flax, will continue to operate and to complete current projects but will not be taking on any new business after January 2019. For any new business we will be forwarding all future Flax enquiries to OSC where Charlie will as ever be very happy to discuss requirements and how OSC’s expert team (which may include some familiar faces!) might help.

We are all very excited about this new development as it will create a larger team of independent search & relevance experts with a global reach. We fully expect to build on Flax’s 17 year history of providing high quality search solutions as part of OSC. We intend to continue managing the London Lucene/Solr Meetup and running, attending and speaking at other events on search related topics.

If you have any questions about the above please do contact us. Merry Christmas and best wishes for the New Year!

The post Flax joins OpenSource Connections appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/12/21/flax-joins-opensource-connections/feed/ 0
More needles, more Haystacks, more relevance! http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/ http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/#respond Wed, 05 Dec 2018 11:28:31 +0000 http://www.flax.co.uk/?p=4009 Those of us who have been working in the search sector for a while know that search tuning isn’t just a matter of installing the default configuration, pointing the engine at some content and starting it up – in fact, … More

The post More needles, more Haystacks, more relevance! appeared first on Flax.

]]>
Those of us who have been working in the search sector for a while know that search tuning isn’t just a matter of installing the default configuration, pointing the engine at some content and starting it up – in fact, if you do just that you’ll probably end up with a search user experience that’s even worse then whatever you’re replacing and certainly a lot worse than your competitors’ solution. It’s also no longer about just knowing how one engine behaves and the magic tweaks to improve it – you need to understand the fundamentals of search and how a range of different products and projects implement this. You also need to understand user requirements and their often entirely subjective views of what is a ‘good’ and ‘bad’ search result, plus how different types of businesses can use search technology for site search, enterprise search, media monitoring, process improvement and myriad of other uses.

Over the last year or so we’ve seen the emergence of a new profession dedicated to improving how search systems present information to users – Relevance Engineering. Importantly this covers not just the technical aspects of search, but the business aspects – understanding the why as much as the how. Relevance engineers understand that search tuning is a multifaceted problem and there are no magic bullets (or magic AI robots) that will do all the work for you. I’ve started to write about relevance engineering recently to try and define what it means.

One of my favourite events last year was the first Haystack conference run by our partners Open Source Connections, which brought together both experienced relevance engineers and those new to the profession. It was friendly, informal, focused and informative. In fact, I enjoyed it so much that by the second day I was already thinking about how to bring the event to Europe – which we did successfully in October.

I’m very happy to say that Haystack is back in April 2019 and the Call for Papers is open until January 9th. If you’ve got an exciting relevance project or idea to talk about please do submit it. See you there!

The post More needles, more Haystacks, more relevance! appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/12/05/more-needles-more-haystacks-more-relevance/feed/ 0
Defining relevance engineering part 4: tools http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/ http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/#comments Thu, 15 Nov 2018 14:30:51 +0000 http://www.flax.co.uk/?p=4000 Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this … More

The post Defining relevance engineering part 4: tools appeared first on Flax.

]]>
Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

In my previous installment of this guide I promised to write next about how to deliver the results of a relevance assessment, but I’ve since decided that this blog should instead cover the tools a relevance engineer can use to measure and tune search performance. Of course, some of these might be used to show results to a client as well, so it’s not an entirely different direction!

It’s also important to note that this is a rapidly evolving field and therefore cannot be a definitive list – and I welcome comments with further suggestions.

1. Gathering judgements

There are various ways to measure relevance, and one is to gather judgement data – either explicit (literally asking users to manually rate how relevant a result is) and implicit (using click data as a proxy, assuming that clicking on a result means it is relevant – which isn’t always true, unfortunately). One can build a user interface that lets users rate results (e.g. from Agnes Van Belle’s talk at Haystack Europe, see page 7) which may be available to everyone or just a select group, or one can use a specialised tool like Quepid that provides an alternative UI on top of your search engine. Even Excel or another spreadsheet can be used to record judgements (although this can become unwieldly at scale). For implicit ratings, there are Javascript libraries such as SearchHub’s search-collector or more complete analytics platforms such as Snowplow which will let you record the events happening on your search pages.

2. Understanding the query landscape

To find out what users are actually searching for and how successful their search journeys are, you will need to look at the log files of the search engine and the hosting platform it runs within. Open source engines such as Solr can provide detailed logs of every query, which will need to be processed into an overall picture. Google Analytics will tell you which Google queries brought users to your site. Some sophisticated analytics & query dashboards are also available – Luigi’s Box is a particularly powerful example for site search. Even a spreadsheets can be useful to graph the distribution of queries by volume, so you can see both the popular queries and those rare queries in the ‘long tail’. On Elasticsearch it’s even possible to submit this log data back into a search index and to display it using a Kibana visualisation.

3. Measurement and metrics

Once you have your data it’s usually necessary to calculate some metrics – overall measurements of how ‘good’ or ‘bad’ relevance is. There’s a long list of metrics commonly used by the Information Retrieval community such as NCDG which show the usefulness, or gain of a search result based on its position in a list. Tools such as Rated Ranking Evaluator (RRE) can calculate these metrics from supplied judgement lists (RRE can also run a whole test environment, spinning up Solr or Elasticsearch, performing a list of queries and recording and displaying the results).

4. Tuning the engine

Next you’ll need a way to adjust the configuration of the engine and/or figure out just why particular results are appearing (or not). These tools are usually specific to the search engine being used: Quepid, for example works with Solr and Elasticsearch and allows you to change query parameters and observe the effect on relevance scores; with RRE you can control the whole configuration of the Solr or Elasticsearch engine that it can then spin up for you. Commercial search engines will have their own tools for adjusting configuration or you may have to work within an overall content management (e.g Drupal) or e-commerce system (e.g. Hybris). Some of these latter systems may only give you limited control of the search engine, but could also let you adjust how content is processed and ingested or how synonyms are generated.

For Solr, tools such as the Google Chrome extension Solr Query Debugger can be used and the Solr Admin UI itself allows full control of Solr’s configuration. Solr’s debug query shows hugely detailed information as to why a query returned a result, but tools such as Splainer and Solr Explain are useful to make sense of this.

For Elasticsearch, the Kopf plugin was a useful tool, but has now been replaced by Cerebro. Elastic, the commercial company behind Elasticsearch offer their own tool Marvel on a 30-day free trial, after which you’ll need an Elastic subscription to use it. Marvel is built on the open source Kibana which also includes various developer tools.

If you need to dig (much) deeper into the Lucene indexes underneath Solr and Elasticsearch, the Lucene Index Toolbox (Luke) is available, or Flax’s own Marple index inspector.

 

As I said at the beginning this is by no means a definitive list – what are your favourite relevance tuning tools? Let me know in the comments!

In the next post I’ll cover how a relevance engineer can develop more powerful and ‘intelligent’ ways to tune search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 4: tools appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/feed/ 2
Haystack Europe 2018, a brief retrospective http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/ http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/#comments Mon, 15 Oct 2018 15:15:49 +0000 http://www.flax.co.uk/?p=3914 It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a … More

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
It’s been a couple of weeks now since the first Haystack search relevance conference in Europe, which we ran with our partners Open Source Connections (OSC). Just under a hundred people came to the Friends’ House in Euston for a day of talks covering both the business and technical aspects of relevance engineering. Doug Turnbull of OSC started the day by introducing what would be a major theme of the conference, Learning to Rank, and how Bloomberg had used and benefited from open sourcing their LTR plugin for Solr. Karen Renshaw of Zoro (a division of Grainger Global Online) talked about how to tune relevance from a business perspective. Sebastian Russ of Tudock showed how even something as simple as an Excel spreadsheet can be a useful visualisation tool for relevance, while Alessandro Benedetti and Andrea Gazzarini of Sease demonstrated Rated Ranking Evaluator, a complete platform for relevance measurement. After lunch, Torsten Köster & Fabian Klenk of Shopping 24 and consultant René Kriegler described their journey with LTR for an ecommerce site and Agnes Van Belle of Textkernel showed how similar techniques can be applied to recruitment search. Tony Russell-Rose was our last speaker on strategies and tools for managing complex Boolean queries.

My only regret was how little time I had personally to catch up with the attendees, many of whom were from Flax clients past and present – I must have had 20 or 30 very brief chats during the day! Luckily a few of us went on for a drink afterwards and eventually a curry nearby. It was a very long day but from the feedback we’ve recieved so far a very successful one. We hope to make this a regular event on the calendar.

Thanks to all who made the event possible, our speakers and everyone who came – the slides are now available on the event website.

The post Haystack Europe 2018, a brief retrospective appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/10/15/haystack-europe-2018-a-brief-retrospective/feed/ 2
Defining relevance engineering part 3: technical assessment http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/ http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/#respond Wed, 11 Jul 2018 09:49:11 +0000 http://www.flax.co.uk/?p=3873 Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this … More

The post Defining relevance engineering part 3: technical assessment appeared first on Flax.

]]>
Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do?

In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

When Flax is working with clients on relevance tuning engagements we aim to gain an overview of the various technology the client uses and how it is obtained, deployed, managed and maintained. This will include not just the search engine but the various systems that supply data to it, host it, monitor it and interface to it to pass results to users. In addition we must understand who is responsible for the various areas, be it in-house staff, consultants, outsourcing or third party suppliers.

We try to answer the following questions in detail, including who supplies, modifies, maintains and supports the various systems concerned, what versions are used and where and how they are hosted and configured. We hope for full access to inspect the systems but this is not always possible – at the least, we need copies of configuration files and settings.

  • What systems supply the source data for search?
  • What is the current search technology?
  • Is the search engine part of another system (such as a content management system or product information system)?
  • What interface is there between the systems that supply source data and the search engine?
  • What systems monitor and manage the search engine?
  • What systems are used to submit queries to the search engine?
  • What query logging is performed and at what level?
  • How are development, test, staging and production systems arranged and what access is available to these?
  • What are the processes used to deploy new software and configuration?
  • What testing is performed?

It’s common to find flaws in the overall technical landscape – as an example, we’ll often find that there is no effective source control of search engine configuration files, with these having been originally derived from an example setup not intended for production use and since modified ad-hoc as issues arose. In this case it’s quite common that no-one knows why a particular setting has been used!

Without a good overall idea of the technology landscape it will be hard if not impossible to improve relevance. External processes (such as how hard it is to obtain a recent and complete log file from a production system) will also impact how effective these improvements will be.

Finally, as search is often owned by the IT department (and by the time we arrive, search is usually viewed as ‘broken’) we sometimes find a ‘bunker mentality’ – those responsible for the implementation are hunkered down and used to being harried and complained at by others who are unhappy with how search is (not) working. It’s important to communicate that only by being open and honest about the current situation can we all work together to improve things and build better search.

In the next post I’ll cover the tools a relevance engineer can use. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 3: technical assessment appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/07/11/defining-relevance-engineering-part-3-technical-assessment/feed/ 0
Defining relevance engineering part 2: learning the business http://www.flax.co.uk/blog/2018/06/26/defining-relevance-engineering-part-2-learning-the-business/ http://www.flax.co.uk/blog/2018/06/26/defining-relevance-engineering-part-2-learning-the-business/#respond Tue, 26 Jun 2018 11:16:57 +0000 http://www.flax.co.uk/?p=3845 Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this … More

The post Defining relevance engineering part 2: learning the business appeared first on Flax.

]]>
Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do?

In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

Before a relevance engineer can install or configure a search engine they need to understand the business concerned. I’ve called this ‘learning the business’ and it’s something that Flax has to do on a weekly basis. One week we may be talking to a recruitment business that thinks and operates in terms of jobs, skills, candidates and roles; the next week it could be a company that sells specialised products and is more concerned with features, prices, availability, stock levels and pack sizes. Even within a single sector, each business will work in a slightly different way, although there will be some common factors.

Example data is key to learning how a business works, but is next to useless without someone to explain it in context. In some cases the business has lost some of the internal knowledge about how their own systems work: “Jeff built that database, but he left two years ago.”. What seems obvious to them may not be obvious to anyone else. Generic terms e.g. “products”, “location”, “keywords” can mean completely different things in each business context. If they exist, corporate glossaries, dictionaries or taxonomies are very useful, but again they may need annotating to explain what each entry means. If a glossary doesn’t exist, it’s a good first step to start one.

Finding the right people to talk to is also vital. Although relevance engineers are usually engaged or recruited by the IT department, this may not be the best place to learn about the business. The marketing department may have the best view of how the business interacts with its clients; the CEO or Managing Director will know the overall direction and objectives but may not have time for the detail; the content creators (which could be librarians, web editors or product information managers) will know about the items the search engine will need to find.

In many companies there are hierarchies and structures that sometimes actively prevent the sharing of information: it’s common to discover who blames who for past bad decisions and to be used as a sounding board by those with axes to grind. At Flax we try to make sure we talk to people at all levels in the client organisation: sometimes the most junior employees – and especially those who are customer-facing – have the most useful information as they have to deal with problems on a day-to-day basis. As external consultants one of our most useful skills is being able to listen without making sudden judgements or assumptions.

The end result of these many conversations is an understanding of where source data is created, gathered and stored; what a ‘search result’ is in the context of a particular business (a product on sale? A contract? A CV or resumé?) and how it might be constructed from this data; what a ‘relevant’ result is in this context (a more valuable product to sell? The most recent contract version? The best candidate for a job?) and how good/bad/nonexistent the current search solution is. This is vital information to be gathered before one even begins thinking about how to install, develop and/or configure and test a search solution.

In the next post I’ll cover how a relevance engineer might assess the technical capability of a business with respect to search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 2: learning the business appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/06/26/defining-relevance-engineering-part-2-learning-the-business/feed/ 0
Defining relevance engineering, part 1: the background http://www.flax.co.uk/blog/2018/06/25/defining-relevance-engineering-part-1-the-background/ http://www.flax.co.uk/blog/2018/06/25/defining-relevance-engineering-part-1-the-background/#respond Mon, 25 Jun 2018 10:40:12 +0000 http://www.flax.co.uk/?p=3838 Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this … More

The post Defining relevance engineering, part 1: the background appeared first on Flax.

]]>
Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

Let’s start by turning the clock back a few years. Ten or fifteen years ago search engines were usually closed source, mysterious black boxes, costing five or six-figure sums for even relatively modest installations (let’s say a couple of million documents – small by today’s standards). Huge amounts of custom code were necessary to integrate them with other systems and projects would take many months to demonstrate even basic search functionality. The trick was to get search working at all, even if the eventual results weren’t very relevant. Sadly even this was sometimes difficult to achieve.

Nowadays, search technology has become highly commoditized and many developers can build a functioning index of several milion documents in a couple of days with off-the-shelf, open source, freely available software. Even the commercial search firms are using open source cores – after all, what’s the point of developing them from scratch? Relevance is often ‘good enough’ out of the box for non business-critical applications.

A relevance engineer is required when things get a little more complicated and/or when good search is absolutely critical to your business. If you’re trading online, search can be a major driver of revenue and getting it wrong could cost you millions. If you’re worried about complying with the GDPR, MiFID or other regulations then ‘good enough’ simply isn’t if you want to prevent legal issues. If you’re serious about saving the time and money your employees waste looking for information or improving your business’ ability to thrive in a changing world then you need to do search right.

So what search engine should you choose before you find a relevance engineer to help with it? I’m going to go out on a limb here and say it doesn’t actually matter that muchAt Flax we’re proponents of open source engines such as Apache Lucene/Solr and Elasticsearch (which have much to recommend them) but the plain fact is that most search engines are the same under the hood. They all use the same basic principles of information retrieval; they all build indexes of some kind; they all have to analyze the source data and user queries in much the same way (ignore ‘cognitive search’ and other ‘AI’ buzzwords for now, most of this is marketing rather than actual substance). If you’re using Microsoft Sharepoint across your business we’re not going to waste your time trying to convince you to move wholesale to a Linux-based open source alternative.

Any modern search engine should allow you the flexibility to adjust how data is ingested, how it is indexed, how queries are processed and how ranking is done. These are the technical tools that the relevance engineer can use to improve search quality. However, relevance engineering is never simply a technical task – in fact, without a business justification, adjusting these levers may make things worse rather than better.

In the next post I’ll cover how a relevance engineer can engage with a business to discover the why of relevance tuning. In the meantime you can read Doug Turnbull’s chapter in the free Search Insights 2018 report by the Search Network (the rest of the report is also very useful) and you might also be interested in the ‘Think like a relevance engineer’ training he is running soon in the USA. Of course, feel free to contact us for details of similar UK or EU-based training or if you need help with relevance engineering.

The post Defining relevance engineering, part 1: the background appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2018/06/25/defining-relevance-engineering-part-1-the-background/feed/ 0