Search Results for “elasticsearch” – Flax

Little Mermaids, Haystacks and moving on

Charlie Hull — Fri, 15 Feb 2019 09:47:25 +0000

As I announced recently Flax is joining OpenSource Connections, and I recently spent a very pleasant week in Virginia with my new colleagues discussing our plans for the year to come. Without giving too much away I can say that this is a very exciting time to be joining OSC: one thing I will be doing soon is starting to write more about OSC’s proven process for supporting our clients as they move up the search relevance curve.

However before then I’ll be at speaking at a few events. At the end of this month I’ll be in Copenhagen to speak on Keeping Search Relevant in a Digital Workplace at the Intrateam conference. This is a fantastic conference on intranets and I’m looking forward to speaking for the second time and joining a very august gathering of speakers. I’m also glad to be returning to both City University and the University of Essex during February and March to talk to students about working in search and information retrieval

In April I’ll be returning to the US for OSC’s Haystack search relevance conference, which was my favourite event of last year – I liked it so much I brought it to London that October. This year we have a fantastic lineup of talks from speakers representing organisations including LexisNexis, Wikimedia Foundation, Eventbrite and Yelp, a new and more capacious venue in downtown Charlottesville, three training options before the main conference (Think Like A Relevance Engineer for Elasticsearch and Solr, and Learning to Rank) and of course the chance to meet, chat with and get to know some of the best search people in the business. Earlybird tickets are available until the end of February and are already selling well, so make your plans to join us soon!

It’s already shaping up to be a busy year – so do keep an eye on this blog and my new home at www.opensourceconnections.com/blog for further news, and if you’d like to know how OSC can help you empower your search team get in touch.

The post Little Mermaids, Haystacks and moving on appeared first on Flax.

Defining relevance engineering part 4: tools

Charlie Hull — Thu, 15 Nov 2018 14:30:51 +0000

Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

In my previous installment of this guide I promised to write next about how to deliver the results of a relevance assessment, but I’ve since decided that this blog should instead cover the tools a relevance engineer can use to measure and tune search performance. Of course, some of these might be used to show results to a client as well, so it’s not an entirely different direction!

It’s also important to note that this is a rapidly evolving field and therefore cannot be a definitive list – and I welcome comments with further suggestions.

1. Gathering judgements

There are various ways to measure relevance, and one is to gather judgement data – either explicit (literally asking users to manually rate how relevant a result is) and implicit (using click data as a proxy, assuming that clicking on a result means it is relevant – which isn’t always true, unfortunately). One can build a user interface that lets users rate results (e.g. from Agnes Van Belle’s talk at Haystack Europe, see page 7) which may be available to everyone or just a select group, or one can use a specialised tool like Quepid that provides an alternative UI on top of your search engine. Even Excel or another spreadsheet can be used to record judgements (although this can become unwieldly at scale). For implicit ratings, there are Javascript libraries such as SearchHub’s search-collector or more complete analytics platforms such as Snowplow which will let you record the events happening on your search pages.

2. Understanding the query landscape

To find out what users are actually searching for and how successful their search journeys are, you will need to look at the log files of the search engine and the hosting platform it runs within. Open source engines such as Solr can provide detailed logs of every query, which will need to be processed into an overall picture. Google Analytics will tell you which Google queries brought users to your site. Some sophisticated analytics & query dashboards are also available – Luigi’s Box is a particularly powerful example for site search. Even a spreadsheets can be useful to graph the distribution of queries by volume, so you can see both the popular queries and those rare queries in the ‘long tail’. On Elasticsearch it’s even possible to submit this log data back into a search index and to display it using a Kibana visualisation.

3. Measurement and metrics

Once you have your data it’s usually necessary to calculate some metrics – overall measurements of how ‘good’ or ‘bad’ relevance is. There’s a long list of metrics commonly used by the Information Retrieval community such as NCDG which show the usefulness, or gain of a search result based on its position in a list. Tools such as Rated Ranking Evaluator (RRE) can calculate these metrics from supplied judgement lists (RRE can also run a whole test environment, spinning up Solr or Elasticsearch, performing a list of queries and recording and displaying the results).

4. Tuning the engine

Next you’ll need a way to adjust the configuration of the engine and/or figure out just why particular results are appearing (or not). These tools are usually specific to the search engine being used: Quepid, for example works with Solr and Elasticsearch and allows you to change query parameters and observe the effect on relevance scores; with RRE you can control the whole configuration of the Solr or Elasticsearch engine that it can then spin up for you. Commercial search engines will have their own tools for adjusting configuration or you may have to work within an overall content management (e.g Drupal) or e-commerce system (e.g. Hybris). Some of these latter systems may only give you limited control of the search engine, but could also let you adjust how content is processed and ingested or how synonyms are generated.

For Solr, tools such as the Google Chrome extension Solr Query Debugger can be used and the Solr Admin UI itself allows full control of Solr’s configuration. Solr’s debug query shows hugely detailed information as to why a query returned a result, but tools such as Splainer and Solr Explain are useful to make sense of this.

For Elasticsearch, the Kopf plugin was a useful tool, but has now been replaced by Cerebro. Elastic, the commercial company behind Elasticsearch offer their own tool Marvel on a 30-day free trial, after which you’ll need an Elastic subscription to use it. Marvel is built on the open source Kibana which also includes various developer tools.

If you need to dig (much) deeper into the Lucene indexes underneath Solr and Elasticsearch, the Lucene Index Toolbox (Luke) is available, or Flax’s own Marple index inspector.

As I said at the beginning this is by no means a definitive list – what are your favourite relevance tuning tools? Let me know in the comments!

In the next post I’ll cover how a relevance engineer can develop more powerful and ‘intelligent’ ways to tune search. In the meantime you can read the free Search Insights 2018 report by the Search Network. Of course, feel free to contact us if you need help with relevance engineering.

The post Defining relevance engineering part 4: tools appeared first on Flax.

Search Meetups

Charlie Hull — Tue, 06 Nov 2018 09:25:06 +0000

A list of known Search Meetups, informal gatherings of search engine developers, enthusiasts, experts and beginners (nothing to do with SEO, sorry!). If you’d like your event to be listed please email us with a hyperlink and geographical location. We’ve put some in italics as we know these groups haven’t met for a while, and added a bold V for those we think are fully or partly organised by vendors. Usually we blog about the Meetups we attend. If there’s no Meetup in your region – why not start one? We’d be happy to advise you!

Jump to your region: North America South America Europe Asia/Australia Africa

North America

USA

Charlottesville Apache Lucene/Solr Meetup, VA
New England Search Technologies (NEST) Group (Burlington, MA)
Boston Search Group, MA
Search and Discovery Chicago, IL
Elastic New York City User Group, NY V
Los Angeles Search, Data and Analytics Meetup, CA
Enterprise Search and Analytics Meetup, Cupertino, CA
Downtown SF Apache Lucene/Solr Meetup, CA V
Elastic Indianapolis User Group, IN V
Elastic AK User Group, Anchorage, AK V
Elastic Philadelphia User Group, PA V
Chicago Apache Lucene Solr User Group, IL V
Atlanta Apache Lucene/Solr Meetup, GA V
Elastic Atlanta User Group, GA V
Elastic Dallas User Group, TX V
Elastic Houston User Group, TX V
Seattle Solr/Lucene Meetup, WA
Salt Lake City Elastic Meetup, UT V
Portland Elastic Meetup Group, OR V
Elastic Washington, DC User Group V
Elastic Madison & Milwaukee User Group, Madison, WI V
Elastic
Detroit User Group, MI V
Elastic South Florida User Group, Miami, FL V
Elastic Silicon Valley User Group, Mountain View, CA V
Los Angeles/OC Apache Lucene/Solr User Group, Santa Monica, CA V

Canada

Coveo User Group (Montréal) V
Waterloo/Kitchener Elasticsearch Meetup V
Ottawa Elastic User Group V
Montreal Solr/ML group
Elasticsearch Toronto
Victoria Elasticsearch and Machine Learning, BC

South America

Mexico

Elastic Oaxaca User Group V

Chili

Elasticsearch Chile, Santiago

Europe

UK

London Lucene/Solr Meetup (run by Flax)
London Elasticsearch User Group Meetup
Cambridge Elasticsearch Meetup V
Enterprise Search London (last met in 2014)
Enterprise Search Cambridge (last met in 2016 – run by Flax)
Elastic – Scotland V

Netherlands

Search Engines Amsterdam
Elastic User Group NL, Amsterdam V

Germany

Search Usergroup Berlin
Search Technology Meetup, Berlin
Search Technology Meetup Hamburg
Search Meetup NRW (Düsseldorf)
München Enterprise Productivity & Search
Search Meetup Karlsruhe
Elasticsearch Berlin
Search Meetup Munich V

Austria

Elasticsearch Usergroup Vienna

Belgium

Belgian Enterprise Search Meetup

France

Enterprise Search Paris
Elastic FR (Paris) V

Switzerland

Elastic Switzerland, Zürich V

Spain

Elastic – Barcelona V
Madrid ElasticSearch Meetup

Italy

Elastic – Italy, Milano V

Luxembourg

Elastic Luxembourg User Group V

Croatia

Elastic Zagreb

Greece

Elastic Greece, Athens V

Turkey

Elastic – Turkey, Istanbul

Israel

Elastic, Tel Aviv-Yafo V

Denmark

Elastic – Copenhagen V

Sweden

Elastic – Göteborg V
Elastic – Stockholm V

Norway

Elastic Oslo User Group V
Oslo Solr Community

Finland

Elastic – Helsinki

Asia & Australia

India

Bangalore Big Data Search Group
Bangalore Apache Solr/Lucene Group V
Delhi Elasticsearch Meetup
Elastic Gujarat User Group, Ahmedabad V
Hyderabad Apache Solr/Lucene Group
Elastic Kolkata User Group V
Elasticsearch Explorers, Bangalore V
Mumbai Elastic User Group V
Elastic Kochi User Group V

Japan

Elastic Tokyo User Group V

Pakistan

Elasticsearch User Group, Lahore V

Australia

Melbourne Search and Recommendation Group
Elastic Melbourne User Group V
Elastic – Sydney V
Elastic Brisbane User Group V
Elastic Perth User Group V

Singapore

Apache Lucene/Solr Singapore

China

Elastic China Users, Beijing

Africa

Ivory Coast

Elastic User Group, Abidjan V

South Africa

Elasticsearch, Johannesburg V

The post Search Meetups appeared first on Flax.

Three weeks of search events this October from Flax

Charlie Hull — Tue, 04 Sep 2018 10:11:56 +0000

Flax has always been very active at conferences and events – we enjoy meeting people to talk about search! With much of our consultancy work being carried out remotely these days, attending events is a great way to catch up in person with our clients, colleagues and peers and to learn from others about what works (and what doesn’t) when building cutting-edge search solutions. I’m thus very glad to announce that we’re running three search events this coming October.

Earlier in the year I attended Haystack in Charlottesville, one of my favourite search conferences ever – and almost immediately began to think about whether we could run a similar event here in Europe. Although we’ve only had a few months I’m very happy to say we’ve managed to pull together a high-quality programme of talks for our first Haystack Europe event, to be held in London on October 2nd. The event is focused on search relevance from both a business and a technical perspective and we have speakers from global retailers and by specialist consultants and authors. Tickets are already selling well and we have limited space, so I would encourage you to register as soon as you can (Haystack USA sold out even after the capacity was increased). We’re running the event in partnership with Open Source Connections.

The next week we’re running a Lucene Hackday on October 9th as part of our London Lucene/Solr Meetup programme. Building on previous successful events, this is a day of hacking on the Apache Lucene search engine and associated software such as Apache Solr and Elasticsearch. You can read up on what we achieved at our last event a couple of years ago – again, space is limited, so sign up soon to this free event (huge thanks to Mimecast for providing the venue and to Elastic for sponsoring drinks and food for an evening get-together afterwards). Bring a laptop and your ideas (and do comment on the event page if you have any suggestions for what we should work on).

We’ll be flying to Montreal soon afterwards to attend the Activate conference (run by our partners Lucidworks) and while we’re there we’ll host another free Lucene Hackday on October 15th. Again, this would not be possible without sponsorship and so thanks must go to Netgovern, SearchStax and One More Cloud. Remember to tell us your ideas in the comments.

So that’s three weeks of excellent search events – see you there!

The post Three weeks of search events this October from Flax appeared first on Flax.

Lucene Solr London: Search Quality Testing and Search Procurement

Charlie Hull — Fri, 29 Jun 2018 11:09:34 +0000

Mimecast were our kind hosts for the latest London Lucene/Solr Meetup (and even provided goodie bags). It’s worth repeating that we couldn’t run these events without the help of sponsors and hosts and we’re always very grateful (and keep those offers coming!).

First up was Andrea Gazzarini presenting a brand new framework for search quality testing. Designed for offline measurement, Rated Ranking Evaluator is an open source Java library (although it can be used from other languages). It uses a heirarchical model to arrange queries into query groups (all queries in a query group should be producing the same results). Each test can run across a number of search engine configuration versions and outputs results in JSON format – but these can also be translated into Excel spreadsheets, PDFs or sent to a server that provides a live console showing how search quality is affected by a search engine configuration change. Although aimed at Elasticsearch and Solr, the platform is extensible to any underlying search engine. This is a very useful tool for search developers and joins Quepid and Searchhub’s recently released search analytics acquisition library in the ‘toolbox’ for relevance engineers. You can see Andrea’s slides here.

Martin White spoke next on how open source search solutions fare in corporate procurements for enterprise search. This was an engaging talk from Martin , showing the scale of the opportunities for open source platforms with budgets of several million pounds being common for enterprise search projects. However, as he mentioned it can be very difficult for procurement departments to get information from vendors and ‘the last thing you’ll know about a piece of enterprise software is how much it will cost’. He detailed how open source solutions often compare badly against closed source commercial offerings due to it being hard to see the ‘edges’ – e.g. what custom development will be necessary to fulfil enterprise requirements. Although the opportunities are clear, it seems open source based solutions still have a way to go to compete. You can read more from Martin on this subject in the recent free Search Insights report.

Thanks to Mimecast and both speakers – we’ll be back after the summer with another Meetup!

The post Lucene Solr London: Search Quality Testing and Search Procurement appeared first on Flax.

Defining relevance engineering, part 1: the background

Charlie Hull — Mon, 25 Jun 2018 10:40:12 +0000

Relevance Engineering is a relatively new concept but companies such as Flax and our partners Open Source Connections have been carrying out relevance engineering for many years. So what is a relevance engineer and what do they do? In this series of blog posts I’ll try to explain what I see as a new, emerging and important profession.

Let’s start by turning the clock back a few years. Ten or fifteen years ago search engines were usually closed source, mysterious black boxes, costing five or six-figure sums for even relatively modest installations (let’s say a couple of million documents – small by today’s standards). Huge amounts of custom code were necessary to integrate them with other systems and projects would take many months to demonstrate even basic search functionality. The trick was to get search working at all, even if the eventual results weren’t very relevant. Sadly even this was sometimes difficult to achieve.

Nowadays, search technology has become highly commoditized and many developers can build a functioning index of several milion documents in a couple of days with off-the-shelf, open source, freely available software. Even the commercial search firms are using open source cores – after all, what’s the point of developing them from scratch? Relevance is often ‘good enough’ out of the box for non business-critical applications.

A relevance engineer is required when things get a little more complicated and/or when good search is absolutely critical to your business. If you’re trading online, search can be a major driver of revenue and getting it wrong could cost you millions. If you’re worried about complying with the GDPR, MiFID or other regulations then ‘good enough’ simply isn’t if you want to prevent legal issues. If you’re serious about saving the time and money your employees waste looking for information or improving your business’ ability to thrive in a changing world then you need to do search right.

So what search engine should you choose before you find a relevance engineer to help with it? I’m going to go out on a limb here and say it doesn’t actually matter that much. At Flax we’re proponents of open source engines such as Apache Lucene/Solr and Elasticsearch (which have much to recommend them) but the plain fact is that most search engines are the same under the hood. They all use the same basic principles of information retrieval; they all build indexes of some kind; they all have to analyze the source data and user queries in much the same way (ignore ‘cognitive search’ and other ‘AI’ buzzwords for now, most of this is marketing rather than actual substance). If you’re using Microsoft Sharepoint across your business we’re not going to waste your time trying to convince you to move wholesale to a Linux-based open source alternative.

Any modern search engine should allow you the flexibility to adjust how data is ingested, how it is indexed, how queries are processed and how ranking is done. These are the technical tools that the relevance engineer can use to improve search quality. However, relevance engineering is never simply a technical task – in fact, without a business justification, adjusting these levers may make things worse rather than better.

In the next post I’ll cover how a relevance engineer can engage with a business to discover the why of relevance tuning. In the meantime you can read Doug Turnbull’s chapter in the free Search Insights 2018 report by the Search Network (the rest of the report is also very useful) and you might also be interested in the ‘Think like a relevance engineer’ training he is running soon in the USA. Of course, feel free to contact us for details of similar UK or EU-based training or if you need help with relevance engineering.

The post Defining relevance engineering, part 1: the background appeared first on Flax.

Catching MICES – a focus on e-commerce search

Charlie Hull — Tue, 19 Jun 2018 14:15:55 +0000

The second event I attended in Berlin last week was the Mix Camp on e-commerce search (MICES), a small and focused event now in its second year and kindly hosted by Mytoys at their offices. Slides for the talks are available here and I hope videos will appear soon.

The first talk was given by Karen Renshaw of Grainger, who Flax worked with at RS Components (she also wrote a great series of blog posts for us on improving relevancy). Karen’s talk drew on her long experience of managing search teams from a business standpoint – this wasn’t about technology but about combining processes, targets and objectives to improve search quality. She showed how to get started by examining customer feedback, known issues, competitors and benchmarks; how to understand and categorise query types; create a test plan within a cross-functional team and to plan for incremental change. Testing was covered including how to score search quality and how to examine the impact of search changes, with the message that “all aspects of search should work together to help customers through their journey”. She concluded with the clear point that there are no silver bullets, and that expectations must be managed during an ongoing, iterative process of improvement. This was a talk to set the scene for the day and containing lessons for every search manager (and a good few search technologists who often ignore the business factors!).

Next up were Christine Bellstedt & Jens Kürsten from Otto, Germany’s second biggest online retailer with over 850,000 search queries a day. Their talk focused on bringing together the users and business perspective to create a search quality testing cycle. They quoted Peter Freis’ graphic from his excellent talk at Haystack to illustrate how they created an offline system for experimentation with new ranking methods based on linear combinations of relevance scores from Solr, business performance indicators and product availability. They described how they learnt how hard it can be to select ranking features, create test query sets with suitable coverage and select appropriate metrics to measure. They also talked about how the experimentation cycle can be used to select ‘challengers’ to the current ‘champion’ ranking method, which can then be A/B tested online.

Pavel Penchev of SearchHub was next and presented their new search event collector library – a Javascript SDK which can be used to collect all kinds of metrics around user behaviour and submit them directly to a storage or analytics system (which could even be a search engine itself – e.g. Elasticsearch/Kibana). This is a very welcome development – only a couple of months ago at Haystack I heard several people bemoaning the lack of open source tools for collecting search analytics. We’ll certainly be trying out this open source library.

Andreas Brückner of e-commerce search vendor Fredhopper talked about the best way to optimise search quality in a business context. His ten headings included “build a dedicated search team” – although 14% of Fredhoppers own customers have no dedicated search staff – “build a measurement framework” – how else can you see how revenue might be improved? and “start with user needs, not features”. Much to agree with in this talk from someone with long experience of the sector from a vendor viewpoint.

Johannes Peter of MediaMarktSaturn described an implementation of a ‘semantic’ search platform which attempts to understand queries such as ‘MyMobile 7 without contract’, recognising this is a combination of a product name, a Boolean operator and an attribute. He described how an ontology (perhaps showing a family of available products and their variants) can be used in combination with various rules to create a more focused query e.g. “title:(“MyMobile7″) AND NOT (flag:contract)”. He also mentioned machine learning and term co-occurrence as useful methods but stressed that these experimental techniques should be treated with caution and one should ‘fail early’ if they are not producing useful results.

Ashraf Aaref & Felipe Besson described their journey using Learning to Rank to improve search at GetYourGuide, a marketplace for activities (e.g. tours and holidays). Using Elasticsearch and the LtR plugin recently released by our partners OpenSourceConnections they tried to improve the results for their ‘location pages’ (e.g. for Paris) but their first iteration actually gave worse results than the current system and was thus rejected by their QA process. They hope to repeat the process using what they have learned about how difficult it is to create good judgement data. This isn’t the first talk I’ve seen that honestly admits that ML approaches to improving search aren’t a magic silver bullet and the work itself is difficult and requires significant investment.

Duncan Blythe of Zalando gave what was the most forward-looking talk of the event, showing a pure Deep Learning approach to matching search queries to results – no query parsing, language analysis, ranking or anything, just a system that tries to learn what queries match which results for a product search. This reminded me of Doug & Tommaso’s talk at Buzzwords a couple of days before, using neural networks to learn the journey between query and document. Duncan did admit that this technique is computationally expensive and in no way ready for production, but it was exciting to hear about such cutting-edge (and well funded) research.

Doug Turnbull was the last speaker with a call to arms for more open source tooling, datasets and relevance judgements to be made available so we can all build better search technology. He gave a similar talk to keynote the Haystack event two months ago and you won’t be surprised to hear that I completely agree with his viewpoint – we all benefit from sharing information.

Unfortunately I had to leave MICES at this point and missed the more informal ‘bar camp’ event to follow, but I would like to thank all the hosts and organisers especially René Kriegler for such an interesting day. There seems to be a great community forming around e-commerce search which is highly encouraging – after all, this is one of the few sectors where one can draw a clear line between improving relevance and delivering more revenue.

The post Catching MICES – a focus on e-commerce search appeared first on Flax.

Haystack, the search relevance conference – day 1

Charlie Hull — Wed, 18 Apr 2018 12:53:41 +0000

Last week I attended the Haystack relevance conference – I’ve already written about my overall impressions but the following are some more notes on the conference sessions. Note that some of the presentations I attended have already been covered in detail by Sujit Pal’s excellent blog. Those presentations I haven’t linked to directly should appear soon on the conference website.

Doug Turnbull of Open Source Connections gave the keynote presentation which led on the idea that we need more open source tools and methods for tuning relevance, including those to gather search analytics. He noted how the Learning to Rank plugins recently developed for both Solr and Elasticsearch have provided commoditized capabilities previously only described by academia and how we also need to build a cohesive community around search relevance. As it turned out, this conference did in my view signal the birth of that community.

Next up was Peter Fries who talked about a business-friendly approach to search quality, a subject close to my heart as I regularly have to discuss relevance tuning with non-technical staff. Peter described how search quality is often presented to business teams as mysterious and ‘not for them’ – without convincing these people of the value of search tuning we will fail to take account of business-related factors (and we’re also unlikely to get full buy-in for a relevance tuning project). He went on to say how it is important to include the marketing and management mindsets in this process and a method for search tuning involving feedback loops and an ‘iron triangle’ of measurement, data and optimisation. This was a very useful talk.

I then went to hear Chao Han of Lucidworks demonstrate how their product Fusion App Studio allows one to capture various signals and use these for ‘head and tail analysis’ – looking not just at the ‘head’ of popular, often-clicked results but those in the ‘tail’ that attract few clicks, possibly due to problems such as mis-spellings. Interestingly this approach allows automatic tail query rewriting – an example might be spotting a colour word such as ‘red’ in the query and rewriting this into a field query of colour:red. This was a popular talk although the presenter was a little mysterious about the exact methodology used, perhaps unsurprisingly as Fusion is a commercial product.

After a tasty Mexican-themed lunch I took a short break for some meetings, so missed the next set of talks. I then went to Elizabeth Haubert’s talk on Click Analytics. She began with a description of the venerable TREC conference (now in its 27th year!) which has evaluated relevance judgements and how these methods might be applied to real-world situations. For example, the TREC evaluations have shown that how relevance tests are assessed is as important as the tests themselves – the assessors are effectively also users of the system under test. She recommended calbrating both the rankings to a tester and the tester to the rankings, and to create a story around each test to put it in context and to help with disambiguation.

We finished the day with some lightning talks, sadly I didn’t take notes on these but check out Sujit’s aforementioned blog for more information. I do remember Tom Burgmans’ visualisation tool for Solr’s Explain debug feature which I’m very much looking forward to seeing as open source. The evening continued with a conference dinner nearby and some excellent local craft beer.

I’ll be covering the second day next.

The post Haystack, the search relevance conference – day 1 appeared first on Flax.

When even the commercial vendors are using it, has open source search won?

Charlie Hull — Thu, 15 Mar 2018 12:03:32 +0000

There have been some interesting announcements recently which may point to an increasing realisation amongst commercial search firms that an open source model is an essential advantage in today’s search market. Coveo have announced that their enterprise search engine can run on an Elasticsearch core, an interesting move for a previously decidedly closed source company. BA Insight, who have previously provided extensions and enhancements for Microsoft’s decidedly closed-source Sharepoint search facility, have been offering Elasticsearch as a core search engine for quite a while. It is also an open secret that some other commercial search firms (such as Attivio) use Apache Lucene as a core technology.

The commercial search firms will have noticed that Lucidworks (who employ a large proportion of Lucene/Solr committers) have announced Lucidworks Fusion 4, which can be used for site and enterprise search. Elastic, the company behind Elasticsearch, recently acquired Swiftype and have repositioned it as a packaged site search engine (with an enterprise search version in beta and rumoured to appear later this year). Both Lucidworks and Elastic are thus attempting to capture a larger segment of the search market, using their dominance and expertise in the open source world. Note however that all these products are ‘open core’ rather than ‘open source’ (despite Elastic’s attempts to pretend otherwise) – which is not very different from Coveo or BA Insight’s approach – so the distance between the traditonally separate ‘open source’ and ‘closed source’ search vendors is now closing.

The question for any search vendor should be whether there is any point developing and maintaining a closed source search engine core, when Lucene derivatives such as Solr and Elasticsearch are so well established. The race between closed and open source is perhaps over.

Here at Flax we’ve been building open source search engines since 2001 and we’re independent of any vendor – so if you need help with your search project, do let us know.

Note: Enterprise Search is usually defined as a search engine working behind a corporate firewall, indexing different content sources such as flat files, databases and intranets. Site Search is usually visible to non-employees and only indexes websites. However, when site search includes an intranet the boundary becomes a little fuzzy – is this lightweight enterprise search? In most cases this doesn’t hugely matter – the underlying search engine core will be the same, it’s simply a difference in where source data comes from and how it is presented to users. However, these two options are often presented as different products by vendors.

UPDATE: A few days after I posted this blog, commercial vendor Attivio released SUIT, an open source user interface library that can run on their own engine, Elasticsearch or Solr. It seems the trend continues.

The post When even the commercial vendors are using it, has open source search won? appeared first on Flax.

No, Elastic X-Pack is not going to be open source – according to Elastic themselves

Charlie Hull — Fri, 02 Mar 2018 14:47:49 +0000

Elastic are the company founded by the creator of Elasticsearch, Shay Banon. At this time of year they have their annual Elasticon conference in San Francisco and as you might expect a lot of announcements are made during the week of the conference. The major ones to appear this time are that Swiftype, which Elastic acquired last year, has reappeared as Elastic Site Search and that Elastic are opening the code for their commercial X-Pack features.

Shay Banon is always keen to relate how Elasticsearch started as open source and will remain true to that heritage, which is always encouraging to hear. However it’s unfortunate to note that the announcement has been reported by many as ‘X-Pack is now open source’ – and the truth is a little more complicated than that.

Firstly, let’s look at the Elasticsearch core code itself. Yes, this is open source under the Apache 2 license, so you can download it, modify it, fork it, even incorporate it into your own products if you like. However most people would like to keep up with the latest and greatest developments so they’ll want to stick with the ‘official’ stream of updates, and what goes into this is entirely up to Elastic employees as they are the only ones allowed to commit to the codebase. Some measure of control of an open source project is essential of course, but this is certainly not ‘open development’ even though it is ‘open source’. Compare this to Apache Lucene/Solr, where those that are allowed to commit code to the official releases are from a wide variety of organisations (and elected as committers by merit, by a group of other longstanding committers). This distinction is important but makes little difference to most adopters.

Elastic have also for some years produced commercial, closed-source software in addition to Elasticsearch – which they call the X-Pack. To use this code you have to license it, although for some of the features the license is free. The announcement this week is that the source code for the X-Pack will be open and available to read under a Elastic license (which hasn’t yet been made available). As Doug Turnbull of our partner company Open Source Connections writes “Be careful: The ‘open source’ Elastic XPack is very different than what most think of as ‘open source'”. To use some of these features you have the source code for in production, you will still need to pay Elastic for a license. If you spot a problem in the source code and submit a patch, you still may end up paying Elastic for the privilege of running it. This is an ‘open core’ model, where the further you move away from the core, the less open and free things become – and as Shay writes this is a key part of their business model.

The final word on this comes from Elastic’s own FAQ on the X-Pack: ” Open Source licensing maintains a strict definition from the Open Source Initiative (OSI). As of 6.3, the X-Pack code will be opened under an Elastic EULA. However, it will not be ‘Open Source’ as it will not be covered by an OSI approved license. “. It’s a shame that this hasn’t been accurately reported.

If you are considering open source search software for your project, contact us for independent and honest advice. We’ve been building open source search applications since 2001.

The post No, Elastic X-Pack is not going to be open source – according to Elastic themselves appeared first on Flax.