guardian – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Elasticon London 2015 – more products, more scale, more users! http://www.flax.co.uk/blog/2015/11/09/elasticon-london-2015-more-products-more-scale-more-users/ http://www.flax.co.uk/blog/2015/11/09/elasticon-london-2015-more-products-more-scale-more-users/#respond Mon, 09 Nov 2015 11:49:58 +0000 http://www.flax.co.uk/?p=2782 Last week Elastic, the company behind Elasticsearch, landed in London for one of their current series of one-day events. The £50 entrance fee has been put to good use, raising £16750 for AbilityNet who work on accessible IT – a … More

The post Elasticon London 2015 – more products, more scale, more users! appeared first on Flax.

]]>
Last week Elastic, the company behind Elasticsearch, landed in London for one of their current series of one-day events. The £50 entrance fee has been put to good use, raising £16750 for AbilityNet who work on accessible IT – a very generous offer by Elastic.

Shay Banon, creator of Elasticsearch, kicked off with a brief history of the project which started when he built the Compass search engine, pretty much as a hobby project while his wife was training as a chef in London. Things have moved on somewhat: today there is a 35,000 strong community with over 35 million downloads of the Elasticsearch software and a number of high-profile users including NASA, Wikimedia and Verizon (who apparently have an impressive 500 billion items indexed).

Clinton Gormley led the next session, talking about new features in the recent 2.0 release. Resiliency, performance and analytics were major themes, with the latter leveraging Lucene’s DocValues as an off-heap column store to build various prediction and detection capabilities. Also mentioned was a new scriptable Ingest Node incorporating parts of the Logstash project. Steve Mayzak then told us about the new version 4 of the Kibana visualisation package, which has now grown in a general UI framework incorporating D3.js for charting and providing an extension API. Shay returned to tell us more about Logstash, which provides over 200 plugins for ingesting data into Elasticsearch. Next up was Uri Boness telling us about the various closed-source parts of the Elasticsearch ecosystem (including the Marvel performance monitor and Shield secuurity module) and we then heard from Morten Ingebrigtsen of Found (a hosted Elasticsearch solution, who Elastic acquired a while ago). For me the most interesting item here was news of an on-premise version of Found Premium – yes, like Lucidworks Fusion, you can now buy a packaged open source search engine from Elastic as a product. This isn’t something we generally recommend as it does remove one of the key advantages of open source, which is the lack of vendor lock-in, but it’s interesting to see Elastic plough such a familiar furrow.

The afternoon consisted of case studies including The Guardian (which I’ve written about previously), a good talk from Jay Chin on using Elasticsearch for Grid Computing for the financial services sector and a couple of use cases from Goldman Sachs. We also heard about the elasticsearch-hadoop connector – note that for high-performance indexing this may not be the best option. I missed a couple of the other talks due to a phone call but returned to hear Shay again, with a controversial statement that ‘the top 8 Lucene committers now work for Elastic’ – how exactly are you measuring that and have you told the other committers? He did however conclude reassuringly with ‘we’re not trying to force anyone to use commercial versions [of Elasticsearch]’ – good to hear!

By the way, if you want to hear how we helped a billion-pound UK IT supplier use Elasticsearch for their e-commerce website, we’ll be presenting with them at the Elasticsearch London Meetup later this month.

The post Elasticon London 2015 – more products, more scale, more users! appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2015/11/09/elasticon-london-2015-more-products-more-scale-more-users/feed/ 0
Elasticsearch London user group – The Guardian & Orchestrate test the limits http://www.flax.co.uk/blog/2014/12/16/elasticsearch-london-user-group-the-guardian-orchestrate-test-the-limits/ http://www.flax.co.uk/blog/2014/12/16/elasticsearch-london-user-group-the-guardian-orchestrate-test-the-limits/#respond Tue, 16 Dec 2014 14:22:30 +0000 http://www.flax.co.uk/blog/?p=1347 Last week I popped into the Elasticsearch London meetup, hosted this time by The Guardian newspaper. Interestingly, the overall theme of this event was not just what the (very capable and flexible) Elasticsearch software is capable of, but also how … More

The post Elasticsearch London user group – The Guardian & Orchestrate test the limits appeared first on Flax.

]]>
Last week I popped into the Elasticsearch London meetup, hosted this time by The Guardian newspaper. Interestingly, the overall theme of this event was not just what the (very capable and flexible) Elasticsearch software is capable of, but also how things can go wrong and what to do about it.

Jenny Sivapalan and Mariot Chauvin from the Guardian’s technical team described how Elasticsearch powers the Content API, used not just for the newspaper’s own website but internally and by third party applications. Originally this was built on Apache Solr (I heard about this the last time I attended a search meetup at the Guardian) but this system was proving difficult to scale elastically, taking a few minutes before new content was available and around an hour to add a new server. Instead of upgrading to SolrCloud (which probably would have solved some of these issues) the team decided to move to Elasticsearch with targets of less than 5 seconds for new content to become live and generally a quicker response to traffic peaks. The team were honest about what had gone wrong during this process: oversharding led to problems caused by Java garbage collection, some of the characteristics of the Amazon cloud hosting used (in particular, unexpected server shutdowns for maintenance) required significant tweaking of the Elasticsearch startup process and they were keen to stress that scripting must be disabled unless you want your search servers to be an easy target for hackers. Although Elasticsearch promises that version upgrades can usually be done on a live cluster, the Guardian team found this unreliable in a majority of cases. Their eventual solution for version upgrades and even more simple configuration changes was to spin up an entirely new cluster of servers, switch over by changing DNS settings and then to turn off the old cluster. They have achieved their performance targets though, with around 375 requests/second supported and less than 15 minutes for a failed node to recover.

After a brief presentation from Colin Goodheart-Smithe of Elasticsearch (the company) on scripted aggregrations – a clever way to gather statistics, but possibly rather fiddly to debug – we moved on to Ian Plosker of Orchestrate.io, who provide a ‘database as a service’ backed by HBase, Elasticsearch and other technologies, and his presentation on Schemalessness Gone Wrong. Elasticsearch allows you submit data for indexing without pre-defining a schema – but Ian demonstrated how this feature isn’t very reliable in practice and how his team had worked around it but creating a ‘tuplewise transform’, restructuring data into pairs of ‘field name, field value’ before indexing with Elasticsearch. Ian was questioned on how this might affect term statistics and thus relevance metrics (which it will) but replied that this probably won’t matter – it won’t for most situations I expect, but it’s something to be aware of. There’s much more on this at Orchestrate’s own blog.

We finished up with the usual Q&A which this time featured some hard questions for the Elasticsearch team to answer – for example why they have rolled their own distributed configuration system rather than used the proven Zookeeper. I asked what’s going to happen to the easily embeddable Kibana 3 now Kibana 4 has its own web application (the answer being that it will probably not be developed further) and also about the licensing and availability of their upcoming Shield security plugin for Elasticsearch. Interestingly this won’t be something you can buy as a product, rather it will only be available to support customers on the Gold and Platinum support subscriptions. It’s clear that although Elasticsearch the search engine should remain open source, we’re increasingly going to see parts of its ecosystem that aren’t – users should be aware of this, and that the future of the platform will very much depend on the business direction of Elasticsearch the company, who also centrally control the content of the open source releases (in contrast to Solr which is managed by the Apache Foundation).

Elasticsearch meetups will be more frequent next year – thanks Yann Cluchey for organising and to all the speakers and the Elasticsearch team, see you again soon I hope.

The post Elasticsearch London user group – The Guardian & Orchestrate test the limits appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2014/12/16/elasticsearch-london-user-group-the-guardian-orchestrate-test-the-limits/feed/ 0
The Fall and rise of search in a world of Big Data – part 2 http://www.flax.co.uk/blog/2011/10/31/the-fall-and-rise-of-search-in-a-world-of-big-data-part-2/ http://www.flax.co.uk/blog/2011/10/31/the-fall-and-rise-of-search-in-a-world-of-big-data-part-2/#respond Mon, 31 Oct 2011 13:02:47 +0000 http://www.flax.co.uk/blog/?p=658 The theme of Big Data continued at the next conference I attended, the first Enterprise Search Europe held in London. There was a good mix of presentations ranging from the academic to the practical, my favourite probably being Martin Belam … More

The post The Fall and rise of search in a world of Big Data – part 2 appeared first on Flax.

]]>
The theme of Big Data continued at the next conference I attended, the first Enterprise Search Europe held in London. There was a good mix of presentations ranging from the academic to the practical, my favourite probably being Martin Belam and colleague’s talk about using Solr to dynamically generate content for the new Guardian Books site. I was lucky enough to be able to talk about the real business benefits of open source search along with one of our customers, Stephen Wicks, CTO of Gorkana Group, which drew some interesting questions. We also ran a combined Meetup on the Monday evening, combining Enterprise Search Cambridge with Enterprise Search London.

There did seem to be a rather negative spin on search from many presenters – saying that search technology is misunderstood, more costly than expected, rarely works and hasn’t seen much recent innovation. Some of this is true – but I see this as an opportunity rather than a problem. There is more focus on the world of search now than before due to some high-profile acquisitions; people are questioning the value and capability of search technology. Those of us working at the cutting edge, delivering real working solutions, should perhaps take this opportunity to say that yes, it can be done, at a sensible cost, and it can deliver real business benefit. Perhaps as we move further into the world of Big Data we’ll realise the true value of effective search.

The post The Fall and rise of search in a world of Big Data – part 2 appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/10/31/the-fall-and-rise-of-search-in-a-world-of-big-data-part-2/feed/ 0
Whitepaper – Why you should be considering open source search http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/ http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/#respond Wed, 22 Jun 2011 09:49:50 +0000 http://www.flax.co.uk/blog/?p=585 I’ve uploaded a whitepaper I wrote a short while ago : “In these rapidly changing times we don’t know what we will need to search tomorrow – so it’s important to be adaptable, flexible and able to cope with data … More

The post Whitepaper – Why you should be considering open source search appeared first on Flax.

]]>
I’ve uploaded a whitepaper I wrote a short while ago :

“In these rapidly changing times we don’t know what we will need to search tomorrow – so it’s important to be adaptable, flexible and able to cope with data volumes that may not scale linearly. Maintaining control over the future of your search software is also key. Open source search has come of age and every modern business should be aware of its advantages.”

The post Whitepaper – Why you should be considering open source search appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/06/22/whitepaper-why-you-should-be-considering-open-source-search/feed/ 0
Open source in the UK http://www.flax.co.uk/blog/2011/06/03/open-source-in-the-uk/ http://www.flax.co.uk/blog/2011/06/03/open-source-in-the-uk/#respond Fri, 03 Jun 2011 10:59:22 +0000 http://www.flax.co.uk/blog/?p=569 We’ve recently been forging links with the UK’s larger open source software community and have joined the Open Source Consortium. Another interesting organisation is Guildfoss who have asked us to speak at an event on 9th June at the British … More

The post Open source in the UK appeared first on Flax.

]]>
We’ve recently been forging links with the UK’s larger open source software community and have joined the Open Source Consortium. Another interesting organisation is Guildfoss who have asked us to speak at an event on 9th June at the British Computer Society’s offices in London on discussing the skills necessary for building content management systems (search being an important part of this).

Guildfoss are also organising the the ‘open government’ stand at the SmartGov Live show on June 14th-15th (part of the Guardian’s Public Procurement Show), where we’ll be talking about and demonstrating a range of solutions based on open source search, including LucidWorks Enterprise. Do let us know if you’re attending the show and would like to meet up.

We’re also helping with a new search event to be held in London in October – Enterprise Search Europe. One of the major themes of this event will be open source enterprise search and there are some fascinating presentations and workshops lined up.

The post Open source in the UK appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/06/03/open-source-in-the-uk/feed/ 0
Online Information 2010 – it’s quiet, too quiet http://www.flax.co.uk/blog/2010/12/03/online-information-2010-its-quiet-too-quiet/ http://www.flax.co.uk/blog/2010/12/03/online-information-2010-its-quiet-too-quiet/#respond Fri, 03 Dec 2010 11:50:20 +0000 http://www.flax.co.uk/blog/?p=443 We dropped in to the Online 2010 event at Olympia this week, and were immediately struck by how quiet the event was: yes, there’s been some terrible weather recently in the UK but there were fewer stalls than last year, … More

The post Online Information 2010 – it’s quiet, too quiet appeared first on Flax.

]]>
We dropped in to the Online 2010 event at Olympia this week, and were immediately struck by how quiet the event was: yes, there’s been some terrible weather recently in the UK but there were fewer stalls than last year, a smaller exhibition space and very few exhibitors in the enterprise search space – no Autonomy, Google, Vivisimo or Endeca for example. Unlike previous years there was no dedicated ‘search’ area on the exhibition floor, and we did see a few unmanned stands from mid afternoon. Is this is a sign of difficult times or of an event that needs a rethink about its focus?

We didn’t attend the conference that runs next to the exhibition hall this year. This report on the closing panel shows that one question to the panel was about the rise of open source search – not surprisingly, the panel members (all being from closed source companies) weren’t very enthusiastic about this. According to Autonomy open source is only for the commodity end of the market, which is the smallest part. I’m not sure Twitter (1 billion queries a day), LinkedIn (30 million users), The Guardian (innovative open platform) or the Financial Times would agree…

The post Online Information 2010 – it’s quiet, too quiet appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2010/12/03/online-information-2010-its-quiet-too-quiet/feed/ 0
When search isn’t just search at The Guardian http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/ http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/#respond Tue, 19 Oct 2010 09:06:43 +0000 http://www.flax.co.uk/blog/?p=378 A fascinating event last night as the Guardian team told us more about how they’ve used open source search technology to build their new open platform. The presentations were brief and to-the-point, and covered how the team have created a … More

The post When search isn’t just search at The Guardian appeared first on Flax.

]]>
A fascinating event last night as the Guardian team told us more about how they’ve used open source search technology to build their new open platform. The presentations were brief and to-the-point, and covered how the team have created a detailed, rich API to their news content, all built on the open source engine Apache Solr – opening up Guardian Media Group content to the world for mashups, repurposing and innovative new business models.

The Guardian have an existing Oracle database with J2EE web applications to serve content, but discovered that certain operations such as returning content with multiple tags, or dynamically generated ‘related’ content, were very database-intensive and difficult to scale. The use of Solr effectively flattens the cost of these complex queries, and also allows them to scale up capacity on demand by simply spinning up more Solr instances on the Amazon EC2 cloud . Interestingly, site search for the Guardian website doesn’t yet use Solr, although they hope to move this across soon.

What we’re seeing here is a change in how search technology is used especially by forward-looking organisations – from being a bolt-on to an existing website or application, search is now the platform for new developments. I’ll be talking about other ways open source search has been used for news content at the British Computer Society this coming Thursday 21st October – I believe there are still a few places available.

The post When search isn’t just search at The Guardian appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2010/10/19/when-search-isnt-just-search-at-the-guardian/feed/ 0