Posts Tagged ‘market’

Elasticsearch London user group – The Guardian & Orchestrate test the limits

Last week I popped into the Elasticsearch London meetup, hosted this time by The Guardian newspaper. Interestingly, the overall theme of this event was not just what the (very capable and flexible) Elasticsearch software is capable of, but also how things can go wrong and what to do about it.

Jenny Sivapalan and Mariot Chauvin from the Guardian’s technical team described how Elasticsearch powers the Content API, used not just for the newspaper’s own website but internally and by third party applications. Originally this was built on Apache Solr (I heard about this the last time I attended a search meetup at the Guardian) but this system was proving difficult to scale elastically, taking a few minutes before new content was available and around an hour to add a new server. Instead of upgrading to SolrCloud (which probably would have solved some of these issues) the team decided to move to Elasticsearch with targets of less than 5 seconds for new content to become live and generally a quicker response to traffic peaks. The team were honest about what had gone wrong during this process: oversharding led to problems caused by Java garbage collection, some of the characteristics of the Amazon cloud hosting used (in particular, unexpected server shutdowns for maintenance) required significant tweaking of the Elasticsearch startup process and they were keen to stress that scripting must be disabled unless you want your search servers to be an easy target for hackers. Although Elasticsearch promises that version upgrades can usually be done on a live cluster, the Guardian team found this unreliable in a majority of cases. Their eventual solution for version upgrades and even more simple configuration changes was to spin up an entirely new cluster of servers, switch over by changing DNS settings and then to turn off the old cluster. They have achieved their performance targets though, with around 375 requests/second supported and less than 15 minutes for a failed node to recover.

After a brief presentation from Colin Goodheart-Smithe of Elasticsearch (the company) on scripted aggregrations – a clever way to gather statistics, but possibly rather fiddly to debug – we moved on to Ian Plosker of Orchestrate.io, who provide a ‘database as a service’ backed by HBase, Elasticsearch and other technologies, and his presentation on Schemalessness Gone Wrong. Elasticsearch allows you submit data for indexing without pre-defining a schema – but Ian demonstrated how this feature isn’t very reliable in practice and how his team had worked around it but creating a ‘tuplewise transform’, restructuring data into pairs of ‘field name, field value’ before indexing with Elasticsearch. Ian was questioned on how this might affect term statistics and thus relevance metrics (which it will) but replied that this probably won’t matter – it won’t for most situations I expect, but it’s something to be aware of. There’s much more on this at Orchestrate’s own blog.

We finished up with the usual Q&A which this time featured some hard questions for the Elasticsearch team to answer – for example why they have rolled their own distributed configuration system rather than used the proven Zookeeper. I asked what’s going to happen to the easily embeddable Kibana 3 now Kibana 4 has its own web application (the answer being that it will probably not be developed further) and also about the licensing and availability of their upcoming Shield security plugin for Elasticsearch. Interestingly this won’t be something you can buy as a product, rather it will only be available to support customers on the Gold and Platinum support subscriptions. It’s clear that although Elasticsearch the search engine should remain open source, we’re increasingly going to see parts of its ecosystem that aren’t – users should be aware of this, and that the future of the platform will very much depend on the business direction of Elasticsearch the company, who also centrally control the content of the open source releases (in contrast to Solr which is managed by the Apache Foundation).

Elasticsearch meetups will be more frequent next year – thanks Yann Cluchey for organising and to all the speakers and the Elasticsearch team, see you again soon I hope.

More than an API – the real third wave of search technology

I recently read a blog post by Karl Hampson of Realise Okana (who offer HP Autonomy and SRCH2 as closed source search options) on his view of the ‘third wave’ of search. The second wave he identifies (correctly) as open source, admitting somewhat grudgingly that “We’d heard about Lucene for years but no customers seemed to take it seriously until all of a sudden they did”. However, he also suggests that there is a third wave on its way – and this is led by HP with its IDOL OnDemand offering.

I’m afraid to say I think that IDOL OnDemand is in fact neither innovative or market leading – it’s simply an API to a cloud hosted search engine and some associated services. Amazon Cloudsearch (originally backed by Amazon’s own A9 search engine, but more recently based on Apache Solr) offers a very similar thing, as do many other companies including Found.no and Qbox with an Elasticsearch backend. For those with relatively simple search requirements and no issues with hosting their data with a third party, these services can be great value. It is however interesting to see the transition of Autonomy’s offering from a hugely expensive license fee (plus support) model to an on-demand cloud service: the HP acquisition and the subsequent legal troubles have certainly shaken things up! At a recent conference I heard a HP representative even suggest that IDOL OnDemand is ‘free software’ which sounds like a slightly desperate attempt to jump on the open source bandwagon and attract some hacker interest without actually giving anything away.

So if a third wave of search technology does exist, what might it actually be? One might suggest that companies such as Attivio or our partners Lucidworks, with their integrated solutions built on proven and scalable open source cores and folding in Hadoop and other Big Data stacks, are surfing pretty high at present. Others such as Elasticsearch (the company) are offering advanced analytical capabilities and easy scalability. We hear about indexes of billions of items, thousands of separate indexes : the scale of some of these systems is incredible and only economically possible where license fees aren’t a factor. Across our own clients we’re seeing searches across huge collections of complex biological data and monitoring systems handling a million new stories a day. Perhaps the third wave of search hasn’t yet arrived – we’re just seeing the second wave continue to flood in.

One interesting potential third wave is the use of search technology to handle even higher volumes of data (which we’re going to receive from the Internet of Things apparently) – classifying, categorising and tagging streams of machine-generated data. Companies such as Twitter and LinkedIn are already moving towards these new models – Unified Log Processing is a commonly used term. Take a look at a recent experiment in connecting our own Luwak stored query library to Apache Samza, developed at LinkedIn for stream processing applications.

Analysts getting a bad press – how can they do better?

It seems to be a bad summer for analyst companies in several sectors: here’s Forrester getting a kicking from Digital Clarity Group about their Wave report on Digital Experience Delivery Platforms (my first challenge was understanding what on earth those are, but I think it’s a new shiny name for web content management), Nuix putting the boot into Gartner about their eDiscovery Magic Quadrant, and Stephen Few jumping up and down in hobnail boots on both analyst firms about Business Intelligence (insert your own joke here), complete with a not particularly enlightening reply from Forrester themselves.

Miles Kehoe has already taken a look at Gartner’s Magic Quadrant report on our own Enterprise Search sector. I’ve written before on how I don’t think open source solutions are particularly well treated by the large analyst firms, as they often focus on vendors only. The world has somewhat changed though and five of the seventeen vendors mentioned are using a base of open source technology, so at least some of this major part of the market is covered.

However the problem remains that the MQ ignores a great deal of the enterprise search sector: it doesn’t cover Sharepoint with its FAST-derived search facility, Oracle’s Endeca (which apparently is now no longer available as a standalone product, a surprise to me), Funnelback (which is again incorrectly labelled as open source – it’s the Squiz CMS software that’s open source, not the search engine they bought) or the rising star of Elasticsearch. If you were new to the sector you might conclude that none of these options are available to you. Gartner itself says “This Magic Quadrant introduces search managers and information architects in end-user organizations to the range of enterprise search vendors they can choose from” – but this range is severely and artificially restricted.

Let’s hope that the analyst firms take note of some of this bad press – perhaps it’s time to change approach, be more open about biases and methodologies, and stop producing hugely oversimplified diagrams to characterise complex and deep business sectors.

Tags: , , , ,

Posted in Business

July 30th, 2014

1 Comment »

How not to predict the future of search

I’ve just seen an article titled Enterprise Search: 14 Industry Experts Predict the Future of Search which presents a list of somewhat contradictory opinions. I’m afraid I have some serious issues with the experts chosen and the undeniably blinkered views some of them have presented.

Firstly, if you’re going to ask a set of experts to write about Enterprise Search, don’t choose an expert in SEO as part of your list. SEO is not Enterprise Search, in fact a lot of the time it isn’t anything at all (except snake oil) – it’s a way of attempting to game the algorithms of web search engines. Secondly, at least make some attempt to prevent your experts from just listing the capabilities of their own companies in their answers: in fact one ‘expert’ was actually a set of PR-friendly answers from a company rather than a person, including listing articles about their own software. The expert from Microsoft rather predictably failed to notice the impact of open source on the search market, before going on to put a positive spin on the raft of acquisitions of search companies over the last few years (and it’s certainly not all good, as a recent writedown has proved). Apparently the acquisition of specialist search companies by corporate behemoths will drive innovation – that is, unless that specialist knowledge vanishes into the behemoth’s Big Data strategy, never to be seen again. Woe betide the past customers that have to get used to a brand new pricing, availability and support plan as well.

Luckily it wasn’t all bad – there were some sensible viewpoints on the need for better interaction with the user, the rise of semantic analysis and how the rise of open source is driving out inefficiency in the market – but the article is absolutely peppered with buzzwords (Big Data being the most prevalent, of course) and contains some odd cliches: “I think a generation of people believes the computer should respond like HAL 9000″…didn’t HAL 9000 kill most of the crew and attempt to lock the survivor outside the airlock?

I’m pretty sure this isn’t a feature we want to replicate in an Enterprise Search system.

Tags: , , , ,

Posted in News

May 15th, 2014

1 Comment »

The closed-source topping on the open-source Elasticsearch

Today Elasticsearch (the company, not the software) announced their first commercial, closed-source product, a monitoring plugin for Elasticsearch (the software, not the company – yes I know this is confusing, one might suspect deliberately so). Amongst the raft of press releases there are a few small liberties with the truth, for example describing Elasticsearch (the company) as ‘founded in 2012 by the people behind the Elasticsearch and Apache Lucene open source projects’ – surely the latter project was started by Doug Cutting, who isn’t part of the aforementioned company.

Adding some closed-source dusting to a popular open-source distribution is nothing new of course – many companies do it, especially those that are venture funded – it’s a way of building intellectual property while also taking full advantage of the open-source model in terms of user adoption. Other strategies include curated distributions such as that offered by Heliosearch, founded by Solr creator Yonik Seeley and our partner LucidWorks‘ complete packaged search applications. It can help lock potential clients into your version of the software and your vision of the future, although of course they are still free to download the core and go it alone (or engage people like us to help do so), which helps them retain some control.

It’s going to be interesting to see how this strategy develops for Elasticsearch (for the last time, the company). At Flax we’ve also built various additional software components for search applications – but as we have no external investors to please these are freely available as open-source software, including Luwak our fast stored query engine, Clade a taxonomy/classification prototype and even some file format extractors.

Time for the crystal ball again…

It’s always fun to make predictions about the future, especially as one can be pretty sure to be proved wrong in interesting ways. At the start of 2014 we at Flax are looking forward to another year of building open source search and we already have some great client projects in progress that we’ll shortly be able to talk about, but what else might be happening this year? Here’s some points to note:

  • The Elasticsearch project continues to add features at a prodigious rate during the arms race between it and Apache Solr – this battle can only be good news for end users in our view. We can expect a 1.0 release of Elasticsearch this year and several further major 4.x releases of Solr.
  • The Solr world has become slightly more complex as original author Yonik Seeley has left Lucidworks to start his own company, Heliosearch – with its own packaged distribution of Solr. How will Heliosearch contribute to the Solr ecosystem?
  • HP Autonomy is a sponsor of the Enterprise Search Europe conference this year, although there’s still some fallout from HP’s acquisition of Autonomy, and little news from the various official investigations into this process. Perhaps this year HP’s overall strategy will become a little clearer.
  • The Big Data bandwagon rolls on and more or less every search company now stresses its capabilities in this area for marketing purposes: but how big is Big? It’s not enough just to re-quote IDC’s latest study on how many exobytes everyone is producing these days, the value is in the detail, not the sheer volume: good (and deep) analytics is the key.
  • We think there might be some interesting things happening around open source search and bioinformatics soon – watch this space!

Tags: , , , , , ,

Posted in News

January 7th, 2014

No Comments »

Solr and the changing landscape of search

This morning I was told about the launch of a new US-based search company, Heliosearch, founded by the creator of Apache Solr, Yonik Seeley. It seems the landscape of open source search and in particular Solr is changing again – Heliosearch are planning their own ‘certified’ distribution of Solr plus a raft of support, consulting and services. In the meantime, the company Yonik co-founded (and our partners) LucidWorks are recently launched an ‘App Store’ for search, the Solr Marketplace, offering add-ons to the core engine from both themselves and others.

What we’re seeing here is the further growth of an ecosystem based around what has almost become the default choice for new and migrating search applications. Some clients will want a packaged distribution of Solr, some will be happy to download the source from Apache, some will need help getting started and some will just need help when things get complicated, or support for a running application. We’ve seen all of these requirements and more in the last year.

Next week the largest conference on open source search, Lucene Revolution is held in Dublin, and four of the Flax team are attending. Do let us know if you’d like to meet up – I don’t think there’s going to be a lack of things to talk about!

Tags: , , , ,

Posted in News, events

October 29th, 2013

No Comments »

Rescue attempts continue for those abandoned by closed source search

I notice this morning that Autonomy have created a rescue program for those unhappy with Microsoft’s decision to offer FAST search only as part of Sharepoint – slightly late to the party, considering this had been long predicted. Last year it was Autonomy’s rivals who offered similar trade-in deals after the bad press from HP’s acquisition of Autonomy. I now have the theme tune to Thunderbirds running through my head…

We’ve talked to a number of clients over the last month or so who are determined to move away from Autonomy IDOL itself, citing reasons such as a lack of ownership of code (so even tiny changes to a user interface need to be carried out by expensive consultants), scaling being difficult and expensive, and indifferent support even after the HP acquisition. As I wrote at the time moving from one closed-source technology to another doesn’t really reduce any risk that your supplier will change their roadmap, prices or corporate strategy to your disadvantage.

Perhaps it’s time to cut the strings and take control of your search.

Business Leaders, Open Source and free Pi

I spent last night at a networking event organised by the Business Leaders Network on the subject of Open Source Business Models – this isn’t the usual sort of event I attend, being held in a very posh law firm’s offices overlooking the Thames and with some fellow attendees from venture capital firms and investment banks. Although the panel included speakers from Canonical, Rackspace and the Raspberry Pi foundation (the gently amusing Jack Lang, a Cambridge luminary who I could have happily listened to for the full hour) the theme was generally non-technical.

Questions from the floor (and via Twitter) showed that many outside the technical sector (and probably a few within it) are still bemused at how one can build a thriving business on open source, when the panel admitted that it can involve making your intellectual property available to your competitors, giving your product away for nothing and investing heavily in community building. One of the most interesting responses from the panel indicated that an open source entrant to an existing market can shrink that market by 40-50% – a venture capitalist I spoke to afterwards couldn’t understand why this can be a positive thing: however if a market is dominated by big players selling overpriced solutions, some disruptive deflation can re-shape the market considerably: this is certainly what we’ve seen in the search sector recently, and investment in the right place and time can still reap considerable rewards (consider Elasticsearch’s recent funding).

The panel also made the point that a key part of open source success is investment in people – both within a business and in the wider community. Another question about what an open source business is actually selling prompted a range of answers: a brand, peach of mind, happiness, experience, platform were the answers given. It was clear that the discussion could have continued for a lot longer as the audience were keen to hear more, and the BLN may thus be running further open source themed events – the appetite for knowledge about open source business models outside the technical community is large.

Thanks to Mark Littlewood for organising such an interesting evening and particular thanks for the free Raspberry Pi – we have a cunning plan about what to do with it so watch this space!

Tags: , , ,

Posted in Business, events

February 7th, 2013

No Comments »

New Year predictions: further search storms ahead!

2012 has been a fascinating and stormy year for those of us in the search business. We’ve seen a raft of further acquisitions of commercial closed source search companies by bigger players, some convinced that what used to be called Enterprise Search is now a solution to Big Data (like Stephen Arnold we wonder what will succeed Big Data as the next marketing term – I love his phrase “In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing”). One acquisition hasn’t gone so smoothly: Autonomy, bought by HP for a price that no-one in the search business thought was remotely sensible, has been accused of being oversold vapourware: this is a story that will continue to develop in 2013. If you want a great overview of the current market read Martin White’s latest research note.

Here in the slightly calmer waters of open source search, we’ve seen a huge rise in enquiries from often blue-chip companies, no longer needing persuasion that open source is a serious contender for even the largest search and content projects. Often these companies have considered large commercial solutions but are put off by both the price and high-pressure marketing tactics – in a world of reduced budgets you simply can’t sell magic beans for a pile of gold. We’ve also seen increased interest in related technologies such as machine learning and automatic categorisation – search really isn’t just about search any more.

At Flax we’re busier than we have ever been and we’re expected the trend to continue. We’re looking forward to running more Cambridge Search Meetups, visiting and helping organise conferences such as Enterprise Search Europe and Lucene Revolution, building our network of carefully chosen partners and of course working on exciting and cutting-edge development projects.

As the storms in our sector continue to rage overhead we’ll simply be getting on with what we do best, building effective search.

Tags: , , , , ,

Posted in Business, News

January 3rd, 2013

No Comments »