Posts Tagged ‘autonomy’
2012 has been a fascinating and stormy year for those of us in the search business. We’ve seen a raft of further acquisitions of commercial closed source search companies by bigger players, some convinced that what used to be called Enterprise Search is now a solution to Big Data (like Stephen Arnold we wonder what will succeed Big Data as the next marketing term – I love his phrase “In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing”). One acquisition hasn’t gone so smoothly: Autonomy, bought by HP for a price that no-one in the search business thought was remotely sensible, has been accused of being oversold vapourware: this is a story that will continue to develop in 2013. If you want a great overview of the current market read Martin White’s latest research note.
Here in the slightly calmer waters of open source search, we’ve seen a huge rise in enquiries from often blue-chip companies, no longer needing persuasion that open source is a serious contender for even the largest search and content projects. Often these companies have considered large commercial solutions but are put off by both the price and high-pressure marketing tactics – in a world of reduced budgets you simply can’t sell magic beans for a pile of gold. We’ve also seen increased interest in related technologies such as machine learning and automatic categorisation – search really isn’t just about search any more.
At Flax we’re busier than we have ever been and we’re expected the trend to continue. We’re looking forward to running more Cambridge Search Meetups, visiting and helping organise conferences such as Enterprise Search Europe and Lucene Revolution, building our network of carefully chosen partners and of course working on exciting and cutting-edge development projects.
As the storms in our sector continue to rage overhead we’ll simply be getting on with what we do best, building effective search.
It hasn’t taken long for some of Autonomy’s rivals to attempt to capitalise on the recent bad PR around HP’s acquisition – OpenText has offered a ’software trade-in’, Recommind has offered a ‘trade-up’ and Swiss company RSD has offered a free license for their governance software to Autonomy customers. No word yet from Exalead, Oracle (Endeca), Microsoft (FAST) or any of the other big commercial search companies but I’m sure their salespeople are making the most of the situation.
Migrating a search engine from one technology to another is rarely trouble-free: data must be re-indexed, query architectures rewritten, integration with external systems re-done, relevancy checked…however with sufficient forethought it can be done successfully. We’ve just helped one client migrate from a commercial engine to Apache Solr in a matter of weeks: although at first glance Solr didn’t seem to support all of the features the commercial engine provided, it proved possible to simulate them using multiple queries and with careful design for scalability, query performance is comparable.
Choosing one closed source engine to replace another doesn’t remove the risk that future corporate mergers & acquisitions will cause exactly the same lack of confidence that is no doubt affecting Autonomy customers – or huge increases in license fees, a drop in the quality of available support or the end of the product line altogether – and we’ve heard of all of these effects over the last few years. Moving to an open source search engine gives you freedom and control of the future of the technology your business is reliant upon, with a wealth of options for migration assistance, development and support.
So here’s our offer – we’d be happy to talk, for free (by phone or face-to-face for customers within reach of our Cambridge offices), to any Autonomy customers considering migration and to help them consider the open source options (some of these even have the Bayesian, probabilistic search features Autonomy IDOL provides) – and together with our partners we can also provide a level of ongoing support comparable to any closed source vendor. We don’t have salespeople, we don’t have a product to sell you and you’ll be talking directly to experts with decades of experience implementing search – and there’s no obligation to take things any further. We’d simply like to offer an alternative (and we believe, safer) route to effective search.
Last Thursday I spent the day at the British Computer Society’s Search Solutions event, run by their Information Retrieval Specialist Group. Unlike some events I could mention, this isn’t a forum for sales pitches, over-inflated claims or business speak – just some great presentations on all aspects of search and some lively networking or discussion. It’s one of my favourite events of the year.
Milad Shokouhi of Microsoft Research started us off showing us how he’s worked on query trend analysis for Bing: he showed us how some queries are regular, some spike and go and some spike and remain – and how these trends can be modelled in various ways. Alex Jaimes of Yahoo! Barcelona talked about a human centred approach to search – I agree with his assertion that “we’re great at adapting to bad technology” – still sadly true for many search interfaces! Some of the demographic approaches have led to projects such as Yahoo! Clues which is worth a look.
Martin White of Intranet Focus was up next with some analysis of recent surveys and research, leading to some rather doom-laden conclusions about just how few companies are investing sufficiently in search. Again some great quotes: “Information Architects think they’ve failed if users still need a search engine” and a plea for search vendors (and open source exponents) to come clean about what search can and can’t do. Emma Bayne of the National Archives was next with a description of their new Discovery catalogue, a similar presentation to the one she gave earlier in the year at Enterprise Search Europe. Kristian Norling of Findwise finished with a laconic and amusing treatment of the results from Findwise’s survey on enterprise search – indicating that those who produce systems that users are “very satisfied” usually do the same things, such as regular user testing and employing a specialist internal search team.
Stella Dextre Clark talked next about a new ISO standard for thesauri, taxonomies and their interopability with other vocabularies – some great points on the need for thesauri to break down language barriers, help retrieval in enterprise situations where techniques such as PageRank aren’t so useful and to access data from decades past. Leo Sauermann was next with what was my personal favourite presentation of the day, about a project to develop a truly semantic search engine both for KDE Linux and currently the Cloud. This system, if more widely adopted, promises a true revolution in search, as relationships between data objects are stored directly by the underlying operating system. I spoke next about our Clade taxonomy/classification system and our Flax Media Monitor, which I hope was interesting.
Nicholas Kemp of DSTL was up next exploring how they research new technologies and approaches which might be of interest to the defence sector, followed by Richard Morgan of Funnelback on how to empower intranet searchers with ways to improve relevance. He showed how Funnelback’s own intranet allows users to adjust multiple factors that affect relevance – of course it’s debatable how these may be best applied to customer situations.
The day ended with a ‘fishbowl’ discussion during which a major topic was of course the Autonomy/HP debacle – there seemed to be a collective sense of relief that perhaps now marketing and hype wouldn’t dominate the search market as much as it had previously…but perhaps also that’s just my wishful thinking! All in all this was as ever an interesting and fun day and my thanks to the IRSG organisers for inviting me to speak. Most of the presentations should be available online soon.
I’m not going to comment on the various financial aspects of the recent news about HP’s write-down of the value of its Autonomy acquisition – others are able to do this far better than me – but I would urge anyone interested to re-read the documents Oracle released earlier this year. However, I am going to write about the IDOL technology itself (I’d also recommend Tony Byrne’s excellent post).
Autonomy’s ability to market its technology has never been in doubt: aggressive and fearless, it painted IDOL as unique and magical, able to understand the meaning of data in multiple forms. However, this has never been true; computers simply don’t understand ‘meaning’ like we do. IDOL’s foundation was just a search engine using Bayesian probabilistic ranking; although most other search technologies use the vector space model there are a few other examples of this approach: Muscat, a company founded a few years before and literally across the hall from Autonomy in a Cambridge incubator, grew to a £30m business with customers including Fujitsu and the Daily Telegraph newspaper. Sadly Muscat was a casualty of the dot-com years but it is where the founders of Flax first met and worked together on a project to build a half-billion-page web search engine.
Another even less well-known example is OmniQ, eventually acquired and subsequently shelved by Sybase. Digging in the archives reveals some familiar-sounding phrases such as “automatically capture and retrieve information based on concepts”.
Originally developed at Muscat, the open source library Xapian also uses Bayesian ranking and we’ve used this successfully to build systems for the Financial Times, Newspaper Licensing Agency and Tait Electronics. Recently, Apache Lucene/Solr version 4.0 has introduced the idea of ‘pluggable’ ranking models, with one option being the Bayesian BM25. It’s important to remember though that Bayesian ranking is only one way to approach a search problem and in many cases, simply unnecessary.
It certainly isn’t magic.
This morning the largest open source search project, Apache Lucene/Solr, released a new version with a raft of new features. We’ve been advising clients to consider version 4.0 for several months now, as the alpha and beta versions have become available, and we know of several already running this version on live sites. Here’s a few highlights:
- Solr Cloud – a collection of new features for scalability and high availability (either on your own servers or on the Cloud), integrating Apache Zookeeper for distributed configuration management.
- More NoSQL features in case you’re planning to use Solr as a primary data store, including a transaction log
- A new web administration interface (including Solr Cloud features)
- New spatial search features including polygon support
- General performance improvements across the board (for example, fuzzy queries are 1-200 times faster!)
- Lucene now has pluggable codecs for storing index data on disk – a potentially powerful technique for performance optimisation, we’ve already been experimenting with storing updatable fields in a NoSQL database
- Lucene now has pluggable ranking models, so you can for example use BM25 Bayesian ranking, previously only available in search engines such as HP Autonomy and the open source Xapian.
The new release has been several years in the making and is a considerable improvement on the previous 3.x version – related projects such as elasticsearch will also benefit. There’s also a new book, Solr in Action, just out to coincide with this release. Exciting times ahead!
I visited Enterprise Search Europe for the first day only last week, and caught a number of the presentations as well as giving one of my own (which I won’t discuss here but you’ll hear more about over the next few weeks). First up was Paul Doscher of Lucid Imagination with a lively presentation discussing whether search is either dead or now a commodity, or whether search on Hadoop is the new killer app for the emerging world of Big Data. We then had Kristian Norling from Findwise with some initial results from their survey on enterprise search – some interesting numbers here such as ‘18.5% of users are mostly/very satisfied with search’ and only ‘6% have a search strategy although 46% are planning one’ – we hear that Kristian is hoping to make the survey an annual one, which will be a great resource for anyone in the industry.
Matt Mullen, fuelled by diet cola, gave an introduction to search with a key point – that enterprise search usually performs a role within a workflow or task – a fact often ignored. Runar Buvik of Searchdaimon talked about a great resource he has developed comparing search engines, which can give some often amusing contrasts between different technologies, with some insisting there are no results for a particular query while others find thousands. I also enjoyed Emma Bayne and Donald Phillips polished presentation on the search facilities at the National Archives – interestingly although Autonomy is currently powering their search they are considering open source alternatives.
The day concluded with a presentation from Matt Eichner of Google, who turned up with their own film crew. You can read much of what he said at Computer World. I’m afraid I didn’t enjoy this presentation very much – it talked down to the audience and contained a lot of FUD around open source (surprising when Google uses and supports so much of it) – complete with sympathy-garnering pictures of babies in incubators and silly analogies about how one should prefer to fly in the airplane that cost the most. I hadn’t realised until his talk that the Google Search Appliance appears to be made of cheese!
It was great to network and catch up, and I hope next year to be able to attend the whole event. Thanks to all the organisers especially Martin White of Intranet Focus.
I spent yesterday morning at Ovum’s briefing on Enterprise Search, and they kindly invited me to sit on a discussion panel. One of the more controversial topics raised by analyst Mike Davis was ‘Is Enterprise Search dead?’ which provoked some lively discussion. We also heard from Tyler Tate of Twigkit on Search UX, Exalead on Search Based Applications and Search Technologies on data conditioning and why metadata is so important.
One can’t deny that the search market is going through some huge changes at the moment. Larger vendors are being acquired which can lead to some major (and not always welcome) changes in the product, pricing and service. Smaller vendors are finding it increasingly hard to compete with the plethora of powerful open source solutions (we’ve heard rumours of prices of closed source solutions being dropped radically to attempt to secure new business). There are also some interesting moves towards more comprehensive Business Intelligence and Unified Access solutions, such as Attivio.
I don’t think enterprise search is dying as a market or an offering, simply changing – and hopefully for the better, into an era of more realistic pricing, solutions that actually work (rather than promising ‘magic’) and more openness in terms of the technology and capability.
The blogotweetosphere has been positively buzzing since last night’s announcement that Hewlett Packard will be buying Autonomy for £7.1bn, while divesting itself of its PC business. Many commentators have put a positive spin on this, pointing to Autonomy’s meteoric rise from a small office in Cambridge to the behemoth it is today. It’s undoubtedly good news for Autonomy’s shareholders. Dave Kellogg correctly identifies Autonomy as a “finance company dressed in (meaning-based) technology company clothing” with a “happy ending”.
However the reaction isn’t all positive – the FT implies this deal is at the “lunatic end of the valuation spectrum”. Law Technology News says “Autonomy’s e-discovery revenue stream is high-end but unsustainable” and quotes users of the system with problems: “We had a lot of issues with the applications crashing, the documents tending not to get checked in”….”"[Autonomy sales staff] were pricey, arrogant, and they couldn’t care less about us. … It cannot get any worse.”.
HP will have to work hard to integrate Autonomy into both its corporate culture and software frameworks – a problem currently faced by Microsoft since its acquisition of FAST a short while ago. Stephen Arnold thinks this process will be “risky”. What it means for the rest of the search sector is harder to guess, although Martin White of Intranet Focus says this deal indicates HP can see a “future in search applications” and, interestingly, “A number of privately-held search vendors are probably working out what their valuation would be”.
My view is that this is just the latest of huge shifts in the enterprise search market, partly spurred on by the rise of open source options and the gradual realisation that the huge license fees charged by some vendors may be unsustainable. I don’t think Autonomy will be the last company looking for a safe haven in the years to come.
This week I was passed a link to a European Commission report on the Enterprise Search market, which I’ve just finished ploughing through (it’s 123 pages and not exactly light reading). It provides an overview of the history of the market and some current trends, but sadly misses out almost completely the rapidly growing open source sector. The authors say “…open source solutions have been disregarded because they do not seem yet to be a real alternative for company use…” – a point of view both I and our satisfied clients would disagree with. The report does at least acknowledge that “open source components are frequently used and integrated in some commercial solutions”.
However there are some very interesting numbers in the latter part of the report. For example, we hear that an Exalead customer, the automotive logistics specialist Gefco, paid 700,000 Euros for the solution built for them to track around 100,000 events a day regarding 1 million vehicles. Appendix 2 has a list of various search vendors and associated costs: for example “The average selling price for the [Autonomy] IDOL tool is $375,000″ and “The price for the Oracle Secure Enterprise Search is $34,500 per processor and $70 per referenced user (with a minimum of 100 users).”
I would question whether these prices are sustainable given that alternative solutions based on proven, scalable open source software are now available at a fraction of the cost. Perhaps the authors of the report should have considered more deeply how this might impact the enterprise search market.
Cambridge, U.K. has a long history of hosting search experts and businesses. Back in the 1980s two firms arose – Cambridge CD Publishing, founded by Martin Porter and John Snyder grew into Muscat, and Cambridge Neurodynamics became Autonomy. We believe Smartlogic still have a small office here. Stephen Robertson, co-author of the probabilistic theory of information retrieval (which Xapian uses for ranking) is based here at Microsoft Research.
Today, the city is still home to innovative search companies, including True Knowledge, Grapeshot and of course ourselves. We know of many more ‘under the radar’ developing search technologies both to complement existing systems and as completely new approaches to information retrieval, including visual search.
To encourage networking and to help keep the city at the forefront of search developments, we’ve created the Enterprise Search Cambridge Meetup group and our first meeting is on February 16th – all are welcome, whether currently working with search and related technologies or simply interested in the possibilities. Hope to meet you there!