Archive for the ‘News’ Category

BioSolr – building better search for bioinformatics

The entire Flax technical team spent the day at the European Bioinformatics Institute yesterday discussing an exciting new project we’ll begin this coming September, BioSolr. Funded by the BBSRC this collaboration between Flax and the EBI aims “to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software”. Here we are with Dr. Sameer Valenkar and Gautier Koscielny of the EBI.

The EBI, located on the Wellcome Trust Genome Campus near Cambridge, maintains the world’s most comprehensive range of freely available and up-to-date molecular databases and is already using Apache Lucene/Solr extensively, for example in the Protein Databank in Europe which indexes over 100,000 items derived from experimental research – but this is just one of the many complex collections they provide. The BioSolr project will run for a full year, during which members of the Flax team will work directly with the EBI team to run workshops, demonstrate and document best practises in search application design, create, improve and extend open source software and learn a lot about the specialist search requirements of bioinformatics. This is a fantastic opportunity for us to push the boundaries of what is possible with Solr and associated software, to work with some incredibly rich data and to do all of this in the open to encourage collaboration from the wider software and biology communities.

We’ll be creating various open resources (software repositories, Wikis, blogs) to support the project later this year – do let us know if you would like to be involved and we will keep you informed.

How not to predict the future of search

I’ve just seen an article titled Enterprise Search: 14 Industry Experts Predict the Future of Search which presents a list of somewhat contradictory opinions. I’m afraid I have some serious issues with the experts chosen and the undeniably blinkered views some of them have presented.

Firstly, if you’re going to ask a set of experts to write about Enterprise Search, don’t choose an expert in SEO as part of your list. SEO is not Enterprise Search, in fact a lot of the time it isn’t anything at all (except snake oil) – it’s a way of attempting to game the algorithms of web search engines. Secondly, at least make some attempt to prevent your experts from just listing the capabilities of their own companies in their answers: in fact one ‘expert’ was actually a set of PR-friendly answers from a company rather than a person, including listing articles about their own software. The expert from Microsoft rather predictably failed to notice the impact of open source on the search market, before going on to put a positive spin on the raft of acquisitions of search companies over the last few years (and it’s certainly not all good, as a recent writedown has proved). Apparently the acquisition of specialist search companies by corporate behemoths will drive innovation – that is, unless that specialist knowledge vanishes into the behemoth’s Big Data strategy, never to be seen again. Woe betide the past customers that have to get used to a brand new pricing, availability and support plan as well.

Luckily it wasn’t all bad – there were some sensible viewpoints on the need for better interaction with the user, the rise of semantic analysis and how the rise of open source is driving out inefficiency in the market – but the article is absolutely peppered with buzzwords (Big Data being the most prevalent, of course) and contains some odd cliches: “I think a generation of people believes the computer should respond like HAL 9000″…didn’t HAL 9000 kill most of the crew and attempt to lock the survivor outside the airlock?

I’m pretty sure this isn’t a feature we want to replicate in an Enterprise Search system.

Tags: , , , ,

Posted in News

May 15th, 2014

1 Comment »

As Hadoop gains, does Lucene benefit?

The last few weeks have seen a rush of investment in companies that offer Hadoop-powered Big Data platforms – the most recent being Intel’s investment in Cloudera, but Hortonworks has also snorted up $100m.

Gartner correctly explains that Hadoop isn’t just one project, but an ecosystem comprising an increasing number of open source projects (and some closed source distributions and add-ons). Once you’ve got your Big Data in a HDFS-shaped pile, there are many ways to make sense of it – and one of those is a search engine, so there’s been a lot of work recently trying to add Lucene-powered search engines such as Apache Solr and Elasticsearch into the mix. There’s also been some interesting partnerships.

I’m thus wondering whether this could signal a significant boost to the development of these search projects: there are already Lucene/Solr committers working at Hadoop-flavoured companies who have been working on distributed search and other improvements to scalability. Let’s hope some of the investment cash goes to search!

The closed-source topping on the open-source Elasticsearch

Today Elasticsearch (the company, not the software) announced their first commercial, closed-source product, a monitoring plugin for Elasticsearch (the software, not the company – yes I know this is confusing, one might suspect deliberately so). Amongst the raft of press releases there are a few small liberties with the truth, for example describing Elasticsearch (the company) as ‘founded in 2012 by the people behind the Elasticsearch and Apache Lucene open source projects’ – surely the latter project was started by Doug Cutting, who isn’t part of the aforementioned company.

Adding some closed-source dusting to a popular open-source distribution is nothing new of course – many companies do it, especially those that are venture funded – it’s a way of building intellectual property while also taking full advantage of the open-source model in terms of user adoption. Other strategies include curated distributions such as that offered by Heliosearch, founded by Solr creator Yonik Seeley and our partner LucidWorks‘ complete packaged search applications. It can help lock potential clients into your version of the software and your vision of the future, although of course they are still free to download the core and go it alone (or engage people like us to help do so), which helps them retain some control.

It’s going to be interesting to see how this strategy develops for Elasticsearch (for the last time, the company). At Flax we’ve also built various additional software components for search applications – but as we have no external investors to please these are freely available as open-source software, including Luwak our fast stored query engine, Clade a taxonomy/classification prototype and even some file format extractors.

Time for the crystal ball again…

It’s always fun to make predictions about the future, especially as one can be pretty sure to be proved wrong in interesting ways. At the start of 2014 we at Flax are looking forward to another year of building open source search and we already have some great client projects in progress that we’ll shortly be able to talk about, but what else might be happening this year? Here’s some points to note:

  • The Elasticsearch project continues to add features at a prodigious rate during the arms race between it and Apache Solr – this battle can only be good news for end users in our view. We can expect a 1.0 release of Elasticsearch this year and several further major 4.x releases of Solr.
  • The Solr world has become slightly more complex as original author Yonik Seeley has left Lucidworks to start his own company, Heliosearch – with its own packaged distribution of Solr. How will Heliosearch contribute to the Solr ecosystem?
  • HP Autonomy is a sponsor of the Enterprise Search Europe conference this year, although there’s still some fallout from HP’s acquisition of Autonomy, and little news from the various official investigations into this process. Perhaps this year HP’s overall strategy will become a little clearer.
  • The Big Data bandwagon rolls on and more or less every search company now stresses its capabilities in this area for marketing purposes: but how big is Big? It’s not enough just to re-quote IDC’s latest study on how many exobytes everyone is producing these days, the value is in the detail, not the sheer volume: good (and deep) analytics is the key.
  • We think there might be some interesting things happening around open source search and bioinformatics soon – watch this space!

Tags: , , , , , ,

Posted in News

January 7th, 2014

No Comments »

Solr and the changing landscape of search

This morning I was told about the launch of a new US-based search company, Heliosearch, founded by the creator of Apache Solr, Yonik Seeley. It seems the landscape of open source search and in particular Solr is changing again – Heliosearch are planning their own ‘certified’ distribution of Solr plus a raft of support, consulting and services. In the meantime, the company Yonik co-founded (and our partners) LucidWorks are recently launched an ‘App Store’ for search, the Solr Marketplace, offering add-ons to the core engine from both themselves and others.

What we’re seeing here is the further growth of an ecosystem based around what has almost become the default choice for new and migrating search applications. Some clients will want a packaged distribution of Solr, some will be happy to download the source from Apache, some will need help getting started and some will just need help when things get complicated, or support for a running application. We’ve seen all of these requirements and more in the last year.

Next week the largest conference on open source search, Lucene Revolution is held in Dublin, and four of the Flax team are attending. Do let us know if you’d like to meet up – I don’t think there’s going to be a lack of things to talk about!

Tags: , , , ,

Posted in News, events

October 29th, 2013

No Comments »

Three reasons why your search may be prehistoric

ArnoldIT wondered today why we were bothering to announce an upgrade to the venerable dtSearch engine, when they “weren’t aware of too many people still using that software”. Perhaps it’s time for a quick reality check here – we regularly see clients with search engines that many would consider prehistoric still in active use. Here’s some reasons why that might be so:

  • Search isn’t seen as essential. If your accounting software goes down, nobody gets paid: but if the search engine has gradually degraded in accuracy, doesn’t always contain the most recent documents and is generally too hard to use then most of your users will try and find a way around it – they’ll Google for content on the corporate website, dig slowly through the filestores or call up a colleague to ask. Of course, all of this will take time and there’s the risk they won’t find anything useful (or worse, find something inaccurate or out-of-date), but time is only money, surely?
  • The magic has gone. The sharp suited salesman who told you all the magical things your search engine could do – it could understand concepts, human language and the meaning of life – is a distant memory. Somehow those magical features were never implemented, perhaps the unexpected extra cost put you off (surely the magic came as standard? No?). You’ve also probably turned off a lot of the clever features of your engine as either no-one could understand how to use them, or they affected performance so much that search results took minutes to appear.
  • Upgrading search is hard and expensive. Small changes to the existing engine can cost huge consultancy fees but if you change supplier, you’ll have a whole new team of salesmen to meet, lots more buzzwords to learn, there’s expensive new license fees to pay, you’ll also have to overhaul your content management system, your metadata, your front ends…better to leave everything alone, surely?

There are search engines out there, chugging away quietly behind a corporate firewall, whose antiquity would astonish. Any chance of a support contract has long gone as the supplier would prefer it if you upgraded to their latest-and-greatest version – that’s if the supplier still exists at all. However there is always a way to upgrade that reduces the risk and cost – an incremental, agile and open-source based approach will prevent future lock-in to a single supplier and give you more control of the code your search engine depends on. Recently we’ve used this approach to help clients successfully upgrade search applications based on dtSearch, FAST ESP and Oracle and in the near future we’ll be doing the same for clients with several other well-known engines – and a few lost in the mists of time!

Tags: , , , , ,

Posted in News

August 5th, 2013

No Comments »

New Year predictions: further search storms ahead!

2012 has been a fascinating and stormy year for those of us in the search business. We’ve seen a raft of further acquisitions of commercial closed source search companies by bigger players, some convinced that what used to be called Enterprise Search is now a solution to Big Data (like Stephen Arnold we wonder what will succeed Big Data as the next marketing term – I love his phrase “In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing”). One acquisition hasn’t gone so smoothly: Autonomy, bought by HP for a price that no-one in the search business thought was remotely sensible, has been accused of being oversold vapourware: this is a story that will continue to develop in 2013. If you want a great overview of the current market read Martin White’s latest research note.

Here in the slightly calmer waters of open source search, we’ve seen a huge rise in enquiries from often blue-chip companies, no longer needing persuasion that open source is a serious contender for even the largest search and content projects. Often these companies have considered large commercial solutions but are put off by both the price and high-pressure marketing tactics – in a world of reduced budgets you simply can’t sell magic beans for a pile of gold. We’ve also seen increased interest in related technologies such as machine learning and automatic categorisation – search really isn’t just about search any more.

At Flax we’re busier than we have ever been and we’re expected the trend to continue. We’re looking forward to running more Cambridge Search Meetups, visiting and helping organise conferences such as Enterprise Search Europe and Lucene Revolution, building our network of carefully chosen partners and of course working on exciting and cutting-edge development projects.

As the storms in our sector continue to rage overhead we’ll simply be getting on with what we do best, building effective search.

Tags: , , , , ,

Posted in Business, News

January 3rd, 2013

No Comments »

Trading-up to open source – a safer route to effective search

It hasn’t taken long for some of Autonomy’s rivals to attempt to capitalise on the recent bad PR around HP’s acquisition – OpenText has offered a ’software trade-in’, Recommind has offered a ‘trade-up’ and Swiss company RSD has offered a free license for their governance software to Autonomy customers. No word yet from Exalead, Oracle (Endeca), Microsoft (FAST) or any of the other big commercial search companies but I’m sure their salespeople are making the most of the situation.

Migrating a search engine from one technology to another is rarely trouble-free: data must be re-indexed, query architectures rewritten, integration with external systems re-done, relevancy checked…however with sufficient forethought it can be done successfully. We’ve just helped one client migrate from a commercial engine to Apache Solr in a matter of weeks: although at first glance Solr didn’t seem to support all of the features the commercial engine provided, it proved possible to simulate them using multiple queries and with careful design for scalability, query performance is comparable.

Choosing one closed source engine to replace another doesn’t remove the risk that future corporate mergers & acquisitions will cause exactly the same lack of confidence that is no doubt affecting Autonomy customers – or huge increases in license fees, a drop in the quality of available support or the end of the product line altogether – and we’ve heard of all of these effects over the last few years. Moving to an open source search engine gives you freedom and control of the future of the technology your business is reliant upon, with a wealth of options for migration assistance, development and support.

So here’s our offer – we’d be happy to talk, for free (by phone or face-to-face for customers within reach of our Cambridge offices), to any Autonomy customers considering migration and to help them consider the open source options (some of these even have the Bayesian, probabilistic search features Autonomy IDOL provides) – and together with our partners we can also provide a level of ongoing support comparable to any closed source vendor. We don’t have salespeople, we don’t have a product to sell you and you’ll be talking directly to experts with decades of experience implementing search – and there’s no obligation to take things any further. We’d simply like to offer an alternative (and we believe, safer) route to effective search.

Tags: , , , , , , ,

Posted in News

December 5th, 2012

No Comments »

Autonomy & HP – a technology viewpoint

I’m not going to comment on the various financial aspects of the recent news about HP’s write-down of the value of its Autonomy acquisition – others are able to do this far better than me – but I would urge anyone interested to re-read the documents Oracle released earlier this year. However, I am going to write about the IDOL technology itself (I’d also recommend Tony Byrne’s excellent post).

Autonomy’s ability to market its technology has never been in doubt: aggressive and fearless, it painted IDOL as unique and magical, able to understand the meaning of data in multiple forms. However, this has never been true; computers simply don’t understand ‘meaning’ like we do. IDOL’s foundation was just a search engine using Bayesian probabilistic ranking; although most other search technologies use the vector space model there are a few other examples of this approach: Muscat, a company founded a few years before and literally across the hall from Autonomy in a Cambridge incubator, grew to a £30m business with customers including Fujitsu and the Daily Telegraph newspaper. Sadly Muscat was a casualty of the dot-com years but it is where the founders of Flax first met and worked together on a project to build a half-billion-page web search engine.

Another even less well-known example is OmniQ, eventually acquired and subsequently shelved by Sybase. Digging in the archives reveals some familiar-sounding phrases such as “automatically capture and retrieve information based on concepts”.

Originally developed at Muscat, the open source library Xapian also uses Bayesian ranking and we’ve used this successfully to build systems for the Financial Times, Newspaper Licensing Agency and Tait Electronics. Recently, Apache Lucene/Solr version 4.0 has introduced the idea of ‘pluggable’ ranking models, with one option being the Bayesian BM25. It’s important to remember though that Bayesian ranking is only one way to approach a search problem and in many cases, simply unnecessary.

It certainly isn’t magic.