Posts Tagged ‘migration’

Enterprise Search Europe 2014 day 1 – Decisions, research and a Meetup quiz

This year’s Enterprise Search Europe was held near Victoria train station in London and unfortunately coincided with a two day strike on the London Underground – worrying for the organisers, but apart from a few notable absences it didn’t seem to affect the attendance too much. We started with a keynote from Dale Roberts, whose book on Decision Sourcing inspired a talk about a ‘rational decision making model’. When examining traditional relational database applications Dale said ‘if you peer at it long enough you can see the rows and columns’ and his point was that modern consumer social networking applications don’t exhibit this old pattern – so this is where search application designers should look for inspiration. His co-presenter Rooven Pakkiri said that Enterprise Search should attempt to ‘release the information from inside our heads’, which of course social networking might help with, connecting you with colleagues. I’m not sure that one can easily take lessons learnt from consumer applications and apply them to business use, and some later speakers agreed with me, but this was a high-energy and thought-provoking start.

Next I chaired the Open Source track, where we started with Cedric Ulmer of France Labs, who talked about a search application they built for a consultancy business with around 40 employees. Using Apache Solr, Apache ManifoldCF and their own Datafari open source framework they turned this project around very quickly – interestingly, the end clients needed no training to use the new system, which implies a very well designed UI. Our second talk from Ronald Hobbs of Reed Business International described a project on a much larger scale: 100 million documents, 72 business units and up to 190 queries per second – this was originally served by the FAST ESP engine but they moved to an Apache Solr system, replacing the FAST processing pipeline with Search Technologies Aspire project. His five steps for an effective migration (Prepare, Get the right tools, Get the right team, Migrate in chunks, Clean up) I can only agree with from our own experience of such projects, including one from FAST ESP to Solr. I was amused by his description of the Apache Zookeeper project as ‘a bipolar manic depressive’, although it seemed this was eventually overcome with a successful deployment on Amazon EC2. Next was Galina Hinova of Intrafind on a aftersales search application for MAN Truck and Bus – again at serious scale (MAN have around 1 billion vehicles in existence with 100-150 documents related to each). Interestingly the Euro6 regulations for emissions and standardized EU terms for automobile parts were direct drivers of the project, with Apache Lucene as the base technology. No longer is open source search just for small-scale projects it seems!

After a short break during which I chatted to John Newton, founder of Documentum Alfresco, and his team we returned to hear Dan Jackson give a description of how UCL had improved their website search – with a chaotic mix of low quality content and an ‘awful’ content management system, the challenges were myriad but with the help of experts such as our associate Tony Russell-Rose they have made significant improvements. Next was what was to prove a very popular talk from Nick Brown of AstraZeneca on a huge, well funded project to build applications to support research and development – again, this was at large scale with 75 million documents (including ‘all the patents and all the research papers’). The key here was their creation of many well-targeted ‘apps’ to enable particular uses of the Sinequa search engine they chose for the back end, including mobile apps to help find others in the company (or external to it) who are also working on a particular drug or disease. This presentation showed just what can be achieved if companies really understand the potential of search technology – knowledge sharing and discovery of previously unknown information.

After a short drinks reception we retired to a nearby pub for the combined Cambridge and London Search Meetup – I’d prepared a short quiz (feel free to have a go!) which was won by Tony Russell-Rose’s team. Networking and chatting continued long into the evening, with some people from the wider UK search community also attending.

To be continued! You can see most of the slides here.

Three reasons why your search may be prehistoric

ArnoldIT wondered today why we were bothering to announce an upgrade to the venerable dtSearch engine, when they “weren’t aware of too many people still using that software”. Perhaps it’s time for a quick reality check here – we regularly see clients with search engines that many would consider prehistoric still in active use. Here’s some reasons why that might be so:

  • Search isn’t seen as essential. If your accounting software goes down, nobody gets paid: but if the search engine has gradually degraded in accuracy, doesn’t always contain the most recent documents and is generally too hard to use then most of your users will try and find a way around it – they’ll Google for content on the corporate website, dig slowly through the filestores or call up a colleague to ask. Of course, all of this will take time and there’s the risk they won’t find anything useful (or worse, find something inaccurate or out-of-date), but time is only money, surely?
  • The magic has gone. The sharp suited salesman who told you all the magical things your search engine could do – it could understand concepts, human language and the meaning of life – is a distant memory. Somehow those magical features were never implemented, perhaps the unexpected extra cost put you off (surely the magic came as standard? No?). You’ve also probably turned off a lot of the clever features of your engine as either no-one could understand how to use them, or they affected performance so much that search results took minutes to appear.
  • Upgrading search is hard and expensive. Small changes to the existing engine can cost huge consultancy fees but if you change supplier, you’ll have a whole new team of salesmen to meet, lots more buzzwords to learn, there’s expensive new license fees to pay, you’ll also have to overhaul your content management system, your metadata, your front ends…better to leave everything alone, surely?

There are search engines out there, chugging away quietly behind a corporate firewall, whose antiquity would astonish. Any chance of a support contract has long gone as the supplier would prefer it if you upgraded to their latest-and-greatest version – that’s if the supplier still exists at all. However there is always a way to upgrade that reduces the risk and cost – an incremental, agile and open-source based approach will prevent future lock-in to a single supplier and give you more control of the code your search engine depends on. Recently we’ve used this approach to help clients successfully upgrade search applications based on dtSearch, FAST ESP and Oracle and in the near future we’ll be doing the same for clients with several other well-known engines – and a few lost in the mists of time!

Tags: , , , , ,

Posted in News

August 5th, 2013

No Comments »

Trading-up to open source – a safer route to effective search

It hasn’t taken long for some of Autonomy’s rivals to attempt to capitalise on the recent bad PR around HP’s acquisition – OpenText has offered a ’software trade-in’, Recommind has offered a ‘trade-up’ and Swiss company RSD has offered a free license for their governance software to Autonomy customers. No word yet from Exalead, Oracle (Endeca), Microsoft (FAST) or any of the other big commercial search companies but I’m sure their salespeople are making the most of the situation.

Migrating a search engine from one technology to another is rarely trouble-free: data must be re-indexed, query architectures rewritten, integration with external systems re-done, relevancy checked…however with sufficient forethought it can be done successfully. We’ve just helped one client migrate from a commercial engine to Apache Solr in a matter of weeks: although at first glance Solr didn’t seem to support all of the features the commercial engine provided, it proved possible to simulate them using multiple queries and with careful design for scalability, query performance is comparable.

Choosing one closed source engine to replace another doesn’t remove the risk that future corporate mergers & acquisitions will cause exactly the same lack of confidence that is no doubt affecting Autonomy customers – or huge increases in license fees, a drop in the quality of available support or the end of the product line altogether – and we’ve heard of all of these effects over the last few years. Moving to an open source search engine gives you freedom and control of the future of the technology your business is reliant upon, with a wealth of options for migration assistance, development and support.

So here’s our offer – we’d be happy to talk, for free (by phone or face-to-face for customers within reach of our Cambridge offices), to any Autonomy customers considering migration and to help them consider the open source options (some of these even have the Bayesian, probabilistic search features Autonomy IDOL provides) – and together with our partners we can also provide a level of ongoing support comparable to any closed source vendor. We don’t have salespeople, we don’t have a product to sell you and you’ll be talking directly to experts with decades of experience implementing search – and there’s no obligation to take things any further. We’d simply like to offer an alternative (and we believe, safer) route to effective search.

Tags: , , , , , , ,

Posted in News

December 5th, 2012

No Comments »

Google Search Appliance version 7 – too little too late?

Google have launched a new version of their search appliance this week – this is the GSA of course, not the Google Mini which was canned in summer 2012 (someone hasn’t told Google UK it seems – try buying one though).

Although there’s a raft of new features, most of them have been introduced by the GSA’s competitors over the last few years or are available as open source (entity recognition or document preview for example). The GSA is also not a particularly cheap option as commentators including Stephen Arnold have noticed: we’ve had clients tell us of six-figure license fees for reasonably sized collections of a few millions of documents – and that’s for two years, after which time you have to buy it again. Not surprisingly some people have migrated to other solutions.

However there’s another question that seems to have been missed by Google’s strategists: how a physical appliance can compete with cloud-based search. I can’t think of a single prospective client over the last year or so who hasn’t considered this latter option on both cost and scalability grounds (and we’ll shortly be able to talk about a very large client who have chosen this route). Although there may well be a hard core of GSA customers who want a real box in reassuring Google yellow, one wonders why Google haven’t considered a ‘virtual’ GSA to compete with Amazon’s CloudSearch amongst others.

It will be interesting to see if this version of the GSA is the last…

Tags: , , , , ,

Posted in News

October 10th, 2012

No Comments »

An open source replacement for the dtSearch closed source search engine

We’ve been working on a client project where we needed to replace the dtSearch closed source search engine, which doesn’t perform that well at scale in this case. As the client has significant investment in stored queries (it’s for a monitoring application) they were keen that the new engine spoke exactly the same query language as the old – so we’ve built a version of Apache Lucene to replace dtSearch. There are a few other modifications we had to do as well, to return such things as positional information from deep within the Lucene code (this is particularly important in monitoring as you want to show clients where the keywords they were interested in appeared in an article – they may be checking their media coverage in detail, and position on the page is important).

First, we developed a new Lucene Analyzer that speaks the same syntax as dtSearch, allowing us to index text input. On the search side we have a Lucene QueryParser that shares this syntax. To make it easier to use we’ve wrapped the whole lot in a modified Solr server. As we needed some features of very recent Lucene code, our modifications are based on a patch to Lucene trunk (and so the source code isn’t for the faint hearted – if you need it let us know, but we’re not currently providing it for download).

We’re not sure if there’s anyone else out there who needs an open source alternative to dtSearch – but in case there is we’ve provided a downloadable WAR file with the latest Solr executables in our downloads area, including a brief README file.

More generally, what this project demonstrates is that even if you have significant investment in your existing search infrastructure it is entirely possible to move to an open source alternative, which may be faster and will almost certainly be more economically scalable. Does anyone else have a search engine they’d like to replace?

How not to make the same mistake twice

We’ve been aware that some FAST customers will be considering migration for a while now – but Autonomy have finally caught up.

However, if you migrate from one closed source solution to another, how can you guarantee that the same sort of events that have led to the current situation won’t happen again? With open source, there’s no vendor lock-in, a wide choice of companies to assist you with development an integration, a wealth of different support options and of course no license fees to pay. Migrating from FAST is a common topic at conferences at the moment – read Jan Høydahl’s presentation, or see Michael McIntosh’s video. There are even open source document processing pipeline frameworks to replace the popular FAST one, and we’ve been evaluating some alternative language processing frameworks. Scaling isn’t an issue and some cases you could significantly reduce your hardware budget.

Tags: , , ,

Posted in Uncategorized

December 6th, 2010

No Comments »