Archive for the ‘News’ Category
We’ve just published a case study on our work for C Spencer Ltd., a UK-based civil engineering company who take a pro-active approach to document management – instead of taking the default Sharepoint route or buying another product off the shelf, they decided to create their own in-house system based on open source components, hosted on the Amazon AWS Cloud. We’ve helped them integrate Apache Solr to provide full text search across the millions of items held in the document management system, with a sub-second response. Their staff can now find letters, contracts, emails and designs quickly via a web interface.
C Spencer are known for their innovative and modern approach – they’re even building their own green power station on a brownfield site in Hull. It’s thus not surprising that they chose cutting-edge open source technology for search: tracking and managing documents correctly is extremely important to their business.
We’re pleased to announce our work with Reed Specialist Recruitment, one of the UK’s largest recruitment companies, where we helped them implement an Apache Solr powered application to allow their 3000+ staff to search for and match candidates to jobs. We built an innovative indexing framework, a configuration tool and performance monitoring system for Reed and the system launched on time and under budget, a great testament to the flexibility and power of this open source software. The new system responds in under a second – a massive improvement on the previous response time of several minutes. You can read the press release here.
If you’d like to hear more I’ll be giving a presentation on the project at Lucene Eurocon in Barcelona tomorrow – Wednesday 19th October at 1.30 p.m. – slides and a video will be online after the event.
If you can’t make it to Barcelona I’ll also be talking in London, on the business benefits of open source search, at around 10am on Tuesday 25th October with our client Stephen Wicks, CTO of Gorkana Group as part of Enterprise Search Europe – there are still tickets available and you can even get a 20% discount if you join the Cambridge or London Enterprise Search Meetups, who are hosting a joint event on the Monday evening of the conference.
Our customer Cambridge Intellectual Property announced yesterday their new API for a collection of 55 million patents – 48 million more than Google Patents. It’s great to see a Cambridge company innovating in this space, especially as the service is powered by Apache Solr (we’ve given them some small assistance with configuring and tuning this software over the last few months).
The API, available on the Boliven website, offers a REST based service and returns patent data in JSON or XML – so users can easily integrate patent data with their own applications. It can also return PDFs or summaries of the selected patents. In addition, the API will allow users to search and query Boliven’s database of 45+ million science literature documents including journal publications and medical device trials. That’s around 100 million items in total.
Like the Guardian’s Open Platform which I wrote about previously, this is a great example of open source search technology as a platform for new delivery methods – showing how effective (and economical) it can be at this large scale.
It didn’t take me long to find my own small contribution to the patent landscape.
The blogotweetosphere has been positively buzzing since last night’s announcement that Hewlett Packard will be buying Autonomy for £7.1bn, while divesting itself of its PC business. Many commentators have put a positive spin on this, pointing to Autonomy’s meteoric rise from a small office in Cambridge to the behemoth it is today. It’s undoubtedly good news for Autonomy’s shareholders. Dave Kellogg correctly identifies Autonomy as a “finance company dressed in (meaning-based) technology company clothing” with a “happy ending”.
However the reaction isn’t all positive – the FT implies this deal is at the “lunatic end of the valuation spectrum”. Law Technology News says “Autonomy’s e-discovery revenue stream is high-end but unsustainable” and quotes users of the system with problems: “We had a lot of issues with the applications crashing, the documents tending not to get checked in”….”"[Autonomy sales staff] were pricey, arrogant, and they couldn’t care less about us. … It cannot get any worse.”.
HP will have to work hard to integrate Autonomy into both its corporate culture and software frameworks – a problem currently faced by Microsoft since its acquisition of FAST a short while ago. Stephen Arnold thinks this process will be “risky”. What it means for the rest of the search sector is harder to guess, although Martin White of Intranet Focus says this deal indicates HP can see a “future in search applications” and, interestingly, “A number of privately-held search vendors are probably working out what their valuation would be”.
My view is that this is just the latest of huge shifts in the enterprise search market, partly spurred on by the rise of open source options and the gradual realisation that the huge license fees charged by some vendors may be unsustainable. I don’t think Autonomy will be the last company looking for a safe haven in the years to come.
There’s a lot of buzz currently around the UK government and its approach to IT projects (which has been historically rather poor in terms of delivery, schedules and cost). We’ve written before about an Action Plan that recommends open source and open standards, but it seems that actually implementing these is more of a problem, especially when you consider (flexible and more agile) smaller suppliers such as ourselves who may not even get a chance to compete for the business.
There’s an inquiry running currently that promises to look at this, and they have invited various people to put their views across. Unfortunately with one laudable exception these people were from (or mainly represent) very large IT companies who already supply the government and whose interest lies in maintaining the status quo.
As Mark Taylor of Sirius has already pointed out, this situation isn’t going to change until government procurement itself becomes an open process, so that we can all see how much could be wasted on outdated project management methods and overpriced closed source software.
I’ve been reading the revised Open Source, Open Standards and ReUse: Government Action Plan – it’s surprising (and heartening) to see this has existed in one form or another since as far back as 2004.
The key changes for this version are:
suppliers have to show evidence they’ve considered open source options – hopefully this will be more than a quick trawl through SourceForge
’shadow license costs’ have to be shown in calculations to take account of previous purchases of ‘perpetual’ licenses – apparently in some cases this could make software license fees for a project appear as zero!
all purchases have to be on the basis of of re-use across the government sector – so no need to pay again if a system moves to the cloud in the future
This all sounds great for the open source community; let’s also hope that increased openness in government means that we’ll be able check the Action Plan is actually being followed!
By the way a great example of open source in action on government data is They Work For You, which cleans up Hansard and makes more accessible – search is powered by Xapian.
Last year my colleague Tom Mortimer talked about indexing security information within an open source enterprise search application, and we’re happy to announce more details of the project. Our client is an international radio supplier, who had considered both closed source products and search appliances, but chose open source for greater flexibility and the much lower cost of scaling to indexes of millions of documents.
Using the Flax platform, we built a high-performance multi-threaded filesystem crawler to gather documents, translated them to plain text using our own open source Flax Filters and captured Unix file permissions and access control lists (ACLs). User logins are authenticated against an LDAP server and we use this to show only the results a particular user is allowed to see. We also added the ability to tag documents directly within the search results page (for example, to mark ‘current’ versions, or even personal favourites) – the tags can then be used to filter future results. Faceted search is also available.
You can read more about the project in a case study (PDF) and Tom’s presentation slides (PDF) explain more about the method we used to index the security information.
Analysts Ovum have released a report on enterprise search – it’s not clear where to obtain it yet, although Report Linker may have it available. According to one report it may also be called “Enterprise Search and Retrieval: Exploiting all of the Organisation’s Information Assets”.
Interestingly most of the press coverage around the release is focussing on the author, Mike Davis’s statements about open source solutions – in particular “…in fact, companies should only go to the big proprietary players if open source can’t deliver what they need. “. He also states that “there are mere nuances between those ranked” – and this includes the open source option Solr 1.4.
This is the clearest statement yet from an analyst that enterprise search engines are all pretty much the same thing, if you strip away the marketing – but more importantly, that open source should be the first option to consider.
It’s been an interesting and busy twelve months here at Flax – we’ve worked on some fantastic customer projects, spoken at conferences at home and abroad and made some great alliances and partnerships. We are talking to more people than ever before about the advantages of open source search and we’ve even started a local Meetup group.
This has been the year when open source search moved out of the shadows and became a force to reckon with – whether handling billions of queries or millions of customers, powering innovative new APIs for open content from forward-looking media companies or simply making it easier for search applications to be developed. Commercial support is now available to rival anything offered by the closed source world and there are now fully packaged solutions built on open source. In some sectors open source may even become the default choice (see what IDC said about the embedded/OEM market).
There’s still significant change to come in the search sector – I expect a few vendors will be in trouble by this time next year as they realise their business models (often built on per-document charges) are out-of-date, and we might also see further acquisitions by the usual behemoths. All this leads to reduced choice and increased costs for customers, and this is where open source can help – you can build your search solution in-house, or engage companies like ours to help, but you’re no longer locked in to a vendor’s roadmap and shackled to their business plan (or the consequences of its failure!).
I’ll leave the final word to Matt Asay of Canonical, who says: “Open source is how we do business 10 years into this new millennium.”
Media monitoring is not a traditional search application: for a start, instead of searching a large number of documents with a single query, a media monitoring application must search every incoming news story with potentially thousands of queries, searching for words and terms relevant to client requirements. This can be difficult to scale, especially when accuracy must be maintained – a client won’t be happy if their media monitors miss relevant stories or send them news that isn’t relevant.
We’ve been working with Durrants Ltd. of London for a while now on replacing their existing (closed source) search engine with a system built on open source. This project, which you can read more about in a detailed case study (PDF), has reduced the hardware requirements significantly and led to huge accuracy improvements (in some cases where 95% of the results passed through to human operators were irrelevant ‘false positives’, the new system is now 95% correct).
The new system is built on Xapian and Python and supports all the features of the previous engine, to ease migration – it even copes with errors introduced during automated scanning of printed news. The new system scales easily and cost effectively.
As far as we know this is one of the first large-scale media monitoring systems built on open source, and a great example of search as a platform, which we’ve discussed before.