The trouble with tabbing: editing rich text on the Web

Matt Pearce, who joined the Flax team earlier this year, writes: A recent client wished to convert documents to and from Microsoft Office formats, using a web form as an intermediate step for editing the content. The documents were read in, imported to a Solr search engine, and could then be searched over, cloned, edited and transformed in batches, before being exported to Office once more. The cont...Continue reading

An open approach to tuning search for gov.uk

Roo Reynolds from the GDS team has written a great blog post about the ongoing process of tuning the search for gov.uk which I can highly recommend. We regularly see situations where a search project has been set up as 'fire and forget' - which is never a good idea: not only does content grow, but use...Continue reading

Building high-end search features at low cost with Apache Solr

One of the best things about the increased use of open source search technology is that features that were previously unattainable for clients with small budgets are now within reach. Our client Bride and Groom Direct, a UK-based business selling wedding gifts and stationery, asked us if we could help improve the search features on their website and in particular the auto-suggest - and they asked us to take a look at the website of US mega-reta...Continue reading

Apache Lucene & Solr version 4.0 released, a giant leap forward for open source search

This morning the largest open source search project, Apache Lucene/Solr, released a new version with a raft of new features. We've been advising clients to consider version 4.0 for several months now, as the alpha and beta versions have become available, and we know of several already running this version on live sites. Here's a few highlights:

Tuning and improving elasticsearch for the Government Digital Service

The exciting GOV.UK project is getting close to its first release date of October 17th and we were asked by them to help with some search tuning as they migrate from Apache Solr to elasticsearch. Although elasticsearch has some great features there are still some areas where it lags Solr, such as the lack of spelling suggestion and proximity...Continue reading

Updating individual fields in Lucene with a Redis-backed codec

A customer of ours has a potential search application which requires (largely for reasons of performance) the ability to update specific individual fields of Apache Lucene documents. This is not the first time that someone has asked for this functionality. However, until now, it has been impossible to change field values in a Lucene document without re-indexing the...Continue reading

Clade – a freely available, open source taxonomy and autoclassification tool

One way to manage digital information is to classify it into a series of categories or a heirarchical taxonomy, and traditionally this was done manually by analysts, who would examine each new document and decide where it should fit. Building and maintaining taxonomies can also be labour intensive, as these will change over time (for a simple example, just consider how political parties change and divide, with factions appearing and disappearin...Continue reading

Better search for e-petitions – handling misspelled content with a Solr phonetic filter

We recently overhauled the search functionality for the UK government's e-petitions site, run by the Government Digital Service, a new team within the Cabinet Office. Search has an important function on the site; users are forced to search for existing petitions which cover their area of concern before creating a new one. This cuts down on the number of near-duplicate petitions, and makes petitions ...Continue reading

An open source replacement for the dtSearch closed source search engine

We've been working on a client project where we needed to replace the dtSearch closed source search engine, which doesn't perform that well at scale in this case. As the client has significant investment in stored queries (it's for a monitoring application) they were keen that the new engine spoke exactly the same query language as the old - so we've built a version of Apache Lucene to replace dtSearch. There are a ...Continue reading

Search backwards – media monitoring with open source search

We're working with a number of clients on media monitoring solutions, which are a special case of search application (we've worked on this previously for Durrants). In standard search, you apply a single query to a large amount of documents, expecting to get a ranked list of documents that match your query as a result. However in media monitoring you need to ...Continue reading