Government – Flax http://www.flax.co.uk The Open Source Search Specialists Thu, 10 Oct 2019 09:03:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.8 Tuning and improving elasticsearch for the Government Digital Service http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/ http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/#comments Mon, 01 Oct 2012 15:45:03 +0000 http://www.flax.co.uk/blog/?p=855 The exciting GOV.UK project is getting close to its first release date of October 17th and we were asked by them to help with some search tuning as they migrate from Apache Solr to elasticsearch. Although elasticsearch has some great … More

The post Tuning and improving elasticsearch for the Government Digital Service appeared first on Flax.

]]>
The exciting GOV.UK project is getting close to its first release date of October 17th and we were asked by them to help with some search tuning as they migrate from Apache Solr to elasticsearch. Although elasticsearch has some great features there are still some areas where it lags Solr, such as the lack of spelling suggestion and proximity boost features. Alan from Flax spent a couple of days working with the GDS team and has blogged about how proximity boosting in particular can be implemented – at least for terms that are relatively close to each other rather than being separated by a page or so.

If you’re interested in more details of how we fixed this and a few other elasticsearch issues, you may want to take a look at the code we worked on – one of the best things about working with the GOV.UK team is that it was already up as open source software within a day (yes, you read that right – code paid for by the taxpayer is open source, as it should be!). We’re looking forward to launch day!

Update: changed ‘proximity search’ to ‘proximity boost’ – thanks Alan!

The post Tuning and improving elasticsearch for the Government Digital Service appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/10/01/tuning-and-improving-elasticsearch-for-the-government-digital-service/feed/ 3
Better search for e-petitions – handling misspelled content with a Solr phonetic filter http://www.flax.co.uk/blog/2012/05/24/better-search-for-e-petitions-handling-misspelled-content-with-a-solr-phonetic-filter/ http://www.flax.co.uk/blog/2012/05/24/better-search-for-e-petitions-handling-misspelled-content-with-a-solr-phonetic-filter/#comments Thu, 24 May 2012 09:40:41 +0000 http://www.flax.co.uk/blog/?p=759 We recently overhauled the search functionality for the UK government’s e-petitions site, run by the Government Digital Service, a new team within the Cabinet Office. Search has an important function on the site; users are forced to search for existing … More

The post Better search for e-petitions – handling misspelled content with a Solr phonetic filter appeared first on Flax.

]]>
We recently overhauled the search functionality for the UK government’s e-petitions site, run by the Government Digital Service, a new team within the Cabinet Office. Search has an important function on the site; users are forced to search for existing petitions which cover their area of concern before creating a new one. This cuts down on the number of near-duplicate petitions, and makes petitions more effective.

The website is implemented in Ruby on Rails, using the Sunspot Solr client library. There are currently only 22,000 petitions, of no more than a few kilobytes each – easily enough to fit into the cache of a standard server. Despite this, the previous configuration was performing badly, and maxing out 8 CPU cores on a virtual machine under a load of a few hundred queries per second. Retrieval was also poor, with no results at all found for queries like “EU”.

The first thing we did was to install Solr 3.6 (the previous version was the rather elderly 1.4) running in Jetty on Ubuntu. Then we looked at the schema and search implementation. The former was using the standard Sunspot field mappings, which is fine for many applications but in this case was not allowing flexibility of weighting. Searches used the standard query parser to parse a hand-constructed query string with different field weightings and frequent use of the fuzzy match operator (e.g. “leasehold~0.8”). This seemed to be the most likely cause of poor performance under load.

Fuzzy matching had been used because of the frequent misspellings in petition text entered by users (e.g. “marraige” instead of “marriage”). Solr spelling correction on the query is not appropriate here, as correctly-spelled queries may not find misspelled content. But since fuzzy matching was performing badly on a relatively small index, we needed a new approach.

What we came up with was two levels of fields: the first being normalised with lowercasing and KStem but otherwise matching exactly, the second using a PhoneticFilterFactory to perform a Double Metaphone encoding on terms. We hoped that the misspellings in the corpus would transform to the same terms under this filter (e.g. “marriage” and “marraige” both yielding “MJ” etc.) The exact fields should provide precision, the phonetic fields, retrieval. Fields were populated using the copyField directive, without changing the client indexing code. We configured an eDisMax query handler to provide a simple interface and removed the custom query string construction from the client code.

In practice, this worked very well – the new server can handle search loads 5 times or greater compared with the previous one, and the CPUs are never maxed out (despite the server having only 4 cores compared with the previous 8). Ranking and retrieval are also greatly improved, and searches for “EU” return relevant petitions!

Phonetic algorithms are never going to catch all misspellings, and had Solr 4.0 been released at this time (with its very fast fuzzy engine) then it would have been the obvious approach to try. However, for now the search is much better, in less than 2 days of effort.

The post Better search for e-petitions – handling misspelled content with a Solr phonetic filter appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2012/05/24/better-search-for-e-petitions-handling-misspelled-content-with-a-solr-phonetic-filter/feed/ 2
Encouraging the use of open source software in government http://www.flax.co.uk/blog/2011/06/10/encouraging-the-use-of-open-source-software-in-government/ http://www.flax.co.uk/blog/2011/06/10/encouraging-the-use-of-open-source-software-in-government/#respond Fri, 10 Jun 2011 09:43:00 +0000 http://www.flax.co.uk/blog/?p=574 I spent yesterday evening at the British Computer Society on the panel of an event organised by the Open Source Specialist Group, nominally discussing the skills required to build Content Management Systems (CMS) using open source software (OSS). We heard … More

The post Encouraging the use of open source software in government appeared first on Flax.

]]>
I spent yesterday evening at the British Computer Society on the panel of an event organised by the Open Source Specialist Group, nominally discussing the skills required to build Content Management Systems (CMS) using open source software (OSS). We heard a lot about a the features and advantages of CMS such as Joomla, Drupal and Plone and the document management system Alfresco, and I contributed some details of Apache Lucene/Solr and Xapian which can be used in concert with all of these systems (and are usually available as plug-in modules).

We also considered how best to encourage the further use of OSS within the UK government, and I’ve tried to list some of the suggestions that were made – this is in no way a complete list, but it’s a start.

  • Look at what has been done with OSS in other countries in the government sector – e.g. the PloneGov initiative. A lot of this knowledge and expertise should be transferable.
  • Publicise current use within government – we all know that OSS is already being used on government websites and intranets, but if this can be more widely known it will encourage further use of OSS within the sector. We hear that there are already ‘skunkworks’ teams in government using open source and open standards – make sure we hear more about what they build.
  • Support the open source projects themselves – this could be by contributing code developed within government back to OSS projects, or by supporting the open source community in other ways – for example, funding the creation of better documentation, or making it easier to run open source conferences (perhaps with the help of local goverment).
  • Improve the procurement process to better understand open source as a viable alternative and to ease its adoption (for example, many open source companies are smaller than closed source vendors and thus less able to engage in lengthy and expensive procurement rounds).
  • Understanding that comparing OSS to a closed source product is often like comparing apples to oranges – OSS provides a highly flexible toolkit where the user chooses what features they want, as opposed to a closed source product where feature sets are fixed by the vendor. During procurement, simple ‘check box’ lists of required features are thus not always applicable.
  • Listen more to OSS experts and bringing them into goverment to help educate and inform.

The post Encouraging the use of open source software in government appeared first on Flax.

]]>
http://www.flax.co.uk/blog/2011/06/10/encouraging-the-use-of-open-source-software-in-government/feed/ 0