We recently implemented a search solution for a customer using Elasticsearch. Most of their requirements were fairly standard, however they also wanted to be able to search for IP addresses embedded in the document text, using a flexible and precise search syntax, e.g. given the following document fragment:
... the API can be accessed at 18.104.22.168 on port 8700 ...
the following searches should all find the document:
A customer of ours has a potential search application which requires (largely for reasons of performance) the ability to update specific individual fields of Apache Lucene documents. This is not the first time that someone has asked for this functionality. However, until now, it has been impossible to change field values in a Lucene document without re-indexing the...Continue reading
We recently overhauled the search functionality for the UK government's e-petitions site, run by the Government Digital Service, a new team within the Cabinet Office. Search has an important function on the site; users are forced to search for existing petitions which cover their area of concern before creating a new one. This cuts down on the number of near-duplicate petitions, and makes petitions ...Continue reading
Charlie wrote previously that we try and work with flexible, lightweight frameworks: flax.core is a Python library for conveniently adding functionality to Xapian projects. The current (and first!) version is 0.1, which can be checked out from the flaxcode repository. This version supports named fields for inde...Continue reading
The Flax team are pleased to announce the alpha release of Flax Search Service (FSS). FSS combines powerful, high-level indexing and search features with a well-designed Web Services interface. FSS is Open Source software (under the MIT licence) and is available as a free download from Google Code.
Web Services and Service Oriented Architectures (SOA) have become increasingly popular in recent years due to their many advantages. FSS provides...Continue reading
We finally decided to move entirely to flax.co.uk. The one page remaining is the news archive....Continue reading
For most applications, Xapian/Flax's search performance will be excellent to acceptable on a single machine of reasonable spec (see here for a discussion of CPU and RAM requirements). However, if the document corpus is unusually large - more than about 20 million items - then one server may not be enough for acceptable speed. Xapian provides a mechanism called remote backends which lets the load be shared ov...Continue reading
This is not strictly a Flax post, but is intended to clarify the Xapian search architecture for people using Xapian directly. It's not intended for experienced Xapian hackers, neither is it a general introduction to using Xapian (see here instead).
The Xapian API is fairly complex, and there is often confusion about the role of the QueryParser, terms, document values, document data etc. in indexing and searching. It is probably worth pointi...Continue reading