One of the things we often notice about existing systems based on relational databases (RDB) is that as they scale to millions of items, simple lookup tasks become slow and inefficient. These tasks don’t usually require complicated database operations, so in most cases it is possible to relocate the data from the RDB into a search engine like Flax.
Consider a system where a search engine has already been implemented to search textual product information, but numerical data on each product, such as price, is still being stored in a RDB. Users will often need filters on search results such as ’show me items under £10′ and so a RDB operation similar to ‘SELECT productID FROM products WHERE price<£10‘ will be needed, in addition to the search engine query. Modern search engines like Flax implement range search functions, so that numerical information can be added to documents, and it is thus possible to carry out this operation in the search engine as part of the full-text search for the product information.
We’ve noticed with several clients that it is now possible to move all their data from the original RDB into the search engine. This can obviously lead to cost savings, as only one system must be hosted, maintained and backed up, and scaling out can be far simpler.
Another way to look at this is to consider a search engine as an example of a document-oriented database.
Our technical partners Cognidox have released a whitepaper detailing their view of the enterprise search market, titled “Why you can’t just ‘Google’ for Enterprise Knowledge” – it’s well worth a read. You can download the PDF from their archive.
We recently helped a small marine consultancy, running a Windows network, implement a completely free enterprise search solution. Even SMEs are now finding it hard to keep on top of the information they produce, and there are few low-cost options for searching their documents. Read the case study here (PDF).
Flax Search Server now has a Perl client, thanks to the guys at Cognidox, who have blogged about why they needed to improve the search facility for their powerful document management system.
My colleague Richard Boulton will be presenting at Europython in Birmingham, U.K. next week, specifically at 15.30 on Tuesday 30th June – an abstract is available. He’ll be talking about Xapian, Xappy and Flax, and showing examples of these in action including one using a Django integration layer.
Update: you can now download the slides for Richard’s talk in OpenOffice format.
The Flax team are pleased to announce the alpha release of Flax Search Service (FSS). FSS combines powerful, high-level indexing and search features with a well-designed Web Services interface. FSS is Open Source software (under the MIT licence) and is available as a free download from Google Code.
Web Services and Service Oriented Architectures (SOA) have become increasingly popular in recent years due to their many advantages. FSS provides a RESTful interface in which databases, documents, and searches are represented as resources identified by URLs. For example, to add a document to a database,the document data is POSTed to the database resource. To search for a word or phrase,the client sends the query as a GET request to the database, which responds with a list of matching documents. Indexing transactions may be handled automatically or explicitly by the client.
For convenience, client libraries are being developed in several languages, including PHP, Python, Java and JavaScript. It would be a simple matter to interface to FSS in any language with support for Web protocols. The FSS distribution also includes example code to get you started, and basic documentation.
FSS alpha supports enough indexing and search functionality to implement basic but useful information retrieval systems. Over the next few months we will be adding support for advanced features like facets and tags, geolocation and image search. It will run on any system with support for Xapian and Python (Windows, Linux and Mac amongst others).
We’ve updated the Flax website with a page showing the Flax software stack – hopefully this will go some way towards explaining how Xapian, Xappy and parts of Flax all fit together. There’s still lots in development so expect some more news later this month.
As part of this, we’ve created a new page bringing together all the Win32 files for Xapian that we maintain – including some pre-built binaries for those of you who don’t want to compile Xapian yourself. We’re working on creating one-click installable packages for bindings for the various languages – however at present we’ve only finished this for Python. Hopefully some users of the other languages will let us know how best to present the other bindings.
Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released.
By the way, the article mentions Xappy: this is a Python interface to Xapian that is a major part of our Flax enterprise search platform. You can get Xappy here.
Searching images is a difficult problem, and it’s not a feature offered by many commercial search engines. Some will cheat slightly, by indexing the title or filename of the image, or the text surrounding an image embedded on a page, and call this ‘image search’ – but this method doesn’t work very well, especially when you have a standalone image called ‘IMG0000064.jpg’ which is actually a picture of an apple. We’ve seen some good demos of actual image search – Imense is particularly impressive – but none that promise a generic solution that will work with all images.
In the meantime we’ve been developing some image related search technology for one of our clients, and we can now offer image similarity matching as part of Flax – you can read more about this exciting development on the Searching with Xapian blog, written by my colleague Richard Boulton.
Based on some feedback, we’ve made some more technical details about Flax available on our Features page. You can download the PDF here.