NOT WITHIN queries in Lucene

A guest post from Alan Woodward who has joined the Flax team recently: I’ve been working on migrating a client from a legacy dtSearch platform to a new system based on Lucene, part of which involves writing a query parser to translate their existing dtSearch queries into Lucene Query objects. dtSearch allows you to perform proximity searches – find doc...Continue reading

Outside the search box – when you need more than just a search engine

Core search features are increasingly a commodity - you can knock up some indexing scripts in whatever scripting language you like in a short time, build a searchable inverted index with freely available open source software, and hook up your search UI quickly via HTTP - this all used to be a lot harder than it is now (unfortunately some vendors would have you believe this is still the case, which is reflected in their hefty price tags). However we're increasingly asked to develop features ...Continue reading

How to remove a stored field in Lucene

While working on a customer project recently we found a very large field that was stored unnecessarily in the Lucene index, taking up a lot of space. As it would have taken a very long time to re-index (there are tens of millions of complex documents in this case) we looked for a way to remove the stored field in-place. There's an interesting set of slides from last year's Apache Lucene Eurocon which discuss this kind of Lucene index pos...Continue reading

Open source search engines and programming languages

So you're writing a search-related application in your favourite language, and you've decided to choose an open source search engine to power it. So far, so good - but how are the two going to communicate? Let's look at two engines, Xapian and Lucene, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ - so if you're using those languages respectively, everything should be relatively simple - j...Continue reading

Packaged solutions and customisability, the Python way

With any large scale software installation, there is going to be some customisation and tweaking necessary, and enterprise search systems are no exception. Whatever features are packaged with a system, some of those you need will be missing and some won't be used at all. It's rare to see a situation where the search engine can just be installed straight out of the box. Our Flax system is based on the Xapian core, which has a set of bindings to various differe...Continue reading

Xapian 1.2.0 arrives

Xapian 1.2.0, the first of a new 'stable' release series, was announced a few weeks ago and we've just uploaded pre-built binaries for Windows and associated build files. You can find them on our Xapian downloads page. This version features a new, faster, more compact database format and enhanced backwards compatibility with existing databases; a built-in replication system (so in a distributed system you only need to propagate the changes to a Xap...Continue reading

Some new open source file filters & previewers

We've just released an early version of Flax Filters, which allow basic conversion of various proprietary formats to plain text ready for indexing. Currently the filters support Microsoft Word, Excel and Powerpoint, the Open Office equivalent formats, Adobe PDF, plain text and HTML, but we'll be adding more in the future (of course, we'd welcome contributions from third parties). We're already using these filte...Continue reading

Replacing relational databases with search engines for simple lookups

One of the things we often notice about existing systems based on relational databases (RDB) is that as they scale to millions of items, simple lookup tasks become slow and inefficient. These tasks don't usually require complicated database operations, so in most cases it is possible to relocate the data from the RDB into a search engine like Flax. Consider a system where a search engine has already been implemented to search textual product information, but numerical data on each product, such...Continue reading