Free file filters, search & taxonomy tools from our old Googlecode repository

Google’s GoogleCode service is closing down, in case you hadn’t heard, and I’ve just started the process of moving everything over to our Github account. This prompted me to take a look at what’s there and there’s a surprising amount of open source code I’d forgotten about. So, here’s a quick rundown of the useful tools, examples and crazy ideas we’ve built over the years – perhaps you’ll find some of it useful – please do bear in mind however that we’re not officially supporting most of it!

  • Flax Basic is a simple enterprise search application built using Python and the Xapian search library. You can install this on your own Unix or Windows system to index Microsoft Office, PDF, RTF and HTML files and it provides a simple web application to search the contents of the files. Although the UI is very basic, it proved surprisingly popular among small companies who don’t have the budget for a ‘grown up’ search system.
  • Clade is a proof-of-concept classification system with a built-in taxonomy editor. Each node in the taxonomy is defined by a collection of words: as a document is ingested, if it contains these words then it is attached to the node. We’ve written about Clade previously. Again this is a basic tool but has proved popular and we hope one day to extend and improve it.
  • Flax Filters are a set of Python programs for extracting plain text from a number of common file formats – which is useful for indexing these files for search. The filters use a number of external programs (such as Open Office in ‘headless’ mode) to extract the data.
  • The Lucene Redis Codec is a (slightly crazy) experiment in how the Lucene search engine could store indexed data not on disk, but in a database – our intention was to see if frequently-updated data could be changed without Lucene noticing. Here’s what we wrote at the time.
  • There’s also a tool for removing fields from a Lucene index, a prototype web service interface with a JSON API for the Xapian search engine and an early version of a searchable database for historians, but to be honest these are all pre-alpha and didn’t get much further.

If you like any of these tools feel free to explore them further – but remember your hard hat and archaeology tools!

Leave a Reply

Your email address will not be published. Required fields are marked *