Open source search engines and programming languages

So you’re writing a search-related application in your favourite language, and you’ve decided to choose an open source search engine to power it. So far, so good – but how are the two going to communicate?

Let’s look at two engines, Xapian and Lucene, and compare how this might be done. Lucene is written in Java, Xapian in C/C++ – so if you’re using those languages respectively, everything should be relatively simple – just download the source code and get on with it. However if this isn’t the case, you’re going to have to work out how to interface to the engine.

The Lucene project has been rewritten in several other languages: for C/C++ there’s Lucy (which includes Perl and Ruby bindings), for Python there’s PyLucene, and there’s even a .Net version called, not surprisingly, Lucene.NET. Some of these ‘ports’ of Lucene are ‘looser’ than others (i.e. they may not share the same API or feature set), and they may not be updated as often as Lucene itself. There are also versions in Perl, Ruby, Delphi or even Lisp (scary!) – there’s a full list available. Not all are currently active projects.

Xapian takes a different approach, with only one core project, but a sheaf of bindings to other languages. Currently these bindings cover C#, Java, Perl, PHP, Python, Ruby and Tcl – but interestingly these are auto-generated using the Simplified Wrapper and Interface Generator or SWIG. This means that every time Xapian’s API changes, the bindings can easily be updated to reflect this (it’s actually not quite that simple, but SWIG copes with the vast majority of code that would otherwise have to be manually edited). SWIG actually supports other languages as well (according to the SWIG website, “Common Lisp (CLISP, Allegro CL, CFFI, UFFI), Lua, Modula-3, OCAML, Octave and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme, Chicken)”) so in theory bindings to these could also be built relatively easily.

There’s also another way to communicate with both engines, using a search server. SOLR is the search server for Lucene, whereas for Xapian there is Flax Search Service. In this case, any language that supports Web Services (you’d be hard pressed to find a modern language that doesn’t) can communicate with the engine, simply passing data over the HTTP protocol.

One thought on “Open source search engines and programming languages

  1. What’s more, at least in the Solr world, there are client libraries for all kinds of languages that one can use to communicate to Solr. Those libraries tend to hide all the HTTP stuff, making it even easier to integrate Solr.

Leave a Reply

Your email address will not be published. Required fields are marked *