Posts Tagged ‘Add new tag’

Hiring

We’re finding more and more clients interested in the advantages of a powerful open source enterprise search engine. Thus, we’re looking at expanding the team – can you help?

Tags:

Posted in Business

August 4th, 2009

No Comments »

Enterprise search – for free

We recently helped a small marine consultancy, running a Windows network, implement a completely free enterprise search solution. Even SMEs are now finding it hard to keep on top of the information they produce, and there are few low-cost options for searching their documents. Read the case study here (PDF).

Tags: , ,

Posted in Business

July 10th, 2009

No Comments »

Xapian compared

Vik Singh has been comparing various open source solutions for search. He only spent a weekend performing the comparison, which is probably not enough time to get any search software performing at its best, and his results reflect this. Xapian was marked down for being slow at indexing (he says 5x slower than SQLite – but then again, SQLite isn’t a search engine, it’s a RDBMS, and really isn’t suitable for search applications) and for producing large index files, much bigger than Lucene.

The reason for this is that Xapian stores different information to Lucene. For example, the full term list (un-inverted index) is retained, which makes it possible to do relevance feedback. Also, Lucene handles deletes by maintaining a separate list of deleted documents, which is merged at the next optimise step – which means that the internal statistics are wrong until this point, and that updates can be more complicated, as an updated document needs a new ID.

Neither approach is wrong and both have advantages – Lucene certainly has smaller index files. Some judicious use of the XAPIAN_FLUSH_THRESHOLD parameter, as suggested in some of the comments on the article, would have certainly speeded up Xapian indexing. We can also look forward to the release of the new Xapian ‘Chert’ backend, which will produce indexes at least 50% smaller than the current ‘Flint’ backend. It’s also hard to say how important index sizes are in these days of cheap storage.

On the search side, Xapian performed comparably to Lucene in terms of relevance and search speed (both were ahead of all the other solutions on these metrics, especially SQLite). There are some other metrics he quoted, such as a ’support’ figure, given as a score out of 5, which he admits is entirely subjective – you’d have to ask our customers about that one! There’s also no comparison of features, ease of integration and scalability to very large collections.

We’ve talked before about performance metrics. Vik should be applauded for his article and for releasing his test framework as open source, hopefully this can be a foundation for some more in-depth studies.

Perl client for Flax Search Server

Flax Search Server now has a Perl client, thanks to the guys at Cognidox, who have blogged about why they needed to improve the search facility for their powerful document management system.

Tags: , , , ,

Posted in Uncategorized

July 1st, 2009

No Comments »

Python and Flax presentation

My colleague Richard Boulton will be presenting at Europython in Birmingham, U.K. next week, specifically at 15.30 on Tuesday 30th June – an abstract is available. He’ll be talking about Xapian, Xappy and Flax, and showing examples of these in action including one using a Django integration layer.

Update: you can now download the slides for Richard’s talk in OpenOffice format.

Tags: , , , , ,

Posted in Uncategorized

June 25th, 2009

No Comments »

Please don’t compete!

Microsoft have been asking open source companies not to compete on cost, but rather on value, according to ZDNet. Unfortunately the response to this hasn’t exactly been positive, as CNET reports. I doubt many open source vendors will be taking much notice of what Microsoft would like them to do, and suspect they will happily continue to make the point that if customers are looking at buying software & services, taking the cost of software completely out of the equation is almost certain to save them money.

Tags: , ,

Posted in Business, News

April 21st, 2009

No Comments »

More on performance metrics

Anurag Goel recently carried out a comparitive test of Xapian/Flax and Lucene/Solr. Some interesting results here: it seems Lucene is faster at building indexes, but Xapian is faster and possibly more accurate at searching. We can expect some further speed improvements over the next few months as a new, more compact backend to Xapian is released.

By the way, the article mentions Xappy: this is a Python interface to Xapian that is a major part of our Flax enterprise search platform. You can get Xappy here.

Tags: , , , ,

Posted in Technical

March 13th, 2009

2 Comments »

Performance metrics

Stephen Arnold recently posted some rather impressive performance figures for Autonomy’s IDOL search engine. This kind of data is all very well, but without independent testing and more detail it’s hard to know how these figures apply to the real world.

So here’s an idea. Why not create an openly available collection of test data, a set of searches and a set of conditions, then compare the performance of the various available engines for indexing and searching? Recording the software and hardware used as well, of course. Making the data and conditions public would allow for independent verification.

I’m not sure commercial search vendors would ever agree to this, but it’s a nice idea.

Tags: , ,

Posted in Technical

March 4th, 2009

1 Comment »