Stephen Arnold recently posted some rather impressive performance figures for Autonomy’s IDOL search engine. This kind of data is all very well, but without independent testing and more detail it’s hard to know how these figures apply to the real world.
So here’s an idea. Why not create an openly available collection of test data, a set of searches and a set of conditions, then compare the performance of the various available engines for indexing and searching? Recording the software and hardware used as well, of course. Making the data and conditions public would allow for independent verification.
I’m not sure commercial search vendors would ever agree to this, but it’s a nice idea.