Archive for February, 2009

Lists of search tools and components

Some more resources for those looking for open source search components: Search Components Online has some great lists, including an exhaustive list of filters based on an article by New Idea Engineering called Where Have All The Filters Gone? Filters in this case are defined as components that extract plain text and metadata from file formats, i.e. Adobe PDF or Microsoft Word.

Another more general list of search engines and technologies can be found at SearchTools.com, although parts of this are a little out of date.

Tags: ,

Posted in Technical

February 24th, 2009

No Comments »

Open source data integration and file format translation

One of the challenges we often come up against is indexing data held in other proprietary or open source systems, such as databases or content management systems. Talend is an open source data integration platform that lets you connect to a huge variety of these systems, from Salesforce to Oracle to SugarCRM. Talend is an offshoot of the Eclipse open source community. We’ll be following the development of Talend with interest.

There’s also the related problem of translating file formats before indexing them. Luckily there are lots of open source converters (as used by Omega, part of Xapian), or if you run on a Microsoft platform there’s IFilters – the latter aren’t open source, but you can easily connect to them from another program using COM. In our experience, the IFilters are better at extracting content from Microsoft-specific formats .

UPDATE: I’ve also recently discovered the Tika project, under the Apache umbrella. Not a lot of formats supported so far, but it’s a start.

Tags: , , ,

Posted in Technical

February 18th, 2009

1 Comment »

Not so FAST…

Microsoft have announced a roadmap for their enterprise search products: none of this is very surprising. How successful they’ll be at integrating the FAST technology (which comes from a Linux background) with Sharepoint, .NET etc. remains to be seen. More coverage here.

They’ve also released an ‘Express’ (i.e., free but feature limited) version of Microsoft Search Server. We’re going to take a deeper look at this soon.

Tags: ,

Posted in Uncategorized

February 12th, 2009

No Comments »

More technical details now available

Based on some feedback, we’ve made some more technical details about Flax available on our Features page. You can download the PDF here.

Tags: ,

Posted in Technical

February 5th, 2009

No Comments »

Finding search engine people

I’ve spent some time recently trying to find where people gather and discuss different search engine technologies and approaches. There is a Yahoo group which seems friendly and full of useful content, and a group on LinkedIn, a business networking site. Stephen Arnold’s blog is also a mine of information, with profiles of vendors and some very interesting comments on particular technologies. I’ve also found some more blogs which I’ve added to the blogroll on the right.

As we continue to develop Flax, it’s very interesting to hear about customers and developers’ experience with other engines. If you know of any other places to look please let me know!

Posted in Uncategorized

February 2nd, 2009

1 Comment »