Open source data integration and file format translation

One of the challenges we often come up against is indexing data held in other proprietary or open source systems, such as databases or content management systems. Talend is an open source data integration platform that lets you connect to a huge variety of these systems, from Salesforce to Oracle to SugarCRM. Talend is an offshoot of the Eclipse open source community. We’ll be following the development of Talend with interest.

There’s also the related problem of translating file formats before indexing them. Luckily there are lots of open source converters (as used by Omega, part of Xapian), or if you run on a Microsoft platform there’s IFilters – the latter aren’t open source, but you can easily connect to them from another program using COM. In our experience, the IFilters are better at extracting content from Microsoft-specific formats .

UPDATE: I’ve also recently discovered the Tika project, under the Apache umbrella. Not a lot of formats supported so far, but it’s a start.

Share this postShare on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInShare on RedditEmail this to someone

2 thoughts on “Open source data integration and file format translation

  1. I notice it’s been several years since this post has last been updated. In my opinion, I’ve found open source systems that can be customised for business CRM use of the sort provided by crmprogrammer.com to be especially effective.

Leave a Reply

Your email address will not be published. Required fields are marked *