Search Anything – indexing obscure formats
Data used today can be held in any of a myriad of places: for example relational databases such as Oracle or SQL Server, file servers, intranets and third party applications. To implement a search across all this data it must be extracted, formatted and finally indexed.
Using Flax's flexible Open Source model we can swiftly develop connectors to all kinds of datastores. These modules can sit close to the data (for example on a file server itself) and send extracted data to the indexing processes or they can crawl external sources such as websites, taking account of security protocols, maximum retry intervals and exclusions. We use standards such as SQL and ODBC for databases, HTTP for crawling and XML for data transfer.
Once the data is extracted it must be indexed correctly: for example, a date must be handled correctly so that the end user can select a date range for his search. Special features of the data (for example, geographical location) may also have to be captured.
A key point is the 'freshness' of the data – it is important that if any of the source data changes, indexing should be repeated to take account of this. A balance between frequent, resource-intensive indexing and search performance must be found.
Our Flax Filters modules allow us to connect to common formats using open standards and we are constantly adding new formats to those available. If you have an old or obscure format contact us and we'll find a way to translate it.