Flax Solutions: built on the best

About the software we use

Flax brings together the best Open Source search-related software, including cutting-edge developments from academic research. Open Source gives us the advantages of flexibility, power and scalability with no licencing costs. We pass on these advantages to the customer while managing integration, providing easy-to-use interfaces and controls, and the assurance of commercial support.

We make no bones about being fans of Open Source. However, we are also experienced with a wide range of proprietary software and are happy to include this in solutions where appropriate. Proprietary and Open Source software can work well together as we have proved to our customers' satisfaction many times.

Some of the software we use for building solutions is described below. Of course, each situation has different requirements, and will utilise a different mix of technologies.

Lucene/SOLR

Apache Lucene™ is a high-performance search engine library written in Java. Apache SOLR™ extends Lucene, adding a web API and enterprise features including data schemas, facets, sharding, replication and text extraction. SOLR is generally the first choice for the core of a search solution, but there are cases where implementing directly on top of Lucene is more appropriate. We also offer a comprehensive support package.

Apache Lucene and Solr are trademarks of The Apache Software Foundation

elasticsearch

elasticsearch is a distributed, RESTful search engine also based on Apache Lucene. Its features include 'schemaless' operation, a data model based on JSON, easy scalability and multitenancy. We've also worked on migration projects from other search engines to elasticsearch We also offer a comprehensive support package.

Apache Lucene is a trademark of The Apache Software Foundation

Xapian

Xapian is a fast, adaptable search engine library written in C++, with bindings for a number of languages including Python, Perl, PHP, Ruby and Java. Unlike Lucene, Xapian is based on the probabilistic Bayesian ranking model. Xapian does not require a Java Virtual Machine and can require less memory than Lucene. We're committers to the Xapian project.

Stanford Named Entity Recogniser (NER)

Stanford NER is an open source toolkit from the Stanford Natural Language Processing (NLP) Group. It finds and labels names of people, places, companies etc. in a block of input text. The tookit is implemented in Java and is fully trainable for different classes and sets of entities. Entity recognition has a wide variety of uses, particularly in the processing of news data.

Scrapy

Originally developed for our client Mydeco by a third party, Scrapy is a powerful and extendable tool for scraping content from websites into a predefined fields. Written in Python, Scrapy is easy to extend to new website patterns and can be used for repurposing content, data mining, monitoring and testing

Stanford Classifier

The Stanford Classifier is another product of the Stanford NLP Group. It uses a probablistic model to implement classification of text documents, e.g. into subject area. It can also be used for sentiment analysis on text fragments (i.e. is a statement positive or negative).

Luwak for fast stored queries

Luwak is a library from Flax, based on Apache Lucene, that allows for very high-performance application of stored queries. We use it to build media monitoring applications which can apply tens of thousands of stored queries to hundreds of thousands of news stories, every day.

Clade Taxonomy

Clade is a proof-of-concept taxonomy and classification system from Flax. Built on open source technology it allows users to import, create and maintain heirarchical structures and automatically classify documents based on keywords.

Flax Access Control

Many intranets and other data repositories require access to documents to be controlled depending on the user's ID and their membership of various groups. These rules are often complex and dynamic, and may be implemented by file system Access Control Lists (ACLs). The Flax Access Control module provides fast, flexible filtering of search results by ACLs and other metadata.