We were lucky enough to have two speakers from Cambridge text mining company Linguamatics at last night’s Meetup. Robin Newton kicked us off with an amusing and idiosyncratic view of the uses and mis-uses of search – starting with the problem that when you have text search software, every problem can look like search might solve it. He gave an example of his recent search for a new job: although matching his skills on paper with a potential employer’s needs is one thing, he also wants to be sure the employer ‘isn’t a crook’! With reference to Tyler Tate’s talks on Information Wayfinding, which in turn quotes urban planner Kevin Lynch, Robin told us how he felt that search ‘journeys’ weren’t always the most efficient way to discover an answer: his assertion was that finding a person who could tell you was more useful. Since even in the most efficient and well-run organisation not all information is held in documents one might agree that finding an ‘expert’ is the best way to get the answers one needs. He finished with a welcome message that informal networking in pubs and cafes (much like our Meetup) helps share a lot of very useful information – and this is how he eventually decided that Linguamatics was going to be a great place to work.
Next was CTO and co-founder of Linguamatics, Dr David Milward, who described his company’s capability in text mining, Natural Language Processing (NLP) and search. He described the challenges of extracting ‘concepts’ from text – how words and acronyms with multiple potential meanings are difficult to parse automatically without contextual knowledge. Linguamatics’ approach has been described as ‘Agile NLP’ and allows the quick development of new patterns for concept extraction. A powerful example he gave was how by specifying a relationship between two entities, in this case one company acquiring another, structured data can be extracted from unstructured text. Other examples focused on the medical and bioscience field (a particular interest of ours at present due to the upcoming BioSolr project) and showed how their software can cluster facts and find connections between disparate pieces of data (‘which X relates to Y via Z’). This process can also be used to generate new facets for searching from free text, including for numeric ranges, and these can even be tailored for different user groups. It’s clear that Linguamatics are experts in this area and David’s talk was of great interest to many in the room, including several from the European Bioinformatics Institute.
We finished with the usual chat, networking and drinks. Thanks to both our speakers – and do let me know if you have a suggestion for a presentation at a future event!