Search Solutions is one of my favourite search events of the year – small, focused and varied, with presentations from both the largest and smallest players in the world of search, drawn from both industry and academia.
This year’s event started with Edgar Meij of Bloomberg, who Flax have helped in the past with their large-scale search and alerting systems. I’d seen most of the details in this talk before so I won’t dwell on them but will thank Bloomberg again for their commitment and contributions to the open source community, particularly to Solr and our Luwak stored search library. Mark Fea of LexisNexis was up next with a talk about taxonomies and how they have built a semi-automated classification system combining supervised machine learning and Boolean rules-based systems: a pragmatic approach to combine the strengths of both approaches as machine learning isn’t always as clever as one might want, and Boolean rules can be hard to build and maintain. Like Bloomberg they are working at large scale: Mark mentioned taxonomies of 21,000 terms and 9 levels, applied to over 1 billion documents.
Mark Harwood of Elastic was up next with one of his always fascinating talks on discovering unknown patterns in data with Elasticsearch. He showed how he had explored ‘toxic’ content (far-right music and those who like it) and fake reviews on Amazon with some great visual demonstrations. An interesting conclusion was how ‘bad actors’ make strange, recognisable shapes in visualised data. [Mark later won the Best Presentation award, richly deserved!]. Anna Kolliakou of King’s College London spoke next on ‘veracity intelligence’ tools to help monitor terms connected to mental health across news media and social networks: an interesting example was ‘mephedrone’ around the time of reclassification of this particular recreational drug. Next up was independent consultant Phil Bradley with a detailed, well-researched and passionate talk on fake news and how one cannot trust any web search engine to present the full picture. Phil is obviously extremely concerned about this issue and his talk spurred discussion amongst the audience about how user education is essential to counter the usual viewpoint of ‘it’s on Google, it must be true’.
Last of our scheduled talks was from Mark Stanger of Search Technologies (recently acquired by Accenture) about their work on Elsevier’s DataSearch platform. He described how they developed a Phrase Service that identifies phrases in the user’s query using various methods including acronym detection, dictionary lookup and natural language processing, then expands these phrases as necessary to provide enhanced search. After identifying these key terms they can be boosted appropriately for search (DataSearch itself is based on Solr).
The DataSearch project is impressive, and later on it won the Best Search Project award (I am proud to say I served as part of the judging panel for these awards this year). The other winner of most promising search startup Search|hub by CXP Commerce Experts GmbH.
We finished with some lightning talks and a brief Fishbowl session, dominated this time by discussions on Fake News and how it affects the world of search technology. Thanks to the BCS IRSG again for a fascinating and enlightening day.