ECIR 2017 Industry Day, our book & a demo of live TV factchecking

Charlie Hull — Mon, 24 Apr 2017 13:45:44 +0000

I visited Aberdeen before Easter to speak at Industry Day, a part of the European Conference on Information Retrieval. Following a reception at Aberdeen’s Town House (a wonderful building) hosted by the Lord Provost I spent an evening with various information retrieval luminaries including Professor Udo Kruschwitz of the University of Essex. We had a chance to discuss the book we’re co-authoring (draft title ‘Searching the Enterprise’, designed as a review of the subject for those considering a PhD or those in business wanting to know the current state of the art – it should be out later this year) and I also caught up with our associate Tony Russell-Rose of UXLabs.

Industry Day started with a talk by Peter Mika of Norwegian media group Schibsted on modelling user behaviour for delivering personalised news. It was interesting to hear his views on Facebook and the recent controversy about their removal of a photo posted by a Schibsted group newspaper, and how this might be a reason Schibsted carry out their own internal developments rather than relying on the algorithms used by much larger companies. Edgar Meij was up next talking about search at Bloomberg (which we’ve been involved in) and it was interesting to hear that they might be contributing some of their alerting infrastructure back to Apache Lucene/Solr. James McMinn of startup Scoop Analytics followed, talking about real time news monitoring. They have built a prototype system based on PostgresSQL rather than a search engine, indexing around half a billion tweets, that allows one to spot breaking news much earlier than the main news outlets might report it.

The next session started with Michaela Regneri of OTTO on Newsleak.io, a project in collaboration with Der Speigel “producing a piece of software that allows to quickly and intuitively explore large amounts of textual data”. She stressed how important it is to have a common view of what is ‘good’ performance in collaborative projects like this. Richard Boulton (who worked at Flax many years ago) was next in his role as Head of Software Engineering at the Government Digital Service, talking about the ambitious project to create a taxonomy for all UK government content. So far, his team have managed to create an alpha version of this for educational content – not that they don’t have the time or resources in-house to tag content, so must therefore work with the relevant departments to do so. They have created various software tools to help including an automatic topic tagger using Latent Dirichlet Allocation – which given this is the GDS, is of course open source and available.

Unfortunately I missed a session after this due to a phone call, but managed to catch some of Elizabeth Daly of IBM talking about automatic claim detection using the Watson framework. Using Wikipedia as a source, this can identify statements that support a particular claim for an argument and tag them as ‘pro’ or ‘con’. This topic led neatly on to Will Moy of Full Fact who we have been working with recently, in a ‘sandwich’ session with myself. Will talked about how Full Fact has been working for many years to develop neutral, un-biased factchecking tools and services and I then spoke about the hackday we ran recently for FullFact and particularly about our Luwak library and how it can be used to spot known claims by politicians in streaming news. Will then surprised me and impressed the audience by showing a prototype service that watches several UK television channels in real time, extracts the subtitles and checks them against a list of previously factchecked claims – using the Luwak backend we built at the hackday. Yes, that’s live factchecking of television news, very exciting!

Thanks to Professor Kruschwitz and Tony Russell-Rose for putting together the agenda and inviting both me and Will to speak – it was great to be able to talk about the exciting work we’re doing with Full Fact and to hear about the other projects.

The post ECIR 2017 Industry Day, our book & a demo of live TV factchecking appeared first on Flax.

A fabulous FactHack for Full Fact

Charlie Hull — Fri, 27 Jan 2017 10:49:20 +0000

Last week we ran a hackday for Full Fact, hosted by Facebook in their London office. We had planned to gather a room full of search experts from our London Lucene/Solr Meetup and around twenty people attended from a range of companies including Bloomberg, Alfresco and the European Bioinformatics Institute, including a number of Lucene/Solr committers.

Mevan Babakar of Full Fact has already written a detailed review of the day, but to summarise we worked on three areas:

Building a web service around our Luwak stored query engine, to give it an easy-to-use API. We now have an early version of this which allows Full Fact to check claims they have previously fact checked against a stream of incoming data (e.g. subtitles or transcripts of political events).
Creating a way to extract numbers from text and turn them into a consistent form (e.g. ‘eleven percent’, ‘11%’, ‘0.11’) so that we can use range queries more easily – Derek Jones’ team researched existing solutions and he has blogged about what they achieved.
Investigating how to use natural language processing to identify parts of speech and tag them in a Lucene index using synonyms and token stacking, to allow for queries such as ‘ is rising’ to match text like ‘crime is rising’ – the team forked Lucene/Solr to experiment with this.

We’re hoping to build on these achievements to continue to support Full Fact as they develop open source automated fact checking tools for both their own operations and for other fact checking organisations across the world (there were fact checkers from Argentina and Africa attending to give us an international perspective). Our thanks to all of those who contributed.

I’ve also introduced Full Fact to many others within the search and text analytics community and we would welcome further contributions from anyone who can lend their expertise and time – get in touch if you can help. This is only the beginning!

The post A fabulous FactHack for Full Fact appeared first on Flax.

full fact – Flax

ECIR 2017 Industry Day, our book & a demo of live TV factchecking

A fabulous FactHack for Full Fact