Last year we published The State of Automated Factchecking, a roadmap that sets out that automated factchecking is within our sights. Not the far flung Skynet future, but rather automating parts of the factchecking process using technology that’s available now. We are developing tools to help us scale, target, and evaluate our factchecking work.
So when Flax, the open source search experts, who also run the Solr/Lucene meetup, suggested we run a hackathon together, we jumped at the chance. We picked a date, got a venue (thanks Facebook!) and got to work.
The big problem we were trying to solve was: how can we spot claims that we’ve already checked, but in new places? And can we do it in real-time?
We might factcheck a claim that one of our factcheckers spotted in a newspaper. But that claim might have appeared elsewhere in the media and political sphere: how can we then monitor and spot all those other instances? It's good to stop take one inaccurate instance of a claim out of circulation - but better if all instances are put to bed.
We also want to improve our live factchecking systems, so that instead of having to rely on cumbersome spreadsheets detailing all recent factchecks, we can build a system that will find matches of claims we've checked (and the accompanying verdicts) straight away.
There are many parts to this problem. It’s by no means simple, but for the hackathon we decided to break it down into 3 key areas that we needed help with:
- Real-time search
- Pre-processing of numbers
- Stacked tokens
Luwak is a stored query engine developed by Flax, it lets you search a rapid stream of documents to find whatever you might be looking for. In our case: factual claims.
The hackathon team made it possible for us to deploy Luwak as a component of our automated factchecking tools. This means we can process information in real-time for live factchecking.
For us this was a complete game changer. It means that we can start to build Full Fact Live. Full Fact Live will be a tool that takes a live stream, like the subtitles on Prime Minister’s Questions, and highlights claims that we’ve factchecked before. This means we can react faster when it matters most.
Huge thanks to the Luwak team at the hackday which included: Alan, Michael S., Jean-Francois, Michael K., Oliver, Emmanuel, Tom & Gerry. (Yes, really! We made sure they sat next to each other.)
We need the ability to tell our factchecking tools that “eleven million” is the same as “11 million” is the same as “11,000,000”. Converting words to numbers sounds like an easy problem but as the pre-processing team found out it’s not so easy. The best code they could find didn’t support fractions, or ranking words like “first” or “second”.
Why is pre-processing important? It means that we can match much more accurately with much less human effort than was possible before.
By the end of the day the team had made some important decisions, identified some awkward edge cases, and started to think about how we could adapt the system for other countries. It helped to have Spanish-speaking Pablo Fernandez from Chequeado, the Argentinian factchecking organisation, with us that week!
One of our roadmap principles is ‘Think global’, and from the start we want to think through how to make automated factchecking work in different languages and contexts.
We’d like to thank the brilliant pre-processing team which included Derek, Jenna, Stanislav, Pablo, Phoebe and Andy. For more detail you can read Derek’s blogpost.
This was the most experimental of the projects. We wanted to be able to automatically detect phrases like “something is rising” where “something” is a noun phrase or similar. This could for example be “crime is rising”.
This is phase two of our plans, not just being able to identify claims we’ve factchecked already, but being able to spot claims that we can automatically check too.
The team of Solr experts here have really helped us grapple with one of the big future challenges and given us the ability to work out how to take the next steps towards ever more nuanced types of searches.
Thanks to Christine, Alessandro, Andy and Periklis for taking this on!
We are so grateful to everyone who attended and made our problems their own. We really didn’t think we could accomplish so much in one day. Huge thanks to the Apache Solr user group and beyond, Facebook who hosted us (and kept us well watered and fed), and special thanks to Flax for organising the day and for their continued support in helping us push forward the boundaries of automated factchecking.