#FactHack: Our hackathon at Facebook, with Flax

25 January 2017

Last year we published The State of Automated Factchecking, a roadmap that sets out that automated factchecking is within our sights. Not the far flung Skynet future, but rather automating parts of the factchecking process using technology that’s available now. We are developing tools to help us scale, target, and evaluate our factchecking work.

So when Flax, the open source search experts, who also run the Solr/Lucene meetup, suggested we run a hackathon together, we jumped at the chance. We picked a date, got a venue (thanks Facebook!) and got to work.

The big problem we were trying to solve was: how can we spot claims that we’ve already checked, but in new places? And can we do it in real-time?

Join 72,547 people who trust us to check the facts

Subscribe to get weekly updates on politics, immigration, health and more.

For example...

We might factcheck a claim that one of our factcheckers spotted in a newspaper. But that claim might have appeared elsewhere in the media and political sphere: how can we then monitor and spot all those other instances? It's good to stop take one inaccurate instance of a claim out of circulation - but better if all instances are put to bed.

We also want to improve our live factchecking systems, so that instead of having to rely on cumbersome spreadsheets detailing all recent factchecks, we can build a system that will find matches of claims we've checked (and the accompanying verdicts) straight away.

There are many parts to this problem. It’s by no means simple, but for the hackathon we decided to break it down into 3 key areas that we needed help with:

Real-time search
Pre-processing of numbers
Stacked tokens

Divided into teams for the @FullFact hackday: Luwak server, token stacking, number normalisation, Spanish use cases #facthack pic.twitter.com/PSiFKzVd4f
— Charlie Hull (@FlaxSearch) January 20, 2017

Real-time search

Luwak is a stored query engine developed by Flax, it lets you search a rapid stream of documents to find whatever you might be looking for. In our case: factual claims.

The hackathon team made it possible for us to deploy Luwak as a component of our automated factchecking tools. This means we can process information in real-time for live factchecking.

Prototype server wrapper around our Luwak #Lucene stored query engine, watch it being built at https://t.co/Td5aS0K9qb #facthack
— Charlie Hull (@FlaxSearch) January 20, 2017

For us this was a complete game changer. It means that we can start to build Full Fact Live. Full Fact Live will be a tool that takes a live stream, like the subtitles on Prime Minister’s Questions, and highlights claims that we’ve factchecked before. This means we can react faster when it matters most.

Huge thanks to the Luwak team at the hackday which included: Alan, Michael S., Jean-Francois, Michael K., Oliver, Emmanuel, Tom & Gerry. (Yes, really! We made sure they sat next to each other.)

Pre-processing

We need the ability to tell our factchecking tools that “eleven million” is the same as “11 million” is the same as “11,000,000”. Converting words to numbers sounds like an easy problem but as the pre-processing team found out it’s not so easy. The best code they could find didn’t support fractions, or ranking words like “first” or “second”.

Why is pre-processing important? It means that we can match much more accurately with much less human effort than was possible before.

Having some great conversations ("How many *is* 'several thousand'?") and watching cool #facthack tools being built today for @FullFact
— Charlie Hull (@FlaxSearch) January 20, 2017

By the end of the day the team had made some important decisions, identified some awkward edge cases, and started to think about how we could adapt the system for other countries. It helped to have Spanish-speaking Pablo Fernandez from Chequeado, the Argentinian factchecking organisation, with us that week!

Trabajando en automatización del chequeo con @Chequeado y todos estos desarrolladores en oficinas de Facebook gracias a @fullfact! #FactHack pic.twitter.com/8KtYYW2WWK
— Pablo M. Fernández (Bluesky: @fernandez) (@fernandezpm) January 20, 2017

One of our roadmap principles is ‘Think global’, and from the start we want to think through how to make automated factchecking work in different languages and contexts.

We’d like to thank the brilliant pre-processing team which included Derek, Jenna, Stanislav, Pablo, Phoebe and Andy. For more detail you can read Derek’s blogpost.

Stacked tokens

This was the most experimental of the projects. We wanted to be able to automatically detect phrases like “something is rising” where “something” is a noun phrase or similar. This could for example be “crime is rising”.

This is phase two of our plans, not just being able to identify claims we’ve factchecked already, but being able to spot claims that we can automatically check too.

The team of Solr experts here have really helped us grapple with one of the big future challenges and given us the ability to work out how to take the next steps towards ever more nuanced types of searches.

Thanks to Christine, Alessandro, Andy and Periklis for taking this on!

Thank you

We are so grateful to everyone who attended and made our problems their own. We really didn’t think we could accomplish so much in one day. Huge thanks to the Apache Solr user group and beyond, Facebook who hosted us (and kept us well watered and fed), and special thanks to Flax for organising the day and for their continued support in helping us push forward the boundaries of automated factchecking.

So, after a day's work we now have some foundations for automated #factchecking - will be fascinating to see where this goes next! #facthack
— Charlie Hull (@FlaxSearch) January 20, 2017

To keep up to date with our automated factchecking work sign up to the dedicated mailing list, or read more about our work. You can find out more about Flax and the Solr/Lucene user group too.

Fact checks

Analysis

Policy & Impact

About