Why good statistics are so important for fact checkers
Fact checkers rely on data from national statistics institutes to do their work in fighting misinformation. The Open Data Institute (ODI) and Full Fact interviewed fact checkers around the world about the statistics data they use. In this blog post, Andrew Dudfield, Head of Product at Full Fact, presents the high-level insights from these interviews.
One of the keys to fact checking bad information is easy access to high-quality good information. Statistics – especially official statistics – are universally one of the most important sources of this high-quality information.
Statistics are at the heart of many of the claims that we fact check at Full Fact. They form the foundation for so much discourse in society and are a vital component of how we hold people to account.
Never has that been clearer than during the current pandemic. We’ve all been bombarded by new sets of official data and metrics – the R rate, number of infections per 100,000 people – and in our fact checking we’ve also found areas where the lack of high-quality data creates a space for bad information to thrive.
We can make global fact checking dramatically faster by improving the way statistics are published.
In 2019 Full Fact, Chequeado, Africa Check and the Open Data Institute (ODI) were awarded a grant by Google.org as part of the global AI impact challenge programme. The grant has allowed us to accelerate our work in helping to understand where technology, and especially AI, can support the work of fact checkers.
As part of this work, we have been approaching three topics:
- Helping identify the right things to fact check each day
- Fact checking things as quickly as possible
- Finding repeats of things we have already fact checked
Between October and December 2020, the ODI and Full Fact conducted 15 interviews with fact checkers, statisticians and technologists to try and best understand how statistics can be published to help fact checkers.
The participants were from eight different countries across five different continents, and consisted of:
- nine fact checkers (or people who manage fact checkers)
- three technologists who work in fact checking organisations
- three people who are experts on publishing of national statistical data (but are not fact checkers)
What we found from our research
The importance of context
Understanding where the data came from, how it was made, and if there were things to watch out for is often extremely important in the work of fact checkers. Fact checkers often also include the context in the fact check they create.
Extra context is sometimes needed to show to readers whether they should trust the numbers published by national statistics institutes. It also helps to show the shortcomings of the data, if they exist. This then helps build trust in the fact checkers amongst their readers.
Knowing if ‘experimental’ methods are used – such as modelling or sampling – is helpful, and especially relevant when more data science or machine learning approaches are used by the national statistics institute.
Data reliability for fact checking
A crucial emerging theme was whether a fact checker can depend on the data.
The timeliness of data – how recent the information is – is considered a high priority. However, this is often lacking in countries or departments with less resources.
Many fact checkers said it was very important to link data across time and make comparisons with previous years. A long historical series where the methodology hasn’t changed was found to give a feeling of stability or of the data being more reliable.
Comparability was also mentioned quite often – fact checkers want to be able to compare data on a similar topic from two different places or two different times.
Multiple organisations publishing data on a similar topic is a complex issue depending on the country. While some found it good to have a single source of truth, others found it useful to have multiple organisations to back up or add more trust to the numbers.
Data from other organisations can complement data published by national statistics institutes, or can provide a safety net if the government service goes down. When data and information is spread across multiple organisations it creates a challenge when it comes to big structural issues like Covid-19, climate change, or the environment. This is less of an issue for specific, focused issues or topics.
Publishing and formatting matters
CSV files were by far the most requested format for publishing data – but are not always in ready supply for fact checkers.
Fact checkers often have to ‘unbundle’ data from the medium it was published in, for example flat images or PDFs. They often then create their own spreadsheets or datasets with the data to do their own analysis.
National statistics institutes should publish related spreadsheets alongside the report in which they include data.
Not surprisingly, APIs were very popular with technologists. Although other tech-savvy fact checkers also called for them.
Conclusion and next steps
The themes from this research were not unexpected, but do neatly show the challenge we face.
Easy access to high-quality statistics makes fact checking easier. Consistent access to high-quality statistics in a way that machines can process means we may be able to use technology to reduce the time it takes to fact check. Having the statistics available in a way that machines can not just process, but understand the context, caveats and complexity of the numbers would be game changing.
Based on this, our project collective is now going to produce examples of the kind of changes we think can be made by those publishing statistics to specifically help fact checkers. Our goal is to be practical with this. We want to produce code ranging from showing simple changes in the way a CSV is published to aid our use case, right the way to more sophisticated ideas for how context and caveats can be added to APIs. We will share more of this work, openly, over the rest of this year.