Mis(sing) information: The impact of disappearing archives and data sources

23 July 2025

By Dr Claire Wardle

Dr Claire Wardle is one of the world’s leading authorities on misinformation and disinformation. She co-founded the non-profit First Draft, was Research Director at the Tow Center for Digital Journalism at Columbia University and has recently joined Brown University to work on a new initiative around global health misinformation. She is a trustee of Full Fact.

If I were to start talking about archives, you would probably conjure up an image of a big white building, probably with some columns framing a heavy wooden door. Inside would be a large number of dusty shelves holding up thousands of old books, and a helpful person waiting behind a desk to help you find what you are looking for.

If we were to modernise this vision, you would likely focus on large and complex online portals, promising an easy route to the document you seek (even though the reality is often a lengthy and frustrating process of finding a document via a Boolean search query). Archives, whether physical buildings holding stacks of books or the virtual storage of digital documents, are valuable, even magical.

The problem, however, is that we take them for granted, particularly digital archives. It could be because physical archives ‘look’ so permanent, whether it’s the British Library or the US Library of Congress. And then when it comes to digital documents, we have a sense that someone somewhere will be backing it up as it’s so much easier than storing a physical book on a shelf. We can now just store photos, videos, databases, and documents ‘on the cloud’.

But the truth is that in an age of digital and big data, archives are extremely precarious, and that is causing two interlocking issues that I want to explore here. First, those of us who care about information integrity are not good enough at arguing why we need an accurate record on issues that are contested.

Fact checkers are trying to do that, but with ongoing attacks and resulting funding challenges, this work is under threat.

Second, information sources are disappearing. This is preventing fact checkers, journalists, researchers and policymakers from doing their jobs. While there has been some discussion about this trend, the outcry has been somewhat muted. Many people believe that it’s not really gone because it’s digital.

The current and historical need for a correct record

Fact checking matters for so many reasons, but too often I hear people say, “Well I just don’t know if fact checks change people’s minds.” Yes, fact checks don’t always change people’s minds. Doing that is hard. The truth is that fact checks work for some people in some contexts on some topics. But irrespective of whether people’s minds are changed in the moment, fact checks play two critical roles:

  1. Fact checks mean we have an accurate record of what was said, claimed or shared. They help other fact checkers, journalists, researchers and policymakers do their work, and in the future, these same professionals will be able to analyse claims and data alongside historians.

  2. Fact checks, in their most impactful form, can succeed in pressuring the source that spread the misinformation to correct the record on the platform they first spread it on. In this era of hyper-polluted information systems, this is what we should aim for. Every time.

For fact checkers to do this work, they need access to information, both contemporary and historical. In the era of relatively accessible digital information, this has been quite straightforward.That era, however, is now ending.

Information sources are being removed or are no longer supported

Today, critical information is dispersed all over the web, and we don’t even know how much has disappeared. There are many reasons for this. One is simply the fact that websites die.
Organisations or groups no longer exist, and the domain registration fee no longer gets paid.

Another reason is that digital technology evolves, and older versions can no longer be supported - for example, old news stories that were built on the programming language Java.

Another explanation is that platforms die. Content on Twitpic, Vine, and Friendster can no longer be accessed. And finally, and perhaps most concerning is the pattern of websites or data on websites being removed. We’re seeing in the US right now that the Trump Administration is making significant changes to federal websites, shutting down webpages and removing data sources.

The precarious nature of the storage and accessibility of digital information is having serious consequences for fact checkers, journalists and policy-makers today, but also how historians will make sense of this period.

Whether it’s the disappearance of population health data helping us understand vaccine uptake, the removal of social media posts shared by politicians that they deem potentially harmful, or economic data related to marginalised communities, it’s harder than ever to make sense of what is currently happening and to compare it to previous eras. Too many benchmarks have disappeared.

We need to be talking much more urgently about building ambitious systems for preserving accurate information and making it accessible, whether it’s data from university or government websites, a news website, a think-tank or a fact checking site.

We also need to make people understand why documenting and preserving information is so important. Digital isn’t forever, and so how can people and institutions be trained to ensure that data is being stored, tagged, and if safe, made accessible for transparency?

Organisations should have ongoing training about information storage and accessibility, particularly over time. What is getting saved, and what is getting lost? What disappears when staff leave the job? What disappears during a website redesign? What data goes missing when the operating system is no longer supported?

Fact checkers, journalists and policymakers need access to all of this if they can correctly analyse trends, patterns and false or misleading claims. And historians need all of this. I think we believe that historians will be drowning in evidence and will be able to piece together today’s events. Yet in an era where more and more governing is happening via disappearing WhatsApp and Signal messages, where data sources that threaten the dominant narratives are silently removed from official websites, when researchers have their funding cut for studying questions that are out of political favor, the ‘reality’ of 2025 and beyond will look very different for people looking back at this time.

The Internet Archive

At this point, I would like to give a huge shout-out to the Internet Archive. Established in 1996 by Brewster Kahle, its mission is to provide universal access to all knowledge”. According to their ‘about us’ page, they have archived:

One of the most amazing aspects of the Internet Archive is ‘The WayBack Machine’, which allows you to look at a website on a particular date and time. I went to look at how it had archived Full Fact and saw that since 2009, it has ‘captured’ the Full Fact website over 70,000 times. It allows anyone to go and look at the website ‘back in time’. Here’s a random capture from February 2012. You can see how this platform allows a fact checker to go back to a government website to see old data sources that might have been removed.

A screen capture of Full Fact's website in 2012 from The Wayback Machine.

The Internet Archive has had to fight legal attacks (under the guise of copyright infringement) and late last year faced a serious cyberattack. But so far they’re still going, and their mission couldn’t be more important. With government websites being purged and critical data sources disappearing overnight, their technology has been a lifeline for fact checkers and journalists trying to make sure they have a complete picture.

We need to ensure that information that already exists is protected, and as a global society, we can’t just rely on a non-profit with a small staff operating out of San Francisco. Protecting the information produced until this point is critical so fact checkers andpolicy makers can undertake longitudinal analyses, and researchers can assess whether policy changes have made a difference.

The Internet Archive is a godsend, but if we don’t know what was on a website in the first place, we don’t know to look in the archive, and we don’t know what date we need to go back to.

Facts matter. And in order to keep having access to accurate facts, we need access to both old and new data. Without it, evidence can be questioned, and we will have as much mis(ing)information as misinformation.

#FactsMatter Science & Technology

Full Fact fights bad information

Bad information ruins lives. It promotes hate, damages people’s health, and hurts democracy. You deserve better.