Paper Mills: a new threat to scientific publishing
Guest author
Dorothy V M Bishop, Emeritus Professor of Developmental Neuropsychology University of Oxford
For many years, the principal problem for journalists covering medical stories was hype. Researchers, or their press offices, would announce a cure for cancer, which turned out to be a promising finding in mice, or even an effect on cells in a dish. Another issue has been selective reporting: a study that looked at associations between pollutants and health would study 100 substances, but only report on the one that showed an association. A finding that was very likely to be a false positive. Kristin Sainini and Regina Nuzzo discuss this phenomenon in their Normal Curves podcast. where they analyse a much-hyped study claiming to find differences between US Asian and White populations in how health is affected by drink temperature. So many comparisons were conducted in this study that the few statistically significant results were likely to have arisen by chance.
These days, there's a new problem: scientific fraud. There have always been individuals who invent or prune data to make results more exciting. Such cases tend to hit the press when discovered, particularly if the fraudster is a high-profile scientist. Notable examples in the field of dementia research were covered in Charles Piller’s recent book ‘Doctored’. But we now also have to reckon with paper mills: organisations that sell authorship for fraudulent or low-value articles, guaranteeing to place them in journals. The price the "author" pays is typically dependent on the prestige of the journal, and the author’s position.
Paper mills exist purely to make money, and successful ones can make millions. But how does this business model work? Surely, academic journals have editors and peer reviewers who exert quality control on what gets published.
There are various ways in which paper mills bypass the usual barriers to publication. The first is when the fraudulent article is a convincing fake and so passes normal peer review. Jennifer Byrne was working as a cancer biologist when she noticed a flurry of articles relating to a rare gene that she was studying. They were all similar and she noticed that the nucleotide sequence that they specified did not match the gene they purported to study. These articles looked like sound research, but seemed to be generated from a template. There are so many genes that have never been studied that it is easy to just substitute the gene details from a genuine study with another gene to produce what looks like a plausible new article - often with the same figures and tables. Subsequently, Byrne linked up with Cyril Labbé to perform an automated search of 12,000 human gene papers and found 700 of them had wrong nucleotide sequences, suggesting they had been generated by a template. Template-based articles are being seen in other subject areas, and will get harder to detect as sophisticated AI can now be used to generate them.
Join 72,953 people who trust us to check the facts
Sign up to get weekly updates on politics, immigration, health and more.
Subscribe to weekly email newsletters from Full Fact for updates on politics, immigration, health and more. Our fact checks are free to read but not to produce, so you will also get occasional emails about fundraising and other ways you can help. You can unsubscribe at any time. For more information about how we use your data see our Privacy Policy.
Another paper mill technique is to work with editors who can be induced to place articles in a journal in return for money. An example was highlighted by Csaba Szabo, who described on my blog how he was approached by an organisation based in China who offered payment for his assistance in publishing. He decided to string them along to try and find out more about how they operated, and it was clear that he could have made thousands of dollars had he been willing to comply.
The easiest option is for a paper mill to place their own person as a journal editor. There has been a massive growth in scientific journals, with some less-principled publishers recognising that there is an insatiable demand by authors for publication outlets. However, the boom in journals creates a demand for editors and peer reviewers to process articles, and it's clear that in many cases there has been little quality control of who does this valuable gatekeeping work.
The big publishers were initially slow to act on paper mills, but it is clear that they pose a huge threat to their credibility and publishers have now banded together to work with data sleuths and other experts in a group called United2Act, which aims to develop better screening methods to trap paper mill articles early in the process.
I have likened paper mills to a virus whose influence can spread rapidly, and which adapts and mutates in response to check it. I think that as well as publishers screening articles, there are other steps that could be effective. Paper mills work because there is a demand for their products, both from authors desperate for publications, and from publishers who want to grow their market. There are moves afoot to change the incentive structure in academia so as to reward careful, scholarly work, but it is still the case that in many countries, researchers are judged in terms of quantitative metrics (number of publications and citations), and in some countries having publications can be an important route to career progression or accessing a visa. There is also a move to more open, transparent research. While this alone would not stop paper mills, it would make it harder to fake data. Also, traditionally peer reviews have been confidential, but even if the identity of the reviewer is kept private, it makes sense to make the content of peer reviews openly available, as a further check on the integrity of the process.
People sometimes ask whether it matters if paper milled articles get published. One problem is that in these days of big data, the scientific literature is automatically scanned to create databases that are used for tasks such as drug development or diagnostics. If these contain fabricated data, then the databases will be inaccurate. Similarly, in medicine, many systematic reviews are conducted to integrate all the studies on a topic. Those doing these studies are reporting that their efficiency and accuracy are affected by large volumes of fraudulent articles. Finally, people are increasingly relying on AI to summarise research literature, but the summary will only get as good as the information that is synthesised. It’s for all these reasons that I believe there’s an urgent need to tackle the flood of unreliable information so that it does not pollute science.
The views expressed in this article are those of the guest authors.