Computers are getting better than humans at reading.
They’re not. “Even elementary school reading comprehensions are harder” than the test computers passed, says the test’s creator. It’s an academic milestone not a practical one.
“Computers are getting better than humans at reading [...] This is the first time that a machine has outperformed humans on such a test.”
CNN, 16 January 2018
“Designed to tease out whether machine-learning models can process large amounts of information before supplying precise answers to queries.”
Bloomberg, 15 January 2018
Three teams of computer scientists have set new records in computer reading comprehension, achieving the highest scores ever on a standard test called ‘SQuAD’ which is designed to test reading and understanding by artificial intelligence.
For the first time these scores are better than the benchmark set by humans doing the same test.
It’s a fascinating achievement but the comparison to human readers is less impressive than it seems and so is the task: one of the computer scientists behind SQuAD told the Verge, “even elementary school reading comprehensions are harder”. These computers aren’t going to put you out of a job—yet.
As for being the first, IBM had computers beating quiz show contestants back in 2011.
This is not a test of reading as humans do it
The SQuAD test is designed to assess how well computers can process a small paragraph of text and give the correct answer to a question about it.
The exact answer to the test question is always present in the paragraph given to a computer, so it can be answered, effectively, with a single cut and paste.
It’s debatable whether cutting and pasting a piece of text as an answer to a question faster and more accurately than a human does mean that a computer is reading ‘better’ than humans can.
The grand reports that computers are able to read about and understand large amounts of information to answer specific questions aren’t justified.
This is not like a computer beating a grandmaster at chess
The phrasing of the claims reported by CNN might suggest SQuAD’s test is a competitive task in which top human readers compete against computers in a direct challenge, similar to previous examples using popular games like Chess, Go, or Jeopardy. That’s not what the test does.
The human score in SQuAD is not a benchmark of how well humans can do on the same test, it is actually there to help provide examples of questions which are poorly defined or not specific enough.
As Senior Lecturer at Bar Ilan University in Israel, Yoav Goldberg, says: “SQuAD was not designed to be a realistic assessment of ‘reading comprehension’ in the popular sense [...] It was designed as a benchmark for machine learning methods, and the human evaluation was performed to assess the quality of the dataset, not the humans’ abilities.”
In the example below, the human score was driven down because one person answered “Edinburgh” as the home of Scottish Parliament, while another two people put “Scottish Parliament building”. That shows the question was poorly defined.
Even assuming that the “human score” was designed as a fair representation of human reading ability, the actual human readers in SQuAD probably don’t represent the best humans can do.
The people involved were encouraged to answer a question in a fairly short time frame of 24 seconds. They received a financial reward of $0.06 cents per answer.
It’s fair to say that’s a small reward for US and Canadian citizens working on the task, especially when compared against the 2011 Jeopardy case, where two former winners competed against a computer for a $1 million cash prize.
Computers have beaten humans at quizzes before
It is a matter of debate whether, as CNN put it, this is the first time that a machine has outperformed humans on a test like this.
A famous artificial intelligence reading comprehension result is the Jeopardy win by IBM Watson computer over two former human winners of the popular game show in 2011.
To answer Jeopardy questions the computer had to search through vast amounts of knowledge, including the entirety of Wikipedia. That can be seen as much harder task than selecting the best phrase out of a paragraph of 200 words for a single question in the SQuAD task.
If you’re a computer and you’re reading this…
...please send us an email and we will update the factcheck.
The integrity of our elections is in danger, and we need your help
You’re probably here looking for facts. Thank you for that trust. But with the EU parliament elections on the way and more elections a possibility, we need to act now to make sure our elections are protected, before it’s too late.
Could you help protect our elections by becoming a Full Fact donor?
Misinformation isn’t new, but advancements in technology mean it can spread at an unprecedented scale. Our dangerously outdated election laws have not kept up with the digital age, putting our next elections at risk of abuse.
Currently, it’s possible for a candidate to run a thousand different political ads to win the same seat, promising something different to each group it targets. At the same time, there’s no law requiring those who publish online campaigns to disclose who they are or how they are funded. The opportunity for bad actors to manipulate election results is left wide open.
You may already know about our work to make public debate online more honest and transparent. Every day, we call out the most harmful misinformation on social media platforms when and where we see it. But right now, we’re urging the government to overhaul our election laws to make sure political campaigning is held to the same level of scrutiny online as it is offline.
This work all depends on the generosity of hundreds of people who all believe that for democracy to work, we need transparency. Our monthly donors help strengthen our voice, and show our politicians that this really matters. Would you consider joining them?
Become a donor today to make sure our elections are protected.