The 21st century has opened up a boundless mass of headlines, articles, and stories. This information influx, however, is partially contaminated: Alongside factual, truthful content is fallacious, deliberately manipulated material from dubious sources. According to research by the European Research Council, one in four Americans visited at least one fake news article during the 2016 presidential campaign.
This problem has recently been exacerbated by something called â€œautomatic text generators.â€� Advanced artificial intelligence software, like OpenAI’s GPT-2 language model, is now being used for things like auto-completion, writing assistance, summarization, and more, and it can also be used to produce large amounts of false information â€” fast.
To mitigate this risk, researchers have recently developed automatic detectors that can identify this machine-generated text.
However, a team from MITâ€™s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that this approach was incomplete.
To prove this, the researchers developed attacks that they showed could fool state-of-the-art fake-news detectors. Since the detector thinks that the human-written text is real, the attacker cleverly (and automatically) impersonates such text. In addition, because the detector thinks the machine-generated text is fake, it might be forced to also falsely condemn totally legitimate uses of automatic text generation.
But how can the attackers automatically produce â€œfake human-written textâ€�? If itâ€™s human-written, how can it be automatically produced?
The team came up with the following strategy: Instead of generating the text from scratch, they used the abundance of existing human-written text, but automatically corrupted it to alter its meaning. To maintain coherence, they used a GPT-2 language model when performing the edits, demonstrating that its potential malicious uses are not limited to generating text.
â€œThereâ€™s a growing concern about machine-generated fake text, and for a good reason,â€� says CSAIL PhD student Tal Schuster, lead author on a new paper on their findings. â€œI had an inkling that something was lacking in the current approaches to identifying fake information by detecting auto-generated text â€” is auto-generated text always fake? Is human-generated text always real?â€�
In one experiment, the team simulated attackers that use auto-completion writing assistance tools similar to legitimate sources. The legitimate source verifies that the auto-completed sentences are correct, whereas the attackers verify that theyâ€™re incorrect.
For example, the team used an article about NASA scientists describing the collection of new data on coronal mass ejections. They prompted a generator to produce information on how this data is useful. The AI gave an informative and fully correct explanation, describing how the data will help scientists to study the Earthâ€™s magnetic fields. Nevertheless, it was identified as â€œfake news.â€� The fake news detector could not differentiate fake from real text if they were both machine-generated.
â€œWe need to have the mindset that the most intrinsic â€˜fake newsâ€™ characteristic is factual falseness, not whether or not the text was generated by machines,â€� says Schuster. â€œText generators donâ€™t have a specific agenda â€” itâ€™s up to the user to decide how to use this technology.â€�
The team notes that, since the quality of text generators is likely to keep improving, the legitimate use of such tools will most likely increase â€” another reason why we shouldnâ€™t â€œdiscriminateâ€� against auto-generated text.
â€œThis finding of ours calls into question the credibility of current classifiers in being used to help detect misinformation in other news sources,â€� says MIT Professor Regina Barzilay.
Schuster and Barzilay wrote the paper alongside Roei Schuster from Cornell Tech and Tel Aviv University, as well as CSAIL PhD student Darsh Shah.
Bias in AI is nothing new â€” our stereotypes, prejudices, and partialities are known to affect the information that our algorithms hinge on. A sample bias could ruin a self-driving car if thereâ€™s not enough nighttime data, and a prejudice bias could unconsciously reflect personal stereotypes. If these predictive models learn based on the data theyâ€™re given, theyâ€™ll undoubtedly fail to understand whatâ€™s true or false.
With that in mind, in a second paper, the same team from MIT CSAIL used the worldâ€™s largest fact-checking dataset, Fact Extraction and VERification (FEVER), to develop systems to detect false statements.
FEVER has been used by machine learning researchers as a repository of true and false statements, matched with evidence from Wikipedia articles. However, the teamâ€™s analysis showed staggering bias in the dataset â€” bias that could cause errors in models it was trained on it.
â€œMany of the statements created by human annotators contain giveaway phrases,â€� says Schuster. â€œFor example, phrases like â€˜did notâ€™ and â€˜yet toâ€™ appear mostly in false statements.â€�
One bad outcome is that models trained on FEVER viewed negated sentences as more likely to be false, regardless of whether they were actually true.
â€œAdam Lambert does not publicly hide his homosexuality,â€� for instance, would likely be declared false by fact-checking AI, even though the statement is true, and can be inferred from the data the AI is given. The problem is that the model focuses on the language of the claim, and doesnâ€™t take external evidence into account.
Another problem of classifying a claim without considering any evidence is that the exact same statement could be true today, but be considered false in the future. For example, until 2019 it was true to say that actress Olivia Colman had never won an Oscar. Today, this statement could be easily refuted by checking her IMDB profile.
With that in mind, the team created a dataset that corrects some of this through de-biasing FEVER. Surprisingly, they found that the models performed poorly on their unbiased evaluation sets, with results dropping from 86 percent to 58 percent.
â€œUnfortunately, the models seem to overly rely on the bias that they were exposed to, instead of validating the statements in the context of given evidence,â€� says Schuster.
Armed with the debiased dataset, the team developed a new algorithm that outperforms previous ones across all metrics.
â€œThe algorithm down-weights the importance of cases with phrases that were specifically common with a corresponding class, and up-weights cases with phrases that are rare for that class,â€� says Shah. â€œFor example, true claims with the phrase â€˜did notâ€™ would be upweighted, so that in the newly weighted dataset, that phrase would no longer be correlated with the â€˜falseâ€™ class.â€�
The team hopes that, in the future, combining fact-checking into existing defenses will make models more robust against attacks. They aim to further improve existing models by developing new algorithms and constructing datasets that cover more types of misinformation.
â€œIt’s exciting to see research on detection of synthetic media, which will be an increasingly key building block of ensuring online security going forward as AI matures,â€� says Miles Brundage, a research scientist at OpenAI who was not involved in the project. â€œThis research opens up AI’s potential role in helping to address the problem of digital information, by teasing apart the roles of factual accuracy and provenance in detection.â€�
A paper on the teamâ€™s contribution to fact-checking, based on debiasing, will be presented at the Conference on Empirical Methods in Natural Language Processing in Hong Kong in October. Schuster wrote the paper alongside Shah, Barzilay, Serene Yeo from DSO National Laboratories, MIT undergraduate Daniel Filizzola, and MIT postdoc Enrico Santus.
This research is supported by Facebook AI Research, who granted the team the Online Safety Benchmark Award.