Is there the Language of Disinformation?

It would be great if computers could detect fake news and disinformation automatically and neatly classify information into true and false. But is this a realistic task?

In their recent book The Language of Fake News (Cambridge University Press, series “Elements in Forensic Linguistics”), Jack Grieve and Helena Woodfield argue that it is possible to identify a distinctive linguistic “signature” of fake news by looking at the grammar of a text: the proportions of nouns, different verb forms and other categories.

This sounds very promising. But what is fake news exactly? In many recent studies, which usually apply machine learning algorithms, fake news is simply false news, or misinformation. If someone spreads errors unintentionally, it will also be fake news. For Grieve and Woodfield, fake news is first and foremost deliberate disinformation.

The authors take an ingenious approach. Instead of trying to identify linguistic properties across different people (which creates many problems), they focus on the linguistic properties of fake and real news written by one and the same author. They investigate articles written by Jason Blair of The New York Times. A promising young journalist, Blair had a spectacular career in the early 2000s. However, a range of factual errors and similarities to other colleagues’ texts raised suspicions about his articles. After the rise came a bitter fall. Blair had to resign. Several investigations showed that he had concocted scenes, fabricated comments and creatively “borrowed” material from other colleagues. Based on these reports, Grieve and Woodfield were able to gather thirty-six fake articles and twenty-eight real articles authors by Blair. The availability of true and fake news written by the same person is very fortunate because it allows the authors to take the journalist’s individual style out of the equation and focus on the stylistic differences between fake and real news.

And the differences are numerous. They can be interpreted along two major dimensions. First, Blair’s real articles tend to have higher information density, containing more detailed information in a limited space than the fake ones. In particular, his real news has on average longer words, more nouns (including words formed with suffixes -tion, -ment, -ity), more gerunds (ing-forms serving as nouns). Blair’s real articles also have more to-infinitives and participles following nouns they modify (e.g., a plot to extort money; orders filed at courts), as well as time adverbials specifying when the events being reported took place (now, today, afterwords).

In contrast, Blair’s fake articles are less informationally dense, more evaluative and informal. For example, he uses instead more pronouns (I, you, it, this), verbs in the present tense, adjectives, emphatics (very, extremely) and downtowners (nearly, slightly) in the fake articles than in the real ones.

The second dimension has to do with the stance – how Blair expresses his attitude towards the information he is communicating. According to the authors, Blair’s fake articles are written with less conviction and precision than the real ones. For example, they contain more agentless passives, which obscure who actually did something (e.g., he had been denied his rights to counsel – by whom?), and downtoners (words like almost, nearly and only), which express uncertainty.

This sounds really exciting. But how generalisable are these results? Can we really believe that low informativeness and precision are universal distinctive features of disinformation? Speaking about political propaganda, I am not so sure. Political lies can be presented in a pretty dense, nouny style. Think of “denazification” and “demilitarisation”, which were announced by Putin as the goals of the “special military operation”. Putin also does not have any problems when he wants to display his strong conviction.

Even if we assume that different liars have different linguistic “signatures”, I have doubts that the variationist approach by Grieve and Woodfield is feasible when it comes to political propaganda. First of all, in most situations we do not know if a politician really believes himself in what he says. Does Putin know that he lies, or does he sincerely believe in what he says? We may never know.

Does Putin believe in what he says?

Moreover, clever politicians and journalists seldom lie in the strict sense of the word. Their preferred method is framing – choosing “a few elements of perceived reality and assembling a narrative that highlights connections among them to promote a particular interpretation” (Entman 2007). For example, Russian propagandists represent Ukraine not as an agent, but as a mere sphere of influence, belonging either to Russia or to the USA. Obviously, this perspective is wrong, but it is rarely expressed directly. Instead, very subtle means like grammatical case are used for this purpose, as described in this post. This means that universal linguistic indicators of political disinformation, if they exist, are extremely hard to find.

Leave a comment