In this paper, we are interested in understanding the interrelationships between mainstream and social media in forming public opinion during mass crises, specifically in regards to how events are framed in the mainstream news and on social networks and to how the language used in those frames may allow to infer political slant and partisanship. We study the lingual choices for political agenda setting in mainstream and social media by analyzing a dataset of more than 40M tweets and more than 4M news articles from the mass protests in Ukraine during 2013-2014 — known as "Euromaidan" — and the post-Euromaidan conflict between Russian, pro-Russian and Ukrainian forces in eastern Ukraine and Crimea. We design a natural language processing algorithm to analyze at scale the linguistic markers which point to a particular political leaning in online media and show that political slant in news articles and Twitter posts can be inferred with a high level of accuracy. These findings allow us to better understand the dynamics of partisan opinion formation during mass crises and the interplay between mainstream and social media in such circumstances.
Identifying Partisan Slant in News Articles and Twitter during Political Crises
1. Identifying Partisan Slant in News Articles
and Twitter during Political Crises
Dmytro Karamshuk12, Tetyana Lokot3, Oleksandr Pryymak4 , Nishanth Sastry2
1Skyscanner, 2King’s College London, 3Dublin City University, 4Facebook
“A Shared Space & A Space for Sharing”, ESRC project, http://www.space4sharingstudy.org/
2. • what are the interrelationships between mainstream media
and social networks in shaping public opinion during mass
protests and war conflicts
• how propaganda and manipulation in the information sphere
work
• can we identify and characterize media bias in traditional and
social media during conflict
Identifying partisan slant during political crises
3. Use case and datasets
Ukrainian Crisis in 2013-2014
• revolution of “dignity”, Nov’13 – Feb’14
• annexation of Crimea and war conflict in
Earstern Ukraine, Feb’14 - today
4. A headline from a Russian news agency, 2015
“The country is a madhouse, and people in it are patients”
RIA News, 2015
In contrast to the brainwashed masses, the leaders of the junta
understand that the 150K army of "Ukrainian patriots" resists not Russian troops but 20
thousand local "separatists", complimented with a couple of thousands (or even
less) volunteers from nearly a dozen countries around the world, including Russia.
The leaders of the junta understand the situation at the front and, in particular, the
lack of effective command and control, arbitrariness in the ranks of the National Guard
under and Right Sector, the hatred of the population to the law enforcement bodies, but the
main thing – horrible morale of the personnel of the Armed Forces of Ukraine, which is
expressed in the mass desertion, drunkenness, looting and robbery, which
do not reflect the goals of Ukrainian "revolution."
https://ria.ru/analytics/20150410/1057804681.html
5. Theory of propaganda
International Encyclopedia of Propaganda - Cole, R. (Ed.). (1998) -
identifies over 40 kinds of propaganda techniques, corroborated in
other media literacy and media studies sources.
• Ad nauseam (insistent ideas)
• Repetition
• Demonizing the enemy
• “Kind words,” Slogans, Euphoria
• Cult of personality
• Lingvo propaganda (verbal control)
• Assigning labels to events/personas
6. Identifying markers of partisan slant
Extract and compare semantics of words in different sources
using Word2Vec approach
7. Predicting slant – Machine learning approach
Select top media sources
in Ukraine and Russia
Manually classify
Train supervised learning model
• Russian independent
• Russian pro-government
• Ukrainian
• Top-30 Russian online news source
• Top-5 Ukrainian online news sources
• use text features from news articles as
features
• predict from a news source of which
party it originates
8. Problem with this approach
We need to make sure that the language patterns we learn
are partisan – not source-specific
• Exemplar markers of individual news sources
9. Identifying markers of partisan slant
discourage learning patterns specific to individual news
agencies by modifying objective function
take-one-source-out cross validation where we test on a
news source which was not shown during training
Machine Learning results
Control for source-specific bias
10. An average Twitter user is exposed to a variety of news sources
BUT with a clear partisan focus
How about Social Media?
We can reason about bias in Twitter by looking at news reposts
… but, only a small share of users repost news articles
11. Predicting political leaning in social media
Supervised learning model to identify political leaning of a Twitter user
based on the content of his/her Tweets
• use content of all tweets (except of news reposts) from a user profile as features
• predict leaning based on what they repost
There is a reasonable/identifiable difference in posts of Twitter
profiles exposed to different partisan media
12. Conclusions
• We have shown how to measure the difference in word
choices in partisan media during conflicts
• We trained a supervised machine learning model to
recognize media bias in both traditional and social media
• Even such “coarse-grain” approach of labelling news
agencies can perform reasonably well in identifying political
leaning
Dmytro Karamshuk
follow me on Twitter: @karamshuk