SlideShare a Scribd company logo
1 of 40
Download to read offline
Automated Identification of Framing
by Word Choice and Labeling
to Reveal Media Bias in News Articles
Anastasia Zhukova
Doctoral supervisor: Felix Hamborg
1st examiner: Prof. Dr. Bela Gipp
2nd examiner: Prof. Dr. Karsten Donnay
Date: 2019-03-07
Agenda
1. Introduction
2. Project motivation and research objectives
3. Related work and research gap
4. Word choice and labeling (WCL) analysis system
5. Usability prototype
6. Multi-step merging approach
7. Evaluation results
8. Future work
9. Conclusion
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 2
07-Feb-23 3
Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Introduction
https://tgram.ru/channels/otsuka_bld
• Biased perception of the Russian president depends on how he was framed
07-Feb-23 4
Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Introduction
invasion forces
vs.
coalition forces
heart-wrenching tales of hardship
vs.
information on the lifestyles
http://umich.edu/~newsbias/wordchoice.html
Word Choice (WC)
Labeling (L)
5
WCL depends on… [1-5]
• actor or perspective selection
• author position
• goal of the message
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
http-//www.anmbadiary.com/2015/04/framing-effect-and-marketing.html
Project motivation
*equal with some degree of approximation
When not identified WCL influences on… [2, 4-6]
• emotion evaluation
• decision making process
• false information propagation
Existing solutions… (cf.[15-17])
• involve manual annotation by social scientists
• automated approaches yield simplistic results
• results are not scalable and not interactive
Project research objectives
RQ: How can we automatically identify instances of bias by WCL referring to the
semantic concepts in a set of English news articles reporting on the same event by
using natural language processing (NLP)?
Research tasks:
1. Design and develop a modular WCL analysis system;
2. Develop a first usability prototype with interactive visualization to explore the results of
WCL analysis;
3. Research, propose, and implement an approach based on the NLP methods to identify
semantic concepts that can be a target of bias by WCL;
4. Conduct an evaluation of the proposed semantic concept identification approach.
6
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 7
Related work and research gap
1. Social science methodology
a. Content analysis [2, 7, 9]
b. Framing analysis [1, 4, 6, 10, 11]
→ effective but manual and time-consuming
2. Automated WCL identification
a. from topic perspective [12, 14-17]
b. from actor perspective [13, 18]
→ require interpretation of the word choice difference
→ no concept-to-concept automatic comparison
3. Natural language processing
a. Named Entity Recognition (NER) (cf. [21])
b. Coreference resolution (cf.[12,20,24])
c. Cross-document coreference resolution (cf. [22, 23])
→ do not resolve broad sense anaphora
→ do not analyze difference of word choice
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 8
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
9
WCL analysis pipeline methodology
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Putin
president
savior
tyrant
humble man
thief
president
savior
Putin
tyrant
humble man
thief
https://tgram.ru/channels/otsuka_bld
Data
preprocessing
Semantic concept
identification
Framing analysis
of semantic
concepts
Framing similarity
across news articles
Semantic concept
identification
10
WCL analysis system
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Preprocessing
Coreference resolution
Tokenization
POS tagging
Dependency parsing
NE Recognition
Related
articles
Sentence splitting
Parsing
Concept identification
Candidate extraction
Corefs NPs
Candidate alignment
Multi-step merging
Core meaning
Core meaning modifiers
Frequent word patterns
Usability prototype
Emotion frames
LIWC emotion dimensions
Emotion clustering
Visualization
Matrix view
Bar chart view
Article view
• Inductive analysis, i.e., no prior knowledge given
• The implementation is focused on the candidate alignment task
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 11
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
12
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Usability prototype
Matrix view Bar chart view Article view
WCL diversity
13
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Usability prototype
Selection mode of the
Matrix view
Candidate view Selection mode of the
Article view
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 14
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
Candidate alignment task
15
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Task NER Coref. resolution Cand. alignment
Categorization/grouping
Cross-document coreferences
Linking of mentions
a. Common knowledge
anaphora
b. Broad sense anaphora
• Candidate alignment task aims at resolving anaphora both of common
knowledge and broad sense.
Multi-step merging approach (MSMA): overview
• Initial entities: coreferences and NPs
• Extract entity attributes to highlight certain properties
• Specify entity comparability to other entities
• Iterate multiple times over all entities
→merge entities based on similarities attributes
• Merging step = level in a hierarchy
16
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
all entities
sorted by
their size
similar color =
similarity in
one criterion
compare the first
entity to the other
entities
the considered
entity merges
similar entities
place the updated
entity to the end
and continue
the considered
entity merges
similar entities
place the
updated entity
to the end
sort entities
by their size
Step 1
Step N
…
Init.
Multi-step merging approach: steps
17
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step1: Representative phrases’ heads
Matching phrases’ heads, e.g. “President Trump” and “Donald Trump”
Step2: Head sets
Semantically similar head sets, e.g., {“Trump”, “president”} and {“billionaire”}
Step3: Representative labeling phrases
Semantically similar labeling phrases, e.g., “undocumented immigrants” and “illegal aliens”
Step4: Compounds
Semantically similar compounds, e.g., “DACA illegals” and “DACA recipients”
Step 5: Representative frequent wordsets
Semantically similar frequent wordsets, e.g., “United States” and “U.S.”
Step 6: Representative frequent phrases
String-similar frequent phrases, e.g., “Deferred Action of Childhood Arrivals” and “Childhood Arrivals”
Multi-step merging approach: summary
18
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Type Step Goal Problems
Core
meaning
Representative
phrases’ heads
Compare on the output of
coreference resolution
Applicable only for named entity (NE)
entity types
Head sets Find synonyms of head
words among entities
Word collocations contain more
meaning than head words
Core
meaning
modifiers
Representative
labeling
phrases
Identify most prominent
adjective + noun patterns
Adjective is not the only core
meaning modifier
Compounds Compare noun-to-noun
similar compounds
More than two-word phrases are
required to represent entities
Frequent
word
patterns
Representative
frequent
wordsets
Identify frequently
repeated wording
Wordsets disregard word order
important for pattern identification
Representative
frequent
phrases
Identify frequently
repeated phrases
Requires extensive repetitive
wording
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 19
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
• Dataset: extended NewsWCL50 corpus
• Ten topics of 5 articles each: NewsWCL50 [25]
• One topic of 25 articles collected according to the NewsWCL50 methodology
• Simplified content analysis (CA) annotation
• Used annotation codes referring to the entities
• Avoided complex semantic concepts, e.g., a reaction on something
• Annotated extracted NPs and coreferential chains
• Metrics
• Weighted precision, recall, F1-score (evaluation of the best matching entities (BMEs) [27]
• Homogeneity, completeness, V-measure (general clustering evaluation) [26]
• WCL complexity metric (phasing diversity)
• Baselines
• Random baseline (B1)
• CoreNLP coreference resolution: employ only coreferential chains (B2) [24]
• Candidate clustering in the word vector space (B3)
• Concept type categorization
• Actor, e.g., Donald Trump
• Group, i.e., group of people acting as one entity
• Country, i.e., country names, anaphora, related to it organizations
• Misc, i.e., events, objects, abstract entities
20
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Experiment setup
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 21
Evaluation results
0.12
0.97
0.87
0.91
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Precision
B1_P B2_P B3_P M_P
0.15 0.17
0.32
0.82
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Recall
B1_R B2_R B3_R M_R
0.12
0.27
0.42
0.84
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
F1-score
B1_F1 B2_F1 B3_F1 M_F1
B1: Random guessing
B2: CoreNLP coreference resolution
B3: Candidate clustering
M: Multi-step merging approach
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 22
WCL complexity evaluation
Concept Type WCL F1
Actor 2.10 0.97
Country 4.49 0.74
Misc 5.67 0.82
Group 9.20 0.78
0.97
0.74
0.82 0.78
0.00
0.20
0.40
0.60
0.80
1.00
1.00 3.00 5.00 7.00 9.00 11.00
F1
WCL metric
0.91
0.81
0.88
0.78 0.82 0.81
0.00
0.20
0.40
0.60
0.80
1.00
2.00 5.00 8.00 11.00 14.00
F1
WCL metric
Topic WCL F1
8 2.84 0.91
7 2.89 0.89
5 3.31 0.83
4 3.54 0.87
1 3.63 0.85
3 3.95 0.87
0 3.99 0.81
9 4.63 0.88
2 5.44 0.78
6 8.37 0.82
10 12.71 0.81
• Concept type split
• Topic split
• Logarithmic trend:
Concepts with high WCL diversity are harder to
identify.
• The most phrase-diverse topics 6 and 10 perform
comparably to the average performance (F1 = 0.84)
➢ WCL complexity is a metric representing anaphora phrasing diversity that
refer to a concept. High complexity = high phrasing variation
23
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Merging steps evaluation: concept types
Steps Actor Country Misc Group
B1 0.123 0.124 0.107 0.112
B2 0.407 0.297 0.198 0.137
B3 0.450 0.428 0.468 0.289
Init. 0.408 0.298 0.204 0.140
Step 1 0.872 0.634 0.298 0.222
Step 2 0.927 0.685 0.779 0.502
Step 3 0.927 0.685 0.803 0.744
Step 4 0.970 0.700 0.803 0.744
Step 5 0.970 0.736 0.808 0.783
Step 6 0.970 0.736 0.817 0.783
Merging step types Actor Country Misc Group
Core meaning (Steps 1 & 2) 0.519 0.388 0.575 0.362
Core modifiers (Steps 3 & 4) 0.043 0.014 0.024 0.242
Word patterns (Steps 5 & 6) 0.000 0.037 0.014 0.039
Overall 0.562 0.439 0.613 0.643
• Development of F1-score at each step
• Difference of F1-score
o Gradual increase at all merging steps
o Init.step: extracted from CoreNLP
coreferential chains and NPs
o Step 1 outperforms B3 on NE-based types
o Step 2 outperforms B3 on non-NE-based
types
o Highest F1: 𝐹1𝐴𝑐𝑡𝑜𝑟 = 0.97
o Lowest F1: “Country” and “Group” types
o Lowest F1 boost:
“Country” type
→ lack of semantic similarity
o Highest F1 boost:
“Group” type
→ many semantic patterns captured
➢ Better approach performance: on small or big topics?
• Big topic: 25 articles per topic
• Small topic: three subsets of topics of 5 articles each
• We report average performance
• big: F1 = 0.81 small: F1 = 0.72
• Big topic outperforms on “Misc” and “Group” types
• Reasons: semantically similar repetitive word choice occurs often enough in a big topic
24
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Big vs. small topic comparison
0.96
0.67
0.63 0.66
0.96
0.88
0.75
0.59
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Actor Misc Group Country
DACA: F1
R5_avg_F1 All25_F1
1.59
6.68
9.37
12.65
1.79
11.47
19.34
23.89
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Actor Misc Group Country
DACA: WCL metric
R5_avg_WCL All25_WCL
• MSMA: F1 = 0.84 baseline B3: F1 = 0.42
• Best performance on “Actor” type: F1 = 0.97
• Largest phrasing diversity: “Group” type
• Largest performance boost on “Group” type
∆ = 0.643
• Better performance on the larger topics:
big: F1 = 0.81 small: F1 = 0.72
• Worst performance on “Group” and “Country”
types:
“Group” type:
o Requires additional merging step(s)
o Concept sense disambiguation
“Country” type:
o Low word semantic representation by the
chosen word vector model
o Broadly defined CA concepts: mix of country
names and organizations
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 25
Discussion summary
0.12
0.27
0.42
0.84
0.00
0.20
0.40
0.60
0.80
1.00
F1-score
B1_F1 B2_F1 B3_F1 M_F1
0.41
0.2
0.1
0.3
0.97
0.82 0.78
0.74
0
0.2
0.4
0.6
0.8
1
Actor Misc Group Country
F1-score: Concept types
Init step All six steps
• Additional merging step using local context
• e.g., “Kim Jong Um” = “Little Rocket Man”
• Concept sense disambiguation
• e.g., “American people”≠ “foreign people”
• Different word vector models
• find better semantic representation of phrases
• More complex concepts
• Identify concepts such as action or reaction on something
• Next step: Deductive analysis
• collect large corpus of “silver”-quality annotated topics
• train a sequential neural network (SNN) model
• identify framing by WCL in any news topic
26
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Future work
Contributions:
1. Proposed methodology of WCL analysis pipeline
2. Implemented WCL analysis system
3. Proposed, implemented and evaluated multi-step merging approach
MSMA: F1 = 0.84 baseline B3: F1 = 0.42
Approach benefits:
• resolves anaphora of broad sense
• uses only candidate phrases without their context
• no additional long model training required
• tested on a specific dataset for WCL analysis
4. Implemented the first usability prototype
Future work:
• Concept sense disambiguation
• SNN model for WCL deductive analysis
27
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Conclusion
1. Kahneman, D., Tversky, A., 1984. Choices, values, and frames. Am. Psychol. 39, 341–350.
2. F. Hamborg, K. Donnay, and B. Gipp, “Automated identification of media bias in news articles : an interdisciplinary literature review,”
International Journal on Digital Libraries, 2018.
3. W. Linstrõm, M., & Marais, M. Linstrom, and W. Marais, “Qualitative News Frame Analysis: A Methodology,” Communitas, vol. 17, no.
17, pp. 21–38, 2012.
4. D. Chong and J. N. Druckman, “Framing Theory,” Annual Review of Political Science, vol. 10, no. 1, pp. 103–126, 2007.
5. A. Duzett, “Media Bias in Strategic Word Choice,” http://www.aim.org/on-targetblog/media-bias-in-strategic-word-choice/, 2011.
6. J. N. Druckman, “Political Preference Formation : Competition and the ( Ir ) relevance of Framing Effects,” The American Political
Science Review, vol. 98, no. 4, pp. 671–686, 2004.
7. M. Linstrom and W. Marais, “Qualitative News Frame Analysis: A Methodology,”Communitas, vol. 17, pp. 21–38, 2012.
8. F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by
Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019.
9. M. Schreier, Qualitative content analysis in practice. Sage publications, 2012.
10. R. M. Entman, “Framing: Toward Clarification of a Fractured Paradigm,” Journal of Communication, vol. 43, no. 4, pp. 51–58, 1993.
11. R. M. Entman, “Framing bias: Media in the distribution of power,” Journal of Communication, vol. 57, no. 1, pp. 163–173, 2007.
12. Tian, Yan, and Concetta M. Stewart. "Framing the SARS crisis: A computer-assisted text analysis of CNN and BBC online news reports of
SARS." Asian Journal of Communication 15.3 (2005): 289-301.
13. Sendén, Marie Gustafsson, Sverker Sikström, and Torun Lindholm. "“She” and “He” in news media messages: pronoun use reflects
gender biases in semantic contexts." Sex Roles 72.1-2 (2015): 40-49.
14. Fortuna, Blaz, Carolina Galleguillos, and Nello Cristianini. "Detection of bias in media outlets with statistical learning methods." Text
Mining. Chapman and Hall/CRC, 2009. 57-80.
15. Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. "Linguistic models for analyzing and detecting biased language."
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.
28
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
References
16. Z. Papacharissi and M. de Fatima Oliveira, “News frames terrorism: A comparative analysis of frames employed in terrorism coverage in
U.S. and U.K. newspapers,” International Journal of Press/Politics, vol. 13, no. 1, pp. 52–74, 2008.
17. D. M. Garyantes and P. J. Murphy, “Success or chaos?: Framing and ideology in news coverage of the Iraqi national elections,”
International Communication Gazette, vol. 72, no. 2, pp. 151–170, 2010.
18. D. Card, J. H. Gross, A. E. Boydstun, and N. A. Smith, “Analyzing Framing through the Casts of Characters in the News,” Proceedings of
the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), pp. 1410–1420, 2016.
19. K. Clark and C. D. Manning, “Deep Reinforcement Learning for Mention-Ranking Coreference Models,” Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing, pp. 2256–2262, 2016.
20. H. Lee, “A Scaffolding Approach to Coreference Resolution Integrating Statistical and Rule-based Models,” Natural Language
Engineering, vol. 23, no. 5, pp. 733–762, 2017
21. J. R. Finkel, T. Grenager, and C. Manning, “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling,”
Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370, 2005.
22. S. Dutta and G. Weikum, “Cross-Document Co-Reference Resolution using SampleBased Clustering with Knowledge Enrichment,”
Transactions of the Association for Computational Linguistics, vol. 3, pp. 15–28, 2015
23. S. Singh, A. Subramanya, F. Pereira, and A. Mccallum, “Large-Scale Cross-Document Coreference Using Distributed Inference and
Hierarchical Models,” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, vol. 1, pp. 793–803, 2011.
24. K. Clark and C. D. Manning, “Improving Coreference Resolution by Learning EntityLevel Distributed Representations,” In Proceedings of
the 54th Annual Meeting of the79 Association for Computational Linguistics, pp. 643–653, 2016
25. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,”
Manuscript submitted for publication, pp. 1–10,
26. Rosenberg, Andrew, and Julia Hirschberg. "V-measure: A conditional entropy-based external cluster evaluation measure." Proceedings
of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning
(EMNLP-CoNLL). 2007.
27. N. Chinchor and P. D, “MUC-5 EVALUATION METRIC S Science Applications International Corporatio n 10260 Campus Point Drive , MIS
A2-F San Diego , CA 9212 1 Naval Command , Control , and Ocean Surveillance Cente r RDT & E Division ( NRaD ) Information Access
Technology Project Te,” System, pp. 69–78, 1992
29
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
References
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 30
Thank you for your attention!
Questions?
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 31
Back-up slides
34
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Entity
Type
Entity Subtype Source Example CA Concept Type
person
nn (noun single) WordNet + POS immigrant Actor
nns (noun plural) WordNet + POS politicians Group
ne (named entity) NER Trump Actor
nes (named entity plural) NER + POS Democrats Group
group
-- WordNet university Group
ne NER Congress Country/Group
country
-- WordNet Homeland Country
ne NER Germany Country
other -- -- vote Misc
Idea:
• Words can be similar in the vector space but the results will be irrelevant to CA concepts
• Identify entity types for the effective results
• Entity types resemble concept type from manual CA
Entity types
35
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 1: Representative phrases’ heads
Donald Trump
Trump
Mr. Trump
forceful Mr. Trump
President Trump
Donald Trump
the president
The president of the US
identical by string
comparison
Entity 1 Entity 2
Merged entities
Heads of
phrases
Representative
phrases Trump Donald Trump
Trump Trump
Donald Trump
Trump
Mr. Trump
forceful Mr. Trump
President Trump
Donald Trump
the president
The president of the US
Heads of
phrases
Representative
phrases
36
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 2: Headsets
young illegals
the illegals
illegals who arrived as
children
DACA illegals
roughly 800,000 young undocumented
immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as
children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets {illegals} {immigrants} {aliens}
similar in the
vector space
Entity 1 Entity 2 Entity 3
the word alone is related
to the UFO; it will be
merged later as “illegal
alien” at the third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
37
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 3: Representative labeling phrases
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
38
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 4a: Headword-compound match
PM Theresa May
Mrs. May
UK Prime Minister Theresa May
Prime Minister
The British prime minister
identical by string
comparison
Entity 1 Entity 2
Merged entities
Heads of
phrases
Compounds
{Minister May, PM May,
Mrs. May, Theresa May}
Minister Minister
{minister,
Minister}
PM Theresa May
Mrs. May
UK Prime Minister Theresa May
Prime Minister
The British prime minister
dependent governor
39
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 4b: Common compounds
DACA recipients
the program’s beneficiaries
DACA beneficiaries
800,000 recipients
DACA participants
800,000 participants
more than a quarter of DACA registrants
program participants
Entity 1 Entity 2
Compounds with
overlapping words
Compounds DACA recipients,
program’s beneficiaries
DACA beneficiaries
DACA participants,
DACA registrants,
program participants
A1: DACA recipients,
A2: DACA beneficiaries
B1: DACA participants,
B2: DACA registrants
{DACA}
Overlapping NE
compounds
Sim.matrix
A1
A2
B1 B2
1
1
0
0
Compounds
Merged entities
2
2×2
≥ 0.3 → similar in the vector space
DACA recipients
the program’s beneficiaries
DACA beneficiaries
800,000 recipients
DACA participants
800,000 participants
more than a quarter of DACA registrants
program participants
Compounds with
overlapping words
40
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 5: Representative frequent wordsets
illegals whose DACA protection is pending
DACA illegals
young illegals
illegal alien applicants
DACA applicants
more than 2,000 DACA recipients
DACA beneficiaries
DACA recipients whose status expires on March 5
former DACA participants
the participants
Entity 1 Entity 2
Frequent
wordsets
A1: {DACA, illegals},
A2: {illegals},
A3: {applicants},
A4: {DACA}
B1: {DACA, recipients},
B2: {DACA},
B3: {participants}
Frequent
wordsets
Sim.matrix
A1
A2
A3
B1 B2
1 1
1
0 0
0
5
4×3
≥ 0.3 → similar in the vector space
A4
B3
1
1
0
0 0
0
Merged entities
illegals whose DACA protection is pending
DACA illegals
young illegals
illegal alien applicants
DACA applicants
more than 2,000 DACA recipients
DACA beneficiaries
DACA recipients whose status expires on March 5
former DACA participants
the participants
41
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 6: Representative frequent phrases
DACA program (x10)
DACA (x10)
Deferred Action Childhood Arrivals program (x5)
Obama-era program (x5)
Childhood Arrivals DACA (x4)
Deferred Action for Childhood Arrivals program (x 3)
Deferred Action (x5)
Deferred Action for Childhood Arrivals (x2)
Entity 1 Entity 2
Frequent phrases
B1: Deferred Action Childhood Arrivals,
B2: Deferred Action,
B3: Childhood Arrivals,
B4: Deferred Action Childhood Arrivals program,
B5: Childhood Arrivals DACA
A1: DACA,
A2: program,
A3: DACA program,
A4: Childhood Arrivals program,
A5: Obama-era program
Frequent phrases
Sim.matrix
A1
A2
A3
B1 B2
1 1
0 0
0
A4
B3
1 1
0 0
0 0
Merged entities
A5
B4 B5
0 0 0
0 0 0 0 0
0
0
0 0 0 0
𝑠𝑖𝑚𝑣𝑎𝑙 = 𝑠𝑖𝑚ℎ𝑜𝑟 =
4
5
≥ 0.5 → similar in the vector space
DACA program (x10)
DACA (x10)
Deferred Action Childhood Arrivals program (x5)
Obama-era program (x5)
Childhood Arrivals DACA (x4)
Deferred Action for Childhood Arrivals program (x 3)
Deferred Action (x5)
Deferred Action for Childhood Arrivals (x2)
42
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
WCL complexity metric
𝑊𝐶𝐿 = ෍
ℎ∈𝐻
𝑆ℎ
𝐿ℎ
• 𝐻 is a set of phrases’ heads in a code,
• 𝑆ℎ is a set of unique phrases with a phrase’s head ℎ ,
• 𝐿ℎ is a list of non-unique phrases with a phrase’s head ℎ.

More Related Content

Similar to Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles

Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classificationIsabella Peters
 
Human Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking ProfilesHuman Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking ProfilesBiswaranjan Samal
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationArmin Haller
 
Aspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete pptAspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete ppttanvikadam76
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachWaqas Tariq
 
Interfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent RecommendationsInterfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent RecommendationsPeter Brusilovsky
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsLisa Graves
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
 
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptxCOMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx5088manoj
 
A study on the approaches of developing a named entity recognition tool
A study on the approaches of developing a named entity recognition toolA study on the approaches of developing a named entity recognition tool
A study on the approaches of developing a named entity recognition tooleSAT Publishing House
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
 
Kual_Kuan Riset kualitatif Bahan 2.pptx
Kual_Kuan Riset kualitatif Bahan 2.pptxKual_Kuan Riset kualitatif Bahan 2.pptx
Kual_Kuan Riset kualitatif Bahan 2.pptxArie Rakhmat Riyadi
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...ijtsrd
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptxJitha Kannan
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)IJERA Editor
 

Similar to Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles (20)

Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
Human Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking ProfilesHuman Being Character Analysis from Their Social Networking Profiles
Human Being Character Analysis from Their Social Networking Profiles
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
 
Aspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete pptAspects&opinions identification_opinion mining complete ppt
Aspects&opinions identification_opinion mining complete ppt
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
Interfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent RecommendationsInterfaces for User-Controlled and Transparent Recommendations
Interfaces for User-Controlled and Transparent Recommendations
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptxCOMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
 
A study on the approaches of developing a named entity recognition tool
A study on the approaches of developing a named entity recognition toolA study on the approaches of developing a named entity recognition tool
A study on the approaches of developing a named entity recognition tool
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
 
Kual_Kuan Riset kualitatif Bahan 2.pptx
Kual_Kuan Riset kualitatif Bahan 2.pptxKual_Kuan Riset kualitatif Bahan 2.pptx
Kual_Kuan Riset kualitatif Bahan 2.pptx
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
 
G04124041046
G04124041046G04124041046
G04124041046
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
Share
ShareShare
Share
 

More from Anastasia Zhukova

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Anastasia Zhukova
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Anastasia Zhukova
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildAnastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
 

More from Anastasia Zhukova (10)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Recently uploaded (20)

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles

  • 1. Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles Anastasia Zhukova Doctoral supervisor: Felix Hamborg 1st examiner: Prof. Dr. Bela Gipp 2nd examiner: Prof. Dr. Karsten Donnay Date: 2019-03-07
  • 2. Agenda 1. Introduction 2. Project motivation and research objectives 3. Related work and research gap 4. Word choice and labeling (WCL) analysis system 5. Usability prototype 6. Multi-step merging approach 7. Evaluation results 8. Future work 9. Conclusion 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 2
  • 3. 07-Feb-23 3 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Introduction https://tgram.ru/channels/otsuka_bld • Biased perception of the Russian president depends on how he was framed
  • 4. 07-Feb-23 4 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Introduction invasion forces vs. coalition forces heart-wrenching tales of hardship vs. information on the lifestyles http://umich.edu/~newsbias/wordchoice.html Word Choice (WC) Labeling (L)
  • 5. 5 WCL depends on… [1-5] • actor or perspective selection • author position • goal of the message 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling http-//www.anmbadiary.com/2015/04/framing-effect-and-marketing.html Project motivation *equal with some degree of approximation When not identified WCL influences on… [2, 4-6] • emotion evaluation • decision making process • false information propagation Existing solutions… (cf.[15-17]) • involve manual annotation by social scientists • automated approaches yield simplistic results • results are not scalable and not interactive
  • 6. Project research objectives RQ: How can we automatically identify instances of bias by WCL referring to the semantic concepts in a set of English news articles reporting on the same event by using natural language processing (NLP)? Research tasks: 1. Design and develop a modular WCL analysis system; 2. Develop a first usability prototype with interactive visualization to explore the results of WCL analysis; 3. Research, propose, and implement an approach based on the NLP methods to identify semantic concepts that can be a target of bias by WCL; 4. Conduct an evaluation of the proposed semantic concept identification approach. 6 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
  • 7. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 7 Related work and research gap 1. Social science methodology a. Content analysis [2, 7, 9] b. Framing analysis [1, 4, 6, 10, 11] → effective but manual and time-consuming 2. Automated WCL identification a. from topic perspective [12, 14-17] b. from actor perspective [13, 18] → require interpretation of the word choice difference → no concept-to-concept automatic comparison 3. Natural language processing a. Named Entity Recognition (NER) (cf. [21]) b. Coreference resolution (cf.[12,20,24]) c. Cross-document coreference resolution (cf. [22, 23]) → do not resolve broad sense anaphora → do not analyze difference of word choice
  • 8. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 8 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  • 9. 9 WCL analysis pipeline methodology 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Putin president savior tyrant humble man thief president savior Putin tyrant humble man thief https://tgram.ru/channels/otsuka_bld Data preprocessing Semantic concept identification Framing analysis of semantic concepts Framing similarity across news articles Semantic concept identification
  • 10. 10 WCL analysis system 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Preprocessing Coreference resolution Tokenization POS tagging Dependency parsing NE Recognition Related articles Sentence splitting Parsing Concept identification Candidate extraction Corefs NPs Candidate alignment Multi-step merging Core meaning Core meaning modifiers Frequent word patterns Usability prototype Emotion frames LIWC emotion dimensions Emotion clustering Visualization Matrix view Bar chart view Article view • Inductive analysis, i.e., no prior knowledge given • The implementation is focused on the candidate alignment task
  • 11. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 11 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  • 12. 12 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Usability prototype Matrix view Bar chart view Article view WCL diversity
  • 13. 13 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Usability prototype Selection mode of the Matrix view Candidate view Selection mode of the Article view
  • 14. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 14 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  • 15. Candidate alignment task 15 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Task NER Coref. resolution Cand. alignment Categorization/grouping Cross-document coreferences Linking of mentions a. Common knowledge anaphora b. Broad sense anaphora • Candidate alignment task aims at resolving anaphora both of common knowledge and broad sense.
  • 16. Multi-step merging approach (MSMA): overview • Initial entities: coreferences and NPs • Extract entity attributes to highlight certain properties • Specify entity comparability to other entities • Iterate multiple times over all entities →merge entities based on similarities attributes • Merging step = level in a hierarchy 16 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling all entities sorted by their size similar color = similarity in one criterion compare the first entity to the other entities the considered entity merges similar entities place the updated entity to the end and continue the considered entity merges similar entities place the updated entity to the end sort entities by their size Step 1 Step N … Init.
  • 17. Multi-step merging approach: steps 17 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step1: Representative phrases’ heads Matching phrases’ heads, e.g. “President Trump” and “Donald Trump” Step2: Head sets Semantically similar head sets, e.g., {“Trump”, “president”} and {“billionaire”} Step3: Representative labeling phrases Semantically similar labeling phrases, e.g., “undocumented immigrants” and “illegal aliens” Step4: Compounds Semantically similar compounds, e.g., “DACA illegals” and “DACA recipients” Step 5: Representative frequent wordsets Semantically similar frequent wordsets, e.g., “United States” and “U.S.” Step 6: Representative frequent phrases String-similar frequent phrases, e.g., “Deferred Action of Childhood Arrivals” and “Childhood Arrivals”
  • 18. Multi-step merging approach: summary 18 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Type Step Goal Problems Core meaning Representative phrases’ heads Compare on the output of coreference resolution Applicable only for named entity (NE) entity types Head sets Find synonyms of head words among entities Word collocations contain more meaning than head words Core meaning modifiers Representative labeling phrases Identify most prominent adjective + noun patterns Adjective is not the only core meaning modifier Compounds Compare noun-to-noun similar compounds More than two-word phrases are required to represent entities Frequent word patterns Representative frequent wordsets Identify frequently repeated wording Wordsets disregard word order important for pattern identification Representative frequent phrases Identify frequently repeated phrases Requires extensive repetitive wording
  • 19. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 19 Roadmap RT1: WCL analysis methodology and system RT2: Usability prototype RT3: Candidate alignment task: methodology of multi-step merging approach RT4: Evaluation of the multi-step merging approach
  • 20. • Dataset: extended NewsWCL50 corpus • Ten topics of 5 articles each: NewsWCL50 [25] • One topic of 25 articles collected according to the NewsWCL50 methodology • Simplified content analysis (CA) annotation • Used annotation codes referring to the entities • Avoided complex semantic concepts, e.g., a reaction on something • Annotated extracted NPs and coreferential chains • Metrics • Weighted precision, recall, F1-score (evaluation of the best matching entities (BMEs) [27] • Homogeneity, completeness, V-measure (general clustering evaluation) [26] • WCL complexity metric (phasing diversity) • Baselines • Random baseline (B1) • CoreNLP coreference resolution: employ only coreferential chains (B2) [24] • Candidate clustering in the word vector space (B3) • Concept type categorization • Actor, e.g., Donald Trump • Group, i.e., group of people acting as one entity • Country, i.e., country names, anaphora, related to it organizations • Misc, i.e., events, objects, abstract entities 20 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Experiment setup
  • 21. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 21 Evaluation results 0.12 0.97 0.87 0.91 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Precision B1_P B2_P B3_P M_P 0.15 0.17 0.32 0.82 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Recall B1_R B2_R B3_R M_R 0.12 0.27 0.42 0.84 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 F1-score B1_F1 B2_F1 B3_F1 M_F1 B1: Random guessing B2: CoreNLP coreference resolution B3: Candidate clustering M: Multi-step merging approach
  • 22. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 22 WCL complexity evaluation Concept Type WCL F1 Actor 2.10 0.97 Country 4.49 0.74 Misc 5.67 0.82 Group 9.20 0.78 0.97 0.74 0.82 0.78 0.00 0.20 0.40 0.60 0.80 1.00 1.00 3.00 5.00 7.00 9.00 11.00 F1 WCL metric 0.91 0.81 0.88 0.78 0.82 0.81 0.00 0.20 0.40 0.60 0.80 1.00 2.00 5.00 8.00 11.00 14.00 F1 WCL metric Topic WCL F1 8 2.84 0.91 7 2.89 0.89 5 3.31 0.83 4 3.54 0.87 1 3.63 0.85 3 3.95 0.87 0 3.99 0.81 9 4.63 0.88 2 5.44 0.78 6 8.37 0.82 10 12.71 0.81 • Concept type split • Topic split • Logarithmic trend: Concepts with high WCL diversity are harder to identify. • The most phrase-diverse topics 6 and 10 perform comparably to the average performance (F1 = 0.84) ➢ WCL complexity is a metric representing anaphora phrasing diversity that refer to a concept. High complexity = high phrasing variation
  • 23. 23 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Merging steps evaluation: concept types Steps Actor Country Misc Group B1 0.123 0.124 0.107 0.112 B2 0.407 0.297 0.198 0.137 B3 0.450 0.428 0.468 0.289 Init. 0.408 0.298 0.204 0.140 Step 1 0.872 0.634 0.298 0.222 Step 2 0.927 0.685 0.779 0.502 Step 3 0.927 0.685 0.803 0.744 Step 4 0.970 0.700 0.803 0.744 Step 5 0.970 0.736 0.808 0.783 Step 6 0.970 0.736 0.817 0.783 Merging step types Actor Country Misc Group Core meaning (Steps 1 & 2) 0.519 0.388 0.575 0.362 Core modifiers (Steps 3 & 4) 0.043 0.014 0.024 0.242 Word patterns (Steps 5 & 6) 0.000 0.037 0.014 0.039 Overall 0.562 0.439 0.613 0.643 • Development of F1-score at each step • Difference of F1-score o Gradual increase at all merging steps o Init.step: extracted from CoreNLP coreferential chains and NPs o Step 1 outperforms B3 on NE-based types o Step 2 outperforms B3 on non-NE-based types o Highest F1: 𝐹1𝐴𝑐𝑡𝑜𝑟 = 0.97 o Lowest F1: “Country” and “Group” types o Lowest F1 boost: “Country” type → lack of semantic similarity o Highest F1 boost: “Group” type → many semantic patterns captured
  • 24. ➢ Better approach performance: on small or big topics? • Big topic: 25 articles per topic • Small topic: three subsets of topics of 5 articles each • We report average performance • big: F1 = 0.81 small: F1 = 0.72 • Big topic outperforms on “Misc” and “Group” types • Reasons: semantically similar repetitive word choice occurs often enough in a big topic 24 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Big vs. small topic comparison 0.96 0.67 0.63 0.66 0.96 0.88 0.75 0.59 0.00 0.20 0.40 0.60 0.80 1.00 1.20 Actor Misc Group Country DACA: F1 R5_avg_F1 All25_F1 1.59 6.68 9.37 12.65 1.79 11.47 19.34 23.89 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Actor Misc Group Country DACA: WCL metric R5_avg_WCL All25_WCL
  • 25. • MSMA: F1 = 0.84 baseline B3: F1 = 0.42 • Best performance on “Actor” type: F1 = 0.97 • Largest phrasing diversity: “Group” type • Largest performance boost on “Group” type ∆ = 0.643 • Better performance on the larger topics: big: F1 = 0.81 small: F1 = 0.72 • Worst performance on “Group” and “Country” types: “Group” type: o Requires additional merging step(s) o Concept sense disambiguation “Country” type: o Low word semantic representation by the chosen word vector model o Broadly defined CA concepts: mix of country names and organizations 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 25 Discussion summary 0.12 0.27 0.42 0.84 0.00 0.20 0.40 0.60 0.80 1.00 F1-score B1_F1 B2_F1 B3_F1 M_F1 0.41 0.2 0.1 0.3 0.97 0.82 0.78 0.74 0 0.2 0.4 0.6 0.8 1 Actor Misc Group Country F1-score: Concept types Init step All six steps
  • 26. • Additional merging step using local context • e.g., “Kim Jong Um” = “Little Rocket Man” • Concept sense disambiguation • e.g., “American people”≠ “foreign people” • Different word vector models • find better semantic representation of phrases • More complex concepts • Identify concepts such as action or reaction on something • Next step: Deductive analysis • collect large corpus of “silver”-quality annotated topics • train a sequential neural network (SNN) model • identify framing by WCL in any news topic 26 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Future work
  • 27. Contributions: 1. Proposed methodology of WCL analysis pipeline 2. Implemented WCL analysis system 3. Proposed, implemented and evaluated multi-step merging approach MSMA: F1 = 0.84 baseline B3: F1 = 0.42 Approach benefits: • resolves anaphora of broad sense • uses only candidate phrases without their context • no additional long model training required • tested on a specific dataset for WCL analysis 4. Implemented the first usability prototype Future work: • Concept sense disambiguation • SNN model for WCL deductive analysis 27 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Conclusion
  • 28. 1. Kahneman, D., Tversky, A., 1984. Choices, values, and frames. Am. Psychol. 39, 341–350. 2. F. Hamborg, K. Donnay, and B. Gipp, “Automated identification of media bias in news articles : an interdisciplinary literature review,” International Journal on Digital Libraries, 2018. 3. W. Linstrõm, M., & Marais, M. Linstrom, and W. Marais, “Qualitative News Frame Analysis: A Methodology,” Communitas, vol. 17, no. 17, pp. 21–38, 2012. 4. D. Chong and J. N. Druckman, “Framing Theory,” Annual Review of Political Science, vol. 10, no. 1, pp. 103–126, 2007. 5. A. Duzett, “Media Bias in Strategic Word Choice,” http://www.aim.org/on-targetblog/media-bias-in-strategic-word-choice/, 2011. 6. J. N. Druckman, “Political Preference Formation : Competition and the ( Ir ) relevance of Framing Effects,” The American Political Science Review, vol. 98, no. 4, pp. 671–686, 2004. 7. M. Linstrom and W. Marais, “Qualitative News Frame Analysis: A Methodology,”Communitas, vol. 17, pp. 21–38, 2012. 8. F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019. 9. M. Schreier, Qualitative content analysis in practice. Sage publications, 2012. 10. R. M. Entman, “Framing: Toward Clarification of a Fractured Paradigm,” Journal of Communication, vol. 43, no. 4, pp. 51–58, 1993. 11. R. M. Entman, “Framing bias: Media in the distribution of power,” Journal of Communication, vol. 57, no. 1, pp. 163–173, 2007. 12. Tian, Yan, and Concetta M. Stewart. "Framing the SARS crisis: A computer-assisted text analysis of CNN and BBC online news reports of SARS." Asian Journal of Communication 15.3 (2005): 289-301. 13. Sendén, Marie Gustafsson, Sverker Sikström, and Torun Lindholm. "“She” and “He” in news media messages: pronoun use reflects gender biases in semantic contexts." Sex Roles 72.1-2 (2015): 40-49. 14. Fortuna, Blaz, Carolina Galleguillos, and Nello Cristianini. "Detection of bias in media outlets with statistical learning methods." Text Mining. Chapman and Hall/CRC, 2009. 57-80. 15. Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. "Linguistic models for analyzing and detecting biased language." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013. 28 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling References
  • 29. 16. Z. Papacharissi and M. de Fatima Oliveira, “News frames terrorism: A comparative analysis of frames employed in terrorism coverage in U.S. and U.K. newspapers,” International Journal of Press/Politics, vol. 13, no. 1, pp. 52–74, 2008. 17. D. M. Garyantes and P. J. Murphy, “Success or chaos?: Framing and ideology in news coverage of the Iraqi national elections,” International Communication Gazette, vol. 72, no. 2, pp. 151–170, 2010. 18. D. Card, J. H. Gross, A. E. Boydstun, and N. A. Smith, “Analyzing Framing through the Casts of Characters in the News,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), pp. 1410–1420, 2016. 19. K. Clark and C. D. Manning, “Deep Reinforcement Learning for Mention-Ranking Coreference Models,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2256–2262, 2016. 20. H. Lee, “A Scaffolding Approach to Coreference Resolution Integrating Statistical and Rule-based Models,” Natural Language Engineering, vol. 23, no. 5, pp. 733–762, 2017 21. J. R. Finkel, T. Grenager, and C. Manning, “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling,” Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370, 2005. 22. S. Dutta and G. Weikum, “Cross-Document Co-Reference Resolution using SampleBased Clustering with Knowledge Enrichment,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 15–28, 2015 23. S. Singh, A. Subramanya, F. Pereira, and A. Mccallum, “Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models,” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 793–803, 2011. 24. K. Clark and C. D. Manning, “Improving Coreference Resolution by Learning EntityLevel Distributed Representations,” In Proceedings of the 54th Annual Meeting of the79 Association for Computational Linguistics, pp. 643–653, 2016 25. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” Manuscript submitted for publication, pp. 1–10, 26. Rosenberg, Andrew, and Julia Hirschberg. "V-measure: A conditional entropy-based external cluster evaluation measure." Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). 2007. 27. N. Chinchor and P. D, “MUC-5 EVALUATION METRIC S Science Applications International Corporatio n 10260 Campus Point Drive , MIS A2-F San Diego , CA 9212 1 Naval Command , Control , and Ocean Surveillance Cente r RDT & E Division ( NRaD ) Information Access Technology Project Te,” System, pp. 69–78, 1992 29 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling References
  • 30. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 30 Thank you for your attention! Questions?
  • 31. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 31 Back-up slides
  • 32. 34 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Entity Type Entity Subtype Source Example CA Concept Type person nn (noun single) WordNet + POS immigrant Actor nns (noun plural) WordNet + POS politicians Group ne (named entity) NER Trump Actor nes (named entity plural) NER + POS Democrats Group group -- WordNet university Group ne NER Congress Country/Group country -- WordNet Homeland Country ne NER Germany Country other -- -- vote Misc Idea: • Words can be similar in the vector space but the results will be irrelevant to CA concepts • Identify entity types for the effective results • Entity types resemble concept type from manual CA Entity types
  • 33. 35 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 1: Representative phrases’ heads Donald Trump Trump Mr. Trump forceful Mr. Trump President Trump Donald Trump the president The president of the US identical by string comparison Entity 1 Entity 2 Merged entities Heads of phrases Representative phrases Trump Donald Trump Trump Trump Donald Trump Trump Mr. Trump forceful Mr. Trump President Trump Donald Trump the president The president of the US Heads of phrases Representative phrases
  • 34. 36 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 2: Headsets young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants illegal aliens who were brought as children nearly 800,000 illegal aliens illegal aliens young illegal aliens headsets {illegals} {immigrants} {aliens} similar in the vector space Entity 1 Entity 2 Entity 3 the word alone is related to the UFO; it will be merged later as “illegal alien” at the third step Merge entities young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants headsets
  • 35. 37 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 3: Representative labeling phrases young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases young immigrants, undocumented immigrants, illegal immigrants, young illegals, endangered immigrants, additional illegals Entity 1 Entity 2 Merged entities Representative labeling phrases A1: young immigrants, A2: illegal immigrants, A3: young illegals B1: young people, B2: foreign people young people, foreign people, bad people, estimated people Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 3 2×3 ≥ 0.3 → similar in the vector space young illegals the illegals illegals who arrived as children DACA illegals roughly 800,000 young undocumented immigrants young immigrants illegal immigrants undocumented immigrants endangered immigrants additional illegals this group of young people nearly 800,000 people a people people who are American in every way except through birth foreign people bad people people affected by the move the estimated 800,000 people these people young people Labeling phrases Representative labeling phrases
  • 36. 38 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 4a: Headword-compound match PM Theresa May Mrs. May UK Prime Minister Theresa May Prime Minister The British prime minister identical by string comparison Entity 1 Entity 2 Merged entities Heads of phrases Compounds {Minister May, PM May, Mrs. May, Theresa May} Minister Minister {minister, Minister} PM Theresa May Mrs. May UK Prime Minister Theresa May Prime Minister The British prime minister dependent governor
  • 37. 39 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 4b: Common compounds DACA recipients the program’s beneficiaries DACA beneficiaries 800,000 recipients DACA participants 800,000 participants more than a quarter of DACA registrants program participants Entity 1 Entity 2 Compounds with overlapping words Compounds DACA recipients, program’s beneficiaries DACA beneficiaries DACA participants, DACA registrants, program participants A1: DACA recipients, A2: DACA beneficiaries B1: DACA participants, B2: DACA registrants {DACA} Overlapping NE compounds Sim.matrix A1 A2 B1 B2 1 1 0 0 Compounds Merged entities 2 2×2 ≥ 0.3 → similar in the vector space DACA recipients the program’s beneficiaries DACA beneficiaries 800,000 recipients DACA participants 800,000 participants more than a quarter of DACA registrants program participants Compounds with overlapping words
  • 38. 40 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 5: Representative frequent wordsets illegals whose DACA protection is pending DACA illegals young illegals illegal alien applicants DACA applicants more than 2,000 DACA recipients DACA beneficiaries DACA recipients whose status expires on March 5 former DACA participants the participants Entity 1 Entity 2 Frequent wordsets A1: {DACA, illegals}, A2: {illegals}, A3: {applicants}, A4: {DACA} B1: {DACA, recipients}, B2: {DACA}, B3: {participants} Frequent wordsets Sim.matrix A1 A2 A3 B1 B2 1 1 1 0 0 0 5 4×3 ≥ 0.3 → similar in the vector space A4 B3 1 1 0 0 0 0 Merged entities illegals whose DACA protection is pending DACA illegals young illegals illegal alien applicants DACA applicants more than 2,000 DACA recipients DACA beneficiaries DACA recipients whose status expires on March 5 former DACA participants the participants
  • 39. 41 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling Step 6: Representative frequent phrases DACA program (x10) DACA (x10) Deferred Action Childhood Arrivals program (x5) Obama-era program (x5) Childhood Arrivals DACA (x4) Deferred Action for Childhood Arrivals program (x 3) Deferred Action (x5) Deferred Action for Childhood Arrivals (x2) Entity 1 Entity 2 Frequent phrases B1: Deferred Action Childhood Arrivals, B2: Deferred Action, B3: Childhood Arrivals, B4: Deferred Action Childhood Arrivals program, B5: Childhood Arrivals DACA A1: DACA, A2: program, A3: DACA program, A4: Childhood Arrivals program, A5: Obama-era program Frequent phrases Sim.matrix A1 A2 A3 B1 B2 1 1 0 0 0 A4 B3 1 1 0 0 0 0 Merged entities A5 B4 B5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 𝑠𝑖𝑚𝑣𝑎𝑙 = 𝑠𝑖𝑚ℎ𝑜𝑟 = 4 5 ≥ 0.5 → similar in the vector space DACA program (x10) DACA (x10) Deferred Action Childhood Arrivals program (x5) Obama-era program (x5) Childhood Arrivals DACA (x4) Deferred Action for Childhood Arrivals program (x 3) Deferred Action (x5) Deferred Action for Childhood Arrivals (x2)
  • 40. 42 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling WCL complexity metric 𝑊𝐶𝐿 = ෍ ℎ∈𝐻 𝑆ℎ 𝐿ℎ • 𝐻 is a set of phrases’ heads in a code, • 𝑆ℎ is a set of unique phrases with a phrase’s head ℎ , • 𝐿ℎ is a list of non-unique phrases with a phrase’s head ℎ.