The document describes research on developing an automated system to identify framing and media bias in news articles through analysis of word choice and labeling (WCL). It presents an approach that uses natural language processing to identify semantic concepts that may be targets of bias and then compares how those concepts are framed across multiple news articles reporting on the same event. The approach involves preprocessing text, identifying semantic concepts, analyzing framing of concepts, and measuring framing similarity. It also details a multi-step merging methodology to align candidate concepts across articles and evaluates the approach on an annotated corpus, finding it outperforms baselines at identifying concepts with various levels of complexity in word choice.
Forensic Biology & Its biological significance.pdf
Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News Articles
1. Automated Identification of Framing
by Word Choice and Labeling
to Reveal Media Bias in News Articles
Anastasia Zhukova
Doctoral supervisor: Felix Hamborg
1st examiner: Prof. Dr. Bela Gipp
2nd examiner: Prof. Dr. Karsten Donnay
Date: 2019-03-07
2. Agenda
1. Introduction
2. Project motivation and research objectives
3. Related work and research gap
4. Word choice and labeling (WCL) analysis system
5. Usability prototype
6. Multi-step merging approach
7. Evaluation results
8. Future work
9. Conclusion
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 2
3. 07-Feb-23 3
Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Introduction
https://tgram.ru/channels/otsuka_bld
• Biased perception of the Russian president depends on how he was framed
4. 07-Feb-23 4
Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Introduction
invasion forces
vs.
coalition forces
heart-wrenching tales of hardship
vs.
information on the lifestyles
http://umich.edu/~newsbias/wordchoice.html
Word Choice (WC)
Labeling (L)
5. 5
WCL depends on… [1-5]
• actor or perspective selection
• author position
• goal of the message
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
http-//www.anmbadiary.com/2015/04/framing-effect-and-marketing.html
Project motivation
*equal with some degree of approximation
When not identified WCL influences on… [2, 4-6]
• emotion evaluation
• decision making process
• false information propagation
Existing solutions… (cf.[15-17])
• involve manual annotation by social scientists
• automated approaches yield simplistic results
• results are not scalable and not interactive
6. Project research objectives
RQ: How can we automatically identify instances of bias by WCL referring to the
semantic concepts in a set of English news articles reporting on the same event by
using natural language processing (NLP)?
Research tasks:
1. Design and develop a modular WCL analysis system;
2. Develop a first usability prototype with interactive visualization to explore the results of
WCL analysis;
3. Research, propose, and implement an approach based on the NLP methods to identify
semantic concepts that can be a target of bias by WCL;
4. Conduct an evaluation of the proposed semantic concept identification approach.
6
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
7. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 7
Related work and research gap
1. Social science methodology
a. Content analysis [2, 7, 9]
b. Framing analysis [1, 4, 6, 10, 11]
→ effective but manual and time-consuming
2. Automated WCL identification
a. from topic perspective [12, 14-17]
b. from actor perspective [13, 18]
→ require interpretation of the word choice difference
→ no concept-to-concept automatic comparison
3. Natural language processing
a. Named Entity Recognition (NER) (cf. [21])
b. Coreference resolution (cf.[12,20,24])
c. Cross-document coreference resolution (cf. [22, 23])
→ do not resolve broad sense anaphora
→ do not analyze difference of word choice
8. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 8
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
9. 9
WCL analysis pipeline methodology
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Putin
president
savior
tyrant
humble man
thief
president
savior
Putin
tyrant
humble man
thief
https://tgram.ru/channels/otsuka_bld
Data
preprocessing
Semantic concept
identification
Framing analysis
of semantic
concepts
Framing similarity
across news articles
Semantic concept
identification
10. 10
WCL analysis system
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Preprocessing
Coreference resolution
Tokenization
POS tagging
Dependency parsing
NE Recognition
Related
articles
Sentence splitting
Parsing
Concept identification
Candidate extraction
Corefs NPs
Candidate alignment
Multi-step merging
Core meaning
Core meaning modifiers
Frequent word patterns
Usability prototype
Emotion frames
LIWC emotion dimensions
Emotion clustering
Visualization
Matrix view
Bar chart view
Article view
• Inductive analysis, i.e., no prior knowledge given
• The implementation is focused on the candidate alignment task
11. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 11
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
12. 12
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Usability prototype
Matrix view Bar chart view Article view
WCL diversity
13. 13
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Usability prototype
Selection mode of the
Matrix view
Candidate view Selection mode of the
Article view
14. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 14
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
15. Candidate alignment task
15
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Task NER Coref. resolution Cand. alignment
Categorization/grouping
Cross-document coreferences
Linking of mentions
a. Common knowledge
anaphora
b. Broad sense anaphora
• Candidate alignment task aims at resolving anaphora both of common
knowledge and broad sense.
16. Multi-step merging approach (MSMA): overview
• Initial entities: coreferences and NPs
• Extract entity attributes to highlight certain properties
• Specify entity comparability to other entities
• Iterate multiple times over all entities
→merge entities based on similarities attributes
• Merging step = level in a hierarchy
16
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
all entities
sorted by
their size
similar color =
similarity in
one criterion
compare the first
entity to the other
entities
the considered
entity merges
similar entities
place the updated
entity to the end
and continue
the considered
entity merges
similar entities
place the
updated entity
to the end
sort entities
by their size
Step 1
Step N
…
Init.
17. Multi-step merging approach: steps
17
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step1: Representative phrases’ heads
Matching phrases’ heads, e.g. “President Trump” and “Donald Trump”
Step2: Head sets
Semantically similar head sets, e.g., {“Trump”, “president”} and {“billionaire”}
Step3: Representative labeling phrases
Semantically similar labeling phrases, e.g., “undocumented immigrants” and “illegal aliens”
Step4: Compounds
Semantically similar compounds, e.g., “DACA illegals” and “DACA recipients”
Step 5: Representative frequent wordsets
Semantically similar frequent wordsets, e.g., “United States” and “U.S.”
Step 6: Representative frequent phrases
String-similar frequent phrases, e.g., “Deferred Action of Childhood Arrivals” and “Childhood Arrivals”
18. Multi-step merging approach: summary
18
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Type Step Goal Problems
Core
meaning
Representative
phrases’ heads
Compare on the output of
coreference resolution
Applicable only for named entity (NE)
entity types
Head sets Find synonyms of head
words among entities
Word collocations contain more
meaning than head words
Core
meaning
modifiers
Representative
labeling
phrases
Identify most prominent
adjective + noun patterns
Adjective is not the only core
meaning modifier
Compounds Compare noun-to-noun
similar compounds
More than two-word phrases are
required to represent entities
Frequent
word
patterns
Representative
frequent
wordsets
Identify frequently
repeated wording
Wordsets disregard word order
important for pattern identification
Representative
frequent
phrases
Identify frequently
repeated phrases
Requires extensive repetitive
wording
19. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 19
Roadmap
RT1: WCL analysis methodology and system
RT2: Usability prototype
RT3: Candidate alignment task: methodology of multi-step merging approach
RT4: Evaluation of the multi-step merging approach
20. • Dataset: extended NewsWCL50 corpus
• Ten topics of 5 articles each: NewsWCL50 [25]
• One topic of 25 articles collected according to the NewsWCL50 methodology
• Simplified content analysis (CA) annotation
• Used annotation codes referring to the entities
• Avoided complex semantic concepts, e.g., a reaction on something
• Annotated extracted NPs and coreferential chains
• Metrics
• Weighted precision, recall, F1-score (evaluation of the best matching entities (BMEs) [27]
• Homogeneity, completeness, V-measure (general clustering evaluation) [26]
• WCL complexity metric (phasing diversity)
• Baselines
• Random baseline (B1)
• CoreNLP coreference resolution: employ only coreferential chains (B2) [24]
• Candidate clustering in the word vector space (B3)
• Concept type categorization
• Actor, e.g., Donald Trump
• Group, i.e., group of people acting as one entity
• Country, i.e., country names, anaphora, related to it organizations
• Misc, i.e., events, objects, abstract entities
20
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Experiment setup
22. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 22
WCL complexity evaluation
Concept Type WCL F1
Actor 2.10 0.97
Country 4.49 0.74
Misc 5.67 0.82
Group 9.20 0.78
0.97
0.74
0.82 0.78
0.00
0.20
0.40
0.60
0.80
1.00
1.00 3.00 5.00 7.00 9.00 11.00
F1
WCL metric
0.91
0.81
0.88
0.78 0.82 0.81
0.00
0.20
0.40
0.60
0.80
1.00
2.00 5.00 8.00 11.00 14.00
F1
WCL metric
Topic WCL F1
8 2.84 0.91
7 2.89 0.89
5 3.31 0.83
4 3.54 0.87
1 3.63 0.85
3 3.95 0.87
0 3.99 0.81
9 4.63 0.88
2 5.44 0.78
6 8.37 0.82
10 12.71 0.81
• Concept type split
• Topic split
• Logarithmic trend:
Concepts with high WCL diversity are harder to
identify.
• The most phrase-diverse topics 6 and 10 perform
comparably to the average performance (F1 = 0.84)
➢ WCL complexity is a metric representing anaphora phrasing diversity that
refer to a concept. High complexity = high phrasing variation
23. 23
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Merging steps evaluation: concept types
Steps Actor Country Misc Group
B1 0.123 0.124 0.107 0.112
B2 0.407 0.297 0.198 0.137
B3 0.450 0.428 0.468 0.289
Init. 0.408 0.298 0.204 0.140
Step 1 0.872 0.634 0.298 0.222
Step 2 0.927 0.685 0.779 0.502
Step 3 0.927 0.685 0.803 0.744
Step 4 0.970 0.700 0.803 0.744
Step 5 0.970 0.736 0.808 0.783
Step 6 0.970 0.736 0.817 0.783
Merging step types Actor Country Misc Group
Core meaning (Steps 1 & 2) 0.519 0.388 0.575 0.362
Core modifiers (Steps 3 & 4) 0.043 0.014 0.024 0.242
Word patterns (Steps 5 & 6) 0.000 0.037 0.014 0.039
Overall 0.562 0.439 0.613 0.643
• Development of F1-score at each step
• Difference of F1-score
o Gradual increase at all merging steps
o Init.step: extracted from CoreNLP
coreferential chains and NPs
o Step 1 outperforms B3 on NE-based types
o Step 2 outperforms B3 on non-NE-based
types
o Highest F1: 𝐹1𝐴𝑐𝑡𝑜𝑟 = 0.97
o Lowest F1: “Country” and “Group” types
o Lowest F1 boost:
“Country” type
→ lack of semantic similarity
o Highest F1 boost:
“Group” type
→ many semantic patterns captured
24. ➢ Better approach performance: on small or big topics?
• Big topic: 25 articles per topic
• Small topic: three subsets of topics of 5 articles each
• We report average performance
• big: F1 = 0.81 small: F1 = 0.72
• Big topic outperforms on “Misc” and “Group” types
• Reasons: semantically similar repetitive word choice occurs often enough in a big topic
24
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Big vs. small topic comparison
0.96
0.67
0.63 0.66
0.96
0.88
0.75
0.59
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Actor Misc Group Country
DACA: F1
R5_avg_F1 All25_F1
1.59
6.68
9.37
12.65
1.79
11.47
19.34
23.89
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Actor Misc Group Country
DACA: WCL metric
R5_avg_WCL All25_WCL
25. • MSMA: F1 = 0.84 baseline B3: F1 = 0.42
• Best performance on “Actor” type: F1 = 0.97
• Largest phrasing diversity: “Group” type
• Largest performance boost on “Group” type
∆ = 0.643
• Better performance on the larger topics:
big: F1 = 0.81 small: F1 = 0.72
• Worst performance on “Group” and “Country”
types:
“Group” type:
o Requires additional merging step(s)
o Concept sense disambiguation
“Country” type:
o Low word semantic representation by the
chosen word vector model
o Broadly defined CA concepts: mix of country
names and organizations
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 25
Discussion summary
0.12
0.27
0.42
0.84
0.00
0.20
0.40
0.60
0.80
1.00
F1-score
B1_F1 B2_F1 B3_F1 M_F1
0.41
0.2
0.1
0.3
0.97
0.82 0.78
0.74
0
0.2
0.4
0.6
0.8
1
Actor Misc Group Country
F1-score: Concept types
Init step All six steps
26. • Additional merging step using local context
• e.g., “Kim Jong Um” = “Little Rocket Man”
• Concept sense disambiguation
• e.g., “American people”≠ “foreign people”
• Different word vector models
• find better semantic representation of phrases
• More complex concepts
• Identify concepts such as action or reaction on something
• Next step: Deductive analysis
• collect large corpus of “silver”-quality annotated topics
• train a sequential neural network (SNN) model
• identify framing by WCL in any news topic
26
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Future work
27. Contributions:
1. Proposed methodology of WCL analysis pipeline
2. Implemented WCL analysis system
3. Proposed, implemented and evaluated multi-step merging approach
MSMA: F1 = 0.84 baseline B3: F1 = 0.42
Approach benefits:
• resolves anaphora of broad sense
• uses only candidate phrases without their context
• no additional long model training required
• tested on a specific dataset for WCL analysis
4. Implemented the first usability prototype
Future work:
• Concept sense disambiguation
• SNN model for WCL deductive analysis
27
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Conclusion
28. 1. Kahneman, D., Tversky, A., 1984. Choices, values, and frames. Am. Psychol. 39, 341–350.
2. F. Hamborg, K. Donnay, and B. Gipp, “Automated identification of media bias in news articles : an interdisciplinary literature review,”
International Journal on Digital Libraries, 2018.
3. W. Linstrõm, M., & Marais, M. Linstrom, and W. Marais, “Qualitative News Frame Analysis: A Methodology,” Communitas, vol. 17, no.
17, pp. 21–38, 2012.
4. D. Chong and J. N. Druckman, “Framing Theory,” Annual Review of Political Science, vol. 10, no. 1, pp. 103–126, 2007.
5. A. Duzett, “Media Bias in Strategic Word Choice,” http://www.aim.org/on-targetblog/media-bias-in-strategic-word-choice/, 2011.
6. J. N. Druckman, “Political Preference Formation : Competition and the ( Ir ) relevance of Framing Effects,” The American Political
Science Review, vol. 98, no. 4, pp. 671–686, 2004.
7. M. Linstrom and W. Marais, “Qualitative News Frame Analysis: A Methodology,”Communitas, vol. 17, pp. 21–38, 2012.
8. F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants ? Towards the Automated Identification of Bias by
Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019.
9. M. Schreier, Qualitative content analysis in practice. Sage publications, 2012.
10. R. M. Entman, “Framing: Toward Clarification of a Fractured Paradigm,” Journal of Communication, vol. 43, no. 4, pp. 51–58, 1993.
11. R. M. Entman, “Framing bias: Media in the distribution of power,” Journal of Communication, vol. 57, no. 1, pp. 163–173, 2007.
12. Tian, Yan, and Concetta M. Stewart. "Framing the SARS crisis: A computer-assisted text analysis of CNN and BBC online news reports of
SARS." Asian Journal of Communication 15.3 (2005): 289-301.
13. Sendén, Marie Gustafsson, Sverker Sikström, and Torun Lindholm. "“She” and “He” in news media messages: pronoun use reflects
gender biases in semantic contexts." Sex Roles 72.1-2 (2015): 40-49.
14. Fortuna, Blaz, Carolina Galleguillos, and Nello Cristianini. "Detection of bias in media outlets with statistical learning methods." Text
Mining. Chapman and Hall/CRC, 2009. 57-80.
15. Recasens, Marta, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. "Linguistic models for analyzing and detecting biased language."
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.
28
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
References
29. 16. Z. Papacharissi and M. de Fatima Oliveira, “News frames terrorism: A comparative analysis of frames employed in terrorism coverage in
U.S. and U.K. newspapers,” International Journal of Press/Politics, vol. 13, no. 1, pp. 52–74, 2008.
17. D. M. Garyantes and P. J. Murphy, “Success or chaos?: Framing and ideology in news coverage of the Iraqi national elections,”
International Communication Gazette, vol. 72, no. 2, pp. 151–170, 2010.
18. D. Card, J. H. Gross, A. E. Boydstun, and N. A. Smith, “Analyzing Framing through the Casts of Characters in the News,” Proceedings of
the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), pp. 1410–1420, 2016.
19. K. Clark and C. D. Manning, “Deep Reinforcement Learning for Mention-Ranking Coreference Models,” Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing, pp. 2256–2262, 2016.
20. H. Lee, “A Scaffolding Approach to Coreference Resolution Integrating Statistical and Rule-based Models,” Natural Language
Engineering, vol. 23, no. 5, pp. 733–762, 2017
21. J. R. Finkel, T. Grenager, and C. Manning, “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling,”
Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370, 2005.
22. S. Dutta and G. Weikum, “Cross-Document Co-Reference Resolution using SampleBased Clustering with Knowledge Enrichment,”
Transactions of the Association for Computational Linguistics, vol. 3, pp. 15–28, 2015
23. S. Singh, A. Subramanya, F. Pereira, and A. Mccallum, “Large-Scale Cross-Document Coreference Using Distributed Inference and
Hierarchical Models,” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies, vol. 1, pp. 793–803, 2011.
24. K. Clark and C. D. Manning, “Improving Coreference Resolution by Learning EntityLevel Distributed Representations,” In Proceedings of
the 54th Annual Meeting of the79 Association for Computational Linguistics, pp. 643–653, 2016
25. F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,”
Manuscript submitted for publication, pp. 1–10,
26. Rosenberg, Andrew, and Julia Hirschberg. "V-measure: A conditional entropy-based external cluster evaluation measure." Proceedings
of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning
(EMNLP-CoNLL). 2007.
27. N. Chinchor and P. D, “MUC-5 EVALUATION METRIC S Science Applications International Corporatio n 10260 Campus Point Drive , MIS
A2-F San Diego , CA 9212 1 Naval Command , Control , and Ocean Surveillance Cente r RDT & E Division ( NRaD ) Information Access
Technology Project Te,” System, pp. 69–78, 1992
29
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
References
30. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 30
Thank you for your attention!
Questions?
31. 07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling 31
Back-up slides
32. 34
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Entity
Type
Entity Subtype Source Example CA Concept Type
person
nn (noun single) WordNet + POS immigrant Actor
nns (noun plural) WordNet + POS politicians Group
ne (named entity) NER Trump Actor
nes (named entity plural) NER + POS Democrats Group
group
-- WordNet university Group
ne NER Congress Country/Group
country
-- WordNet Homeland Country
ne NER Germany Country
other -- -- vote Misc
Idea:
• Words can be similar in the vector space but the results will be irrelevant to CA concepts
• Identify entity types for the effective results
• Entity types resemble concept type from manual CA
Entity types
33. 35
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 1: Representative phrases’ heads
Donald Trump
Trump
Mr. Trump
forceful Mr. Trump
President Trump
Donald Trump
the president
The president of the US
identical by string
comparison
Entity 1 Entity 2
Merged entities
Heads of
phrases
Representative
phrases Trump Donald Trump
Trump Trump
Donald Trump
Trump
Mr. Trump
forceful Mr. Trump
President Trump
Donald Trump
the president
The president of the US
Heads of
phrases
Representative
phrases
34. 36
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 2: Headsets
young illegals
the illegals
illegals who arrived as
children
DACA illegals
roughly 800,000 young undocumented
immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as
children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets {illegals} {immigrants} {aliens}
similar in the
vector space
Entity 1 Entity 2 Entity 3
the word alone is related
to the UFO; it will be
merged later as “illegal
alien” at the third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
35. 37
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 3: Representative labeling phrases
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
36. 38
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 4a: Headword-compound match
PM Theresa May
Mrs. May
UK Prime Minister Theresa May
Prime Minister
The British prime minister
identical by string
comparison
Entity 1 Entity 2
Merged entities
Heads of
phrases
Compounds
{Minister May, PM May,
Mrs. May, Theresa May}
Minister Minister
{minister,
Minister}
PM Theresa May
Mrs. May
UK Prime Minister Theresa May
Prime Minister
The British prime minister
dependent governor
37. 39
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 4b: Common compounds
DACA recipients
the program’s beneficiaries
DACA beneficiaries
800,000 recipients
DACA participants
800,000 participants
more than a quarter of DACA registrants
program participants
Entity 1 Entity 2
Compounds with
overlapping words
Compounds DACA recipients,
program’s beneficiaries
DACA beneficiaries
DACA participants,
DACA registrants,
program participants
A1: DACA recipients,
A2: DACA beneficiaries
B1: DACA participants,
B2: DACA registrants
{DACA}
Overlapping NE
compounds
Sim.matrix
A1
A2
B1 B2
1
1
0
0
Compounds
Merged entities
2
2×2
≥ 0.3 → similar in the vector space
DACA recipients
the program’s beneficiaries
DACA beneficiaries
800,000 recipients
DACA participants
800,000 participants
more than a quarter of DACA registrants
program participants
Compounds with
overlapping words
38. 40
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 5: Representative frequent wordsets
illegals whose DACA protection is pending
DACA illegals
young illegals
illegal alien applicants
DACA applicants
more than 2,000 DACA recipients
DACA beneficiaries
DACA recipients whose status expires on March 5
former DACA participants
the participants
Entity 1 Entity 2
Frequent
wordsets
A1: {DACA, illegals},
A2: {illegals},
A3: {applicants},
A4: {DACA}
B1: {DACA, recipients},
B2: {DACA},
B3: {participants}
Frequent
wordsets
Sim.matrix
A1
A2
A3
B1 B2
1 1
1
0 0
0
5
4×3
≥ 0.3 → similar in the vector space
A4
B3
1
1
0
0 0
0
Merged entities
illegals whose DACA protection is pending
DACA illegals
young illegals
illegal alien applicants
DACA applicants
more than 2,000 DACA recipients
DACA beneficiaries
DACA recipients whose status expires on March 5
former DACA participants
the participants
39. 41
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
Step 6: Representative frequent phrases
DACA program (x10)
DACA (x10)
Deferred Action Childhood Arrivals program (x5)
Obama-era program (x5)
Childhood Arrivals DACA (x4)
Deferred Action for Childhood Arrivals program (x 3)
Deferred Action (x5)
Deferred Action for Childhood Arrivals (x2)
Entity 1 Entity 2
Frequent phrases
B1: Deferred Action Childhood Arrivals,
B2: Deferred Action,
B3: Childhood Arrivals,
B4: Deferred Action Childhood Arrivals program,
B5: Childhood Arrivals DACA
A1: DACA,
A2: program,
A3: DACA program,
A4: Childhood Arrivals program,
A5: Obama-era program
Frequent phrases
Sim.matrix
A1
A2
A3
B1 B2
1 1
0 0
0
A4
B3
1 1
0 0
0 0
Merged entities
A5
B4 B5
0 0 0
0 0 0 0 0
0
0
0 0 0 0
𝑠𝑖𝑚𝑣𝑎𝑙 = 𝑠𝑖𝑚ℎ𝑜𝑟 =
4
5
≥ 0.5 → similar in the vector space
DACA program (x10)
DACA (x10)
Deferred Action Childhood Arrivals program (x5)
Obama-era program (x5)
Childhood Arrivals DACA (x4)
Deferred Action for Childhood Arrivals program (x 3)
Deferred Action (x5)
Deferred Action for Childhood Arrivals (x2)
40. 42
07-Feb-23 Anastasia Zhukova - Automated Identification of Framing by Word Choice and Labeling
WCL complexity metric
𝑊𝐶𝐿 =
ℎ∈𝐻
𝑆ℎ
𝐿ℎ
• 𝐻 is a set of phrases’ heads in a code,
• 𝑆ℎ is a set of unique phrases with a phrase’s head ℎ ,
• 𝐿ℎ is a list of non-unique phrases with a phrase’s head ℎ.