2106 JWLLP

Hate Speech as Toxic and Biased Words:
Construction and Analysis of
Korean Hate Speech Corpus
Won Ik Cho (SNU ECE)
2021. 6. 4 @JWLLP

Contents
• Introduction
• Source Corpus
• Guideline and Annotation
• Analysis
• Conclusion
Caution! This presenation contains contents that can be offensive
1

Introduction
• Hate speech
 What are the aspects of hate speech?
• Hate speech and hatred
• Bad words and insulting
• Discrimination and bias
 Various projects undergoing in the name of ...
• Abusive language, Toxic words, etc.
 Social agreement that prevalent hate speech `matters’ a lot
 However, some argues are on:
• What really is `hate speech’?
• Can some expressions be called as `hate speech’?
• Is hate speech really hateful?
2

Introduction
• Hate speech
 What are the aspects of hate speech?
• Hate speech and hatred
• Bad words and insulting
• Discrimination and bias
 Various projects undergoing in the name of ...
• Abusive language, Toxic words, etc.
 Social agreement that prevalent hate speech `matters’ a lot
 However, some argues are on:
• What really is `hate speech’?
• Can some expressions be called as `hate speech’?
• Is hate speech really hateful?
3

Introduction
• Hate speech
 Hate speech detection in practice
• Finding and blinding malicious expressions in game or broadcasting chat
• Blinding posts/comments of Youtube, Facebook or Twitter based on detecting
system
 Does current practical studies consider theoretical/social discussions?
• Current practical studies in Korean hate speech detection
– Detecting swear words and profanity terms: Usually dictionary-based
– Defines the sentences that contain the terms as `hate speech’
– OR sometimes defines the expressions from certain communities as hate speech
– Less study on human annotating the utterances
4

Introduction
• Hate speech
 Hate speech detection in practice
• Finding and blinding malicious expressions in game or broadcasting chat
• Blinding posts/comments of Youtube, Facebook or Twitter based on detecting
system
 Does current practical studies consider theoretical/social discussions?
• Current practical studies in Korean hate speech detection
– Detecting swear words and profanity terms: Usually dictionary-based
– Defines the sentences that contain the terms as `hate speech’
– OR sometimes defines the expressions from certain communities as hate speech
– Less study on human annotating the utterances
5

Introduction
• Hate speech
 In literature (and in other languages)
• Waseem and Hovy (2016)
– Tags English twitter posts, with around 10 or more characteristics that imply hate
speech
• Davidson et al. (2017)
– Mentions the discrepancy between the theoretical definition and real world
expressions of hate speech
– Puts `offensive’ expressions in between `hate’ and `non-hate’, to incorporate the
expressions that are in the grey area
• Sanguinetti et al. (2018)
– Investigates hate speech for the posts on Italian immigrants
» Beyond hate speech, detects if the post is offensive, aggressive, intensive, has
irony and sarcasm, shows stereotype.
» `Stereotype’ as a factor that can be a clue to discrimination
6

Introduction
• Hate speech
 In literature (and in other languages)
• Waseem and Hovy (2016)
– Tags English twitter posts, with around 10 or more characteristics that imply hate
speech
• Davidson et al. (2017)
– Mentions the discrepancy between the theoretical definition and real world
expressions of hate speech
– Puts `offensive’ expressions in between `hate’ and `non-hate’, to incorporate the
expressions that are in the grey area
• Sanguinetti et al. (2018)
– Investigates hate speech for the posts on Italian immigrants
» Beyond hate speech, detects if the post is offensive, aggressive, intensive, has
irony and sarcasm, shows stereotype.
» `Stereotype’ as a factor that can be a clue to discrimination
7

Introduction
• Hate speech
 Research Questions
• RQ1
– How is hate speech displayed in Korean online comments?
» What is bias and which categories are included in?
» How can we represent the amount of toxicity of expressions?
• RQ2
– What characteristics does the Korean hate speech corpus incorporate?
» Does bias accompany the toxicity of expression?
» Does toxicity matter with the type of shown bias?
8

Source Corpus
• Comments from the most popular Korean entertainment news
platform
 Jan. 2018 ~ Feb. 2020
 10,403,368 comments from 23,700 articles
 Sampling and Filtering
 Top 20 comments in the order of Wilson score on the downvote for each
1,580 articles acquired by stratified sampling
• Filter the duplicates and leave comments having more than single
token and less than 100 characters
• 10K comments were selected
9

Guideline and Annotation
• Formulation
 Hate speech
• Discussion with 1,000 comments over total 10,000
• Which factors make the comment `hate speech’?
– Bias
» `People with a specific characteristic may behave in some way’
» May differ from the judgment
– Hate
» Hostility towards a specific group or individual
» Can be represented by some profanity terms, but terms does not imply hate
– Insult
» Expressions that can harm the prestige of individuals or group
» Various profanity terms are included
– Offensive expressions
» Does not count as hate or insult, but may make the readers offensive
» Includes sarcasm, irony, bad guessing, unethical expressions
10

• Formulation
 Hate speech
• Discussion with 1,000 comments over total 10,000
• Which factors make the comment `hate speech’?
– Bias
» `People with a specific characteristic may behave in some way’
» May differ from the judgment
– Hate
» Hostility towards a specific group or individual
» Can be represented by some profanity terms, but terms does not imply hate
– Insult
» Expressions that can harm the prestige of individuals or group
» Various profanity terms are included
– Offensive expressions
» Does not count as hate or insult, but may make the readers offensive
» Includes sarcasm, irony, bad guessing, unethical expressions
11

• Formulation
 Social bias + Toxicity
• Detection of bias (ternary)
– Gender-related bias (Why?)
– Other biases
– None
» Close to the problem of `detection’
» Why concentrated on gender issue?
• Measuring toxicity (ternary)
– Severe hate or insult
– Not hateful but offensive or sarcastic
– None
» Close to the problem of `amount’
» Why formulated as a problem of intensity?
12

• Formulation
 Social bias + Toxicity
• Detection of bias (ternary)
– Gender-related bias (Why?)
– Other biases
– None
» Close to the problem of `detection’
» Why concentrated on gender issue?
• Measuring toxicity (ternary)
– Severe hate or insult
– Not hateful but offensive or sarcastic
– None
» Close to the problem of `amount’
» Why formulated as a problem of intensity?
13

• Guideline
 On bias
• Gender-related bias (left)
and other biases (right)
14

• Guideline
 On toxicity
• Hate (left two) and offensive (right)
15

• Guideline
 Multi-label tagging
• 3 classes for bias
• 3 classes for toxicity
 Given a comment (without context), the annotator should tag each
attribute
 Every comments provided to three random annotators
• Total 32 participants (in pilot and main tagging phase)
• Female : male = 6 : 4 / 20s : 30s : 40s = 3 : 2 : 1
16
1. What kind of bias does the comment contain?
- Gender bias, Other biases, or None
2. Which is the adequate category for the comment in terms of toxicity?
- Hate, Offensive, or None

• Pilot tagging – Which workers would fit?
 Human checked
• Ethical standard not too far from the guideline?
• Is feedback effective for the rejected samples?
 Automatically checked
• Enough taggings done?
• Too frequent cases of skipping the annotation?
17

• Crowd-sourcing – With selected workers
 Feedback for each annotator is not conducted in the sourcing phase
18

Analysis
• Data Post-processing
 After whole annotation (8,000 instances)
• Commonly checked for social bias and toxicity
– If all three annotators differ
» Task managers decide the final label after adjudication
• For toxicity
– Since the problem regarding ‘Intensity’, only (o) and (x) cases need to be reorganized
» Final decision after adjudication
• Failure for decision (unable to majority vote) - discarded
 Annotator agreement (Krippendorff’s alpha): overall moderate
• Bias (binary) – 0.767 (Existence of gender-related bias is relatively explicit)
• Bias (ternary) – 0.492
• Hate (ternary) – 0.496
19

Analysis
• Data Post-processing
 After whole annotation (8,000 instances)
• Commonly checked for social bias and toxicity
– If all three annotators differ
» Task managers decide the final label after adjudication
• For toxicity
– Since the problem regarding ‘Intensity’, only (o) and (x) cases need to be reorganized
» Final decision after adjudication
• Failure for decision (unable to majority vote) - discarded
 Annotator agreement (Krippendorff’s alpha): Overall moderate
• Bias (binary) – 0.767 (Existence of gender-related bias is relatively explicit)
• Bias (ternary) – 0.492
• Hate (ternary) – 0.496
20

Analysis
• Final data
 Data split
• Discarded 659 over 10,000
• Split train/valid/test with the rest
 Data composition
• Test: 974
– Data tagged while constructing the guideline (Most adjusted to the intention of the
guideline)
• Valid: 471
– Data which went through tagging/review/reject and accept in the pilot phase, done
with a large number of annotators (Roughly aligned with the guideline)
• Train: 7,896
– Data which were crowd-sourced with the selected annotators, not reviewed totally
but went through adjudication for some special case
21

Analysis
• Final data
 Characteristics
• Toxic comments possess slightly
larger portion towards None
• For bias, the same does not hold
 Something to remark
• ‘Lots of toxic expressions in celebrity news domain’?
– Though we sampled in the order of downvote, the overall portion does not
necessarily reflect the toxicity of random comments
• ‘Higher portion of toxic comments compared to bias’?
– Though the results tell so, biases are usually implicit and might not have been visible
to the users
» So that they were not accurately reflected to up/downvotes
22

Analysis
• Final data
 Characteristics
• Toxic comments possess slightly
larger portion towards None
• For bias, the same does not hold
 Something to remark
• ‘Lots of toxic expressions in celebrity news domain’?
– Though we sampled in the order of downvote, the overall portion does not
necessarily reflect the toxicity of random comments
• ‘Higher portion of toxic comments compared to bias’?
– Though the results tell so, biases are usually implicit and might not have been visible
to the users
» So that they were not accurately reflected to up/downvotes
23

Analysis
• Final data
 Bias and toxicity
• Toxicity is observed in most texts
with gender-related or other biases
– Gender-related bias?
» 93.76% toxic
– Other biases?
» 90.42% toxic
• In contrast, toxic comments do not necessarily contain biases
 The category of bias and amount of toxicity
• About 1.4 times gender-related bias in `hate’ compared to other biases
– Portion of gender-related bias goes half of other biases in `offensive’
• Maybe largely influenced by our guideline, but still suggests that the amount of
toxicity in celebrity news domain matters a lot with gender-related contents
24

Analysis
• Final data
 Bias and toxicity
• Toxicity is observed in most texts
with gender-related or other biases
– Gender-related bias?
» 93.76% toxic
– Other biases?
» 90.42% toxic
• In contrast, toxic comments do not necessarily contain biases
 The category of bias and amount of toxicity
• About 1.4 times gender-related bias in `hate’ compared to other biases
– Portion of gender-related bias goes half of other biases in `offensive’
• Maybe largely influenced by our guideline, but still suggests that the amount of
toxicity in celebrity news domain matters a lot with gender-related contents
25

Analysis
• Research questions
 RQ1
• How is hate speech displayed
in Korean online comments?
– Social bias and Toxicity
 RQ2
• What characteristics does the
Korean hate speech corpus
incorporate?
– Bias usually accompanies toxicity
– Gender-related bias seems to
accompany more toxic expressions
26

Conclusion
• Discussions on hate speech have diverse viewpoints, from
academia, to social and industry
• Construction of hate speech corpus in Korean links the above
discussions, to be useful in real world hate speech detection
• We observed bias and toxicity in Korean hate speech, which is
weighted to gender-related factors in celebrity news comments
• Our future work includes building up hate speech corpus for
various domain of texts, from formal to colloquial, to deal with the
uncovered cases
27

Conclusion
• Model and data release
 Annotation guideline
• https://www.notion.so/c1ecb7cc52d446cc93d928d172ef8442
 Kaggle competition
• https://www.kaggle.com/c/korean-gender-bias-detection
• https://www.kaggle.com/c/korean-bias-detection/
• https://www.kaggle.com/c/korean-hate-speech-detection/
 Github repository
• https://github.com/kocohub/korean-hate-speech
• For easier data importing
 Koco package
• https://github.com/inmoonlight/koco
– Library to easily access kocohub datasets
– Kocohub contains KOrean COrpus for natural language processing
» https://github.com/kocohub
28

2106 JWLLP

Recomendados

Recomendados

Más contenido relacionado

Similar a 2106 JWLLP

Similar a 2106 JWLLP (20)

Más de WarNik Chow

Más de WarNik Chow (20)

Último

Último (20)

2106 JWLLP

Notas del editor