1. Human Interface Laboratory
Towards Cross-Lingual Generalization of
Translation Gender Bias
2021. 3. 9 @FAccT Conference
Won Ik Cho*, Jiwon Kim*, Jaeyoung Yang, Nam Soo Kim
2. Contents
• Translation gender bias
What’s the problem and why this matters?
Significant in which language pairs? - Struggles so far
• Our approach
Language pairs and template
Dataset construction
Measurement of fluency and biasedness
• Discussion
Results and analysis
Takeaways
1
3. Bias
• Bias in machine learning?
Bias and variance
• Overfitting and underfitting
Bias in view of fairness machine learning?
• Problem of individuality and context rather than of
statistics and system (Binns, 2017)
Is the bias in machine learning related with the bias in fairness machine
learning and real social bias?
• e.g., image semantic role labeling
– Zhao et al., Men Also Like Shopping:
Reducing Gender Bias Amplification
using Corpus-level Constraints,
in Proc. EMNLP, 2017.
• This also happens in translation!
2
4. Bias
• What is shown (social) bias in AI and NLP?
Sun et al., Mitigating Gender Bias in Natural Language Processing:
Literature Review, in Proc. ACL, 2019.
3
5. Overview: Gender bias in translation?
• Formulation #1
Gender-neutral pronouns
• Target problem?
Translation of gender-neutral pronouns to gender-specific ones
• Gender-neutral pronoun
Pronouns with no biological
gender displayed
Frequently appears in languages
like Korean, Japanese, Turkish, ...
Prates et al., Assessing Gender
Bias in Machine Translation:
A Case Study with Google Trans
late, Neural Computing and
Applications, 2018.
4
6. Overview: Gender bias in translation?
• Formulation #2
Gendered languages
• Target problem?
Translation of expressions without
gender representation to gendered items
• Gendered languages
Grammatical genders in articles,
nouns, adjectives
Differs from the biological gender
Vanmassenhove et al.,
Getting Gender Right in
Neural Machine Translation,
in Proc. EMNLP, 2018.
5
7. Overview: Gender bias in translation?
• Why do they matter?
The result can be offensive to end users
• When they do matter?
In cases the users are familiar or not with the target/source language
• Who will potentially feel offensive?
Especially if the mistranslation regards social stereotypes
• Research questions
How can the evaluation incorporate various aspects of translation gender
bias?
How will grammatical properties and resource condition influence the bias
issue?
6
8. Template-based attacks
• 걔(s/he)는 [##]이야!
Cho et al., On measuring gender bias in translation of gender-neutral
pronouns," in Proc. GeBNLP, ACL Workshop, 2019.
• Why Korean?
Displays various sentence styles
Translation service
popular among the users
7
9. Semantic cues
• WinoMT
Stanovsky et al., Evaluating Gender Bias in Machine Translation,
in Proc. ACL, 2019.
8
- Performance can differ among
a similar language family
- High accuracy in stereotypical
case does not guarantee the
accuracy in the counterpart
10. Our approach
• Combined approach – Cross-lingual evaluation?
Two different types of translation gender bias
• Different types of gender bias can be observed in a single translation system
9
?
11. Our approach
• Language pairs
Different types of `language pairs’ should be investigated
• 1) Languages with gender-neutral pronouns
• 2) Languages without (colloquially used) GNPs but not grammatically gendered
• 3) Languages with grammatical gender
10
12. Our approach
• Language pairs
Linguistic criteria: On GNPs and genderedness
• Korean/Tagalog -> English -> German/Portuguese
Resource criteria
• Korean > Tagalog
• German > Portuguese
11
13. Our approach
• Template
How can the pairs be simultaneously evaluated?
12
Korean 그 여자에 대해 하나 말하자면, 걔는 초등학교 선생님이야.
Tagalog Isang bagay tungkol sa babae, siya ay isang guro sa elementarya.
English One thing about the woman, she is an elementary school teacher.
German Eine Sache über die Frau, sie ist eine Grundschullehrerin.
Portuguese Um facto sobre a mulher, ela é professora do ensino primário.
15. Our approach
• Evaluation
Template-based evaluation set construction
Inference with public MT modules
Human evaluation (gender-related) and automatic metrics (fluency)
14
16. Our approach
• Measurement
Biasedness
• Accuracy on biological gender
• Accuracy on grammatical gender
• Disparate impact
– Accuracy on female case
divided by accuracy on male case
Fluency
• BLEU
– EN, DE, PT
• BERTScore
– Multilingual BERT
15
17. Results and analysis
• Results
More bias-related errors in EN > DE/PT than in KO/TL > EN
• She is a game programmer > Sie ist ein professioneller Spieler
• aviador, soldado, monge (airman, soldier, monk)
• Exceptional cases for Bing KO-EN
16
18. Results and analysis
• Analysis
Unbiasedness/Disparate impact
• Higher among type 1 languages
– DE, PT < KO, TL (overall)
• In the same type, resource seems
to matter
– DE < PT, KO < TL
Fluency measurement
• Lexical and semantic approach have different results
– BLEU (lexical): DE > PT > KO, TL
– BERTScore (semantic): DE < PT, KO < TL
Observations
• The amount of available language resource, though here assumed for public
MT modules, does not guarantee unbiased translation, albeit fluency measure
may be higher in some sense
• There is a difference regarding the evaluation on gender-related inference per
fluency measures
17
19. Takeaways
• Translation gender bias is problematic since wrong results can be
offensive to end users
• Translation gender bias matters regardless of the user proficiency
of the language, and especially offensive if the mistranslation
engages social stereotypes
• Our approach, including template and measurement, can combine
the translation gender bias evaluation regarding various language
pairs
• Our evaluation results suggest that the inductive bias as a social
stereotype is a major factor causing the errors and augmenting
training corpora may not be a solution
18
20. Reference (order of appearance)
• Binns, Reuben. "Fairness in Machine Learning: Lessons from Political Philosophy." arXiv preprint
arXiv:1712.03586 (2017).
• Zhao, Jieyu, et al. "Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-
level Constraints." arXiv preprint arXiv:1707.09457 (2017).
• Sun, Tony, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza,
Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. "Mitigating Gender Bias in Natural
Language Processing: Literature Review." In Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pp. 1630-1640. 2019.
• Prates, Marcelo OR, Pedro H. Avelar, and Luís C. Lamb. "Assessing Gender Bias in Machine
Translation: A Case Study with Google Translate." Neural Computing and Applications (2018): 1-
19.
• Vanmassenhove, Eva, Christian Hardmeier, and Andy Way. "Getting Gender Right in Neural
Machine Translation." In Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, pp. 3003-3008. 2018.
• Cho, Won Ik, et al. "On Measuring Gender Bias in Translation of Gender-neutral Pronouns."
GeBNLP 2019 (2019): 173.
• Stanovsky, Gabriel, Noah A. Smith, and Luke Zettlemoyer. "Evaluating Gender Bias in Machine
Translation." arXiv preprint arXiv:1906.00591 (2019).
19