SlideShare una empresa de Scribd logo
1 de 89
Descargar para leer sin conexión
Typ-NLP Workshop
1 August 2019
What can typological
knowledge bases and
language representations
tell us about linguistic
properties?
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*Credit for many of the slides: Johannes Bjerva
Linguistic Typology
2
● ‘The systematic study and comparison of language
structures’ (Velupillai, 2012)
● Long history (Herder, 1772; von der Gabelentz, 1891; …)
● Computational approaches (Dunn et al., 2011; Wälchli,
2014; Östling, 2015, ...)
Why Computational Typology?
3
● Answer linguistic research questions on large scale
● About relationships between languages
● About relationships between structural features of languages
● Facilitate multilingual learning
○ Cross-lingual transfer
○ Few-shot or zero-shot learning
How to Obtain Typological Knowledge?
4
● Discrete representation of language features in typological
knowledge bases
● World Atlas of Language Structures (WALS)
● Continuous representation of language features via
language embeddings
● Learned via language modelling
Why Computational Typology?
5
● Answer linguistic research questions on large scale
● Multilingual learning
○ Language representations
○ Cross-lingual transfer
○ Few-shot or zero-shot learning
● This talk:
○ Features in the World Atlas of Language Structures (WALS)
○ Computational Typology via unsupervised modelling of languages
in neural networks
6
World Atlas of Language Structures (WALS)
7
World Atlas of Language Structures (WALS)
Can language representations be learned from data?
Resources that exist for many languages:
● Universal Dependencies (>60 languages)
● UniMorph (>50 languages)
● New Testament translations (>1,000 languages)
● Automated Similarity Judgment Program (>4,500
languages)
8
Multilingual NLP and Language Representations
● No explicit representation
○ Multilingual Word Embeddings
● Google’s “Enabling zero-shot
learning” NMT trick
○ Language given explicitly in
input
● One-hot encodings
○ Languages represented as a
sparse vector
● Language Embeddings
○ Languages represented as a
distributed vector
9
(Östling and Tiedemann, 2017)
Experimental Setup
Data
● Pre-trained language embeddings (Östling and Tiedemann, 2017)
○ Trained via Language Modelling on New Testament data
● PoS annotation from Universal Dependencies for
○ Finnish
○ Estonian
○ North Sami
○ Hungarian
Task
● Fine-tune language embeddings on PoS tagging
● Investigate how typological properties are encoded in these for four
Uralic languages
10
Distributed Language Representations
11
• Language Embeddings
• Analogous to Word
Embeddings
• Can be learned in a
neural network without
supervision
Language Embeddings in Deep Neural Networks
12
Talk Overview
Part 1: Language Embeddings
- Do they aid multilingual parameter sharing?
- Do they encode typological properties?
- What types of similarities between languages do they encode?
Part 2: Typological Knowledge Bases
- Can they be populated automatically?
- Can they be used to discover typological implications?
13
Part 1: Language
Embeddings
14
Parameter sharing between dependency
parsers for related languages
Miryam de Lhoneux, Johannes Bjerva,
Isabelle Augenstein, Anders Søgaard
EMNLP 2018
15
Cross-lingual sharing with language embeddings
● Do language embeddings help to learn soft sharing
strategies?
● Use case: transition-based dependency parsing
● Types of parameters:
● Character embeddings
● Word embeddings
● Transition parameters (MLP)
● Ablation with language embedding concatenated with
char, word or transition vector
16
Cross-lingual sharing with language embeddings
17
Lang Tokens Family Word order
ar 208,932 Semitic VSO
he 161,685 Semitic SVO
et 60,393 Finnic SVO
fi 67,258 Finnic SVO
hr 109,965 Slavic SVO
ru 90,170 Slavic SVO
it 113,825 Romance SVO
es 154,844 Romance SVO
nl 75,796 Germanic No dom. order
no 76,622 Germanic SVO
Table 1: Dataset characteristics
classifier parameters always helps, whereas the
usefulness of sharing LSTM parameters depends
transition is used to gene
pendency trees (Nivre, 200
For an input sentence o
w1, . . . , wn, the parser cre
tors x1:n, where the vecto
the concatenation of a wor
nal state of the character-b
cessing the characters of w
ch(wi) is obtained by run
LSTM over the character
of wi. Each input elemen
word-level, bi-directional L
BILSTM(x1:n, i). For each
ture extractor concatenates
tions of core elements fro
Both the embeddings and
together with the model.
A configuration c is re
Cross-lingual sharing with language embeddings
19
78
78,2
78,4
78,6
78,8
79
79,2
79,4
79,6
79,8
80
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings
Related vs. Unrelated Languages
22
78,00
78,20
78,40
78,60
78,80
79,00
79,20
79,40
79,60
79,80
80,00
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings
Tracking Typological Traits of Uralic
Languages in Distributed Language
Representations
Johannes Bjerva, Isabelle Augenstein
IWCLUL 2018
24
Language Embeddings in Deep Neural Networks
25
1. Do language
embeddings aid
multilingual modelling?
2. Do language
embeddings contain
typological
information?
Model performance (Monolingual PoS tagging)
26
• Compared to most
frequent class
baseline (black line)
• Model transfer
between Finnic
languages relatively
successful
• Little effect from
language
embeddings (to be
expected)
Model performance (Multilingual PoS tagging)
27
• Compared to
monolingual baseline
(black line)
• Model transfer
between Finnic
languages
outperforms
monolingual baseline
• Language
embeddings improve
multilingual modelling
Tracking Typological Traits (full language sample)
28
• Baseline: Most frequent
typological class in sample
• Language embeddings saved
at each training epoch
• Separate Logistic Regression
classifier trained for each
feature and epoch
• Input: Language
embedding
• Output: Typological class
• Typological features encoded
in language embeddings
change during training
Tracking Typological Traits (Uralic languages held out)
29
• Some typological
features can be
predicted with high
accuracy for the
unseen Uralic
languages.
Cross-lingual sharing with language embeddings: Summary
● Conclusions
● Sharing high-level features more useful than low-level features
● If languages are unrelated, sharing low-level features hurts
performance
● Language embeddings help
● Only tested for (selected) language pairs
● Sharing for more languages
● Only tested for selected tasks (parsing, PoS tagging)
● Language embeddings pre-trained or trained end-to-end
● Soft or hard sharing based on typological KBs?
30
From Phonology to Syntax: Unsupervised
Linguistic Typology at Different Levels with
Language Embeddings
Johannes Bjerva, Isabelle Augenstein
NAACL HLT 2018
31
Language Embeddings in Deep Neural Networks
32
Do language
embeddings contain
typological information?
- Predict typological
features
- Study unsupervised vs.
fine-tuned embeddings
Research Questions
● RQ 1: Which typological properties are encoded in task-
specific distributed language representations, and can we
predict phonological, morphological and syntactic
properties of languages using such representations?
● RQ 2: To what extent do the encoded properties change as
the representations are fine-tuned for tasks at different
linguistic levels?
● RQ 3: How are language similarities encoded in fine-tuned
language embeddings?
33
Phonological Features
34
● 20 features
● E.g. descriptions of the
consonant and vowel
inventories, presence of tone
and stress markers
Morphological Features
35
● 41 features
● Features from morphological
and nominal chapter
● E.g. number of genders, usage
of definite and indefinite articles
and reduplication
Word Order Features
36
● 56 features
● Encode ordering of subjects,
objects and verbs
Experimental Setup
37
Data
● Pre-trained language embeddings (Östling and Tiedemann, 2017)
● Task-specific datasets: grapheme-to-phoneme (G2P), phonological
reconstruction (ASJP), morphological inflection (SIGMORPHON),
part-of-speech tagging (UD)
Dataset Class Ltask Ltask Lpre
G2P Phonology 311 102
ASJP Phonology 4664 824
SIGMORPHON Morphology 52 29
UD Syntax 50 27
Experimental Setup
38
Method
● Fine-tune language embeddings on grapheme-to-phoneme (G2P),
phonological reconstruction (ASJP), morphological inflection
(SIGMORPHON), part-of-speech tagging (UD)
○ train supervised seq2seq models on G2P, ASJP, SIGMORPHON
tasks and
○ Train seq labelling model on UD task
● Predict typological properties with kNN model
Experimental Setup
39
Seq2Seq Model
Part-of-Speech Tagging (UD)
47
- Improvements for all experimental settings
-> Pre-trained and fine-tuned language embeddings encode
features relevant to word order
System/Features Random lang/feat
pairs from word
order features
Random
lang/feat pairs
from all features
Most frequent class 67.81% 82.93%
k-NN (pre-trained) 76.66% 82.69%
k-NN (fine-tuned) *80.81% 83.55%
Conclusions
50
- Language embeddings can encode typological features
- Works for morphological inflection and PoS tagging
- Does not work for phonological tasks
- We can predict typological features for unseen language families
with high accuracies
- G2P task: phonological differences between otherwise similar
languages (e.g. Norwegian Bokmål and Danish) are accurately
encoded
What do Language Representations Really
Represent?
Johannes Bjerva, Robert Östling, Maria Han
Veiga, Jörg Tiedemann, Isabelle Augenstein
Computational Linguistics 2019
51
Language Representations encode Language Similarities
52
• Similar languages – similar representations
• ...similar how?
• Can reconstruct language family trees (Rabinovich et al. 2017)!
• So... Language family (genetic) similarity?
What Do Language Representations Really Represent?
53
presentations encapsulate is further hinted at
f language vectors in Östling and Tiedemann
models (Johnson et al. 2017).
Structural distance?
Family distance?

Geographical distance?
{en
fr
es
pt
de
nl
Figure 1
Language representations in a
two-dimensional space. What do
their similarities represent?
prelimi-
anguage
n (2017),
Johnson
odelling
ng mul-
p on the
er (2017)
consist-
find that
space is
However,
ng a cor-
Language embeddings in a two-dimensional space.
What do their similarities represent?
Language Representations from Monolingual Texts
54
• Input: official translations from EU languages to English (EuroParl)
• Train multilingual LM on various levels of abstraction
• Evaluate resulting language representations
Bjerva et al. What do Language Representations Really Represent?
Czech source
Swedish source
Official
translation
… …
Multilingual language model
Multilingual language model
Multilingual language model
CS For example , in my country , the Czech Republic English translation
CS ADP NOUN PUNCT ADP ADJ NOUN PUNCT DET PROPN PROPN POS
CS prep pobj punct prep poss pobj punct det compound nsubj DepRel
SE In Stockholm , we must make comparisons and learn English translation
SE ADP PROPN PUNCT PRON VERB VERB NOUN CCONJ VERB POS
SE prep pobj punct nsubj aux ROOT dobj cc conj DepRel
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
(2017) who investigate representation learning on
monolingual English sentences, which are translations
from various source languages to English from
the Europarl corpus (Koehn, 2005). They employ
a feature-engineering approach to predict source
languages and learn an Indo-European (IE) family
tree using their language representations. Crucially,
they posit that the relationships found between their
representations encode the genetic relationships
between languages. They use features based on
sequences of POS tags, function words and cohesive
markers. We significantly expand on this work by
comparing three language similarity measures (§4).
By doing this, we offer a stronger explanation of what
language representations really represent.
3 Method
Figure 2 illustrates the data and problem we consider
in this paper. We are given a set of English gold-
standard translations from the official languages of
the European Union, based on speeches from the
European Parliament.1 We wish to learn language
representations based on this data, and investigate the
linguistic relationships which hold between the result-
ing representations (RQ2). For this to make sense, it
is important to abstract away from the surface forms of
the translations as, e.g., speakers from certain regions
will tend to talk about the same issues. We therefore
introduce several levels of abstraction: i) training on
of POS tags. Our model is similar to ¨Ostling and
Tiedemann (2017), who train a character-based
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
4.1 Genetic Distance
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
us source languages to English from
l corpus (Koehn, 2005). They employ
ngineering approach to predict source
nd learn an Indo-European (IE) family
heir language representations. Crucially,
hat the relationships found between their
ons encode the genetic relationships
nguages. They use features based on
f POS tags, function words and cohesive
We significantly expand on this work by
three language similarity measures (§4).
s, we offer a stronger explanation of what
presentations really represent.
d
strates the data and problem we consider
r. We are given a set of English gold-
nslations from the official languages of
an Union, based on speeches from the
arliament.1 We wish to learn language
ons based on this data, and investigate the
ationships which hold between the result-
tations (RQ2). For this to make sense, it
to abstract away from the surface forms of
ons as, e.g., speakers from certain regions
talk about the same issues. We therefore
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
most closely related to Rabinovich et al.
nvestigate representation learning on
English sentences, which are translations
source languages to English from
corpus (Koehn, 2005). They employ
ineering approach to predict source
d learn an Indo-European (IE) family
ir language representations. Crucially,
t the relationships found between their
ns encode the genetic relationships
uages. They use features based on
POS tags, function words and cohesive
significantly expand on this work by
ree language similarity measures (§4).
we offer a stronger explanation of what
esentations really represent.
rates the data and problem we consider
We are given a set of English gold-
slations from the official languages of
Union, based on speeches from the
liament.1 We wish to learn language
s based on this data, and investigate the
ionships which hold between the result-
tions (RQ2). For this to make sense, it
abstract away from the surface forms of
s as, e.g., speakers from certain regions
the input sequences themselves are, e.g., sequences
of POS tags. Our model is similar to ¨Ostling and
Tiedemann (2017), who train a character-based
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
Figure 2
Problem illustration. Given official translations from EU languages to English, we train
multilingual language models on various levels of abstractions, encoding the source languages.
The resulting source language representations (Lraw etc.) are evaluated.
languages, having an incorrect view of the structure of the language representation
space can be dangerous. For instance, the standard assumption of genetic similarity
would imply that the representation of the Gagauz language (Turkic, spoken mainly
in Moldova) should be interpolated from the genetically very close Turkish, but this
would likely lead to poor performance in syntactic tasks since the two languages have
Tree Distance Evaluation
55
• Hierarchical clustering of language embeddings
• Compare resulting trees with gold phylogenetic trees
• Hierarchical clustering of cosine distances (Rabinovich et al. 2017)
nd Petroni (2008), using the distance metric from Rabinovich, Ordan,
017).3
Our generated trees yield comparable results to previous work
Condition Mean St.d.
Raw text (LM-Raw) 0.527 -
Function words and POS (LM-Func) 0.556 -
Only POS (LM-POS) 0.517 -
Phrase-structure (LM-Phrase) 0.361 -
Dependency Relations (LM-Deprel) 0.321 -
POS trigrams (ROW17) 0.353 0.06
Random (ROW17) 0.724 0.07
Table 1
Tree distance evaluation (lower is better, cf. §5.1).
ling using lexi-
and POS tags.
ments deal with
y on the raw
This is likely to
tions by speak-
erent countries
pecific issues or
Figure 2), and
l comparatively
n to work with
explicit syntac-
available. As a
the lack of explicit syntactic information, it is unsurprising that the
w in Table 1) only marginally outperform the random baseline.
away from the content and negate the geographical effect we train
on only function words and POS. This performs almost on par with
Func in Table 1), indicating that the level of abstraction reached is not
pture similarities between languages. We next investigate whether we
y abstract away from the content by removing function words, and only
56
Distance Measures
• Family distance (following Rabinovich et al. 2017)
• Geographic distance
• Using Glottlog geocoordinates (Hammarström et al. 2017)
• Structural distance
vs
• Language embedding similarities most strongly correlate with
structural similarities
• Less strong correlation with genetic similarities, even though
phylogenetic trees can be faithfully reconstructed (Rabinovich et al.
2017)
57
Figure 4
Correlations between similarities (Genetic, Geo
and Struct.) and language representations (Raw
Func, POS, Phrase, Deprel). Significance at
p < 0.001 is indicated by *.
reconstruct
s in a sim-
work, we
genetic re-
ages really
esentations
matrices A⇢,
resents the
th
and jth
similarity
hen, the en-
ise genetic
mming the
Analysis of Similarities
Part 2: Typological
Knowledge Bases
58
A Probabilistic Generative Model of Linguistic
Typology
Johannes Bjerva, Yova Kementchedjhieva,
Ryan Cotterell, Isabelle Augenstein
NAACL 2019
59
Languages are different
61
eat an apple
Languages are different
62
eat an apple
ringo-wo taberu
Differences between languages are often correlated
63
eat an apple ringo-wo taberu
go to NAACL
Differences are often correlated
64
eat an apple ringo-wo taberu
go to NAACL
Differences are often correlated
65
eat an apple
go to NAACL
NAACL -ni iku
NAACL
ringo-wo taberu
Linguistic Typology and Greenberg’s Universals
66
VO languages have prepositions
OV languages have postpositions
Correlations between Word Order and Adpositions
67
Contributions
• Greenberg’s universals are binary – However, correlations are
rarely 100%, so we implement a probabilisation of typology
• Framed as typological collaborative filtering,
exploiting correlations between languages and features
• We exploit raw linguistic data by adding a semi-supervised
extension
68
World Atlas of Language Structure (WALS)
70
Ø 2,500 languages
Ø 192 features
Feature 81A – Order of Subject, Object and Verb
WALS is Sparse and Skewed
71
Ø Sparse:
Most languages are
covered by only a
handful of features
Ø Skewed:
A few features have
much wider
coverage than
others
World Atlas of Language Structure (WALS)
72
Ø 2,500 languages
800 languages
Ø 192 features
160 features
Feature 81A – Order of Subject, Object and Verb
Probabilistic Typology as Matrix Factorisation
73
Users
Movies
Probabilistic Typology as Matrix Factorisation
74
Users
Users
Movies
Ratings
Probabilistic Typology as Matrix Factorisation
75
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
Movies
Probabilistic Typology as Matrix Factorisation
76
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[	4		1	-5]
[	7	-2		0]
[	6	-2		3]
[-9		1		4]
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[
Matrix Completion: Collaborative Filtering
77
[1,-4,3] [-5,2,1]
-10
Dot	Product
Probabilistic Typology as Matrix Factorisation
78
Users
-37 29 19 29
-36 67 77 22
-24 74 12
-79-52 -39
-20
-5
-9
29
14
-3
Users
[	4		1	-5]
[	7	-2		0]
[	6	-2		3]
[-9		1		4]
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[
Probabilistic Typology as Matrix Factorisation
79
Users
-37 29 19 29
-36 67 77 22
-24 61 74 12
-79 -41-52 -39
-20
-5
70
-9
29
-21
14
-3
Users
[	4		1	-5]
[	7	-2		0]
[	6	-2		3]
[-9		1		4]
-6
-3
2
[
9
-2
1
[
9
-7
2
[
[
Movies
4
3
-2
[
4
3
-2
[
4
3
-2
[
4
3
-2
[
[
[
[
[
[
[
Probabilistic Typology as Matrix Factorisation
80
Languages
Typological Features
Probabilistic Typology as Matrix Factorisation
84
Typological Features
Probabilistic Typology as Matrix Factorisation
86
Probabilistic Typology as Matrix Factorisation
87
Typological Feature Prediction
88
Probabilistic Typology as Matrix Factorisation
89
4
0.9
Dot	Product
Sigmoid
Typological Feature Prediction
90
Representing Languages – Semi-supervised
Extension
95
k a t
<s> k a
Representing Languages – Semi-supervised
Extension
96
k a t
<s> k a
Semi-supervised Extension - Interpretability
97
Typological Feature
Prediction
Multilingual Language
Modelling
Compressing linguistic information
Evaluation
• Controlling for
Genetic Relationships
• Train on all out-of-family
data
• [0, 1, 5, 10, 20]% in-family
data
• Observed features in
matrix
• With/without pre-
trained language
embeddings
98
Feature Prediction across Language Families
99
Effect of Pre-training (Semi-supervised Extension)
100
Uncovering Probabilistic Implications in
Typological Knowledge Bases
Johannes Bjerva, Yova Kementchedjhieva,
Ryan Cotterell, Isabelle Augenstein
ACL 2019
108
Linguistic Typology and Greenberg’s Universals
109
VO languages have prepositions
OV languages have postpositions
From Correlations to Probabilistic Implications
110
Visualisation of a section of the induced graphical model.
Observing the features in the left-most nodes (SV, OV, and
Noun-Adjective), can we correctly infer the value of the
right-most node (SVO)?
Computer Science, Johns Hopkins University
er Science and Technology, University of Cambridge
genstein@di.ku.dk, rdc42@cam.ac.uk
rooted in
linguistic
uages with
have post-
ations typi-
manual pro-
d linguists,
stic univer-
present a
sfully iden-
Greenberg
ones, wor-
n. Our ap-
ously used
g baseline
SV
OV
SVO
Noun-
Adjective
Figure 1: Visualisation of a section of our induced
graphical model. Observing the features in the left-
most nodes (SV, OV, and Noun-Adjective), can we cor-
rectly infer the value of the right-most node (SVO)?
one should consider universals to maintain the plau-
sibility of the data (Wang and Eisner, 2016). Com-
Accuracies for feature prediction in a typologically diverse
test set, across number of implicants used
111
ory sizes across languages
and Haspelmath (2013)).
s is a standard technique
we are interested in pre-
res given others. If we
observed features for a
N implicants 2 3 4 5 6
Phonology 0.75 0.82 0.84 0.86 0.89
Morphology 0.77 0.85 0.87 0.70 0.82
Nominal Categories 0.72 0.83 0.80 0.84 0.81
Nominal Syntax 0.77 0.89 0.85 0.89 0.81
Verbal Categories 0.80 0.84 0.80 0.86 0.90
Word Order 0.74 0.86 0.86 0.86 0.93
Clause 0.75 0.81 0.84 0.85 0.84
Complex 0.82 0.83 0.87 0.93 0.84
Lexical 0.83 0.76 0.75 0.85 0.79
Mean 0.77 0.83 0.83 0.85 0.85
Most freq. 0.30
Pairwise 0.77
PRA 0.81
Language embeddings 0.85
Table 1: Accuracies for feature prediction in a typo-
logically diverse test set, across number of implicants
used. Note that the numbers are not comparable across
columns nor to the baseline, since each makes a differ-
ent number of predictions.
Feature Prediction Accuracies
Hand-picked implications. In cases where the same is
covered by Daumé III and Campbell (2007), we borrow
their analysis (marked with *)
112
# Implicant Implicand
1* Postpositions Genitive-Noun (Greenberg #2a)
2* Postpositions OV (Greenberg #4)
3 OV SV
4* Postpositions SV
5* Prepositions VO (Greenberg #4)
6* Prepositions Initial subord. word (Lehmann)
7* Adjective-Noun, Postpositions Demonstrative-Noun
8* Genitive-Noun, Adjective-Noun OV
9 SV
OV
Noun-Adjective SOV
10 Degree word-Adjective
VO and Noun–Relative Clause
SVO Numeral-Noun
11 SOV
OV and Relative Clause–Noun
Adjective-Degree word Noun-Numeral
Table 2: Hand-picked implications. In cases where the
same is covered by Daum´e III and Campbell (2007),
we borrow their analysis (marked with *).
two-tailed t-test.2 After adjusting for multiple tests
power goes to the Degree
conditioned on this featur
der holds in 79% of the
combination of all three
hand, results in a subset
Numeral-Noun order. T
can thus be implied with
dence from a combination
5 Related Work
Typological implication
possible languages, base
served languages, as reco
guists (Greenberg, 1963; L
1983). While work in this
ual, typological knowled
(Dryer and Haspelmath,
Levin, 2016), which allow
Probabilistic Implications Found
Wrap-Up
113
Conclusions: This Talk
114
Part 1: Language Representations
- Improve performance for multilingual
sharing (de Lhoneaux et al. 2018)
- Encode typological properties
- task-specific fine-tuned ones even
more so than ones only obtained using
language modelling (Bjerva &
Augenstein 2018a,b)
- Can be used to reconstruct phylogenetic
trees (Rabinovich et al. 2017, Östling &
Tiedemann 2017)
- … but actually mostly represent
structural similarities between
languages (Bjerva et al., 2019a)
Conclusions: This Talk
115
Part 2: Typological
Knowledge Bases
- Can be populated automatically
with high accuracy using KBP
methods (Bjerva et al. 2019b)
- Language embeddings further
improve performance
- Can be used to discover
probabilistic implications
- With one or multiple implicants
- Including Greenberg universals
Future work
116
• Improve
multilingual
modelling
• E.g., share
morphologically
relevant
parameters for
morphologically
similar languages
• Use typological
knowledge bases
for multilingual
modelling
Presented Papers
Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard.
Parameter sharing between dependency parsers for related languages. EMNLP
2018.
Johannes Bjerva, Isabelle Augenstein. Tracking Typological Traits of Uralic
Languages in Distributed Language Representations. Fourth International
Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2018).
Johannes Bjerva, Isabelle Augenstein. Unsupervised Linguistic Typology at
Different Levels with Language Embeddings. NAACL HLT 2018.
Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle
Augenstein. What do Language Representations Really Represent?
Computational Linguistics, Vol. 45, No. 2, June 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
A Probabilistic Generative Model of Linguistic Typology. NAACL 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
Uncovering Probabilistic Implications in Typological Knowledge Bases. ACL 2019.
117
Thanks to my collaborators and advisees!
Johannes Bjerva, Ryan Cotterell, Yova Kementchedjhieva, Miryam de
Lhoneux, Robert Östling, Maria Han Veiga, Jörg Tiedemann
118
Thank you!
augenstein@di.ku.dk
@IAugenstein
119

Más contenido relacionado

La actualidad más candente

Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
ClarkTony
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
Seokhwan Kim
 
Presentation劉思竹v4.2 10122608
Presentation劉思竹v4.2 10122608Presentation劉思竹v4.2 10122608
Presentation劉思竹v4.2 10122608
思竹 劉
 

La actualidad más candente (7)

Identification of Translationese: A Machine Learning Approach
Identification of Translationese: A Machine Learning ApproachIdentification of Translationese: A Machine Learning Approach
Identification of Translationese: A Machine Learning Approach
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
Presentation劉思竹v4.2 10122608
Presentation劉思竹v4.2 10122608Presentation劉思竹v4.2 10122608
Presentation劉思竹v4.2 10122608
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 

Similar a What can typological knowledge bases and language representations tell us about linguistic properties?

Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 

Similar a What can typological knowledge bases and language representations tell us about linguistic properties? (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Esa act
Esa actEsa act
Esa act
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural Language
 
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
MultiSeg
MultiSegMultiSeg
MultiSeg
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
ELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languagesELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languages
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 

Más de Isabelle Augenstein

Más de Isabelle Augenstein (20)

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 

Último

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 

Último (20)

Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

What can typological knowledge bases and language representations tell us about linguistic properties?

  • 1. Typ-NLP Workshop 1 August 2019 What can typological knowledge bases and language representations tell us about linguistic properties? Isabelle Augenstein* augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ *Credit for many of the slides: Johannes Bjerva
  • 2. Linguistic Typology 2 ● ‘The systematic study and comparison of language structures’ (Velupillai, 2012) ● Long history (Herder, 1772; von der Gabelentz, 1891; …) ● Computational approaches (Dunn et al., 2011; Wälchli, 2014; Östling, 2015, ...)
  • 3. Why Computational Typology? 3 ● Answer linguistic research questions on large scale ● About relationships between languages ● About relationships between structural features of languages ● Facilitate multilingual learning ○ Cross-lingual transfer ○ Few-shot or zero-shot learning
  • 4. How to Obtain Typological Knowledge? 4 ● Discrete representation of language features in typological knowledge bases ● World Atlas of Language Structures (WALS) ● Continuous representation of language features via language embeddings ● Learned via language modelling
  • 5. Why Computational Typology? 5 ● Answer linguistic research questions on large scale ● Multilingual learning ○ Language representations ○ Cross-lingual transfer ○ Few-shot or zero-shot learning ● This talk: ○ Features in the World Atlas of Language Structures (WALS) ○ Computational Typology via unsupervised modelling of languages in neural networks
  • 6. 6 World Atlas of Language Structures (WALS)
  • 7. 7 World Atlas of Language Structures (WALS)
  • 8. Can language representations be learned from data? Resources that exist for many languages: ● Universal Dependencies (>60 languages) ● UniMorph (>50 languages) ● New Testament translations (>1,000 languages) ● Automated Similarity Judgment Program (>4,500 languages) 8
  • 9. Multilingual NLP and Language Representations ● No explicit representation ○ Multilingual Word Embeddings ● Google’s “Enabling zero-shot learning” NMT trick ○ Language given explicitly in input ● One-hot encodings ○ Languages represented as a sparse vector ● Language Embeddings ○ Languages represented as a distributed vector 9 (Östling and Tiedemann, 2017)
  • 10. Experimental Setup Data ● Pre-trained language embeddings (Östling and Tiedemann, 2017) ○ Trained via Language Modelling on New Testament data ● PoS annotation from Universal Dependencies for ○ Finnish ○ Estonian ○ North Sami ○ Hungarian Task ● Fine-tune language embeddings on PoS tagging ● Investigate how typological properties are encoded in these for four Uralic languages 10
  • 11. Distributed Language Representations 11 • Language Embeddings • Analogous to Word Embeddings • Can be learned in a neural network without supervision
  • 12. Language Embeddings in Deep Neural Networks 12
  • 13. Talk Overview Part 1: Language Embeddings - Do they aid multilingual parameter sharing? - Do they encode typological properties? - What types of similarities between languages do they encode? Part 2: Typological Knowledge Bases - Can they be populated automatically? - Can they be used to discover typological implications? 13
  • 15. Parameter sharing between dependency parsers for related languages Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard EMNLP 2018 15
  • 16. Cross-lingual sharing with language embeddings ● Do language embeddings help to learn soft sharing strategies? ● Use case: transition-based dependency parsing ● Types of parameters: ● Character embeddings ● Word embeddings ● Transition parameters (MLP) ● Ablation with language embedding concatenated with char, word or transition vector 16
  • 17. Cross-lingual sharing with language embeddings 17 Lang Tokens Family Word order ar 208,932 Semitic VSO he 161,685 Semitic SVO et 60,393 Finnic SVO fi 67,258 Finnic SVO hr 109,965 Slavic SVO ru 90,170 Slavic SVO it 113,825 Romance SVO es 154,844 Romance SVO nl 75,796 Germanic No dom. order no 76,622 Germanic SVO Table 1: Dataset characteristics classifier parameters always helps, whereas the usefulness of sharing LSTM parameters depends transition is used to gene pendency trees (Nivre, 200 For an input sentence o w1, . . . , wn, the parser cre tors x1:n, where the vecto the concatenation of a wor nal state of the character-b cessing the characters of w ch(wi) is obtained by run LSTM over the character of wi. Each input elemen word-level, bi-directional L BILSTM(x1:n, i). For each ture extractor concatenates tions of core elements fro Both the embeddings and together with the model. A configuration c is re
  • 18. Cross-lingual sharing with language embeddings 19 78 78,2 78,4 78,6 78,8 79 79,2 79,4 79,6 79,8 80 AVG Mono Lang-Best Best All Soft • Mono: single-task baseline • Lang-best: best sharing strategy for each language • Best: best sharing strategy across languages (char not shared, word shared, transition shared with language embedding) • All: all parameters shared • Soft: sharing learned using language embeddings
  • 19. Related vs. Unrelated Languages 22 78,00 78,20 78,40 78,60 78,80 79,00 79,20 79,40 79,60 79,80 80,00 AVG Mono Lang-Best Best All Soft • Mono: single-task baseline • Lang-best: best sharing strategy for each language • Best: best sharing strategy across languages (char not shared, word shared, transition shared with language embedding) • All: all parameters shared • Soft: sharing learned using language embeddings
  • 20. Tracking Typological Traits of Uralic Languages in Distributed Language Representations Johannes Bjerva, Isabelle Augenstein IWCLUL 2018 24
  • 21. Language Embeddings in Deep Neural Networks 25 1. Do language embeddings aid multilingual modelling? 2. Do language embeddings contain typological information?
  • 22. Model performance (Monolingual PoS tagging) 26 • Compared to most frequent class baseline (black line) • Model transfer between Finnic languages relatively successful • Little effect from language embeddings (to be expected)
  • 23. Model performance (Multilingual PoS tagging) 27 • Compared to monolingual baseline (black line) • Model transfer between Finnic languages outperforms monolingual baseline • Language embeddings improve multilingual modelling
  • 24. Tracking Typological Traits (full language sample) 28 • Baseline: Most frequent typological class in sample • Language embeddings saved at each training epoch • Separate Logistic Regression classifier trained for each feature and epoch • Input: Language embedding • Output: Typological class • Typological features encoded in language embeddings change during training
  • 25. Tracking Typological Traits (Uralic languages held out) 29 • Some typological features can be predicted with high accuracy for the unseen Uralic languages.
  • 26. Cross-lingual sharing with language embeddings: Summary ● Conclusions ● Sharing high-level features more useful than low-level features ● If languages are unrelated, sharing low-level features hurts performance ● Language embeddings help ● Only tested for (selected) language pairs ● Sharing for more languages ● Only tested for selected tasks (parsing, PoS tagging) ● Language embeddings pre-trained or trained end-to-end ● Soft or hard sharing based on typological KBs? 30
  • 27. From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings Johannes Bjerva, Isabelle Augenstein NAACL HLT 2018 31
  • 28. Language Embeddings in Deep Neural Networks 32 Do language embeddings contain typological information? - Predict typological features - Study unsupervised vs. fine-tuned embeddings
  • 29. Research Questions ● RQ 1: Which typological properties are encoded in task- specific distributed language representations, and can we predict phonological, morphological and syntactic properties of languages using such representations? ● RQ 2: To what extent do the encoded properties change as the representations are fine-tuned for tasks at different linguistic levels? ● RQ 3: How are language similarities encoded in fine-tuned language embeddings? 33
  • 30. Phonological Features 34 ● 20 features ● E.g. descriptions of the consonant and vowel inventories, presence of tone and stress markers
  • 31. Morphological Features 35 ● 41 features ● Features from morphological and nominal chapter ● E.g. number of genders, usage of definite and indefinite articles and reduplication
  • 32. Word Order Features 36 ● 56 features ● Encode ordering of subjects, objects and verbs
  • 33. Experimental Setup 37 Data ● Pre-trained language embeddings (Östling and Tiedemann, 2017) ● Task-specific datasets: grapheme-to-phoneme (G2P), phonological reconstruction (ASJP), morphological inflection (SIGMORPHON), part-of-speech tagging (UD) Dataset Class Ltask Ltask Lpre G2P Phonology 311 102 ASJP Phonology 4664 824 SIGMORPHON Morphology 52 29 UD Syntax 50 27
  • 34. Experimental Setup 38 Method ● Fine-tune language embeddings on grapheme-to-phoneme (G2P), phonological reconstruction (ASJP), morphological inflection (SIGMORPHON), part-of-speech tagging (UD) ○ train supervised seq2seq models on G2P, ASJP, SIGMORPHON tasks and ○ Train seq labelling model on UD task ● Predict typological properties with kNN model
  • 36. Part-of-Speech Tagging (UD) 47 - Improvements for all experimental settings -> Pre-trained and fine-tuned language embeddings encode features relevant to word order System/Features Random lang/feat pairs from word order features Random lang/feat pairs from all features Most frequent class 67.81% 82.93% k-NN (pre-trained) 76.66% 82.69% k-NN (fine-tuned) *80.81% 83.55%
  • 37. Conclusions 50 - Language embeddings can encode typological features - Works for morphological inflection and PoS tagging - Does not work for phonological tasks - We can predict typological features for unseen language families with high accuracies - G2P task: phonological differences between otherwise similar languages (e.g. Norwegian Bokmål and Danish) are accurately encoded
  • 38. What do Language Representations Really Represent? Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein Computational Linguistics 2019 51
  • 39. Language Representations encode Language Similarities 52 • Similar languages – similar representations • ...similar how? • Can reconstruct language family trees (Rabinovich et al. 2017)! • So... Language family (genetic) similarity?
  • 40. What Do Language Representations Really Represent? 53 presentations encapsulate is further hinted at f language vectors in Östling and Tiedemann models (Johnson et al. 2017). Structural distance? Family distance?
 Geographical distance? {en fr es pt de nl Figure 1 Language representations in a two-dimensional space. What do their similarities represent? prelimi- anguage n (2017), Johnson odelling ng mul- p on the er (2017) consist- find that space is However, ng a cor- Language embeddings in a two-dimensional space. What do their similarities represent?
  • 41. Language Representations from Monolingual Texts 54 • Input: official translations from EU languages to English (EuroParl) • Train multilingual LM on various levels of abstraction • Evaluate resulting language representations Bjerva et al. What do Language Representations Really Represent? Czech source Swedish source Official translation … … Multilingual language model Multilingual language model Multilingual language model CS For example , in my country , the Czech Republic English translation CS ADP NOUN PUNCT ADP ADJ NOUN PUNCT DET PROPN PROPN POS CS prep pobj punct prep poss pobj punct det compound nsubj DepRel SE In Stockholm , we must make comparisons and learn English translation SE ADP PROPN PUNCT PRON VERB VERB NOUN CCONJ VERB POS SE prep pobj punct nsubj aux ROOT dobj cc conj DepRel 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 (2017) who investigate representation learning on monolingual English sentences, which are translations from various source languages to English from the Europarl corpus (Koehn, 2005). They employ a feature-engineering approach to predict source languages and learn an Indo-European (IE) family tree using their language representations. Crucially, they posit that the relationships found between their representations encode the genetic relationships between languages. They use features based on sequences of POS tags, function words and cohesive markers. We significantly expand on this work by comparing three language similarity measures (§4). By doing this, we offer a stronger explanation of what language representations really represent. 3 Method Figure 2 illustrates the data and problem we consider in this paper. We are given a set of English gold- standard translations from the official languages of the European Union, based on speeches from the European Parliament.1 We wish to learn language representations based on this data, and investigate the linguistic relationships which hold between the result- ing representations (RQ2). For this to make sense, it is important to abstract away from the surface forms of the translations as, e.g., speakers from certain regions will tend to talk about the same issues. We therefore introduce several levels of abstraction: i) training on of POS tags. Our model is similar to ¨Ostling and Tiedemann (2017), who train a character-based multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). 4.1 Genetic Distance 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 us source languages to English from l corpus (Koehn, 2005). They employ ngineering approach to predict source nd learn an Indo-European (IE) family heir language representations. Crucially, hat the relationships found between their ons encode the genetic relationships nguages. They use features based on f POS tags, function words and cohesive We significantly expand on this work by three language similarity measures (§4). s, we offer a stronger explanation of what presentations really represent. d strates the data and problem we consider r. We are given a set of English gold- nslations from the official languages of an Union, based on speeches from the arliament.1 We wish to learn language ons based on this data, and investigate the ationships which hold between the result- tations (RQ2). For this to make sense, it to abstract away from the surface forms of ons as, e.g., speakers from certain regions talk about the same issues. We therefore multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 most closely related to Rabinovich et al. nvestigate representation learning on English sentences, which are translations source languages to English from corpus (Koehn, 2005). They employ ineering approach to predict source d learn an Indo-European (IE) family ir language representations. Crucially, t the relationships found between their ns encode the genetic relationships uages. They use features based on POS tags, function words and cohesive significantly expand on this work by ree language similarity measures (§4). we offer a stronger explanation of what esentations really represent. rates the data and problem we consider We are given a set of English gold- slations from the official languages of Union, based on speeches from the liament.1 We wish to learn language s based on this data, and investigate the ionships which hold between the result- tions (RQ2). For this to make sense, it abstract away from the surface forms of s as, e.g., speakers from certain regions the input sequences themselves are, e.g., sequences of POS tags. Our model is similar to ¨Ostling and Tiedemann (2017), who train a character-based multilingual language model using a 2-layer LSTM, with the modification that each time-step includes a representation of the language at hand. That is to say, each input to their LSTM is represented both by a character representation, c, and a language representation, l2L. Since the set of language repre- sentations L is updated during training, the resulting representations encode linguistic properties of the languages. Whereas ¨Ostling and Tiedemann (2017) model hundreds of languages, we model only English - however, we redefine L to be the set of source languages from which our translations originate. LPOS Lraw LDepRel 4 Comparing Languages We compare the resulting language embeddings to three different types of language distance measures: genetic distance estimated by methods from histor- ical linguistics, geographical distance of speaker communities, and a novel measure for the structural distances between languages. As previously stated, our goal with this is to investigate whether it really is the genetic distances between languages which are captured by language representations, or if other distance measures provide more explanation (RQ2). Figure 2 Problem illustration. Given official translations from EU languages to English, we train multilingual language models on various levels of abstractions, encoding the source languages. The resulting source language representations (Lraw etc.) are evaluated. languages, having an incorrect view of the structure of the language representation space can be dangerous. For instance, the standard assumption of genetic similarity would imply that the representation of the Gagauz language (Turkic, spoken mainly in Moldova) should be interpolated from the genetically very close Turkish, but this would likely lead to poor performance in syntactic tasks since the two languages have
  • 42. Tree Distance Evaluation 55 • Hierarchical clustering of language embeddings • Compare resulting trees with gold phylogenetic trees • Hierarchical clustering of cosine distances (Rabinovich et al. 2017) nd Petroni (2008), using the distance metric from Rabinovich, Ordan, 017).3 Our generated trees yield comparable results to previous work Condition Mean St.d. Raw text (LM-Raw) 0.527 - Function words and POS (LM-Func) 0.556 - Only POS (LM-POS) 0.517 - Phrase-structure (LM-Phrase) 0.361 - Dependency Relations (LM-Deprel) 0.321 - POS trigrams (ROW17) 0.353 0.06 Random (ROW17) 0.724 0.07 Table 1 Tree distance evaluation (lower is better, cf. §5.1). ling using lexi- and POS tags. ments deal with y on the raw This is likely to tions by speak- erent countries pecific issues or Figure 2), and l comparatively n to work with explicit syntac- available. As a the lack of explicit syntactic information, it is unsurprising that the w in Table 1) only marginally outperform the random baseline. away from the content and negate the geographical effect we train on only function words and POS. This performs almost on par with Func in Table 1), indicating that the level of abstraction reached is not pture similarities between languages. We next investigate whether we y abstract away from the content by removing function words, and only
  • 43. 56 Distance Measures • Family distance (following Rabinovich et al. 2017) • Geographic distance • Using Glottlog geocoordinates (Hammarström et al. 2017) • Structural distance vs
  • 44. • Language embedding similarities most strongly correlate with structural similarities • Less strong correlation with genetic similarities, even though phylogenetic trees can be faithfully reconstructed (Rabinovich et al. 2017) 57 Figure 4 Correlations between similarities (Genetic, Geo and Struct.) and language representations (Raw Func, POS, Phrase, Deprel). Significance at p < 0.001 is indicated by *. reconstruct s in a sim- work, we genetic re- ages really esentations matrices A⇢, resents the th and jth similarity hen, the en- ise genetic mming the Analysis of Similarities
  • 46. A Probabilistic Generative Model of Linguistic Typology Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein NAACL 2019 59
  • 48. Languages are different 62 eat an apple ringo-wo taberu
  • 49. Differences between languages are often correlated 63 eat an apple ringo-wo taberu go to NAACL
  • 50. Differences are often correlated 64 eat an apple ringo-wo taberu go to NAACL
  • 51. Differences are often correlated 65 eat an apple go to NAACL NAACL -ni iku NAACL ringo-wo taberu
  • 52. Linguistic Typology and Greenberg’s Universals 66 VO languages have prepositions OV languages have postpositions
  • 53. Correlations between Word Order and Adpositions 67
  • 54. Contributions • Greenberg’s universals are binary – However, correlations are rarely 100%, so we implement a probabilisation of typology • Framed as typological collaborative filtering, exploiting correlations between languages and features • We exploit raw linguistic data by adding a semi-supervised extension 68
  • 55. World Atlas of Language Structure (WALS) 70 Ø 2,500 languages Ø 192 features Feature 81A – Order of Subject, Object and Verb
  • 56. WALS is Sparse and Skewed 71 Ø Sparse: Most languages are covered by only a handful of features Ø Skewed: A few features have much wider coverage than others
  • 57. World Atlas of Language Structure (WALS) 72 Ø 2,500 languages 800 languages Ø 192 features 160 features Feature 81A – Order of Subject, Object and Verb
  • 58. Probabilistic Typology as Matrix Factorisation 73 Users Movies
  • 59. Probabilistic Typology as Matrix Factorisation 74 Users Users Movies Ratings
  • 60. Probabilistic Typology as Matrix Factorisation 75 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users Movies
  • 61. Probabilistic Typology as Matrix Factorisation 76 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  • 62. Matrix Completion: Collaborative Filtering 77 [1,-4,3] [-5,2,1] -10 Dot Product
  • 63. Probabilistic Typology as Matrix Factorisation 78 Users -37 29 19 29 -36 67 77 22 -24 74 12 -79-52 -39 -20 -5 -9 29 14 -3 Users [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  • 64. Probabilistic Typology as Matrix Factorisation 79 Users -37 29 19 29 -36 67 77 22 -24 61 74 12 -79 -41-52 -39 -20 -5 70 -9 29 -21 14 -3 Users [ 4 1 -5] [ 7 -2 0] [ 6 -2 3] [-9 1 4] -6 -3 2 [ 9 -2 1 [ 9 -7 2 [ [ Movies 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ 4 3 -2 [ [ [ [ [ [ [
  • 65. Probabilistic Typology as Matrix Factorisation 80 Languages Typological Features
  • 66. Probabilistic Typology as Matrix Factorisation 84 Typological Features
  • 67. Probabilistic Typology as Matrix Factorisation 86
  • 68. Probabilistic Typology as Matrix Factorisation 87
  • 70. Probabilistic Typology as Matrix Factorisation 89 4 0.9 Dot Product Sigmoid
  • 72. Representing Languages – Semi-supervised Extension 95 k a t <s> k a
  • 73. Representing Languages – Semi-supervised Extension 96 k a t <s> k a
  • 74. Semi-supervised Extension - Interpretability 97 Typological Feature Prediction Multilingual Language Modelling Compressing linguistic information
  • 75. Evaluation • Controlling for Genetic Relationships • Train on all out-of-family data • [0, 1, 5, 10, 20]% in-family data • Observed features in matrix • With/without pre- trained language embeddings 98
  • 76. Feature Prediction across Language Families 99
  • 77. Effect of Pre-training (Semi-supervised Extension) 100
  • 78. Uncovering Probabilistic Implications in Typological Knowledge Bases Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein ACL 2019 108
  • 79. Linguistic Typology and Greenberg’s Universals 109 VO languages have prepositions OV languages have postpositions
  • 80. From Correlations to Probabilistic Implications 110 Visualisation of a section of the induced graphical model. Observing the features in the left-most nodes (SV, OV, and Noun-Adjective), can we correctly infer the value of the right-most node (SVO)? Computer Science, Johns Hopkins University er Science and Technology, University of Cambridge genstein@di.ku.dk, rdc42@cam.ac.uk rooted in linguistic uages with have post- ations typi- manual pro- d linguists, stic univer- present a sfully iden- Greenberg ones, wor- n. Our ap- ously used g baseline SV OV SVO Noun- Adjective Figure 1: Visualisation of a section of our induced graphical model. Observing the features in the left- most nodes (SV, OV, and Noun-Adjective), can we cor- rectly infer the value of the right-most node (SVO)? one should consider universals to maintain the plau- sibility of the data (Wang and Eisner, 2016). Com-
  • 81. Accuracies for feature prediction in a typologically diverse test set, across number of implicants used 111 ory sizes across languages and Haspelmath (2013)). s is a standard technique we are interested in pre- res given others. If we observed features for a N implicants 2 3 4 5 6 Phonology 0.75 0.82 0.84 0.86 0.89 Morphology 0.77 0.85 0.87 0.70 0.82 Nominal Categories 0.72 0.83 0.80 0.84 0.81 Nominal Syntax 0.77 0.89 0.85 0.89 0.81 Verbal Categories 0.80 0.84 0.80 0.86 0.90 Word Order 0.74 0.86 0.86 0.86 0.93 Clause 0.75 0.81 0.84 0.85 0.84 Complex 0.82 0.83 0.87 0.93 0.84 Lexical 0.83 0.76 0.75 0.85 0.79 Mean 0.77 0.83 0.83 0.85 0.85 Most freq. 0.30 Pairwise 0.77 PRA 0.81 Language embeddings 0.85 Table 1: Accuracies for feature prediction in a typo- logically diverse test set, across number of implicants used. Note that the numbers are not comparable across columns nor to the baseline, since each makes a differ- ent number of predictions. Feature Prediction Accuracies
  • 82. Hand-picked implications. In cases where the same is covered by Daumé III and Campbell (2007), we borrow their analysis (marked with *) 112 # Implicant Implicand 1* Postpositions Genitive-Noun (Greenberg #2a) 2* Postpositions OV (Greenberg #4) 3 OV SV 4* Postpositions SV 5* Prepositions VO (Greenberg #4) 6* Prepositions Initial subord. word (Lehmann) 7* Adjective-Noun, Postpositions Demonstrative-Noun 8* Genitive-Noun, Adjective-Noun OV 9 SV OV Noun-Adjective SOV 10 Degree word-Adjective VO and Noun–Relative Clause SVO Numeral-Noun 11 SOV OV and Relative Clause–Noun Adjective-Degree word Noun-Numeral Table 2: Hand-picked implications. In cases where the same is covered by Daum´e III and Campbell (2007), we borrow their analysis (marked with *). two-tailed t-test.2 After adjusting for multiple tests power goes to the Degree conditioned on this featur der holds in 79% of the combination of all three hand, results in a subset Numeral-Noun order. T can thus be implied with dence from a combination 5 Related Work Typological implication possible languages, base served languages, as reco guists (Greenberg, 1963; L 1983). While work in this ual, typological knowled (Dryer and Haspelmath, Levin, 2016), which allow Probabilistic Implications Found
  • 84. Conclusions: This Talk 114 Part 1: Language Representations - Improve performance for multilingual sharing (de Lhoneaux et al. 2018) - Encode typological properties - task-specific fine-tuned ones even more so than ones only obtained using language modelling (Bjerva & Augenstein 2018a,b) - Can be used to reconstruct phylogenetic trees (Rabinovich et al. 2017, Östling & Tiedemann 2017) - … but actually mostly represent structural similarities between languages (Bjerva et al., 2019a)
  • 85. Conclusions: This Talk 115 Part 2: Typological Knowledge Bases - Can be populated automatically with high accuracy using KBP methods (Bjerva et al. 2019b) - Language embeddings further improve performance - Can be used to discover probabilistic implications - With one or multiple implicants - Including Greenberg universals
  • 86. Future work 116 • Improve multilingual modelling • E.g., share morphologically relevant parameters for morphologically similar languages • Use typological knowledge bases for multilingual modelling
  • 87. Presented Papers Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard. Parameter sharing between dependency parsers for related languages. EMNLP 2018. Johannes Bjerva, Isabelle Augenstein. Tracking Typological Traits of Uralic Languages in Distributed Language Representations. Fourth International Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2018). Johannes Bjerva, Isabelle Augenstein. Unsupervised Linguistic Typology at Different Levels with Language Embeddings. NAACL HLT 2018. Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein. What do Language Representations Really Represent? Computational Linguistics, Vol. 45, No. 2, June 2019. Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein. A Probabilistic Generative Model of Linguistic Typology. NAACL 2019. Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein. Uncovering Probabilistic Implications in Typological Knowledge Bases. ACL 2019. 117
  • 88. Thanks to my collaborators and advisees! Johannes Bjerva, Ryan Cotterell, Yova Kementchedjhieva, Miryam de Lhoneux, Robert Östling, Maria Han Veiga, Jörg Tiedemann 118