One of the core challenges in typology is to record properties of languages in a structured way. As a result of manual efforts, typological knowledge bases have emerged, which contains information about languages’ phonological, morphological and syntactic properties; as well as information about language families. Ideally, such typological knowledge bases would provide useful information for multilingual NLP models to learn how to selectively share parameters.
A related area of research suggests a different way of encoding properties of languages, namely to learn language representation vectors directly from text documents.
In this talk, I will analyse and contrast these two ways of encoding linguistic properties, as well as present research on how the two can benefit one another.
What can typological knowledge bases and language representations tell us about linguistic properties?
1. Typ-NLP Workshop
1 August 2019
What can typological
knowledge bases and
language representations
tell us about linguistic
properties?
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*Credit for many of the slides: Johannes Bjerva
2. Linguistic Typology
2
● ‘The systematic study and comparison of language
structures’ (Velupillai, 2012)
● Long history (Herder, 1772; von der Gabelentz, 1891; …)
● Computational approaches (Dunn et al., 2011; Wälchli,
2014; Östling, 2015, ...)
3. Why Computational Typology?
3
● Answer linguistic research questions on large scale
● About relationships between languages
● About relationships between structural features of languages
● Facilitate multilingual learning
○ Cross-lingual transfer
○ Few-shot or zero-shot learning
4. How to Obtain Typological Knowledge?
4
● Discrete representation of language features in typological
knowledge bases
● World Atlas of Language Structures (WALS)
● Continuous representation of language features via
language embeddings
● Learned via language modelling
5. Why Computational Typology?
5
● Answer linguistic research questions on large scale
● Multilingual learning
○ Language representations
○ Cross-lingual transfer
○ Few-shot or zero-shot learning
● This talk:
○ Features in the World Atlas of Language Structures (WALS)
○ Computational Typology via unsupervised modelling of languages
in neural networks
8. Can language representations be learned from data?
Resources that exist for many languages:
● Universal Dependencies (>60 languages)
● UniMorph (>50 languages)
● New Testament translations (>1,000 languages)
● Automated Similarity Judgment Program (>4,500
languages)
8
9. Multilingual NLP and Language Representations
● No explicit representation
○ Multilingual Word Embeddings
● Google’s “Enabling zero-shot
learning” NMT trick
○ Language given explicitly in
input
● One-hot encodings
○ Languages represented as a
sparse vector
● Language Embeddings
○ Languages represented as a
distributed vector
9
(Östling and Tiedemann, 2017)
10. Experimental Setup
Data
● Pre-trained language embeddings (Östling and Tiedemann, 2017)
○ Trained via Language Modelling on New Testament data
● PoS annotation from Universal Dependencies for
○ Finnish
○ Estonian
○ North Sami
○ Hungarian
Task
● Fine-tune language embeddings on PoS tagging
● Investigate how typological properties are encoded in these for four
Uralic languages
10
13. Talk Overview
Part 1: Language Embeddings
- Do they aid multilingual parameter sharing?
- Do they encode typological properties?
- What types of similarities between languages do they encode?
Part 2: Typological Knowledge Bases
- Can they be populated automatically?
- Can they be used to discover typological implications?
13
15. Parameter sharing between dependency
parsers for related languages
Miryam de Lhoneux, Johannes Bjerva,
Isabelle Augenstein, Anders Søgaard
EMNLP 2018
15
16. Cross-lingual sharing with language embeddings
● Do language embeddings help to learn soft sharing
strategies?
● Use case: transition-based dependency parsing
● Types of parameters:
● Character embeddings
● Word embeddings
● Transition parameters (MLP)
● Ablation with language embedding concatenated with
char, word or transition vector
16
17. Cross-lingual sharing with language embeddings
17
Lang Tokens Family Word order
ar 208,932 Semitic VSO
he 161,685 Semitic SVO
et 60,393 Finnic SVO
fi 67,258 Finnic SVO
hr 109,965 Slavic SVO
ru 90,170 Slavic SVO
it 113,825 Romance SVO
es 154,844 Romance SVO
nl 75,796 Germanic No dom. order
no 76,622 Germanic SVO
Table 1: Dataset characteristics
classifier parameters always helps, whereas the
usefulness of sharing LSTM parameters depends
transition is used to gene
pendency trees (Nivre, 200
For an input sentence o
w1, . . . , wn, the parser cre
tors x1:n, where the vecto
the concatenation of a wor
nal state of the character-b
cessing the characters of w
ch(wi) is obtained by run
LSTM over the character
of wi. Each input elemen
word-level, bi-directional L
BILSTM(x1:n, i). For each
ture extractor concatenates
tions of core elements fro
Both the embeddings and
together with the model.
A configuration c is re
18. Cross-lingual sharing with language embeddings
19
78
78,2
78,4
78,6
78,8
79
79,2
79,4
79,6
79,8
80
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings
19. Related vs. Unrelated Languages
22
78,00
78,20
78,40
78,60
78,80
79,00
79,20
79,40
79,60
79,80
80,00
AVG
Mono Lang-Best Best All Soft
• Mono: single-task baseline
• Lang-best: best sharing strategy for each language
• Best: best sharing strategy across languages (char not shared,
word shared, transition shared with language embedding)
• All: all parameters shared
• Soft: sharing learned using language embeddings
20. Tracking Typological Traits of Uralic
Languages in Distributed Language
Representations
Johannes Bjerva, Isabelle Augenstein
IWCLUL 2018
24
21. Language Embeddings in Deep Neural Networks
25
1. Do language
embeddings aid
multilingual modelling?
2. Do language
embeddings contain
typological
information?
22. Model performance (Monolingual PoS tagging)
26
• Compared to most
frequent class
baseline (black line)
• Model transfer
between Finnic
languages relatively
successful
• Little effect from
language
embeddings (to be
expected)
23. Model performance (Multilingual PoS tagging)
27
• Compared to
monolingual baseline
(black line)
• Model transfer
between Finnic
languages
outperforms
monolingual baseline
• Language
embeddings improve
multilingual modelling
24. Tracking Typological Traits (full language sample)
28
• Baseline: Most frequent
typological class in sample
• Language embeddings saved
at each training epoch
• Separate Logistic Regression
classifier trained for each
feature and epoch
• Input: Language
embedding
• Output: Typological class
• Typological features encoded
in language embeddings
change during training
25. Tracking Typological Traits (Uralic languages held out)
29
• Some typological
features can be
predicted with high
accuracy for the
unseen Uralic
languages.
26. Cross-lingual sharing with language embeddings: Summary
● Conclusions
● Sharing high-level features more useful than low-level features
● If languages are unrelated, sharing low-level features hurts
performance
● Language embeddings help
● Only tested for (selected) language pairs
● Sharing for more languages
● Only tested for selected tasks (parsing, PoS tagging)
● Language embeddings pre-trained or trained end-to-end
● Soft or hard sharing based on typological KBs?
30
27. From Phonology to Syntax: Unsupervised
Linguistic Typology at Different Levels with
Language Embeddings
Johannes Bjerva, Isabelle Augenstein
NAACL HLT 2018
31
28. Language Embeddings in Deep Neural Networks
32
Do language
embeddings contain
typological information?
- Predict typological
features
- Study unsupervised vs.
fine-tuned embeddings
29. Research Questions
● RQ 1: Which typological properties are encoded in task-
specific distributed language representations, and can we
predict phonological, morphological and syntactic
properties of languages using such representations?
● RQ 2: To what extent do the encoded properties change as
the representations are fine-tuned for tasks at different
linguistic levels?
● RQ 3: How are language similarities encoded in fine-tuned
language embeddings?
33
30. Phonological Features
34
● 20 features
● E.g. descriptions of the
consonant and vowel
inventories, presence of tone
and stress markers
31. Morphological Features
35
● 41 features
● Features from morphological
and nominal chapter
● E.g. number of genders, usage
of definite and indefinite articles
and reduplication
36. Part-of-Speech Tagging (UD)
47
- Improvements for all experimental settings
-> Pre-trained and fine-tuned language embeddings encode
features relevant to word order
System/Features Random lang/feat
pairs from word
order features
Random
lang/feat pairs
from all features
Most frequent class 67.81% 82.93%
k-NN (pre-trained) 76.66% 82.69%
k-NN (fine-tuned) *80.81% 83.55%
37. Conclusions
50
- Language embeddings can encode typological features
- Works for morphological inflection and PoS tagging
- Does not work for phonological tasks
- We can predict typological features for unseen language families
with high accuracies
- G2P task: phonological differences between otherwise similar
languages (e.g. Norwegian Bokmål and Danish) are accurately
encoded
38. What do Language Representations Really
Represent?
Johannes Bjerva, Robert Östling, Maria Han
Veiga, Jörg Tiedemann, Isabelle Augenstein
Computational Linguistics 2019
51
39. Language Representations encode Language Similarities
52
• Similar languages – similar representations
• ...similar how?
• Can reconstruct language family trees (Rabinovich et al. 2017)!
• So... Language family (genetic) similarity?
40. What Do Language Representations Really Represent?
53
presentations encapsulate is further hinted at
f language vectors in Östling and Tiedemann
models (Johnson et al. 2017).
Structural distance?
Family distance?
Geographical distance?
{en
fr
es
pt
de
nl
Figure 1
Language representations in a
two-dimensional space. What do
their similarities represent?
prelimi-
anguage
n (2017),
Johnson
odelling
ng mul-
p on the
er (2017)
consist-
find that
space is
However,
ng a cor-
Language embeddings in a two-dimensional space.
What do their similarities represent?
41. Language Representations from Monolingual Texts
54
• Input: official translations from EU languages to English (EuroParl)
• Train multilingual LM on various levels of abstraction
• Evaluate resulting language representations
Bjerva et al. What do Language Representations Really Represent?
Czech source
Swedish source
Official
translation
… …
Multilingual language model
Multilingual language model
Multilingual language model
CS For example , in my country , the Czech Republic English translation
CS ADP NOUN PUNCT ADP ADJ NOUN PUNCT DET PROPN PROPN POS
CS prep pobj punct prep poss pobj punct det compound nsubj DepRel
SE In Stockholm , we must make comparisons and learn English translation
SE ADP PROPN PUNCT PRON VERB VERB NOUN CCONJ VERB POS
SE prep pobj punct nsubj aux ROOT dobj cc conj DepRel
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
(2017) who investigate representation learning on
monolingual English sentences, which are translations
from various source languages to English from
the Europarl corpus (Koehn, 2005). They employ
a feature-engineering approach to predict source
languages and learn an Indo-European (IE) family
tree using their language representations. Crucially,
they posit that the relationships found between their
representations encode the genetic relationships
between languages. They use features based on
sequences of POS tags, function words and cohesive
markers. We significantly expand on this work by
comparing three language similarity measures (§4).
By doing this, we offer a stronger explanation of what
language representations really represent.
3 Method
Figure 2 illustrates the data and problem we consider
in this paper. We are given a set of English gold-
standard translations from the official languages of
the European Union, based on speeches from the
European Parliament.1 We wish to learn language
representations based on this data, and investigate the
linguistic relationships which hold between the result-
ing representations (RQ2). For this to make sense, it
is important to abstract away from the surface forms of
the translations as, e.g., speakers from certain regions
will tend to talk about the same issues. We therefore
introduce several levels of abstraction: i) training on
of POS tags. Our model is similar to ¨Ostling and
Tiedemann (2017), who train a character-based
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
4.1 Genetic Distance
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
us source languages to English from
l corpus (Koehn, 2005). They employ
ngineering approach to predict source
nd learn an Indo-European (IE) family
heir language representations. Crucially,
hat the relationships found between their
ons encode the genetic relationships
nguages. They use features based on
f POS tags, function words and cohesive
We significantly expand on this work by
three language similarity measures (§4).
s, we offer a stronger explanation of what
presentations really represent.
d
strates the data and problem we consider
r. We are given a set of English gold-
nslations from the official languages of
an Union, based on speeches from the
arliament.1 We wish to learn language
ons based on this data, and investigate the
ationships which hold between the result-
tations (RQ2). For this to make sense, it
to abstract away from the surface forms of
ons as, e.g., speakers from certain regions
talk about the same issues. We therefore
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
most closely related to Rabinovich et al.
nvestigate representation learning on
English sentences, which are translations
source languages to English from
corpus (Koehn, 2005). They employ
ineering approach to predict source
d learn an Indo-European (IE) family
ir language representations. Crucially,
t the relationships found between their
ns encode the genetic relationships
uages. They use features based on
POS tags, function words and cohesive
significantly expand on this work by
ree language similarity measures (§4).
we offer a stronger explanation of what
esentations really represent.
rates the data and problem we consider
We are given a set of English gold-
slations from the official languages of
Union, based on speeches from the
liament.1 We wish to learn language
s based on this data, and investigate the
ionships which hold between the result-
tions (RQ2). For this to make sense, it
abstract away from the surface forms of
s as, e.g., speakers from certain regions
the input sequences themselves are, e.g., sequences
of POS tags. Our model is similar to ¨Ostling and
Tiedemann (2017), who train a character-based
multilingual language model using a 2-layer LSTM,
with the modification that each time-step includes
a representation of the language at hand. That is to
say, each input to their LSTM is represented both
by a character representation, c, and a language
representation, l2L. Since the set of language repre-
sentations L is updated during training, the resulting
representations encode linguistic properties of the
languages. Whereas ¨Ostling and Tiedemann (2017)
model hundreds of languages, we model only English
- however, we redefine L to be the set of source
languages from which our translations originate.
LPOS Lraw LDepRel
4 Comparing Languages
We compare the resulting language embeddings to
three different types of language distance measures:
genetic distance estimated by methods from histor-
ical linguistics, geographical distance of speaker
communities, and a novel measure for the structural
distances between languages. As previously stated,
our goal with this is to investigate whether it really
is the genetic distances between languages which
are captured by language representations, or if other
distance measures provide more explanation (RQ2).
Figure 2
Problem illustration. Given official translations from EU languages to English, we train
multilingual language models on various levels of abstractions, encoding the source languages.
The resulting source language representations (Lraw etc.) are evaluated.
languages, having an incorrect view of the structure of the language representation
space can be dangerous. For instance, the standard assumption of genetic similarity
would imply that the representation of the Gagauz language (Turkic, spoken mainly
in Moldova) should be interpolated from the genetically very close Turkish, but this
would likely lead to poor performance in syntactic tasks since the two languages have
42. Tree Distance Evaluation
55
• Hierarchical clustering of language embeddings
• Compare resulting trees with gold phylogenetic trees
• Hierarchical clustering of cosine distances (Rabinovich et al. 2017)
nd Petroni (2008), using the distance metric from Rabinovich, Ordan,
017).3
Our generated trees yield comparable results to previous work
Condition Mean St.d.
Raw text (LM-Raw) 0.527 -
Function words and POS (LM-Func) 0.556 -
Only POS (LM-POS) 0.517 -
Phrase-structure (LM-Phrase) 0.361 -
Dependency Relations (LM-Deprel) 0.321 -
POS trigrams (ROW17) 0.353 0.06
Random (ROW17) 0.724 0.07
Table 1
Tree distance evaluation (lower is better, cf. §5.1).
ling using lexi-
and POS tags.
ments deal with
y on the raw
This is likely to
tions by speak-
erent countries
pecific issues or
Figure 2), and
l comparatively
n to work with
explicit syntac-
available. As a
the lack of explicit syntactic information, it is unsurprising that the
w in Table 1) only marginally outperform the random baseline.
away from the content and negate the geographical effect we train
on only function words and POS. This performs almost on par with
Func in Table 1), indicating that the level of abstraction reached is not
pture similarities between languages. We next investigate whether we
y abstract away from the content by removing function words, and only
43. 56
Distance Measures
• Family distance (following Rabinovich et al. 2017)
• Geographic distance
• Using Glottlog geocoordinates (Hammarström et al. 2017)
• Structural distance
vs
44. • Language embedding similarities most strongly correlate with
structural similarities
• Less strong correlation with genetic similarities, even though
phylogenetic trees can be faithfully reconstructed (Rabinovich et al.
2017)
57
Figure 4
Correlations between similarities (Genetic, Geo
and Struct.) and language representations (Raw
Func, POS, Phrase, Deprel). Significance at
p < 0.001 is indicated by *.
reconstruct
s in a sim-
work, we
genetic re-
ages really
esentations
matrices A⇢,
resents the
th
and jth
similarity
hen, the en-
ise genetic
mming the
Analysis of Similarities
54. Contributions
• Greenberg’s universals are binary – However, correlations are
rarely 100%, so we implement a probabilisation of typology
• Framed as typological collaborative filtering,
exploiting correlations between languages and features
• We exploit raw linguistic data by adding a semi-supervised
extension
68
55. World Atlas of Language Structure (WALS)
70
Ø 2,500 languages
Ø 192 features
Feature 81A – Order of Subject, Object and Verb
56. WALS is Sparse and Skewed
71
Ø Sparse:
Most languages are
covered by only a
handful of features
Ø Skewed:
A few features have
much wider
coverage than
others
57. World Atlas of Language Structure (WALS)
72
Ø 2,500 languages
800 languages
Ø 192 features
160 features
Feature 81A – Order of Subject, Object and Verb
74. Semi-supervised Extension - Interpretability
97
Typological Feature
Prediction
Multilingual Language
Modelling
Compressing linguistic information
75. Evaluation
• Controlling for
Genetic Relationships
• Train on all out-of-family
data
• [0, 1, 5, 10, 20]% in-family
data
• Observed features in
matrix
• With/without pre-
trained language
embeddings
98
78. Uncovering Probabilistic Implications in
Typological Knowledge Bases
Johannes Bjerva, Yova Kementchedjhieva,
Ryan Cotterell, Isabelle Augenstein
ACL 2019
108
79. Linguistic Typology and Greenberg’s Universals
109
VO languages have prepositions
OV languages have postpositions
80. From Correlations to Probabilistic Implications
110
Visualisation of a section of the induced graphical model.
Observing the features in the left-most nodes (SV, OV, and
Noun-Adjective), can we correctly infer the value of the
right-most node (SVO)?
Computer Science, Johns Hopkins University
er Science and Technology, University of Cambridge
genstein@di.ku.dk, rdc42@cam.ac.uk
rooted in
linguistic
uages with
have post-
ations typi-
manual pro-
d linguists,
stic univer-
present a
sfully iden-
Greenberg
ones, wor-
n. Our ap-
ously used
g baseline
SV
OV
SVO
Noun-
Adjective
Figure 1: Visualisation of a section of our induced
graphical model. Observing the features in the left-
most nodes (SV, OV, and Noun-Adjective), can we cor-
rectly infer the value of the right-most node (SVO)?
one should consider universals to maintain the plau-
sibility of the data (Wang and Eisner, 2016). Com-
81. Accuracies for feature prediction in a typologically diverse
test set, across number of implicants used
111
ory sizes across languages
and Haspelmath (2013)).
s is a standard technique
we are interested in pre-
res given others. If we
observed features for a
N implicants 2 3 4 5 6
Phonology 0.75 0.82 0.84 0.86 0.89
Morphology 0.77 0.85 0.87 0.70 0.82
Nominal Categories 0.72 0.83 0.80 0.84 0.81
Nominal Syntax 0.77 0.89 0.85 0.89 0.81
Verbal Categories 0.80 0.84 0.80 0.86 0.90
Word Order 0.74 0.86 0.86 0.86 0.93
Clause 0.75 0.81 0.84 0.85 0.84
Complex 0.82 0.83 0.87 0.93 0.84
Lexical 0.83 0.76 0.75 0.85 0.79
Mean 0.77 0.83 0.83 0.85 0.85
Most freq. 0.30
Pairwise 0.77
PRA 0.81
Language embeddings 0.85
Table 1: Accuracies for feature prediction in a typo-
logically diverse test set, across number of implicants
used. Note that the numbers are not comparable across
columns nor to the baseline, since each makes a differ-
ent number of predictions.
Feature Prediction Accuracies
82. Hand-picked implications. In cases where the same is
covered by Daumé III and Campbell (2007), we borrow
their analysis (marked with *)
112
# Implicant Implicand
1* Postpositions Genitive-Noun (Greenberg #2a)
2* Postpositions OV (Greenberg #4)
3 OV SV
4* Postpositions SV
5* Prepositions VO (Greenberg #4)
6* Prepositions Initial subord. word (Lehmann)
7* Adjective-Noun, Postpositions Demonstrative-Noun
8* Genitive-Noun, Adjective-Noun OV
9 SV
OV
Noun-Adjective SOV
10 Degree word-Adjective
VO and Noun–Relative Clause
SVO Numeral-Noun
11 SOV
OV and Relative Clause–Noun
Adjective-Degree word Noun-Numeral
Table 2: Hand-picked implications. In cases where the
same is covered by Daum´e III and Campbell (2007),
we borrow their analysis (marked with *).
two-tailed t-test.2 After adjusting for multiple tests
power goes to the Degree
conditioned on this featur
der holds in 79% of the
combination of all three
hand, results in a subset
Numeral-Noun order. T
can thus be implied with
dence from a combination
5 Related Work
Typological implication
possible languages, base
served languages, as reco
guists (Greenberg, 1963; L
1983). While work in this
ual, typological knowled
(Dryer and Haspelmath,
Levin, 2016), which allow
Probabilistic Implications Found
84. Conclusions: This Talk
114
Part 1: Language Representations
- Improve performance for multilingual
sharing (de Lhoneaux et al. 2018)
- Encode typological properties
- task-specific fine-tuned ones even
more so than ones only obtained using
language modelling (Bjerva &
Augenstein 2018a,b)
- Can be used to reconstruct phylogenetic
trees (Rabinovich et al. 2017, Östling &
Tiedemann 2017)
- … but actually mostly represent
structural similarities between
languages (Bjerva et al., 2019a)
85. Conclusions: This Talk
115
Part 2: Typological
Knowledge Bases
- Can be populated automatically
with high accuracy using KBP
methods (Bjerva et al. 2019b)
- Language embeddings further
improve performance
- Can be used to discover
probabilistic implications
- With one or multiple implicants
- Including Greenberg universals
87. Presented Papers
Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard.
Parameter sharing between dependency parsers for related languages. EMNLP
2018.
Johannes Bjerva, Isabelle Augenstein. Tracking Typological Traits of Uralic
Languages in Distributed Language Representations. Fourth International
Workshop on Computational Linguistics for Uralic Languages (IWCLUL 2018).
Johannes Bjerva, Isabelle Augenstein. Unsupervised Linguistic Typology at
Different Levels with Language Embeddings. NAACL HLT 2018.
Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle
Augenstein. What do Language Representations Really Represent?
Computational Linguistics, Vol. 45, No. 2, June 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
A Probabilistic Generative Model of Linguistic Typology. NAACL 2019.
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein.
Uncovering Probabilistic Implications in Typological Knowledge Bases. ACL 2019.
117
88. Thanks to my collaborators and advisees!
Johannes Bjerva, Ryan Cotterell, Yova Kementchedjhieva, Miryam de
Lhoneux, Robert Östling, Maria Han Veiga, Jörg Tiedemann
118