Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their
meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high
level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Towards a Universal Wordnet by Learning from Combined Evidence
1. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Towards a Universal Wordnet
by Learning from Combined Evidence
Gerard de Melo and Gerhard Weikum
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
2009-11-03
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
2. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words? person who
gives a talk
“speaker”
device that
produces
sounds
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
3. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
flat piece
of wood
“board”
committee
panel for writing
with chalk
to enter a
transportation
vehicle
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
4. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
someone who
studies
“student”
“pupil”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
5. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
faculty
professor
member
part
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
6. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
7. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
Many Applications
examples:
NLP, AI
question answering
query expansion
human consultation
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
8. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Top 10 Languages by
Approx. No. of Speakers
Source: Ethnologue 2005
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
9. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Internet users by Region
Source:
Internet World Stats
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
10. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
person who
gives a talk
eng: “speaker”
jpn: “ ”話者
rus: “докладчик”
ces: “řečník”
... ......
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
look up any word
in any language,
get a list of its meanings
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
11. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
entitypor: “entidade”
cmn: “ ”制度 institution
educational
institution
university
heb: “ישות.”
deu: “Bildungs-
einrichtung”
cym: “prifysgol”
...
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
meanings should be connected
via semantic relations
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
12. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
13. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
14. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Miller, Fellbaum et al. (1990)
among most-cited papers
in computer science
(source: CiteseerX)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
15. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
16. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
hypernym hierarchy
meronymy (part of)
etc.
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
17. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
18. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
19. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
not a single, coherent resource
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
20. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
21. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
2 languages, around 70 000 entities
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
22. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
large translation graph
limited structure
e.g. no semantic hierarchy
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
23. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
class hierarchy not multilingual
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
24. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
25. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
26. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
−→
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Desired Output
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
27. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
mainly English, Spanish, Catalan
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
28. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
dictionaries (e.g. Wiktionary)
thesauri and ontologies
parallel corpora (word alignment)
also: predict new translations
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
29. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
30. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
31. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
32. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
33. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
Given term t
and meaning m
Question: Should they be linked?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
34. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
Given term t
and meaning m
Question: Should they be linked?
Look at neighbours t ∈ Γt
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
35. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
36. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
(1−sim(m ,m))
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
37. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
φ1(t, t ) sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
φ2(t ,m )sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
φ2(t ,m )(1−sim(m ,m))
weighting based on:
part-of-speech
corpus frequency
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
38. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Other Features
cosine similarity of
translations with gloss
scores assessing polysemy by
looking at back-translations
many more
(see paper for details)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
39. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
40. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
41. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
42. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
43. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
44. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
45. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
46. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
deu: “Schulgebäude”
school
(group of fish)
school
(institution)
school
(building)
deu: “Schulhaus”
deu: “Fischschwarm”
ces: “hejno”
fra: “banc”
ind: “sekolah”
jpn: “ ”学校
kor: “ ”학교
lao: “ໂຮງຮຽນ”
kat: “ ”სკოლა
Excerpt from final UWN graph G3 after 3 iterations
retaining only edges with sufficiently high weights (0.5 / 0.6)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
47. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Evaluation
Relation Precision1
Term-Meaning Links (French) 89.2% ± 3.4%
Term-Meaning Links (German) 85.9% ± 3.8%
Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3%
Generalization (Hypernymy) 87.1% ± 4.8%
Instance 89.3% ± 4.4%
Similarity 92.0% ± 3.8%
Category 93.3% ± 4.5%
Part (Meronymy) 94.4% ± 4.1%
Member (Meronymy) 92.7% ± 4.0%
Substance (Meronymy) 95.6% ± 3.5%
Opposite 94.3% ± 3.9%
1: Wilson score intervals for random samples
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
48. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Coverage
Language Term-Meaning Links Distinct Terms
Overall 1,595,763 822,212
German 132,523 67,087
French 75,544 33,423
Esperanto 71,247 33,664
Dutch 68,792 30,154
Spanish 68,445 32,143
Turkish 67,641 31,553
Czech 59,268 33,067
Russian 57,929 26,293
Portuguese 55,569 23,499
Italian 52,008 24,974
Hungarian 46,492 28,324
Thai 44,523 30,815
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
49. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
50. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
51. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
52. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Results for 3 German Datasets
Dataset GUR65 GUR350 ZG222
r Cov. r Cov. r Cov.
Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222)
Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205
GermaNet (Lin*) 0.73 60 0.50 208 0.08 88
UWN 0.80 60 0.68 242 0.51 106
r: Pearson product-moment correlation coefficient
Cov.: absolute coverage
∗: scores by Gurevych et al. (2007)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
53. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
54. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
55. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
56. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
57. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
Language Pair Terms only Terms + Meanings
English-Italian 68.3% 76.3%
English-Russian 51.7% 71.2%
Italian-English 74.4% 78.1%
Italian-Russian 58.4% 73.2%
Russian-English 67.3% 76.8%
Russian-Italian 62.2% 71.8%
(all values are F1 scores)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
58. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
59. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
60. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
61. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
62. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
63. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
64. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
65. Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Thanks!
expression of
gratitude
eng: “thank you”
yue: “ ”唔該
cmn: “ ”谢谢
jap: “ ”ありがとう
spa: “gracias”
ara: “راً شك.”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29