SlideShare una empresa de Scribd logo
1 de 92
Programming language is not an island: Word Sense 
Alignment of Lexical-Semantic Resources 
Iryna Gurevych 
Joint work with: Judith Eckle-Kohler, Kostadin Cholakov, Silvana 
Hartmann, Michael Matuschek, Christian M. Meyer 
http://www.ukp.tu-darmstadt.de/data/uby 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 1 
UBY
Applications of Linked Lexical Resources 
2 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Joint Modeling of Features 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Text Analysis Needs Lexical-Semantic Knowledge 
NLP application Lexical resource 
Which lexical resource 
to choose? 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 3
Resources are Largely Different 
 Different coverage of words/word senses 
 Different types of information 
Encyclopedic vs. linguistic knowledge 
Syntactic vs. semantic knowledge 
 … 
Resource integration can significantly influence the performance 
of your system! – Instead of choosing only one (best performing): 
Why not combine multiple resources 
and benefit from all their knowledge? 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 4
Overlap of Lexical Entries 
Roget’s Thesaurus 
(62,797) 
25,541 
28,650 
163,027 67,868 
56,240 
Wiktionary 
(364,663) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 5 
WordNet 
(149,502) 
Common vocabulary is 
rather small (28,650). 
Each resource contains a lot 
of “unique” words.
Overlap of Lexical Entries 
slang 
dialect 
natural 
sciences 
computer 
science 
surprisingly 
neologisms 
named 
entities 
social 
sciences 
humanities 
biological 
taxonomy 
small 
overlap 
math 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 6
7 
Word Sense Alignment 
1. To sing: To produce musical or 
harmonious sounds with one’s 
voice. 
2. To sing: To express audibly by means of 
a harmonious vocalization. 
3. To sing: To confess under 
interrogation. 
1. singen: Mit 
der Stimme 
harmonische 
Töne erzeugen. 
1. To sing: Produce 
tones with the voice 
2. To sing: divulge 
confidential information 
or secrets 
1. To sing: To produce 
harmonious sounds 
with one's voice. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Prior Work on Linked Lexical Resources (LLR) 
Meaning Multilingual Central Repository, Atserias et al. (2004) 
 Yago, Suchanek et al. (2007) 
 SemLink (Palmer, 2009) 
 Universal Wordnet (UWN), Gerard de Melo and Gerhard Weikum 
(2009) 
 eXtended WordFrameNet, Laparra and Rigau (2010) 
 BabelNet, Navigli and Ponzetto (2010) 
NULEX, McFate and Forbus (2011) 
 UBY, Gurevych et al. (2012) 
 … many more, e.g. on the Semantic Web 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 8
Potential of Linked Lexical Resources 
Increased coverage and the enriched sense representation 
 Linking FrameNet, VerbNet, and WordNet for semantic parsing 
(Shi and Mihalcea, 2005) 
 Linking VerbNet, FrameNet and PropBank for semantic role labeling 
(Palmer, 2009) 
 Linking WordNet and Wikipedia for word sense disambiguation 
(Navigli and Ponzetto, 2010) 
 Linking WordNet and Wiktionary for measuring verb similarity 
(Meyer and Gurevych, 2012) 
 Linking OmegaWiki and Wiktionary for mining translations (McCrae 
and Cimiano, 2013) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 9
The Challenge: Heterogeneity of Resources 
Different coverage: 
missing entities in one 
of the resources 
Different granularity: 
entities are defined at 
different levels 
Different perspectives: 
entities are defined for 
a different purpose 
vs. 
vs. 
vs. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 10 
(Euzenat/Shvaiko, 2007)
Lemma Alignment 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 13 
Wiktionary 
WordNet 
Content integration at the lemma 
level is easy, but…
Word Sense Alignment 
Content integration at the lemma 
level is easy, but… 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 14 
Wiktionary 
WordNet 
…integration at the 
sense level is hard!
Word Sense Alignment 
plant in Wiktionary 
 (botany) An organism of the kingdom 
Plantae […] 
 (proscribed as biologically inaccurate) 
Any creature that grows on soil or 
similar surfaces, including plants and 
fungi. 
 A factory or other industrial or 
institutional building or facility. 
 (snooker) A play in which the cue ball 
knocks one (usually red) ball onto 
another […] 
plant in WordNet 
 buildings for carrying on 
industrial labor 
 (botany) a living organism 
lacking the power of 
locomotion 
 an actor situated in the 
audience whose acting is 
rehearsed but seems 
spontaneous to the 
audience 
? 
? 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 15
The Alignment Process 
 Can be generalized for multiple resources „multi-alignment“: 
parameters p 
r 
resource 1 
A Matching 
A‘ 
knowledge k 
r‘ 
alignment 
(possibly empty) 
resource 2 
initial 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 17 
output 
alignment 
A‘ = f(r, r‘, A, p, k) 
A‘ = f(r1,…,rn, A, p, k) 
(Euzenat/Shvaiko, 2007)
Applications of Linked Lexical Resources 
20 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Joint Modeling of Features 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Construction of aligned lexical resources 
What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for 
Increased Domain Coverage. Christian M. Meyer and Iryna Gurevych. In: Proceedings 
of IJCNLP, pp. 883-892, November 2011. 
Niemann & 
Gurevych, 
IWCS 2011 
█ 
Sense 
Alignment 
Meyer & 
Gurevych, 
IJCNLP 
2011 
█ 
Matuschek 
& Gurevych, 
TACL, 2013 
█ █ █ 
Matuschek 
& Gurevych, 
COLING, 
2014 
█ █ █ 
Miller & 
Gurevych, 
LREC 2014 
█ █ █ 
Hartmann & 
Gurevych, 
ACL 2013 
█ █ 
█ Graph-based alignment 
█ Resource-independent alignment 
█ Text similarity-based alignment 
█ Exploitation of existing LR alignments 
to produce new ones 
14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 21
Similarity-based Word Sense Alignment 
Increased coverage 
Enriched sense 
representations 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 22
works 
(factory) … 
23 
bird 
(animal) 
Wikipedia 
article … 
Wikipedia 
article … 
Aligning Wiktionary and WordNet 
A two-step approach: 
1. Candidate extraction 
2. Candidate disambiguation 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 
plant 
(factory) 
plant 
(organism) 
plant 
(person) 
works 
(machine) 
WordNet synsets 
Wiktionary senses 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
to fly 
(move) reddish 
(color)
works 
(factory) … 
24 
bird 
(animal) 
Wikipedia 
article … 
Wikipedia 
article … 
Aligning Wiktionary and WordNet 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 
plant 
(factory) 
plant 
(organism) 
plant 
(person) 
works 
(machine) 
WordNet synsets 
Wiktionary senses 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
to fly 
(move) reddish 
(color) 
A two-step approach: 
1. Candidate extraction 
2. Candidate disambiguation
X 
works 
(factory) … 
25 
bird 
(animal) 
Wikipedia 
article … 
Wikipedia 
article … 
Aligning Wiktionary and WordNet 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 
plant 
(factory) 
plant 
(organism) 
plant 
(person) 
works 
(machine) 
WordNet synsets 
Wiktionary senses 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
{plant, works, 
industrial plant} 
to fly 
(move) reddish 
(color) 
X 
X 
A two-step approach: 
1. Candidate extraction 
2. Candidate disambiguation
Bag of Words Representation 
synset 
hypernyms 
hyponyms 
hyper- & 
hyponyms 
bag-of-words 
bag-of-words 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 26 
lemma 
sense 
definition 
usage 
examples 
synonyms 
Synsets are represented 
by synonyms, gloss, 
examples
Candidate Disambiguation 
semantic 
relatedness 
measure 
bag-of-words 
bag-of-words 
COS: Cosine similarity 
score s 
PPR: Personalized PageRank 
s < threshold s ≥ threshold 
No alignment! 
Align this pair of 
WordNet synset and 
Wiktionary sense! 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 27
Evaluation Dataset 
Dataset creation: 
 No previous alignments = no other evaluation datasets 
 We created a new dataset with 2,423 sense pairs 
 10 human raters (students/researchers from CS, math, linguistics) 
 Annotate each pair as “same meaning” or “different meaning” 
Dataset reliability: 
 Inter-rater agreement: AO = .93, κ = .70 
 Removing two biased raters: AO = .94, κ = .74 
Gold standard: 
 Majority vote of the 8 raters, additional tie breaker 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 39
Evaluation Results 
 RAND: Random baseline 
 MFS: Baseline aligning always the first sense (≈ most frequent sense) 
Method A P R F1 
RAND .662 .212 .594 .313 
MFS .802 .329 .508 .399 
COS only .901 .598 .703 .646 
PPR only .915 .684 .636 .659 
COS&PPR .914 .674 .649 .661 
 Our approach significantly outperforms the baseline (at 1% level) 
 COS highest recall; PPR highest precision; COS&PPR highest F1 
 Significant difference of PPR, COS&PPR over COS (at 1% level) 
 No significant difference between PPR and COS&PPR 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 40
Error Analysis 
110 false negatives: 
“same meaning, but was not aligned” 
 Very different wording 
 “good discernment” vs.“ability to notice what others might miss” 
 Similar senses but slightly below threshold 
 “plants of the genus Centaurea” vs. “common weeds of the genus 
Centaurea” 
 Pointing to another entry rather than a content-based gloss 
 pacification: “the process of pacifying” 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 42
Error Analysis 
98 false positives: 
“different meaning, but have been aligned” 
 Similar wording, but refer to different concepts 
 “a computer that provides client stations with access to files and 
printers as shared resources to a computer network” vs. “any 
computer attached to a network” 
 High relatedness, but generic- versus domain-specific vocabulary 
 “any computer attached to a network” vs. “any organization that 
provides resources and facilities for a function or event” 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 43
Increased Coverage: Parts of Speech 
 Our alignment: 56,970 sense pairs 
 Final resource contains 488,988 word senses 
 Substantial increase in the coverage of senses 
 Wiktionary is not restricted to nouns/verbs/adjectives: proverbs, 
idioms, collocations, particles, determiners, inflected forms, etc. 
Wiktionary 
AND WordNet 
Additionally in 
Wiktionary 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 44 
Additionally in 
WordNet 
Nouns 34,464 158,085 47,651 
Verbs 8,252 29,119 5,515 
Adj./Adv. 14,236 60,977 7,541 
Other POS 0 16,778 0 
Inflected Forms 0 106,328 0
Increased Coverage: Domains 
Wiktionary 
AND WordNet 
Additionally in 
Wiktionary 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 45 
Additionally in 
WordNet 
Biology 4,465 4,067 12,869 
Chemistry 2,561 8,260 2,268 
Engineering 1,108 940 1,080 
Geology 2,287 2,898 2,479 
Humanities 4,949 2,700 5,060 
IT 439 3,032 557 
Linguistics 1,249 1,011 1,576 
Math 615 2,747 483 
Medicine 3,613 3,728 3,058 
Military 574 426 585 
Physics 1,246 2,835 1,252 
Religion 733 1,154 781 
Social Sciences 3,745 2,907 4,458 
Sport 905 2,821 807
Enriched Sense Representation 
Synonyms 
Gloss 
Example sentence 
Subsumption hierarchy 
Synset organization 
… 
Pronunciation 
Etymology 
Syntactic knowledge 
Quotations 
Related terms 
Translations 
… 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 46
Selected Conclusions 
 Aligned Wiktionary – WordNet is characterized by: 
(1) Increased coverage 
 Different parts of speech, not only nouns 
 e.g. humanities and social sciences from WordNet 
 e.g. technical domains and leisure from Wiktionary 
(2) Enriched sense representation 
 Pronunciation, etymology, related terms, translations, etc. 
 Novel evaluation dataset annotated by 10 human raters 
 Better results based on the resource-structure based and hybrid 
techniques in later work (Matuschek & Gurevych, TACL ‘13) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 47
Applications of Linked Lexical Resources 
48 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Joint Modeling of Features 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Construction of aligned lexical resources 
Michael Matuschek and Iryna Gurevych: Dijkstra-WSA: A Graph-Based 
Approach to Word Sense Alignment, in: Transactions of the Association 
for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013 
Niemann & 
Gurevych, 
IWCS 2011 
█ 
Sense 
Alignment 
Meyer & 
Gurevych, 
IJCNLP 
2011 
█ 
Matuschek 
& Gurevych, 
TACL, 2013 
█ █ █ 
Matuschek 
& Gurevych, 
COLING, 
2014 
█ █ █ 
Miller & 
Gurevych, 
LREC 2014 
█ █ █ 
Hartmann & 
Gurevych, 
ACL 2013 
█ █ 
█ Graph-based alignment 
█ Resource-independent alignment 
█ Text similarity-based alignment 
█ Exploitation of existing LR alignments 
to produce new ones 
14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 49
Similarity-Based Approaches Suffer From… 
 Different vocabulary employed by definitions 
 Example: English noun eye/discernment, e.g., 
she has an eye for fresh talent 
he has an artist's eye 
good discernment (either visually or as if visually) 
low semantic relatedness score… 
ability to notice what others might miss 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 50
Solution: Use the Graph Topology 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 51 
Word Senses 
Java1 Java2 of Java 
Java3
Intuition of Graph Topology 
Java1 Java2 of Java 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 52 
Word Senses 
Monosemous 
lexeme 
programming 
language 
Java3 
programming 
language1
Java1 Java2 of Java 
53 
Word Senses 
Word Senses 
of Ruby 
Intuition of Graph Topology 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 
Monosemous 
lexeme 
programming 
language 
Java3 
programming 
language1 
Ruby1
Intuition of Graph Topology 
Java1 Java2 of Java 
Related senses are in the same region of the graph 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 54 
Word Senses 
Monosemous 
lexeme 
programming 
language 
Word Senses 
of Ruby 
Java3 
programming 
language1 
Ruby1
Dijkstra-WSA 
Graph-based word sense alignment approach 
Key ideas: 
 Represent lexical resources as graphs 
 Rely on trivial alignments as “reference nodes” and “bridges” 
Use Dijkstra’s shortest path algorithm 
to find alignments 
Steps: 
1. Graph construction 
2. Computing sense alignments 
(Matuschek/Gurevych, 2013) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 55
Step 1: Graph Construction 
Represent each lexical resource as an undirected graph 
L = (V, E) with 
the set of nodes V representing senses or synsets 
 the set of edges E  V x V representing some kind of (semantic) 
similarity between a pair of nodes 
An edge connects sense S1 and sense S2 if, for example… 
 There exists a semantic relation between S1 and S2 
 A lexeme W2 occurs in the sense definition of S1, and 
W2 is monosemous 
 S1 and S2 share the same syntactic behavior 
 … 
(Matuschek/Gurevych, 2013) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 56
Step 1: Graph Construction 
Graph of resource 1 
Graph of resource 2 
Java1 
Java3 
edges representing some kind of 
(semantic) similarity between nodes 
Java2 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 57 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Step 2: Computing Sense Alignments 
a) Create trivial alignments between the resources: 
 Trivial = lexeme is unique/monosemous in both resources 
 Example: programming language 
 Precision: >0.95 
b) Identify alignment candidates 
 For example: nodes representing the same lemma 
c) For all nodes still unaligned, find shortest paths to the 
candidate nodes in the other graph 
 Trivial alignments serve as “bridges” between the graphs 
 Align the node pair with the shortest path 
(Matuschek/Gurevych, 2013) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 58
Step 2: Computing Sense Alignments 
Graph of resource 1 
Graph of resource 2 
Java1 
Java2 
Java3 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 59 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Step 2a: Create Trivial Alignments 
Graph of resource 2 
Graph of resource 1 
Java1 
Java2 
Java3 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 60 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Step 2b: Identify Alignment Candidates 
Graph of resource 2 
Graph of resource 1 
? 
? 
? 
Java1 
Java2 
Java3 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 61 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Step 2c: Shortest Paths to the Candidates 
Graph of resource 2 
Graph of resource 1 
3 
5 
∞ 
Java1 
Java2 
Java3 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 62 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Step 2c: Align the Nodes 
Graph of resource 2 
Graph of resource 1 
! 
Java1 
Java2 
Java3 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 63 
Java1 
programming 
language1 
programming 
language1 
espresso1 
espresso1
Parameter Choices 
Restricting the number of alignments 
 Stop when the first candidate is found (1:1 alignment) 
 Keep going and align everything you can reach (1:n alignment) 
 Possibly with a restricted search depth 
Graph construction 
 Use semantic relations, monosemous linking, or both 
 Get rid of relations to high frequent monosemous lexemes (e.g., there is) 
 Limiting to rare lexemes avoids “explosion” of edges 
 Rare = only appearing in 1 / N of the definitions (e.g., N = 200) 
Computing Sense Alignments 
 Path length L: unbounded L yields unmanageable runtime! 
 Best F1 score between 5 and 8, depending on the resource pair 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 65
Hybrid Approach 
Main issue of Dijkstra-WSA 
 Low recall due to missing edges / sparse graph 
Hybrid approach 
 Try to align using the graph first 
 Parameterized for high precision 
 Align those with no match using a similarity-based approach 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 66
Evaluation Datasets 
Sampled datasets: 
 WordNet – Wikipedia (1,815 sense pairs) 
 WordNet – Wiktionary (2,423 sense pairs) 
 FrameNet – Wiktionary (2,789 sense pairs) 
 WordNet – OmegaWiki (683 sense pairs) 
 Wiktionary – OmegaWiki (586 sense pairs) 
 Wiktionary –Wikipedia English (367 sense pairs) 
Full datasets: 
 GermaNet – Wiktionary (45,636 sense pairs) 
 Wiktionary –Wikipedia German (31,808 sense pairs) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 67
Datasets Display Different Properties 
 WordNet, OmegaWiki, Wikipedia: sense definitions and semantic 
relations 
 Wiktionary: no disambiguated semantic relations => sparse graphs 
 GermaNet: very few sense definitions 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 68
Evaluation 
Random baseline 
1:1 
1st 
Similarity-based (SB) 
Semantic Relations (SR) 
Linking Monosemes (LM) 
SR + LM 
SR + SB 
LM + SB 
SR + LM + SB 
Hybrid 
Human performance 
(Matuschek/Gurevych, 2013) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 69
Evaluation 
Random baseline 
1:1 
1st 
Similarity-based (SB) 
Semantic Relations (SR) 
Linking Monosemes (LM) 
SR + LM 
SR + SB 
LM + SB 
SR + LM + SB 
Hybrid 
Human performance 
Significant improvement 
(Matuschek/Gurevych, 2013) 
in recall…. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 70
Evaluation 
Random baseline 
1:1 
1st 
Similarity-based (SB) 
Semantic Relations (SR) 
Linking Monosemes (LM) 
SR + LM 
SR + SB 
LM + SB 
SR + LM + SB 
Hybrid 
Human performance 
(Matuschek/Gurevych, 2013) 
… and F-measure… 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 71
Evaluation 
… also on all other 
datasets! 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 72
Selected Conclusions 
 Dijkstra-WSA ≥ gloss similarity for densely linked LSRs 
 Generic alignment approach is valid 
 But: low recall for sparse LSRs (English Wiktionary, OmegaWiki) 
 Dijkstra-WSA + similarity-based backoff outperfoms previous work 
on all datasets 
 The two notions of similarity are complementary 
 Could they be combined in a smarter way? 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 76
Joint Modeling of Features 
Applications of Linked Lexical Resources 
77 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Construction of Aligned Lexical Resources 
Michael Matuschek and Iryna Gurevych: High Performance Word Sense Alignment by 
Joint Modeling of Sense Distance and Gloss Similarity, in: Proceedings of the 25th 
International Conference on Computational Linguistics (COLING 2014). Dublin, Ireland. 
Niemann & 
Gurevych, 
IWCS 2011 
█ 
Sense 
Alignment 
Meyer & 
Gurevych, 
IJCNLP 
2011 
█ 
Matuschek 
& Gurevych, 
TACL, 2013 
█ █ █ 
Matuschek 
& Gurevych, 
COLING, 
2014 
█ █ █ 
Miller & 
Gurevych, 
LREC 2014 
█ █ █ 
Hartmann & 
Gurevych, 
ACL 2013 
█ █ 
█ Graph-based alignment 
█ Resource-independent alignment 
█ Text similarity-based alignment 
█ Exploitation of existing LR alignments 
to produce new ones 
14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 78
Joint Usage of Features 
 Similarity- and graph-based approaches both have weaknesses 
 Different formulation of glosses 
 Sparse / disconnected graphs 
Two-step hybrid approach already helped improve recall 
 But: No real combination of both notions 
 Idea: Combine them using Machine Learning 
 Exploit the complementary strengths more effectively 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 79
Setup - Features 
Features: 
 Gloss similarity (COS, PPR) 
 Dijkstra-WSA distances 
 Infinite distance if no target can be found 
Other possible features: 
 Part of speech, sense index, translation overlap, example sentence 
patterns 
No significant improvement by using them! 
 Glosses and structure are sufficient 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 80
Setup - Classifiers 
Classifiers used: 
 Naive Bayes 
 Bayesian Networks 
 Perceptrons 
 Support Vector Machines (SVMs) 
 Decision Trees 
Evaluation using 10-fold cross validation 
 Same datasets as before 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 81
Evaluation 
Random 
1:1 
1st 
SB 
DWSA 
Hybrid 
SVM 
Naive Bayes 
Bayesian Network 
Perceptron 
Decision Tree 
Human performance 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 82
Evaluation 
Random 
1:1 
1st 
SB 
DWSA 
Hybrid 
SVM 
Naive Bayes 
Bayesian Network 
Perceptron 
Decision Tree 
Human performance 
General improvement in 
precision… 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 83
Evaluation 
Random 
1:1 
1st 
SB 
DWSA 
Hybrid 
SVM 
Naive Bayes 
Bayesian Network 
Perceptron 
Decision Tree 
Human performance 
…but in F-measure only 
for some of the 
datasets! 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 84
Selected Conclusions 
 Better overall results on 4 out of 8 datasets 
Machine learning helps most for sparse and incomplete LSRs like 
OmegaWiki and Wiktionary 
 For „complete“ LSRs like WordNet, we cannot gain much 
 Better precision on 7 out of 8 
 Most robust: Bayesian Networks 
 Complex classifiers (e.g. SVMs) challenged by skewed values 
Main source of improvements: 
 Better classification of „borderline“ examples 
 High gloss similarity & distance or vice versa 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 88
Borderline Example 
Genome: 
1. “The non-redundant genetic information stored in DNA sequences 
that defines an individual organism” 
2. “In the context of a genetic algorithm, the information that defines 
an individual entity” 
 Very similar description 
But: Far apart in the graph 
=> No alignment! 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 89
Joint Modeling of Features 
Applications of Linked Lexical Resources 
90 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Linked Lexical Resources 
Gurevych et 
al., EACL 
2012 
█ █ 
LLRs 
Eckle-Kohler 
et al., LREC 
2012 
█ █ 
Eckle-Kohler 
& Gurevych, 
EACL 2012 
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian 
Wirth: UBY - A Large-Scale Unified Lexical-Semantic █ 
Resource Based on LMF, in: Proceedings of the 13th 
Conference of the European chapter of the Association for Computational Linguistics (EACL), April 2012. 
Eckle-Kohler 
et al., SWJ, 
2014 
█ 
Eckle-Kohler 
et al., LMF, 
2013 
█ █ █ 
█ Large-scale unified LR based on LMF 
█ Standardizing heterogeneous LRs 
█ Standardized format for subcat frames 
█ Language independence of lexicon 
models 
12.0.2014 | Technische Universität Darmstadt | Iryna Gurevych 91
UBY: Linking Lexical Resource 
Two main charUaBYcteristics: 
- Word Sense Alignments 
- Standardized Representation 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 92 
Web 2.0 
IMSLex-Subcat
Heterogeneity of Lexical Resources 
Complementary information types 
Different terminology 
Incompatible Data formats 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 93
Unified Lexical Resource UBY 
Unified lexicon model 
Preserves variety of lexical information 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 94 
Extensible
Structure Integration 
Standardized representation frameworks 
 Lexical Markup Framework (LMF) 
http://www.lexicalmarkupframework.org 
 Text Encoding Initiative (TEI) 
http://www.tei-c.org 
<entry> 
<form> 
<orth>disproof</orth> 
<pron>dIs"pru:f</pron> 
</form> 
<gramGrp> 
<pos>n</pos> 
</gramGrp> 
<sense n="1"> 
<def>facts that disprove something.</def> 
</sense> 
<sense n="2"> 
<def>the act of disproving.</def> [..] 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 95
Structure Integration in UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 96 
(Eckle-Kohler et al. 2012)
Sense Alignments Enable Semantic Interoperability 
 Senses linked by SenseAxis class (over 1,000,000 instances) 
 English alignments, e.g. WordNet-Wikipedia 
 German alignments, e.g. GermaNet-Wiktionary 
 Cross-lingual alignments, e.g. WordNet-OmegaWiki DE 
97 
1. To sing: To produce musical or 
harmonious sounds with one’s 
voice. 
2. To sing: To express audibly by means of 
a harmonious vocalization. 
3. To sing: To confess under 
interrogation. 
1. singen: Mit 
der Stimme 
harmonische 
Töne erzeugen. 
1. To sing: Produce 
tones with the voice 
2. To sing: divulge 
confidential information 
or secrets 
1. To sing: To produce 
harmonious sounds 
with one's voice. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Available Alignments 
Wikipedia English—WordNet 83,192 
Wiktionary English—WordNet 138,282 
GermaNet—Wiktionary German 32,850 
FrameNet—Wiktionary English 12,340 
Wiktionary English—OmegaWiki English 34,509 
WordNet—OmegaWiki German 27,529 
Wiktionary German—Wikipedia German 21,872 
Wiktionary English—Wikipedia English 66,050 
WordNet—VerbNet 40,716 
FrameNet—VerbNet 17,529 
Wikipedia English—OmegaWiki English 3,960 
Wikipedia German—OmegaWiki German 1,097 
Wikipedia English—Wikipedia German 463,311 
OmegaWiki English—OmegaWiki German 58,785 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 98
Resource Integration Workflow in UBY 
JWNL FN API JWPL JWKTL 
Human users Machines 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 99
Step 1. Structure Integration 
UBY API UBY API UBY API UBY API 
Human users Machines 
UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 100
101 
UBY-API 
Step 2. Content Integration 
Human users Machines 
UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
UBY Web UI – Textual View 
Textual View: allows to list senses across all resources, to display sense details 
and to perform sense comparisons. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 104
UBY Web UI – Visual View 
Visual view: allows to explore the sense alignments. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 105
UBY Java API 
The UBY API is open source at Google Code: http://code.google.com/p/uby/ 
Getting Started: 
1. Download a UBY database dump 
2. Import the dump into a MySQL database 
3. Start using the UBY API 
The UBY API is work in progress! 
Many API methods need to be added – consider contributing! 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 106
UBY – Data and Tools 
https://uby.ukp.informatik.tu-darmstadt.de/webui/ 
Database Dumps UBY 
http://uby.ukp.informatik.tu-darmstadt.de/uby/ UBY 
107 
http://code.google.com/p/uby/ 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 
Web Interface 
Open Source API (JAVA)
Joint Approaches to Word Sense Alignment 
Applications of Linked Lexical Resources 
108 
Motivation 
Similarity-based Word Sense Alignment 
Graph-based Word Sense Alignment 
Outline 
Putting the Pieces Together: UBY 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
Utilizing Linked Lexical Resources 
Cholakov et 
al., EACL 
2014 
█ 
Kostadin Utilizing 
Cholakov and Judith Eckle-Kohler and Iryna Gurevych: Automated Verb Sense 
Labelling LLRs 
Based on Linked Lexical Resources, in: Proceedings of the 14th Conference of 
the European Chapter of the Association for Computational Linguistics (EACL 2014), pp. 
68-77, April 2014 
Matuschek 
et al., 
KONVENS 
2014 
█ 
Michael Matuschek and Christian M. Meyer and Iryna Gurevych: Multilingual Knowledge 
in Aligned Wiktionary and OmegaWiki for Translation Applications, in: Translation: 
Corpora, Computation, Cognition (TC3), vol. 3, no. 1, p. 87-118, July 2013 
Matuschek 
et al., TC3, 
2013 
█ 
Hartmann et 
al., 2014 (in 
preparation) 
Hartmann & 
Gurevych, 
ACL 2013 
█ 
█ 
█ Sense annotation/disambiguation 
█ Machine/computer-assisted translation 
█ Semantic role labelling 
Michael Matuschek and Tristan Miller and Iryna Gurevych : A Language-independent 
█ Cross-language transfer of lexical-semantic 
Sense Clustering Approach for Enhanced WSD, in Proceedings of the 12th Konferenz zur 
Verarbeitung naturlicher Sprache (KONVENS 2014), to appear 
14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 109 
resources
Automatic Verb Sense Labelling of Corpora 
Motivation 
 Automatically create verb sense-annotated corpora as training data for 
supervised approaches 
Approach 
1. Create sense patterns from UBY (combining WordNet, FrameNet, VerbNet, 
Wiktionary) 
2. Compare these to patterns derived from corpus instances 
3. Assign word sense in corpus if similarity is above a threshold 
4. Use this data to train supervised systems (distant supervision) 
Results 
 Significant improvement over MFS baseline for verb sense disambiguation on 
MASC and Senseval-3 
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 110
Using Alignments for Word Sense Clustering 
Motivation 
 Cluster fine-grained word senses in expert-built resources to improve WSD 
performance 
Approach 
1. Create alignments between resources using Dijkstra-WSA, allowing 1:n 
alignments 
 Source: GermaNet, WordNet 
 Target: Wiktionary, Wikipedia, OmegaWiki 
2. If two or more senses are aligned to the same sense in the other resource, 
merge them into one coarse sense 
3. Rescore state-of-the-art WSD algorithms on clustered sense inventory 
Results 
 Significant improvement over random clusters of same granularity on 
WebCAGe (GermaNet) and Senseval-3 (WordNet) 
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 111
Using Aligned Resources for Computer-aided 
Translation 
Motivation 
 SMT systems help, but are not smart enough to replace manual translation 
Approach 
1. Create sense alignments between multilingual resources 
2. Display information from all resources for a particular meaning 
Results 
 Substantially more available translations and other information types 
 Example: “bass” in Wiktionary and OmegaWiki 
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 112
Programming language is not an island! 
 Word Sense Alignment is vital for increasing coverage and 
richness of sense representations 
But: It is a hard problem! 
 Various approaches 
 Similarity-based, graph-based, combined 
 Performance depends on resources 
 Sparsity, availability of glosses,… 
 Machine learning shows most robust results 
 Aligned resources help improve performance for various 
applications 
 VSD, coarse-grained WSD, computer-aided translation 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 128
Future Work 
1. Linked lexical resources (LLRs) 
 Integrating and aligning further resources in UBY 
 Special focus: cross-lingual alignment 
2. Construction of aligned lexical resources 
 Investigating more elaborate similarity measures for glosses 
 Using different graph algorithms to better express similarity 
 Aligning several resources at once (n-way alignment) 
3. Utilizing LLR for language processing 
 Unified deep learning framework utilizing linked resources 
 Distant supervision applied to semantic role labeling 
 Word sense disambiguation and lexical substitution for German 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 129
Thank you. Questions? 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 130
Sense Alignment of Lexical Resources 
(References) 
 Elisabeth Niemann and Iryna Gurevych. The People’s Web Meets Linguistic Knowledge: Automatic Sense 
Alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Conference on Computational 
Semantics (IWCS), p. 205-214, January 2011. 
 Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and 
WordNet for Increased Domain Coverage. In: Proceedings of the 5th International Joint Conference on Natural 
Language Processing (IJCNLP), p. 883–892, November 2011. 
 Michael Matuschek and Iryna Gurevych. Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment. 
Transactions of the Association for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013. 
 Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using 
Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for 
Computational Linguistics (ACL), vol. 1, p. 1363-1373, August 2013. 
 Tristan Miller and Iryna Gurevych. WordNet-Wikipedia-Wiktionary: Construction of a Three-way Alignment. In: 
Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), May 2014. (to 
appear) 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 131
Linked Lexical Resources @ UKP 
(References) 
 Judith Eckle-Kohler and Iryna Gurevych. Subcat-LMF – Fleshing out a Standardized Format for Subcategorization 
Frame Interoperability. In: Proceedings of the 13th Conference of the European Chapter of the Association for 
Computational Linguistics (EACL), p. 550-560, April 2012. 
 Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - A 
Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF. In: Proceedings of the 
8th International Conference on Language Resources and Evaluation (LREC), p. 275-282, May 2012. 
 Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - 
Exploring the Boundaries of Language-Independent Lexicon Models. In: LMF Lexical Markup Framework, chap. 10, 
p. 145-156, ISTE - HERMES - Wiley, 2013. ISBN 978 184 821 4309. 
 Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian 
Wirth. UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. In: Proceedings of the 13th 
Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 580--590, April 
2012. 
 Judith Eckle-Kohler, John Philip McCrae, and Christian Chiarcos. lemonUby - A Large, Interlinked, Syntactically-rich 
Lexical Resource for Ontologies. Semantic Web Journal, March 2014. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 132
Utilizing Linked Lexical Resources 
(References) 
 Kostadin Cholakov, Judith Eckle-Kohler, and Iryna Gurevych. Automated Verb Sense Labelling Based on Linked 
Lexical Resources. In: Proceedings of the 14th Conference of the European Chapter of the Association for 
Computational Linguistics (EACL), p. 68-77, April 2014. 
 Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using 
Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for 
Computational Linguistics (ACL), vol. 1, p. 1363-1373, August 2013. 
 Michael Matuschek, Tristan Miller, and Iryna Gurevych. A Language-independent Sense Clustering Approach for 
Enhanced WSD. In: Proceedings of the 19th Conference on Empirical Methods in Natural Language Processing, 
October 2014. (in submission) 
 Michael Matuschek, Christian M. Meyer, and Iryna Gurevych. Multilingual Knowledge in Aligned Wiktionary and 
OmegaWiki for Translation Applications. Translation: Corpora, Computation, Cognition (TC3), vol. 3, no. 1, p. 87- 
118, July 2013. 
12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 133

Más contenido relacionado

Similar a Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла слов в лексико-семантических ресурсах"

Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Jan Wedekind
 
Kaye odin dryad presentation 2
Kaye odin dryad presentation 2Kaye odin dryad presentation 2
Kaye odin dryad presentation 2ORCID, Inc
 
Introduction to development of lexical databases
Introduction to development of lexical databasesIntroduction to development of lexical databases
Introduction to development of lexical databasesMuhammad Shoaib Chaudhary
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextFulvio Rotella
 
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store - Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store - Hendrik Drachsler
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
An Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech RecognitionAn Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech RecognitionProjectsatbangalore
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
C4DM Seminar 2016-07-12: Brecht De Man
C4DM Seminar 2016-07-12: Brecht De ManC4DM Seminar 2016-07-12: Brecht De Man
C4DM Seminar 2016-07-12: Brecht De Mansebastianewert
 
Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataitstuff
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011Alex Hardisty
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Science 2.0 and language technology
Science 2.0 and language technologyScience 2.0 and language technology
Science 2.0 and language technologyfridolin.wild
 
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. Morris
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. MorrisDSD-INT 2015 - from foreshore data to foreshore information - Edward P. Morris
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. MorrisDeltares
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...ijseajournal
 

Similar a Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла слов в лексико-семантических ресурсах" (20)

Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...
 
Kaye odin dryad presentation 2
Kaye odin dryad presentation 2Kaye odin dryad presentation 2
Kaye odin dryad presentation 2
 
Introduction to development of lexical databases
Introduction to development of lexical databasesIntroduction to development of lexical databases
Introduction to development of lexical databases
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from Text
 
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store - Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
Learning Analytics Metadata Standards, xAPI recipes & Learning Record Store -
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
An Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech RecognitionAn Overview of Noise-Robust Automatic Speech Recognition
An Overview of Noise-Robust Automatic Speech Recognition
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
C4DM Seminar 2016-07-12: Brecht De Man
C4DM Seminar 2016-07-12: Brecht De ManC4DM Seminar 2016-07-12: Brecht De Man
C4DM Seminar 2016-07-12: Brecht De Man
 
Machine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-dataMachine learning-and-data-mining-19-mining-text-and-web-data
Machine learning-and-data-mining-19-mining-text-and-web-data
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Science 2.0 and language technology
Science 2.0 and language technologyScience 2.0 and language technology
Science 2.0 and language technology
 
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. Morris
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. MorrisDSD-INT 2015 - from foreshore data to foreshore information - Edward P. Morris
DSD-INT 2015 - from foreshore data to foreshore information - Edward P. Morris
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...
 

Más de AINL Conferences

Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...AINL Conferences
 
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...AINL Conferences
 
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"AINL Conferences
 
Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"AINL Conferences
 
Николай Бузурнюк "Автономная система распознавания русской речи"
 Николай Бузурнюк "Автономная система распознавания русской речи" Николай Бузурнюк "Автономная система распознавания русской речи"
Николай Бузурнюк "Автономная система распознавания русской речи"AINL Conferences
 
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"AINL Conferences
 
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"AINL Conferences
 
Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"AINL Conferences
 
Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"AINL Conferences
 
Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"AINL Conferences
 
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...AINL Conferences
 
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...AINL Conferences
 
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...AINL Conferences
 
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL Conferences
 
AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL Conferences
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...AINL Conferences
 
Ainl2013 molchanov статистические методы в машинном переводе_проблемы роста
Ainl2013 molchanov статистические методы в машинном переводе_проблемы ростаAinl2013 molchanov статистические методы в машинном переводе_проблемы роста
Ainl2013 molchanov статистические методы в машинном переводе_проблемы ростаAINL Conferences
 
Ainl 2013 bogatyrev_математическая и лингвистическая
Ainl 2013 bogatyrev_математическая и лингвистическаяAinl 2013 bogatyrev_математическая и лингвистическая
Ainl 2013 bogatyrev_математическая и лингвистическаяAINL Conferences
 
Ainl 2013 shavykin nao роботы.ppt
Ainl 2013 shavykin nao роботы.pptAinl 2013 shavykin nao роботы.ppt
Ainl 2013 shavykin nao роботы.pptAINL Conferences
 

Más de AINL Conferences (19)

Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...Альберт Ефимов "Перспективные направления исследований в области робототехник...
Альберт Ефимов "Перспективные направления исследований в области робототехник...
 
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
Сергей Уласень (Eugene Goostman) "Организация диалога в системе общения на ес...
 
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
Владислав Мараев "Унимодальные речевые интерфейсы: проблемы и перспективы"
 
Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"Дмитрий Суворов "Интеллектуальный помощник Лекси"
Дмитрий Суворов "Интеллектуальный помощник Лекси"
 
Николай Бузурнюк "Автономная система распознавания русской речи"
 Николай Бузурнюк "Автономная система распознавания русской речи" Николай Бузурнюк "Автономная система распознавания русской речи"
Николай Бузурнюк "Автономная система распознавания русской речи"
 
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
Артём Семинихин "IBM Watson: выявление скрытых взаимосвязей"
 
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
Анна Власова, Кирилл Зоркий "Как отличить в диалоге робота от человека"
 
Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"Антон Колонин "О создании программных агентов для "интернета вещей"
Антон Колонин "О создании программных агентов для "интернета вещей"
 
Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"Как получить статус участника проекта "Сколково"
Как получить статус участника проекта "Сколково"
 
Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"Центры коллективного пользования Технопарка "Сколково"
Центры коллективного пользования Технопарка "Сколково"
 
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
WordNet для русского языка. Русские тезаурусы: что есть и что надо? Ведущий: ...
 
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
Илья Мельников (Яндекс) "Классификатор коротких текстов с использованием вект...
 
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология разработки предме...
 
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
AINL 2013: Коммерческое использование мобильных ассистентов (Егор Наумов, i-F...
 
AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)AINL 2013: Commercial use of mobile assistants (i-Free)
AINL 2013: Commercial use of mobile assistants (i-Free)
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
 
Ainl2013 molchanov статистические методы в машинном переводе_проблемы роста
Ainl2013 molchanov статистические методы в машинном переводе_проблемы ростаAinl2013 molchanov статистические методы в машинном переводе_проблемы роста
Ainl2013 molchanov статистические методы в машинном переводе_проблемы роста
 
Ainl 2013 bogatyrev_математическая и лингвистическая
Ainl 2013 bogatyrev_математическая и лингвистическаяAinl 2013 bogatyrev_математическая и лингвистическая
Ainl 2013 bogatyrev_математическая и лингвистическая
 
Ainl 2013 shavykin nao роботы.ppt
Ainl 2013 shavykin nao роботы.pptAinl 2013 shavykin nao роботы.ppt
Ainl 2013 shavykin nao роботы.ppt
 

Último

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 

Último (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 

Ирина Гуревич "Язык программирования – это не остров: выравнивание смысла слов в лексико-семантических ресурсах"

  • 1. Programming language is not an island: Word Sense Alignment of Lexical-Semantic Resources Iryna Gurevych Joint work with: Judith Eckle-Kohler, Kostadin Cholakov, Silvana Hartmann, Michael Matuschek, Christian M. Meyer http://www.ukp.tu-darmstadt.de/data/uby 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 1 UBY
  • 2. Applications of Linked Lexical Resources 2 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Joint Modeling of Features Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 3. Text Analysis Needs Lexical-Semantic Knowledge NLP application Lexical resource Which lexical resource to choose? 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 3
  • 4. Resources are Largely Different  Different coverage of words/word senses  Different types of information Encyclopedic vs. linguistic knowledge Syntactic vs. semantic knowledge  … Resource integration can significantly influence the performance of your system! – Instead of choosing only one (best performing): Why not combine multiple resources and benefit from all their knowledge? 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 4
  • 5. Overlap of Lexical Entries Roget’s Thesaurus (62,797) 25,541 28,650 163,027 67,868 56,240 Wiktionary (364,663) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 5 WordNet (149,502) Common vocabulary is rather small (28,650). Each resource contains a lot of “unique” words.
  • 6. Overlap of Lexical Entries slang dialect natural sciences computer science surprisingly neologisms named entities social sciences humanities biological taxonomy small overlap math 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 6
  • 7. 7 Word Sense Alignment 1. To sing: To produce musical or harmonious sounds with one’s voice. 2. To sing: To express audibly by means of a harmonious vocalization. 3. To sing: To confess under interrogation. 1. singen: Mit der Stimme harmonische Töne erzeugen. 1. To sing: Produce tones with the voice 2. To sing: divulge confidential information or secrets 1. To sing: To produce harmonious sounds with one's voice. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 8. Prior Work on Linked Lexical Resources (LLR) Meaning Multilingual Central Repository, Atserias et al. (2004)  Yago, Suchanek et al. (2007)  SemLink (Palmer, 2009)  Universal Wordnet (UWN), Gerard de Melo and Gerhard Weikum (2009)  eXtended WordFrameNet, Laparra and Rigau (2010)  BabelNet, Navigli and Ponzetto (2010) NULEX, McFate and Forbus (2011)  UBY, Gurevych et al. (2012)  … many more, e.g. on the Semantic Web 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 8
  • 9. Potential of Linked Lexical Resources Increased coverage and the enriched sense representation  Linking FrameNet, VerbNet, and WordNet for semantic parsing (Shi and Mihalcea, 2005)  Linking VerbNet, FrameNet and PropBank for semantic role labeling (Palmer, 2009)  Linking WordNet and Wikipedia for word sense disambiguation (Navigli and Ponzetto, 2010)  Linking WordNet and Wiktionary for measuring verb similarity (Meyer and Gurevych, 2012)  Linking OmegaWiki and Wiktionary for mining translations (McCrae and Cimiano, 2013) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 9
  • 10. The Challenge: Heterogeneity of Resources Different coverage: missing entities in one of the resources Different granularity: entities are defined at different levels Different perspectives: entities are defined for a different purpose vs. vs. vs. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 10 (Euzenat/Shvaiko, 2007)
  • 11. Lemma Alignment 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 13 Wiktionary WordNet Content integration at the lemma level is easy, but…
  • 12. Word Sense Alignment Content integration at the lemma level is easy, but… 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 14 Wiktionary WordNet …integration at the sense level is hard!
  • 13. Word Sense Alignment plant in Wiktionary  (botany) An organism of the kingdom Plantae […]  (proscribed as biologically inaccurate) Any creature that grows on soil or similar surfaces, including plants and fungi.  A factory or other industrial or institutional building or facility.  (snooker) A play in which the cue ball knocks one (usually red) ball onto another […] plant in WordNet  buildings for carrying on industrial labor  (botany) a living organism lacking the power of locomotion  an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience ? ? 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 15
  • 14. The Alignment Process  Can be generalized for multiple resources „multi-alignment“: parameters p r resource 1 A Matching A‘ knowledge k r‘ alignment (possibly empty) resource 2 initial 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 17 output alignment A‘ = f(r, r‘, A, p, k) A‘ = f(r1,…,rn, A, p, k) (Euzenat/Shvaiko, 2007)
  • 15. Applications of Linked Lexical Resources 20 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Joint Modeling of Features Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 16. Construction of aligned lexical resources What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage. Christian M. Meyer and Iryna Gurevych. In: Proceedings of IJCNLP, pp. 883-892, November 2011. Niemann & Gurevych, IWCS 2011 █ Sense Alignment Meyer & Gurevych, IJCNLP 2011 █ Matuschek & Gurevych, TACL, 2013 █ █ █ Matuschek & Gurevych, COLING, 2014 █ █ █ Miller & Gurevych, LREC 2014 █ █ █ Hartmann & Gurevych, ACL 2013 █ █ █ Graph-based alignment █ Resource-independent alignment █ Text similarity-based alignment █ Exploitation of existing LR alignments to produce new ones 14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 21
  • 17. Similarity-based Word Sense Alignment Increased coverage Enriched sense representations 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 22
  • 18. works (factory) … 23 bird (animal) Wikipedia article … Wikipedia article … Aligning Wiktionary and WordNet A two-step approach: 1. Candidate extraction 2. Candidate disambiguation 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych plant (factory) plant (organism) plant (person) works (machine) WordNet synsets Wiktionary senses {plant, works, industrial plant} {plant, works, industrial plant} {plant, works, industrial plant} to fly (move) reddish (color)
  • 19. works (factory) … 24 bird (animal) Wikipedia article … Wikipedia article … Aligning Wiktionary and WordNet 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych plant (factory) plant (organism) plant (person) works (machine) WordNet synsets Wiktionary senses {plant, works, industrial plant} {plant, works, industrial plant} {plant, works, industrial plant} to fly (move) reddish (color) A two-step approach: 1. Candidate extraction 2. Candidate disambiguation
  • 20. X works (factory) … 25 bird (animal) Wikipedia article … Wikipedia article … Aligning Wiktionary and WordNet 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych plant (factory) plant (organism) plant (person) works (machine) WordNet synsets Wiktionary senses {plant, works, industrial plant} {plant, works, industrial plant} {plant, works, industrial plant} to fly (move) reddish (color) X X A two-step approach: 1. Candidate extraction 2. Candidate disambiguation
  • 21. Bag of Words Representation synset hypernyms hyponyms hyper- & hyponyms bag-of-words bag-of-words 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 26 lemma sense definition usage examples synonyms Synsets are represented by synonyms, gloss, examples
  • 22. Candidate Disambiguation semantic relatedness measure bag-of-words bag-of-words COS: Cosine similarity score s PPR: Personalized PageRank s < threshold s ≥ threshold No alignment! Align this pair of WordNet synset and Wiktionary sense! 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 27
  • 23. Evaluation Dataset Dataset creation:  No previous alignments = no other evaluation datasets  We created a new dataset with 2,423 sense pairs  10 human raters (students/researchers from CS, math, linguistics)  Annotate each pair as “same meaning” or “different meaning” Dataset reliability:  Inter-rater agreement: AO = .93, κ = .70  Removing two biased raters: AO = .94, κ = .74 Gold standard:  Majority vote of the 8 raters, additional tie breaker 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 39
  • 24. Evaluation Results  RAND: Random baseline  MFS: Baseline aligning always the first sense (≈ most frequent sense) Method A P R F1 RAND .662 .212 .594 .313 MFS .802 .329 .508 .399 COS only .901 .598 .703 .646 PPR only .915 .684 .636 .659 COS&PPR .914 .674 .649 .661  Our approach significantly outperforms the baseline (at 1% level)  COS highest recall; PPR highest precision; COS&PPR highest F1  Significant difference of PPR, COS&PPR over COS (at 1% level)  No significant difference between PPR and COS&PPR 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 40
  • 25. Error Analysis 110 false negatives: “same meaning, but was not aligned”  Very different wording  “good discernment” vs.“ability to notice what others might miss”  Similar senses but slightly below threshold  “plants of the genus Centaurea” vs. “common weeds of the genus Centaurea”  Pointing to another entry rather than a content-based gloss  pacification: “the process of pacifying” 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 42
  • 26. Error Analysis 98 false positives: “different meaning, but have been aligned”  Similar wording, but refer to different concepts  “a computer that provides client stations with access to files and printers as shared resources to a computer network” vs. “any computer attached to a network”  High relatedness, but generic- versus domain-specific vocabulary  “any computer attached to a network” vs. “any organization that provides resources and facilities for a function or event” 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 43
  • 27. Increased Coverage: Parts of Speech  Our alignment: 56,970 sense pairs  Final resource contains 488,988 word senses  Substantial increase in the coverage of senses  Wiktionary is not restricted to nouns/verbs/adjectives: proverbs, idioms, collocations, particles, determiners, inflected forms, etc. Wiktionary AND WordNet Additionally in Wiktionary 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 44 Additionally in WordNet Nouns 34,464 158,085 47,651 Verbs 8,252 29,119 5,515 Adj./Adv. 14,236 60,977 7,541 Other POS 0 16,778 0 Inflected Forms 0 106,328 0
  • 28. Increased Coverage: Domains Wiktionary AND WordNet Additionally in Wiktionary 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 45 Additionally in WordNet Biology 4,465 4,067 12,869 Chemistry 2,561 8,260 2,268 Engineering 1,108 940 1,080 Geology 2,287 2,898 2,479 Humanities 4,949 2,700 5,060 IT 439 3,032 557 Linguistics 1,249 1,011 1,576 Math 615 2,747 483 Medicine 3,613 3,728 3,058 Military 574 426 585 Physics 1,246 2,835 1,252 Religion 733 1,154 781 Social Sciences 3,745 2,907 4,458 Sport 905 2,821 807
  • 29. Enriched Sense Representation Synonyms Gloss Example sentence Subsumption hierarchy Synset organization … Pronunciation Etymology Syntactic knowledge Quotations Related terms Translations … 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 46
  • 30. Selected Conclusions  Aligned Wiktionary – WordNet is characterized by: (1) Increased coverage  Different parts of speech, not only nouns  e.g. humanities and social sciences from WordNet  e.g. technical domains and leisure from Wiktionary (2) Enriched sense representation  Pronunciation, etymology, related terms, translations, etc.  Novel evaluation dataset annotated by 10 human raters  Better results based on the resource-structure based and hybrid techniques in later work (Matuschek & Gurevych, TACL ‘13) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 47
  • 31. Applications of Linked Lexical Resources 48 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Joint Modeling of Features Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 32. Construction of aligned lexical resources Michael Matuschek and Iryna Gurevych: Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment, in: Transactions of the Association for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013 Niemann & Gurevych, IWCS 2011 █ Sense Alignment Meyer & Gurevych, IJCNLP 2011 █ Matuschek & Gurevych, TACL, 2013 █ █ █ Matuschek & Gurevych, COLING, 2014 █ █ █ Miller & Gurevych, LREC 2014 █ █ █ Hartmann & Gurevych, ACL 2013 █ █ █ Graph-based alignment █ Resource-independent alignment █ Text similarity-based alignment █ Exploitation of existing LR alignments to produce new ones 14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 49
  • 33. Similarity-Based Approaches Suffer From…  Different vocabulary employed by definitions  Example: English noun eye/discernment, e.g., she has an eye for fresh talent he has an artist's eye good discernment (either visually or as if visually) low semantic relatedness score… ability to notice what others might miss 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 50
  • 34. Solution: Use the Graph Topology 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 51 Word Senses Java1 Java2 of Java Java3
  • 35. Intuition of Graph Topology Java1 Java2 of Java 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 52 Word Senses Monosemous lexeme programming language Java3 programming language1
  • 36. Java1 Java2 of Java 53 Word Senses Word Senses of Ruby Intuition of Graph Topology 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych Monosemous lexeme programming language Java3 programming language1 Ruby1
  • 37. Intuition of Graph Topology Java1 Java2 of Java Related senses are in the same region of the graph 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 54 Word Senses Monosemous lexeme programming language Word Senses of Ruby Java3 programming language1 Ruby1
  • 38. Dijkstra-WSA Graph-based word sense alignment approach Key ideas:  Represent lexical resources as graphs  Rely on trivial alignments as “reference nodes” and “bridges” Use Dijkstra’s shortest path algorithm to find alignments Steps: 1. Graph construction 2. Computing sense alignments (Matuschek/Gurevych, 2013) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 55
  • 39. Step 1: Graph Construction Represent each lexical resource as an undirected graph L = (V, E) with the set of nodes V representing senses or synsets  the set of edges E  V x V representing some kind of (semantic) similarity between a pair of nodes An edge connects sense S1 and sense S2 if, for example…  There exists a semantic relation between S1 and S2  A lexeme W2 occurs in the sense definition of S1, and W2 is monosemous  S1 and S2 share the same syntactic behavior  … (Matuschek/Gurevych, 2013) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 56
  • 40. Step 1: Graph Construction Graph of resource 1 Graph of resource 2 Java1 Java3 edges representing some kind of (semantic) similarity between nodes Java2 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 57 Java1 programming language1 programming language1 espresso1 espresso1
  • 41. Step 2: Computing Sense Alignments a) Create trivial alignments between the resources:  Trivial = lexeme is unique/monosemous in both resources  Example: programming language  Precision: >0.95 b) Identify alignment candidates  For example: nodes representing the same lemma c) For all nodes still unaligned, find shortest paths to the candidate nodes in the other graph  Trivial alignments serve as “bridges” between the graphs  Align the node pair with the shortest path (Matuschek/Gurevych, 2013) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 58
  • 42. Step 2: Computing Sense Alignments Graph of resource 1 Graph of resource 2 Java1 Java2 Java3 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 59 Java1 programming language1 programming language1 espresso1 espresso1
  • 43. Step 2a: Create Trivial Alignments Graph of resource 2 Graph of resource 1 Java1 Java2 Java3 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 60 Java1 programming language1 programming language1 espresso1 espresso1
  • 44. Step 2b: Identify Alignment Candidates Graph of resource 2 Graph of resource 1 ? ? ? Java1 Java2 Java3 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 61 Java1 programming language1 programming language1 espresso1 espresso1
  • 45. Step 2c: Shortest Paths to the Candidates Graph of resource 2 Graph of resource 1 3 5 ∞ Java1 Java2 Java3 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 62 Java1 programming language1 programming language1 espresso1 espresso1
  • 46. Step 2c: Align the Nodes Graph of resource 2 Graph of resource 1 ! Java1 Java2 Java3 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 63 Java1 programming language1 programming language1 espresso1 espresso1
  • 47. Parameter Choices Restricting the number of alignments  Stop when the first candidate is found (1:1 alignment)  Keep going and align everything you can reach (1:n alignment)  Possibly with a restricted search depth Graph construction  Use semantic relations, monosemous linking, or both  Get rid of relations to high frequent monosemous lexemes (e.g., there is)  Limiting to rare lexemes avoids “explosion” of edges  Rare = only appearing in 1 / N of the definitions (e.g., N = 200) Computing Sense Alignments  Path length L: unbounded L yields unmanageable runtime!  Best F1 score between 5 and 8, depending on the resource pair 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 65
  • 48. Hybrid Approach Main issue of Dijkstra-WSA  Low recall due to missing edges / sparse graph Hybrid approach  Try to align using the graph first  Parameterized for high precision  Align those with no match using a similarity-based approach 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 66
  • 49. Evaluation Datasets Sampled datasets:  WordNet – Wikipedia (1,815 sense pairs)  WordNet – Wiktionary (2,423 sense pairs)  FrameNet – Wiktionary (2,789 sense pairs)  WordNet – OmegaWiki (683 sense pairs)  Wiktionary – OmegaWiki (586 sense pairs)  Wiktionary –Wikipedia English (367 sense pairs) Full datasets:  GermaNet – Wiktionary (45,636 sense pairs)  Wiktionary –Wikipedia German (31,808 sense pairs) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 67
  • 50. Datasets Display Different Properties  WordNet, OmegaWiki, Wikipedia: sense definitions and semantic relations  Wiktionary: no disambiguated semantic relations => sparse graphs  GermaNet: very few sense definitions 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 68
  • 51. Evaluation Random baseline 1:1 1st Similarity-based (SB) Semantic Relations (SR) Linking Monosemes (LM) SR + LM SR + SB LM + SB SR + LM + SB Hybrid Human performance (Matuschek/Gurevych, 2013) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 69
  • 52. Evaluation Random baseline 1:1 1st Similarity-based (SB) Semantic Relations (SR) Linking Monosemes (LM) SR + LM SR + SB LM + SB SR + LM + SB Hybrid Human performance Significant improvement (Matuschek/Gurevych, 2013) in recall…. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 70
  • 53. Evaluation Random baseline 1:1 1st Similarity-based (SB) Semantic Relations (SR) Linking Monosemes (LM) SR + LM SR + SB LM + SB SR + LM + SB Hybrid Human performance (Matuschek/Gurevych, 2013) … and F-measure… 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 71
  • 54. Evaluation … also on all other datasets! 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 72
  • 55. Selected Conclusions  Dijkstra-WSA ≥ gloss similarity for densely linked LSRs  Generic alignment approach is valid  But: low recall for sparse LSRs (English Wiktionary, OmegaWiki)  Dijkstra-WSA + similarity-based backoff outperfoms previous work on all datasets  The two notions of similarity are complementary  Could they be combined in a smarter way? 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 76
  • 56. Joint Modeling of Features Applications of Linked Lexical Resources 77 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 57. Construction of Aligned Lexical Resources Michael Matuschek and Iryna Gurevych: High Performance Word Sense Alignment by Joint Modeling of Sense Distance and Gloss Similarity, in: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014). Dublin, Ireland. Niemann & Gurevych, IWCS 2011 █ Sense Alignment Meyer & Gurevych, IJCNLP 2011 █ Matuschek & Gurevych, TACL, 2013 █ █ █ Matuschek & Gurevych, COLING, 2014 █ █ █ Miller & Gurevych, LREC 2014 █ █ █ Hartmann & Gurevych, ACL 2013 █ █ █ Graph-based alignment █ Resource-independent alignment █ Text similarity-based alignment █ Exploitation of existing LR alignments to produce new ones 14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 78
  • 58. Joint Usage of Features  Similarity- and graph-based approaches both have weaknesses  Different formulation of glosses  Sparse / disconnected graphs Two-step hybrid approach already helped improve recall  But: No real combination of both notions  Idea: Combine them using Machine Learning  Exploit the complementary strengths more effectively 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 79
  • 59. Setup - Features Features:  Gloss similarity (COS, PPR)  Dijkstra-WSA distances  Infinite distance if no target can be found Other possible features:  Part of speech, sense index, translation overlap, example sentence patterns No significant improvement by using them!  Glosses and structure are sufficient 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 80
  • 60. Setup - Classifiers Classifiers used:  Naive Bayes  Bayesian Networks  Perceptrons  Support Vector Machines (SVMs)  Decision Trees Evaluation using 10-fold cross validation  Same datasets as before 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 81
  • 61. Evaluation Random 1:1 1st SB DWSA Hybrid SVM Naive Bayes Bayesian Network Perceptron Decision Tree Human performance 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 82
  • 62. Evaluation Random 1:1 1st SB DWSA Hybrid SVM Naive Bayes Bayesian Network Perceptron Decision Tree Human performance General improvement in precision… 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 83
  • 63. Evaluation Random 1:1 1st SB DWSA Hybrid SVM Naive Bayes Bayesian Network Perceptron Decision Tree Human performance …but in F-measure only for some of the datasets! 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 84
  • 64. Selected Conclusions  Better overall results on 4 out of 8 datasets Machine learning helps most for sparse and incomplete LSRs like OmegaWiki and Wiktionary  For „complete“ LSRs like WordNet, we cannot gain much  Better precision on 7 out of 8  Most robust: Bayesian Networks  Complex classifiers (e.g. SVMs) challenged by skewed values Main source of improvements:  Better classification of „borderline“ examples  High gloss similarity & distance or vice versa 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 88
  • 65. Borderline Example Genome: 1. “The non-redundant genetic information stored in DNA sequences that defines an individual organism” 2. “In the context of a genetic algorithm, the information that defines an individual entity”  Very similar description But: Far apart in the graph => No alignment! 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 89
  • 66. Joint Modeling of Features Applications of Linked Lexical Resources 90 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 67. Linked Lexical Resources Gurevych et al., EACL 2012 █ █ LLRs Eckle-Kohler et al., LREC 2012 █ █ Eckle-Kohler & Gurevych, EACL 2012 Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth: UBY - A Large-Scale Unified Lexical-Semantic █ Resource Based on LMF, in: Proceedings of the 13th Conference of the European chapter of the Association for Computational Linguistics (EACL), April 2012. Eckle-Kohler et al., SWJ, 2014 █ Eckle-Kohler et al., LMF, 2013 █ █ █ █ Large-scale unified LR based on LMF █ Standardizing heterogeneous LRs █ Standardized format for subcat frames █ Language independence of lexicon models 12.0.2014 | Technische Universität Darmstadt | Iryna Gurevych 91
  • 68. UBY: Linking Lexical Resource Two main charUaBYcteristics: - Word Sense Alignments - Standardized Representation 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 92 Web 2.0 IMSLex-Subcat
  • 69. Heterogeneity of Lexical Resources Complementary information types Different terminology Incompatible Data formats 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 93
  • 70. Unified Lexical Resource UBY Unified lexicon model Preserves variety of lexical information 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 94 Extensible
  • 71. Structure Integration Standardized representation frameworks  Lexical Markup Framework (LMF) http://www.lexicalmarkupframework.org  Text Encoding Initiative (TEI) http://www.tei-c.org <entry> <form> <orth>disproof</orth> <pron>dIs"pru:f</pron> </form> <gramGrp> <pos>n</pos> </gramGrp> <sense n="1"> <def>facts that disprove something.</def> </sense> <sense n="2"> <def>the act of disproving.</def> [..] 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 95
  • 72. Structure Integration in UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 96 (Eckle-Kohler et al. 2012)
  • 73. Sense Alignments Enable Semantic Interoperability  Senses linked by SenseAxis class (over 1,000,000 instances)  English alignments, e.g. WordNet-Wikipedia  German alignments, e.g. GermaNet-Wiktionary  Cross-lingual alignments, e.g. WordNet-OmegaWiki DE 97 1. To sing: To produce musical or harmonious sounds with one’s voice. 2. To sing: To express audibly by means of a harmonious vocalization. 3. To sing: To confess under interrogation. 1. singen: Mit der Stimme harmonische Töne erzeugen. 1. To sing: Produce tones with the voice 2. To sing: divulge confidential information or secrets 1. To sing: To produce harmonious sounds with one's voice. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 74. Available Alignments Wikipedia English—WordNet 83,192 Wiktionary English—WordNet 138,282 GermaNet—Wiktionary German 32,850 FrameNet—Wiktionary English 12,340 Wiktionary English—OmegaWiki English 34,509 WordNet—OmegaWiki German 27,529 Wiktionary German—Wikipedia German 21,872 Wiktionary English—Wikipedia English 66,050 WordNet—VerbNet 40,716 FrameNet—VerbNet 17,529 Wikipedia English—OmegaWiki English 3,960 Wikipedia German—OmegaWiki German 1,097 Wikipedia English—Wikipedia German 463,311 OmegaWiki English—OmegaWiki German 58,785 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 98
  • 75. Resource Integration Workflow in UBY JWNL FN API JWPL JWKTL Human users Machines 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 99
  • 76. Step 1. Structure Integration UBY API UBY API UBY API UBY API Human users Machines UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 100
  • 77. 101 UBY-API Step 2. Content Integration Human users Machines UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 78. UBY Web UI – Textual View Textual View: allows to list senses across all resources, to display sense details and to perform sense comparisons. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 104
  • 79. UBY Web UI – Visual View Visual view: allows to explore the sense alignments. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 105
  • 80. UBY Java API The UBY API is open source at Google Code: http://code.google.com/p/uby/ Getting Started: 1. Download a UBY database dump 2. Import the dump into a MySQL database 3. Start using the UBY API The UBY API is work in progress! Many API methods need to be added – consider contributing! 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 106
  • 81. UBY – Data and Tools https://uby.ukp.informatik.tu-darmstadt.de/webui/ Database Dumps UBY http://uby.ukp.informatik.tu-darmstadt.de/uby/ UBY 107 http://code.google.com/p/uby/ 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych Web Interface Open Source API (JAVA)
  • 82. Joint Approaches to Word Sense Alignment Applications of Linked Lexical Resources 108 Motivation Similarity-based Word Sense Alignment Graph-based Word Sense Alignment Outline Putting the Pieces Together: UBY 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych
  • 83. Utilizing Linked Lexical Resources Cholakov et al., EACL 2014 █ Kostadin Utilizing Cholakov and Judith Eckle-Kohler and Iryna Gurevych: Automated Verb Sense Labelling LLRs Based on Linked Lexical Resources, in: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pp. 68-77, April 2014 Matuschek et al., KONVENS 2014 █ Michael Matuschek and Christian M. Meyer and Iryna Gurevych: Multilingual Knowledge in Aligned Wiktionary and OmegaWiki for Translation Applications, in: Translation: Corpora, Computation, Cognition (TC3), vol. 3, no. 1, p. 87-118, July 2013 Matuschek et al., TC3, 2013 █ Hartmann et al., 2014 (in preparation) Hartmann & Gurevych, ACL 2013 █ █ █ Sense annotation/disambiguation █ Machine/computer-assisted translation █ Semantic role labelling Michael Matuschek and Tristan Miller and Iryna Gurevych : A Language-independent █ Cross-language transfer of lexical-semantic Sense Clustering Approach for Enhanced WSD, in Proceedings of the 12th Konferenz zur Verarbeitung naturlicher Sprache (KONVENS 2014), to appear 14.05.2014 | Technische Universität Darmstadt | Iryna Gurevych 109 resources
  • 84. Automatic Verb Sense Labelling of Corpora Motivation  Automatically create verb sense-annotated corpora as training data for supervised approaches Approach 1. Create sense patterns from UBY (combining WordNet, FrameNet, VerbNet, Wiktionary) 2. Compare these to patterns derived from corpus instances 3. Assign word sense in corpus if similarity is above a threshold 4. Use this data to train supervised systems (distant supervision) Results  Significant improvement over MFS baseline for verb sense disambiguation on MASC and Senseval-3 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 110
  • 85. Using Alignments for Word Sense Clustering Motivation  Cluster fine-grained word senses in expert-built resources to improve WSD performance Approach 1. Create alignments between resources using Dijkstra-WSA, allowing 1:n alignments  Source: GermaNet, WordNet  Target: Wiktionary, Wikipedia, OmegaWiki 2. If two or more senses are aligned to the same sense in the other resource, merge them into one coarse sense 3. Rescore state-of-the-art WSD algorithms on clustered sense inventory Results  Significant improvement over random clusters of same granularity on WebCAGe (GermaNet) and Senseval-3 (WordNet) April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 111
  • 86. Using Aligned Resources for Computer-aided Translation Motivation  SMT systems help, but are not smart enough to replace manual translation Approach 1. Create sense alignments between multilingual resources 2. Display information from all resources for a particular meaning Results  Substantially more available translations and other information types  Example: “bass” in Wiktionary and OmegaWiki April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler 112
  • 87. Programming language is not an island!  Word Sense Alignment is vital for increasing coverage and richness of sense representations But: It is a hard problem!  Various approaches  Similarity-based, graph-based, combined  Performance depends on resources  Sparsity, availability of glosses,…  Machine learning shows most robust results  Aligned resources help improve performance for various applications  VSD, coarse-grained WSD, computer-aided translation 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 128
  • 88. Future Work 1. Linked lexical resources (LLRs)  Integrating and aligning further resources in UBY  Special focus: cross-lingual alignment 2. Construction of aligned lexical resources  Investigating more elaborate similarity measures for glosses  Using different graph algorithms to better express similarity  Aligning several resources at once (n-way alignment) 3. Utilizing LLR for language processing  Unified deep learning framework utilizing linked resources  Distant supervision applied to semantic role labeling  Word sense disambiguation and lexical substitution for German 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 129
  • 89. Thank you. Questions? 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 130
  • 90. Sense Alignment of Lexical Resources (References)  Elisabeth Niemann and Iryna Gurevych. The People’s Web Meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Conference on Computational Semantics (IWCS), p. 205-214, January 2011.  Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage. In: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), p. 883–892, November 2011.  Michael Matuschek and Iryna Gurevych. Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment. Transactions of the Association for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013.  Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), vol. 1, p. 1363-1373, August 2013.  Tristan Miller and Iryna Gurevych. WordNet-Wikipedia-Wiktionary: Construction of a Three-way Alignment. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), May 2014. (to appear) 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 131
  • 91. Linked Lexical Resources @ UKP (References)  Judith Eckle-Kohler and Iryna Gurevych. Subcat-LMF – Fleshing out a Standardized Format for Subcategorization Frame Interoperability. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 550-560, April 2012.  Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), p. 275-282, May 2012.  Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - Exploring the Boundaries of Language-Independent Lexicon Models. In: LMF Lexical Markup Framework, chap. 10, p. 145-156, ISTE - HERMES - Wiley, 2013. ISBN 978 184 821 4309.  Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth. UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 580--590, April 2012.  Judith Eckle-Kohler, John Philip McCrae, and Christian Chiarcos. lemonUby - A Large, Interlinked, Syntactically-rich Lexical Resource for Ontologies. Semantic Web Journal, March 2014. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 132
  • 92. Utilizing Linked Lexical Resources (References)  Kostadin Cholakov, Judith Eckle-Kohler, and Iryna Gurevych. Automated Verb Sense Labelling Based on Linked Lexical Resources. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 68-77, April 2014.  Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), vol. 1, p. 1363-1373, August 2013.  Michael Matuschek, Tristan Miller, and Iryna Gurevych. A Language-independent Sense Clustering Approach for Enhanced WSD. In: Proceedings of the 19th Conference on Empirical Methods in Natural Language Processing, October 2014. (in submission)  Michael Matuschek, Christian M. Meyer, and Iryna Gurevych. Multilingual Knowledge in Aligned Wiktionary and OmegaWiki for Translation Applications. Translation: Corpora, Computation, Cognition (TC3), vol. 3, no. 1, p. 87- 118, July 2013. 12.09.2014 | Technische Universität Darmstadt | Iryna Gurevych 133