ULM-1 Understanding Languages by Machines: The borders of Ambiguity

ULM-1
Understanding Language
by Machines
The Borders of Ambiguity
Ruben Izquierdo
ruben.izquierdobevia@vu.nl
http://rubenizquierdobevia.com

Structure
 Part I
 The ULM-1 project
 Part II
 Error analysis on WSD
 Part III
 Using Background Information to Perform WSD
 Part IV
 What is next?
Ruben Izquierdo, Nov 2015 “The Borders of Ambiguity” 2

Who am I?
 Ruben Izquierdo Bevia
 Computer Science, Alicante, Spain 2004
 2004-2011 researcher at the University of Alicante
 September 2010, Alicante
 Phd. Thesis: An approach to Word Sense Disambiguation based on Supervised
Machine Learning and Semantic Classes
 Sept 2011  Sept 2012
 DutchSemCor project (Tilburg and VU universities, NL)
 Sept 2012  Sept 2014
 Opener project (VU University, NL)
 Sept 2014 
 ULM1 Spinoza project
Ruben Izquierdo, Nov 2015 “The Borders of Ambiguity”
3

Part I
Understanding Language by
Machines

Understanding Languages by
Machines
 NWO (Netherlands Organization for Scientific
Research)
 Spinoza Price
 Highest Dutch award in science for top researchers with
international reputation
 Piek Vossen was one of the three winners in 2013
 Some money for research  4 ULM projects

Machines
 Develop computer models that assign deeper meaning
to language and approximates human understanding
 Use the models to automatically read and understand
texts
 Words and texts are highly ambiguous
 Get a better understanding of the scope and complexity
of this ambiguity

Machines
 ULM-1: The borders of ambiguity
 Word relations and ambiguity
 Define the problem and find an optimal solution
 ULM-2: Word, Concept, Perception and Brain
 Relate words and meanings to perceptual data and brain activation patterns
 ULM-3: From timelines to storylines
 Interpretation of words and our way of interacting with the changing world
 Structure these changes as stories along explanatory motivations
 ULM-4: A quantum model of text understanding
 Technical model
 Move from pipeline approaches which take early decisions to a model there the final
interpretation is carried out by high-order semantic and contextual models

ULM-1: The Borders of
Ambiguity
Piek Vossen Marten Postma Ruben Izquierdo

Word Sense Disambiguation
WSD  “The problem of computationally determining which
‘sense’ of a word is activated by the use of that word in a
particular context” (Agirre & Edmonds, 2006)
Our1 project14 looks14 into1 breaking60 the1 borders10 of1
ambiguity1, for1 which1 the1 queen12 piece18 is13 an1 example1
1.981.324.800 interpretations !!!

Classical Approaches
 Supervised approaches
 Require annotated data
 Problems with domain adaptation
 Knowledge based
 Dependent on the resources
 Unsupervised approaches
 Low performance
 Require large amount of data

Still Unsolved
WSD is still considered to be “unsolved”
Competition Year Type Baseline Best F1
SensEval2 2001 all-words 57.0 69.0 (Sup)
SensEval3 2004 All-words 60.9 65.1 (Sup)
SemEval1 2007 All-words (task 17) 51.4 59.1 (Sup)
SemEval2 2010 All-words on specific
domain
50.5 56.2 (Kb)

General Trends
 Look at WSD as a purely classification problem
 Focus more on the low level algorithm than on the
WSD problem itself
 Poor representation of the context
 Following the idea: “the more features, the better
performance”
 Usually Bag-of-words features

… but … what about the
discourse and background
information?

Discourse and Background
Knowledge
The winner will walk away with $1.5 million
source: http://www.southafrica.info/news/sport/golf- nedbank-
210613.htm#.VEAWkYusVW8
Creation time: 21 June 2013

Knowledge
Winner  the contestant who wins the contest (wordnet
synset ENG30-10782940-n)

Knowledge
The winner won the Nedbank
Golf Challengue

Knowledge
The winner was  Thomas Bjørn

Borders of Ambiguity
Lexical WSD: WordNet sense of winner
Discourse information: “winner” is the winner of the
Nedbank Golf Challenge
Referential WSD: the “winner” is Thomas Børjn
WordNet

The Role of Background
knowledge
“One of the best moves by Gary Kasparov which includes a queen sacrifice…”
Source: http://www.chess.com/forum/view/chess-players/kasparov-queen-sacrifice

knowledge
“One of the best moves by Gary Kasparov which includes a queen sacrifice…”
Source: http://www.chess.com/forum/view/chess-players/kasparov-queen-sacrifice
STATE OF THE ART SYSTEM
It-makes-sense WSD system (Zhong and Ng, 2010)
• 36% queen.n.1: the only fertile female in a colony of social insects such
as bees, ants or termites.
• 34% queen.n.2: a female sovereign ruler
• 30% queen.n.3: the wife or widow of a king
• …..
• 0% queen.n.6: the most powerful chess piece

knowledge
 A very naïve approach
 Find “Gary Kasparov” as an entity and link it to Wikipedia
 Compare textual overlapping of:
 Wikipage Queen_chess  Wikipage Gary_Kasparov
 170 overlapping types
 Wikipage Queen_regnant  Wikipage Gary_Kasparov
 88 overlapping types
Examples of matching words Queen_chess – G. Kasparov
board opening matches game press championship rules
chess player king queen

Our ideal system

Part II
Error Analysis of WSD
systems

Motivation
Word Sense Disambiguation is still an unsolved problem

Hypothesis
 Little attention has been paid to the problem
 WSD as just 1 problem
 The context is not being exploited properly
 Systems rely too much on the Most Frequent Sense
 It is indeed the baseline, very hard to overcome

Goal of the Analysis
 Perform error analysis of the participant systems on
previous WSD evaluations to prove our hypothesis
 Senseval-2: all-words task
 Senseval-3: all-words task
 Semeval2007: all-words task (#17)
 Semeval2010: all-words on specific domain (#17)
 Semeval2013: multilingual all-words WSD and entity
linking (#12)

Analysis
 Calculate the performance of the systems according to
different criteria of the gold data
 Monosemous / polysemous
 Part-of-speech
 Most Frequent Sense vs. Non MFS
 Polysemy class
 Frequency class

Monosemous errors

Monosemous Errors
Competition Monosemou
s
Wrong Examples
Senseval2 499 (20.9%) 37.5% gene.n (suppressor_gene.n), chance.a
(chance.n) next.r (next.a)
Senseval3 334 (16.6%) 44.1% Datum.n (data.n) making.n (make.v)
out_of_sight (sight)
Semeval2007 25 (5.5%) 11.1% get_stuck.v, lack.v, write_about.v
Semeval2010 31 (2.2%) 97.9% Tidal_zone.n pine_marten.n roe_deer.n
cordgrass.n
Semeval2013
(lemmas)
348 (21.1%) 1.9% Private_enterprise, developing_country,
narrow_margin

Most Frequent Sense

Most Frequent Sense
 When the correct sense is NOT the most frequent
sense
 Systems still assign mostly the MFS
 Senseval2
 799 tokens are not MFS
 84% systems still assign the MFS
 Most “failed” words due to MFS bias
 Senseval2, senseval3
 Say.v find.v take.v have.v cell.n church.n
 Semeval2010
 Area.n nature.n connection.n water.n population.n

Analysis per PoS-tag

Polysemy Profile

Frequency Class

Expected vs. Observed
difficulty
 Calculate per sentence
 The “expected” difficulty
 Average polysemy, sentence length, average word length

difficulty
 Average polysemy, sentence length, average word length

difficulty
 Average polysemy, sentence length, average wor length
 The “observed” difficulty
 From the real participant outputs, average error rate
 We could expect:
harder sentences higher error rate
easier sentences lower error rate

difficulty

difficulty
• The context is not (probably) exploited properly
• Expected “easy” sentences SHOULD show low error rates
• Occurrences of the same word in different contexts have similar
error rate
• The difficulty of a word depends more on its polysemy than on
the context where it appears

WSD Corpora
http://github.com/rubenIzquierdo/wsd_corpora

WSD Corpora

System Outputs
https://github.com/rubenIzquierdo/sval_systems

System Outputs

Part III
When to Use Background
Information to Perform WSD

SemEval-2015 Task #13
 Multilingual All-Words Sense Disambiguation and Entity
Linking

SemEval-2015 Task #13

Motivation
 From the previous error analysis
 MFS bias is a big problem
 For both supervised and unsupervised approaches
 Specially when there is domain shift
 Our approach
1. Determine the predominant sense for every lemma in the
specific domain (unsupervised)
2. Apply a state-of-the-art WSD system
3. Define an heuristic to determine when to apply 1) or 2)
4. We focused on WSD in English only

Architecture
 IMS route: favors the MFS in general domain and local features
 Background route: favors the predominant sense in the domain
ROUTE 1
ROUTE 2

Architecture

Architecture
 Two different approaches
 Online approach
 The SemEval test documents (4 documents)
 Offline approach
 Precompiled documents for the target domain
 Documents from biomedical domain
 Converted to NAF
 Tokens, Lemmas and PoS tags
Seed documents SD

Architecture

Architecture
 DBpedia spotlight is applied to the seed documents
 Entities and links to DBpedia are extracted
 Wikipedia pages from DBpedia links
 Filter:
 Consider only DBpedia links with a ontological type which
is a leaf on the ontology
 Better results without filter
 All the wikipedia pages compile the EAC corpus
Entity Article Corpus EAC

Architecture

Architecture
 Targets high recall and low precision/quality
 Entity Article Corpus EAC  LDA  Domain Model DM
 For every document DEAC in EAC
 Obtain the DBpedia type T
 Obtain the set of DBpedia entities S from DBpedia which belong to
T
 For every document DS in S:
 Compute the similarity of DS against the model DM
 If similarity >= THRESHOLD  select document for the Entity expanded
corpus
LDA Expansion

Architecture
LDA Expansion

Architecture
LDA Expansion
http://dbpedia.org/ontology/HumanGene

Architecture
LDA Expansion
Domain
Model
LDA
Similarity
Entity Article
Corpus EAC

Architecture

Architecture
Entity Overlapping Expansion
 Targets high quality and medium recall
 Entity Article Corpus EAC
 Extract all the set of entities: SE
 For every entity E in SE:
 Obtain all the wikilinks in E: W
 For every Ew in W
 Obtain all the wikilinks Wew in Ew  SW
 Compute the overlap SE and SW
 Filter by threshold

Architecture
…
…
http://dbpedia.org/resource/CCDC11
…
…
SE
WikiPage for CCDC11
Get wikilinks for
CCDC11
…
…
Phosphorylation
…
…
WikiPage for Phosphorylation
Get wikilinks for
Phosphorylation
Phosphate
Enzymes
Biochemistry
Prokaryotic
CCDC11
wikilinks

Architecture
…
…
http://dbpedia.org/resource/CCDC11
…
…
SE
Phosphate
Enzymes
Biochemistry
Prokaryotic
Calculate overlap
> THRESHOLD
Select / Reject

Architecture

Architecture
Predominant Sense Algorithm
 Background corpus BC: EAC + EE
 For every lemma L in BC:
 Extract all sentences containing L
 If there are more than 100 sentences
 Word sense induction with Hierarchical Dirichlet Processes
(Lau et al., 2012)
 Induce senses using Topic Modeling
 Output: list of senses with confidences per lemma

Architecture

Architecture
Voting
 For a new instance for a given lemma
 Obtain sense ranking of Predominant Sense (PS)
 Only if first 2 senses agglomerate 85% of confidence (avoid
skewedness)
 Mix both sense rankings
 PS and ItMakesSense
 Select the sense with highest confidence
 If there is no Predominant Sense information
 Use ItMakesSense best sense

Results
All domains
Measure All N V
Precision 67.5 (2) 64.7 56.6
Recall 51.4 (5) 42.9 53.9
F1 58.4 (4) 51.6 55.2
Social Issues domain
Measure All N V
F1 61.2 (2) 54.8 (7) 70.6 (1)
Math Computer domain
Measure All N V
F1 47.7 (5) 30.5 (13) 49.7 (7)
Biomedical domain
Measure All N V
F1 66.4 (4) 62.7 (9) 53.8 (2)

Discussion
 The domain was not just biomedical, but mixed
 We couldn’t use offline approach
 Online approach: small size of seed documents
 We used WN1.7.1 while gold was WN3.0
 Some test instances were not annotated
 Only the predominant sense output
 Precision nouns improved 64.7%  69.1%
 Precision verbs improved 56.6%  64.6%
 … but…
 Recall nouns 42.9%  20.1%
 Recall verbs 53.9%  17.7%

GitHub Code
https://github.com/cltl/vua-wsd-sem2015

Part IV
What is next?

Current and Future
 Most Frequent Sense Classifier
 Decide when MFS apply or not
 Based on the output of 2 WSD systems
 UKB
 IMS
 Random Forest algorithm
 Features
 Confidence of the MFS by systems
 Sense ranking entropy
 WordNet Domains / SuperSense for the MFS
 …
 Voting for selecting the MFS

Current and Future
 Unsupervised learning for MFS / LFS
 Distributional semantics and word2vec for detecting the
MFS
 Vectors for representing MFS cases
 Vectors for representing LFS cases
 Operate with vectors
 V(‘Paris’) – V(‘France’) + V(‘Italy’) => V(‘Rome’)
 V(‘king’) – V(‘man’) + V(‘woman’)  V(‘queen’)

ULM-1
Understanding Language
by Machines
The Borders of Ambiguity
THANKS
Ruben Izquierdo
ruben.izquierdobevia@vu.nl
http://rubenizquierdobevia.com

SemEval2013 datasets

SemEval2013 results

ULM-1 Understanding Languages by Machines: The borders of Ambiguity

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (16)

Similar a ULM-1 Understanding Languages by Machines: The borders of Ambiguity

Similar a ULM-1 Understanding Languages by Machines: The borders of Ambiguity (20)

Más de Rubén Izquierdo Beviá

Más de Rubén Izquierdo Beviá (9)

Último

Último (20)

ULM-1 Understanding Languages by Machines: The borders of Ambiguity

Notas del editor