In this paper we present an approach to Word Sense Disambiguation based on Topic Modeling (LDA). Our approach consists of two different steps, where first a binary classifier is applied to decide whether the most frequent sense applies or not, and then another classifier deals with the non most frequent sense cases. An exhaustive evaluation is performed on the Spanish corpus Ancora, to analyze the performance of our two-step system and the impact of the context and the different parameters in the system. Our best experiment reaches an accuracy of 74.53, which is 6 points over the highest baseline. All the software developed for these experiments has been made freely available, to enable reproducibility and allow the re-usage of the software.
3. Starting point
“Understanding languages by machines” project
Starts from the results of DutchSemCor (WSD)
Analyse the real problems of WSD
Understand the WSD task
Word
Meaning
Context
3Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
5. Still WSD?
Word Sense Disambiguation is still unsolved
Used in high level applications
Recently some unsupervised approaches and SemEval
tasks
Babelnet, Babelfy…
Several reasons and problems
5Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
6. WSD problems I
Context is not considered properly
Most are/were supervised approaches
Moving to unsupervised, graph-based…
WSD as a black box
The larger number of features, the better performance?
The best and newest machine learning algorithm
WSD is seen as only one problem
All words and cases treated in the same way
6Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
7. WSD problems II
Error analysis SenseEval/SemEval systems [Postma
et al., 2014]
Propagation errors (monosemous)
Most Frequent Sense bias
Supervised systems are skewed towards MFS
Error analysis on WSD and SenseEval/SemEval
Performance on MFS cases is good
Very poor performance on non MFS cases
7Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
9. WSD problems II
Most Frequent Sense bias
Supervised systems are skewed towards MFS
Error analysis on WSD and SenseEval/SemEval
Performance on MFS cases is good
Very poor performance on non MFS cases
Systems assign MFS in almost every case
Sval2
799 cases where the correct is not the MFS
84% of the system still assign the MFS
9Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
11. Main idea
WSD considered as two different problems
When the MFS applies
More general usages
Larger contexts ??
Rest of the senses
More concrete usages
Shorter contexts ??
Specialized classifiers for each case
Different features, parameters, contexts…
Evaluation for Spanish
Sense annotated corpus Ancora
11Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
12. Our approach
TRAINING. Use Topic Modeling (LDA) to induce word
expert classifiers
For the Most Frequent Sense
Topics for the MFS case
Topics for non MFS cases
For the rest of senses (non MFS)
Topics for every sense
CLASSIFICATION. Apply the 2 classifiers in cascade
to decide the sense in every case
BINARY
MULTICLASS
12Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
16. Evaluation framework
Ancora corpus
News Articles, Spanish part, 500K words, sense
annotated (nouns)
Converted to NAF format
3 Folded-cross validation
Keeping sense distribution
7119 unique lemmas annotated with nominal senses
16Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
17. Evaluation framework
Ancora corpus
Spanish part, 500K words, sense annotated (nouns)
3 Folded-cross validation
Keeping sense distribution
7119 unique lemmas annotated
4907 are monosemous (69%)
2212 are polysemous (31%)
589 with at least 3 instances per sense (from the annotated)
17Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
18. Evaluation framework
Ancora corpus
Spanish part, 500K words, sense annotated (nouns)
3 Folded-cross validation
Keeping sense distribution
7119 unique lemmas annotated
0
200
400
600
800
1000
1200
1400
2 3 4 5 6 7 8 9 10 11 12
Number of lemmas vs. polysemy
Number of Lemmas
18Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
19. Baseline Results
For the 589 selected lemmas
Baseline Accuracy
Random 40.10
MFS overall 67.68
MFS folded 68.63
19Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
21. Experimentation
Configuration of our cascade classifiers
Only one step with the senseLDA classifier
2 steps, mfsLDA with perfect performance + senseLDA
2 steps, mfsLDA and senseLDA both induced
automatically
LDA parameters (python gensim library)
Context size (number of sentences)
Number of topics for LDA
21Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
22. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results I
Instance
Example
Sense
LDA (all
senses)
Word
Sense
One step
classification
Sentences Topics Accurac
y
MFS baseline 68.63
0 3 67.54
10 65.56
100 58.34
3 3 66.30
10 64.62
100 60.07
50 3 66.04
10 63.42
100 59.06
• MFS not reached
• Most informative clues in
small contexts
• More topics less
performance
22
23. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results II
Instance
Example
MFS
(100%
accuracy)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier 100%
performance
Sentences Topics Accurac
y
MFS baseline 68.63
0 3 92.48
10 92.12
100 90.50
3 3 92.45
10 92.11
100 91.60
50 3 92.41
10 92.12
100 91.43
• Extremely high figures
• Good performance of the
senseLDA classifier (when no
MFS)
• Similar behaviour w.r.t. #sents
and # topics
23
24. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results III
Instance
Example
MFS (s5)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier #S=5
Sents Topics Acc. MFS
T100
Acc. MFS
T1000
MFS baseline 68.63
0 3 74.53 66.73
10 74.00 66.41
100 72.61 64.91
3 3 74.30 66.61
10 73.87 66.36
100 73.39 65.76
50 3 74.26 66.48
10 73.90 66.24
100 73.53 65.75
• MFS s5 t100
• Smaller contexts for
non MFS cases (3, 50
included by 0)
• 3 Topics is the best
24
25. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results IV
Instance
Example
MFS (s50)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier #S=50
Sents Topics Acc. MFS
T100
Acc. MFS
T1000
MFS baseline 68.63
0 3 73.34 67.15
10 72.92 66.76
100 71.43 65.13
3 3 73.21 67.02
10 72.88 66.60
100 72.40 66.24
50 3 73.21 66.95
10 72.83 66.58
100 72.15 66.20
• Similar behaviour
compared to MFS_s5
• Slightly lower results
25
26. Lemma comparison
Lemma MFS (68.63) LDA (74.53) Variation Annotations
año 89.15 91.19 2.04 1275
país 72.29 83.55 11.26 695
presidente 70.31 73.94 3.63 690
partido 55.87 64.48 8.61 641
equipo 98.32 98.88 0.56 539
mes 54.29 80 25.71 315
hora 61.39 56.11 -5.28 305
caso 61.05 91.58 30.53 286
mundo 47.31 40.14 -7.17 279
semana 85.06 92.34 7.28 263
Most frequent lemmas
26Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
28. Conclusions
Simple approach based on LDA for WSD in Spanish
Two step classification approach for WSD improves the results for
Spanish (6 points)
Different nature of both cases
MFS in contexts of 5 sentences, 100 topics
NonMFS in contexts in the local sentence, 3 topics
All code and data publicly
available on GitHub (group policy)
http://github.com/rubenIzquierdo/lda_wsd
28Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.