SlideShare a Scribd company logo
1 of 41
Biomedical Word Sense Disambiguation with
Neural Word and Concept Embedding
Department of Computer Science
University of Kentucky
Oct 7, 2016
AKM Sabbir
Advisor,
Dr. Ramakanth Kavuluru
10/27/2016 1
Outline
 Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 2
Introduction
• WSD is the task of detecting correct sense or assigning proper
sense
– the air in the center of the vortex of a cyclone is generally
very cold
– I Could not come to office last week because I had a cold
• Retrieving information from Machine is not easy task
• Number of Natural Language Processing(NLP) tasks require
WSD
10/27/2016 3
Outline
• Introduction
 Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 4
Application
• Text to Speech Conversion
– Bass can be pronounced either base or baes
• Machine Translation
– French Word Grille can be translated into gate or bar
• Information Retrieval
• Named Entity Recognition
• Document Summary Generation
10/27/2016 5
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
 Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Approach
• Experiment and Analysis
• Conclusion
10/27/2016 6
Motivation
• Generalized WSD is a difficult problem
• Solve it for each domain
• Biomedical domain contains a large number of ambiguous
words
• Medical report summary generation
• Drug side effect prediction
10/27/2016 7
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
 Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 8
Related Method
• Supervised Methods
– Support Vector Machines, Convolutional Neural Net
• Unsupervised Methods
– Clustering, generative model
– If vocabulary has four words w1, w2, w3, w4
• Knowledge Based Methods
– WordNet, UMLS(Unified Medical Language System)
10/27/2016 9
w1 … w4
w1 1/5 2/5 0
w2
…
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
 Our Method
• Word Vectors
• Tools Used
• Our Approach
• Experiment and Analysis
• Conclusion
10/27/2016 10
Our Method
• We build a semi supervised model
• Model involves usage of concept/sense/CUI vectors just like
how people use word vectors (more later)
• Metamap is an knowledge based NER tool. We use its
decisions is used to generate concept vectors
• Model also involves the usage of P(w|c) where c is a concept
or sense generated using other knowledge based approaches
• Generated word vectors using unstructured data source
10/27/2016 11
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Approach
 Word Vectors
• Tools Used
• Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 12
What is Word Vector
• Distributed representation of words
• Representation of word spread across all dimension of vector
• The idea is different from other representation where the
length is equal to the vocabulary size. Here we choose a small
dimension say d=200 and generate dense vectors
• Each element of the vector contributes to the definition of
many different words
10/27/2016 13
0.07 0.05 0.8 0.002 0.1 0.3King
0.7 0.05 0.67 0.002 0.2 0.3Queen
What is Word Vector
• It is a numerical way of word representation
• Where each dimension captures some semantic and syntactic
information related to that word
• Using the similar idea we can generate concept/sense/CUI vectors.
10/27/2016 14
Why Word Vectors Work ?
• Learned word vectors capture the syntactic and semantic
information exist in text
– vector(“king”) – vector(“man”) + vector(“woman”) ≈ vector(“queen”)
10/27/2016 15
Fig 5: resultant queen vector and other vectors [5]
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
 Tools Used
• Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 16
Required Tools
• language model
10/27/2016 17
Required Tools Contd.
10/27/2016 18
Step1 Parsing: text parsed in noun phrases using xerox POS tagger to
perform syntactic analysis [4].
Step2 Variant Generation: Varaint for each input phrase are
generated using the knowledge of specialist lexicons and
supplementary database of synonyms
Step3 Candidate Retrieval: the candidate sets retrieved from the
UMLS metathesaurus contains at least One of the variants generated
from step three
Step 4 Candidate Evaluation
Fig2 : Variants for word ocular
Fig3 : evaluated candidates for Ocular complication
• Metamap
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
 Our Method in Detail
• Experiment and Analysis
• Conclusion
10/27/2016 19
Our Method in Detail
• Text preprocessing
– English stop words
– Nltk word tokenization
– Frequency greater than five
– Lower case everything
• Word context is ten words long
• Generated word vectors are 300 dimension
10/27/2016 20
Generating word and concept vectors
• Generate word and concept vectors
• 20 million citations from pubmed for training word vectors
• Randomly chosen 5 million citations
• Retrieved 7.1 million sentences containing target ambiguous
words
• Each sentence is 16-17 words long
• Combined sentence are used to generate the bigrams
• Each bigrams fed into metamap with WSD option turned on
• Replace each bigram with corresponding concepts
• Then fed the data to language model to generate concept vector10/27/2016 21
Estimate P(D|c) [Yepes et al.]
• Using Jimeno-Yepes and Berlanga[3] model used Markov
Chain to calculate P(D|c)
• In order to get P(D|c), need to calculate P(w|c)
– 𝑃 𝑤𝑗 𝑐𝑖 =
𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗,𝑐𝑖)
𝑤 𝑗∈𝑙𝑒𝑥(𝑐 𝑖) 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗, 𝑐 𝑖)
10/27/2016 22
Biomedical MSH WSD
• A dataset with 203 ambiguous words
• 424 unique concept identifiers (senses)
• 38,495 test context instances with an average of 200 test
instances for each ambiguous word.
• Goal -- to correctly identify the senses for each test instance
10/27/2016 23
Model I Cosine Similarity
𝑓 𝑐
𝑇, 𝑤, 𝐶 𝑤 = 𝑎𝑟𝑔max
𝑐∈𝐶(𝑤)
cos(𝑇𝑎𝑣𝑔, 𝑐)
• W is the ambiguous word
• T is test instance context containing the ambiguous word w
• C(w) is the set of concepts that w can assume
10/27/2016 24
Model II projection magnitude
𝑓 𝑐(𝑇, 𝑤, 𝐶(𝑤)) = argmax
𝑐∈𝐶(𝑤)
𝜌(cos 𝑇𝑎𝑣𝑔, 𝑐 ).
𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐)
𝑐
• Took projection along concept vector and then consider the
Euclidean norm
10/27/2016 25
Model III
𝑓 𝑐
(𝑇, 𝑤, 𝐶(𝑤)) = argmax
𝑐∈𝐶(𝑤)
cos 𝑇𝑎𝑣𝑔, 𝑐 .
𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐)
𝑐
• Combined both angular and magnitude
10/27/2016 26
Model IV
𝑓 𝑐
(𝑇, 𝑤, 𝐶(𝑤)) = argmax
𝑐∈𝐶(𝑤)
cos 𝑇𝑎𝑣𝑔, 𝑐 .
𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐)
𝑐
+ 𝑃(𝑇|𝑐)
10/27/2016 27
Model V KNN
• Now we have multiple ways to resolve sense for ambiguous terms
• Built distantly supervised dataset by collecting data from
biomedical citations
• For each ambiguous words there is on average 40000 sentences
• Resolved senses for each sentences using Model IV
10/27/2016 28
KNN in Pseudo Code
10/27/2016 29
KNN contd.
10/27/2016 30
𝑓 𝑘−𝑁𝑁
(𝑇, 𝑤, 𝐶(𝑤)) = argmax
𝑐∈𝐶(𝑤)
(𝐷,𝑤,𝑐)∈𝑅 𝑘(𝐷 𝑤)
cos 𝑇𝑎𝑣𝑔, 𝐷 𝑎𝑣𝑔
Training instance 1 (c_1)
Training instance 2(c_1)
Training instance 3 (c_2)
Training instance 4 (c_1)
Training instance 5 (c_2)
……………..
………………
Training instance n (c_2)
Test Instance 1 (__)
Cosine similarity
Training instance 1 (c_1, 0.7)
Training instance 2(c_1, 0.9)
Training instance 3 (c_2, 0.1)
Training instance 4 (c_1, 0.03)
Training instance 5 (c_2, 0.02)
……………..
………………
Training instance n (c_2, 0.12)
KNN Accuracy graph
10/27/2016 31
Distant Supervision with CNN
• Used the refined assignment of CUIs to sentences as a training set
• Then used MSH WSD data as a test data set
• Trained 203 Convolutional Neural Net
• With one convolutional layer and one hidden layer
• Used 900 filters of 3 different size
• Used the test case for testing purpose
10/27/2016 32
Distant Supervision Using CNN
10/27/2016 33
Ensembling of CNNs
• Five CNN training and testing for each ambiguous words
• Average the output and takes the best one
• Tends to improve the result at the cost of computation
10/27/2016 34
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Approach
 Experiment and Analysis
• Conclusion
10/27/2016 35
Results and Analysis
Methods Results
Jimeno-Yepes and Berlanga [1] 89.10%
Cosine similarity (Model I 𝑓 𝑐
) 85.54%
Projection length proportion(Model II 𝑓 𝑝) 88.68%
Combining Model I and II (𝑓 𝑐,𝑝) 89.26%
Combining Model I, II and [1] 92.24%
Convolutional Neural Net 86.17%
Ensembling CNN 87.78%
K-NN with k = 3500 (𝑓 𝑘−𝑁𝑁) 94.34%
10/27/2016 36
Outline
• Introduction
• Application of Word Sense Disambiguation(WSD)
• Motivation
• Related Methods to Solve WSD
• Our Method
• Word Vectors
• Tools Used
• Our Approach
• Experiment and Analysis
 Conclusion
10/27/2016 37
Conclusion
• The developed model is highly accurate beating previous best
• It is unsupervised no requirement of hand label information
• It is scalable however the accuracy level will be uncertain
– By increasing the number of training sentence and the context of
sentence more information may be extractable
• Graph based algorithm need to be explored
• HPC, Theano, Nltk, Gensim Word2Vec
10/27/2016 38
Questions
10/27/2016 39
References
1. Eneko Agirre and Philip Edmonds. Word sense disambiguation:
Algorithms and applications, volume 33. Springer Science &
Business Media, 2007.
2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian
Janvin. A neural probabilistic language model. The Journal of
Machine Learning Research, 3:1137-1155, 2003
3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based
word-concept model estimation and renement for biomedical text
mining. Journal of biomedical informatics, 53:300-307, 2015.
4. Aronson, Alan R. "Effective mapping of biomedical text to the
UMLS Metathesaurus: the MetaMap program." Proceedings of the
AMIA Symposium. American Medical Informatics Association,
2001.
5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-
vectors/
10/27/2016 40
References
6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet
classication with deep convolutional neural networks. In Advances
in neural information processing systems, pages 1097-1105, 2012.
10/27/2016 41

More Related Content

Viewers also liked

Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Grupo HULAT
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 

Viewers also liked (18)

Meghana_Resume_FullTime_May_2016
Meghana_Resume_FullTime_May_2016Meghana_Resume_FullTime_May_2016
Meghana_Resume_FullTime_May_2016
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Sunil_Resume
Sunil_ResumeSunil_Resume
Sunil_Resume
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Word-sense disambiguation
Word-sense disambiguationWord-sense disambiguation
Word-sense disambiguation
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Similar to Biomedical Word Sense Disambiguation presentation [Autosaved]

Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsIoannis Stavrakantonakis
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryIoannis Stavrakantonakis
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIVijayAECE1
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...AIST
 
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...cscpconf
 
A new approach based on the detection of opinion by sentiwordnet for automati...
A new approach based on the detection of opinion by sentiwordnet for automati...A new approach based on the detection of opinion by sentiwordnet for automati...
A new approach based on the detection of opinion by sentiwordnet for automati...csandit
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 

Similar to Biomedical Word Sense Disambiguation presentation [Autosaved] (20)

Word 2 vector
Word 2 vectorWord 2 vector
Word 2 vector
 
Approach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through SemanticsApproach to leverage Websites to APIs through Semantics
Approach to leverage Websites to APIs through Semantics
 
Word2vector
Word2vectorWord2vector
Word2vector
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Linked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms DiscoveryLinked Open Vocabulary Ranking and Terms Discovery
Linked Open Vocabulary Ranking and Terms Discovery
 
Unit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AIUnit_4- Principles of AI explaining the importants of AI
Unit_4- Principles of AI explaining the importants of AI
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
MUMS Opening Workshop - The Isaac Newton Institute Uncertainty Quantification...
MUMS Opening Workshop - The Isaac Newton Institute Uncertainty Quantification...MUMS Opening Workshop - The Isaac Newton Institute Uncertainty Quantification...
MUMS Opening Workshop - The Isaac Newton Institute Uncertainty Quantification...
 
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...Sergey Nikolenko and  Anton Alekseev  User Profiling in Text-Based Recommende...
Sergey Nikolenko and Anton Alekseev User Profiling in Text-Based Recommende...
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...
 
A new approach based on the detection of opinion by sentiwordnet for automati...
A new approach based on the detection of opinion by sentiwordnet for automati...A new approach based on the detection of opinion by sentiwordnet for automati...
A new approach based on the detection of opinion by sentiwordnet for automati...
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
GKumarAICS
GKumarAICSGKumarAICS
GKumarAICS
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 

Biomedical Word Sense Disambiguation presentation [Autosaved]

  • 1. Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding Department of Computer Science University of Kentucky Oct 7, 2016 AKM Sabbir Advisor, Dr. Ramakanth Kavuluru 10/27/2016 1
  • 2. Outline  Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 2
  • 3. Introduction • WSD is the task of detecting correct sense or assigning proper sense – the air in the center of the vortex of a cyclone is generally very cold – I Could not come to office last week because I had a cold • Retrieving information from Machine is not easy task • Number of Natural Language Processing(NLP) tasks require WSD 10/27/2016 3
  • 4. Outline • Introduction  Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 4
  • 5. Application • Text to Speech Conversion – Bass can be pronounced either base or baes • Machine Translation – French Word Grille can be translated into gate or bar • Information Retrieval • Named Entity Recognition • Document Summary Generation 10/27/2016 5
  • 6. Outline • Introduction • Application of Word Sense Disambiguation(WSD)  Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis • Conclusion 10/27/2016 6
  • 7. Motivation • Generalized WSD is a difficult problem • Solve it for each domain • Biomedical domain contains a large number of ambiguous words • Medical report summary generation • Drug side effect prediction 10/27/2016 7
  • 8. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation  Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 8
  • 9. Related Method • Supervised Methods – Support Vector Machines, Convolutional Neural Net • Unsupervised Methods – Clustering, generative model – If vocabulary has four words w1, w2, w3, w4 • Knowledge Based Methods – WordNet, UMLS(Unified Medical Language System) 10/27/2016 9 w1 … w4 w1 1/5 2/5 0 w2 …
  • 10. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD  Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis • Conclusion 10/27/2016 10
  • 11. Our Method • We build a semi supervised model • Model involves usage of concept/sense/CUI vectors just like how people use word vectors (more later) • Metamap is an knowledge based NER tool. We use its decisions is used to generate concept vectors • Model also involves the usage of P(w|c) where c is a concept or sense generated using other knowledge based approaches • Generated word vectors using unstructured data source 10/27/2016 11
  • 12. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Approach  Word Vectors • Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 12
  • 13. What is Word Vector • Distributed representation of words • Representation of word spread across all dimension of vector • The idea is different from other representation where the length is equal to the vocabulary size. Here we choose a small dimension say d=200 and generate dense vectors • Each element of the vector contributes to the definition of many different words 10/27/2016 13 0.07 0.05 0.8 0.002 0.1 0.3King 0.7 0.05 0.67 0.002 0.2 0.3Queen
  • 14. What is Word Vector • It is a numerical way of word representation • Where each dimension captures some semantic and syntactic information related to that word • Using the similar idea we can generate concept/sense/CUI vectors. 10/27/2016 14
  • 15. Why Word Vectors Work ? • Learned word vectors capture the syntactic and semantic information exist in text – vector(“king”) – vector(“man”) + vector(“woman”) ≈ vector(“queen”) 10/27/2016 15 Fig 5: resultant queen vector and other vectors [5]
  • 16. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors  Tools Used • Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 16
  • 17. Required Tools • language model 10/27/2016 17
  • 18. Required Tools Contd. 10/27/2016 18 Step1 Parsing: text parsed in noun phrases using xerox POS tagger to perform syntactic analysis [4]. Step2 Variant Generation: Varaint for each input phrase are generated using the knowledge of specialist lexicons and supplementary database of synonyms Step3 Candidate Retrieval: the candidate sets retrieved from the UMLS metathesaurus contains at least One of the variants generated from step three Step 4 Candidate Evaluation Fig2 : Variants for word ocular Fig3 : evaluated candidates for Ocular complication • Metamap
  • 19. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used  Our Method in Detail • Experiment and Analysis • Conclusion 10/27/2016 19
  • 20. Our Method in Detail • Text preprocessing – English stop words – Nltk word tokenization – Frequency greater than five – Lower case everything • Word context is ten words long • Generated word vectors are 300 dimension 10/27/2016 20
  • 21. Generating word and concept vectors • Generate word and concept vectors • 20 million citations from pubmed for training word vectors • Randomly chosen 5 million citations • Retrieved 7.1 million sentences containing target ambiguous words • Each sentence is 16-17 words long • Combined sentence are used to generate the bigrams • Each bigrams fed into metamap with WSD option turned on • Replace each bigram with corresponding concepts • Then fed the data to language model to generate concept vector10/27/2016 21
  • 22. Estimate P(D|c) [Yepes et al.] • Using Jimeno-Yepes and Berlanga[3] model used Markov Chain to calculate P(D|c) • In order to get P(D|c), need to calculate P(w|c) – 𝑃 𝑤𝑗 𝑐𝑖 = 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗,𝑐𝑖) 𝑤 𝑗∈𝑙𝑒𝑥(𝑐 𝑖) 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗, 𝑐 𝑖) 10/27/2016 22
  • 23. Biomedical MSH WSD • A dataset with 203 ambiguous words • 424 unique concept identifiers (senses) • 38,495 test context instances with an average of 200 test instances for each ambiguous word. • Goal -- to correctly identify the senses for each test instance 10/27/2016 23
  • 24. Model I Cosine Similarity 𝑓 𝑐 𝑇, 𝑤, 𝐶 𝑤 = 𝑎𝑟𝑔max 𝑐∈𝐶(𝑤) cos(𝑇𝑎𝑣𝑔, 𝑐) • W is the ambiguous word • T is test instance context containing the ambiguous word w • C(w) is the set of concepts that w can assume 10/27/2016 24
  • 25. Model II projection magnitude 𝑓 𝑐(𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) 𝜌(cos 𝑇𝑎𝑣𝑔, 𝑐 ). 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 • Took projection along concept vector and then consider the Euclidean norm 10/27/2016 25
  • 26. Model III 𝑓 𝑐 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) cos 𝑇𝑎𝑣𝑔, 𝑐 . 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 • Combined both angular and magnitude 10/27/2016 26
  • 27. Model IV 𝑓 𝑐 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) cos 𝑇𝑎𝑣𝑔, 𝑐 . 𝑃𝑟(𝑇𝑎𝑣𝑔, 𝑐) 𝑐 + 𝑃(𝑇|𝑐) 10/27/2016 27
  • 28. Model V KNN • Now we have multiple ways to resolve sense for ambiguous terms • Built distantly supervised dataset by collecting data from biomedical citations • For each ambiguous words there is on average 40000 sentences • Resolved senses for each sentences using Model IV 10/27/2016 28
  • 29. KNN in Pseudo Code 10/27/2016 29
  • 30. KNN contd. 10/27/2016 30 𝑓 𝑘−𝑁𝑁 (𝑇, 𝑤, 𝐶(𝑤)) = argmax 𝑐∈𝐶(𝑤) (𝐷,𝑤,𝑐)∈𝑅 𝑘(𝐷 𝑤) cos 𝑇𝑎𝑣𝑔, 𝐷 𝑎𝑣𝑔 Training instance 1 (c_1) Training instance 2(c_1) Training instance 3 (c_2) Training instance 4 (c_1) Training instance 5 (c_2) …………….. ……………… Training instance n (c_2) Test Instance 1 (__) Cosine similarity Training instance 1 (c_1, 0.7) Training instance 2(c_1, 0.9) Training instance 3 (c_2, 0.1) Training instance 4 (c_1, 0.03) Training instance 5 (c_2, 0.02) …………….. ……………… Training instance n (c_2, 0.12)
  • 32. Distant Supervision with CNN • Used the refined assignment of CUIs to sentences as a training set • Then used MSH WSD data as a test data set • Trained 203 Convolutional Neural Net • With one convolutional layer and one hidden layer • Used 900 filters of 3 different size • Used the test case for testing purpose 10/27/2016 32
  • 33. Distant Supervision Using CNN 10/27/2016 33
  • 34. Ensembling of CNNs • Five CNN training and testing for each ambiguous words • Average the output and takes the best one • Tends to improve the result at the cost of computation 10/27/2016 34
  • 35. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach  Experiment and Analysis • Conclusion 10/27/2016 35
  • 36. Results and Analysis Methods Results Jimeno-Yepes and Berlanga [1] 89.10% Cosine similarity (Model I 𝑓 𝑐 ) 85.54% Projection length proportion(Model II 𝑓 𝑝) 88.68% Combining Model I and II (𝑓 𝑐,𝑝) 89.26% Combining Model I, II and [1] 92.24% Convolutional Neural Net 86.17% Ensembling CNN 87.78% K-NN with k = 3500 (𝑓 𝑘−𝑁𝑁) 94.34% 10/27/2016 36
  • 37. Outline • Introduction • Application of Word Sense Disambiguation(WSD) • Motivation • Related Methods to Solve WSD • Our Method • Word Vectors • Tools Used • Our Approach • Experiment and Analysis  Conclusion 10/27/2016 37
  • 38. Conclusion • The developed model is highly accurate beating previous best • It is unsupervised no requirement of hand label information • It is scalable however the accuracy level will be uncertain – By increasing the number of training sentence and the context of sentence more information may be extractable • Graph based algorithm need to be explored • HPC, Theano, Nltk, Gensim Word2Vec 10/27/2016 38
  • 40. References 1. Eneko Agirre and Philip Edmonds. Word sense disambiguation: Algorithms and applications, volume 33. Springer Science & Business Media, 2007. 2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003 3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based word-concept model estimation and renement for biomedical text mining. Journal of biomedical informatics, 53:300-307, 2015. 4. Aronson, Alan R. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2001. 5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word- vectors/ 10/27/2016 40
  • 41. References 6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet classication with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012. 10/27/2016 41