SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Usage of Word Sense Disambiguation in
Concept Identification in Ontology
Construction
1
Guest Talk at University of Moratuwa, Department of Computer Science and Engineering
5th November, 2016
Discussed by: Kiruparan Balachandran
Background Information - Ontology
Ontology provides a potential method to describe domain knowledge
2
algorithm
sorting algorithm
problem
solve
complexity
has
is a
Background Information - Ontology learning layer-cake approach
Terms
Relations
Concept Hierarchy
Concepts
Synonyms
{Randomized algorithm, sorting algorithm, system software, application software}
{Randomized algorithm, sorting algorithm}, {system software, application software}
Algorithm (I, E, L)
isA(sorting algorithm, algorithm) - known as Taxonomy relationship
solve (algorithm, problem) - known as Non- Taxonomy relationship
RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)
3
Implemented approach follows Buitelaar et al. criteria in forming concepts
from terms
• An intentional definition of the concept
• Formal definition: A term can be considered as a concept if the term is linked with a valid relation to
another term.
• Informal definition: A term should have a textual description.
• A set of concept instances, i.e. its extensions: a term can be considered a concept if it has
instances.
• A set of linguistic realizations.
4
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
5
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
For example ts = “we propose a hardware design, call the
virtual line scheme, that allows the utilization of large virtual
cache line when fetch datum from memory for better
exploitation of spatial locality”
cache#n#1, cache#n#2, and cache#n#3
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
6
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
7
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
• Simplified LESK
• To solve combinatorial explosion
• Runs a separate disambiguation process for each ambiguous word in the input text
• Adapted LESK
• Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms,
attribute relations, and their associated definitions
8
Less accuracy
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
9
ConSim (C1, C2) =
2∗N3
N1+N2+2∗N3
root
C3
C1 C2
N1 N2
N3
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
10
Weight = C – path length – k * number of changes of direction
Which algorithm best suited ?
• Link strength of a parent-child link using corpus statistical information
11
Information content + distance
Information Content : obtained by estimating probability of occurrence of class in a large text corpus
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
12
Disambiguating Concepts (LESK ?)
cache#n#1, cache#n#2, and cache#n#3
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
13
Disambiguating Concepts (LESK ?)
For example
• WNs1 e.g. “a hidden storage space for money or
provisions or weapons”
• WNs2 e.g. “a secret store of valuables or money”
• WNs3 e.g. “RAM memory that is set aside as a
specialized buffer storage, which is continually updated;
used to optimize data transfers between system
elements with different characteristics”
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
14
Disambiguating Concepts (LESK ?)
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
15
Disambiguating Concepts (LESK ?)
Evaluation – domain-specific concept extraction
Annotator 1 Annotator 2 Annotator 3
ComSciPrecision for concepts 75% 56% 78%
Our
approach
MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al.
Bio MedicalRecall 58.70% 57.73% 20.27%
• Identified 253 computer science domain-specific concepts validated by three domain experts
• Measured the inter-annotator agreement using Fleiss' kappa
• 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories)
• Identified 47 domain-specific concepts for the GENIA corpus
• compared with two different approaches discussed by Zhou et al. and Subramaniam et al.
16
Why LESK ?
17
Conclusion
Choosing a best WSD algorithm based on
• Nature of your problem
• Available factors
• Performance with respect to accuracy and time
References
18
K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on
Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41.
P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005.
X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer,
2006, pp. 1145-1149.
L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and
an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417.
G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305,
pp. 305-332, 1998.
S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed:
Springer, 2002, pp. 136-145.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138.
M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual
international conference on Systems documentation, 1986, pp. 24-26.
C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265-
283, MIT Press, 1998.
J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.
Questions ?
Thank You…
19

Más contenido relacionado

La actualidad más candente

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 

La actualidad más candente (20)

Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Destacado

Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
unyil96
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
akm sabbir
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
Fabio Benedetti
 

Destacado (15)

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Word-sense disambiguation
Word-sense disambiguationWord-sense disambiguation
Word-sense disambiguation
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Similar a Usage of word sense disambiguation in concept identification in ontology construction

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
Saswat Padhi
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
Urjit Patel
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
anil maurya
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 

Similar a Usage of word sense disambiguation in concept identification in ontology construction (20)

A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
G04124041046
G04124041046G04124041046
G04124041046
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
Fusing semantic data
Fusing semantic dataFusing semantic data
Fusing semantic data
 
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 

Último

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Usage of word sense disambiguation in concept identification in ontology construction

  • 1. Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction 1 Guest Talk at University of Moratuwa, Department of Computer Science and Engineering 5th November, 2016 Discussed by: Kiruparan Balachandran
  • 2. Background Information - Ontology Ontology provides a potential method to describe domain knowledge 2 algorithm sorting algorithm problem solve complexity has is a
  • 3. Background Information - Ontology learning layer-cake approach Terms Relations Concept Hierarchy Concepts Synonyms {Randomized algorithm, sorting algorithm, system software, application software} {Randomized algorithm, sorting algorithm}, {system software, application software} Algorithm (I, E, L) isA(sorting algorithm, algorithm) - known as Taxonomy relationship solve (algorithm, problem) - known as Non- Taxonomy relationship RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem) 3
  • 4. Implemented approach follows Buitelaar et al. criteria in forming concepts from terms • An intentional definition of the concept • Formal definition: A term can be considered as a concept if the term is linked with a valid relation to another term. • Informal definition: A term should have a textual description. • A set of concept instances, i.e. its extensions: a term can be considered a concept if it has instances. • A set of linguistic realizations. 4
  • 5. Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 5 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts For example ts = “we propose a hardware design, call the virtual line scheme, that allows the utilization of large virtual cache line when fetch datum from memory for better exploitation of spatial locality”
  • 6. cache#n#1, cache#n#2, and cache#n#3 Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 6 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts
  • 7. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing 7
  • 8. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing • Simplified LESK • To solve combinatorial explosion • Runs a separate disambiguation process for each ambiguous word in the input text • Adapted LESK • Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms, attribute relations, and their associated definitions 8 Less accuracy
  • 9. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 9 ConSim (C1, C2) = 2∗N3 N1+N2+2∗N3 root C3 C1 C2 N1 N2 N3
  • 10. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 10 Weight = C – path length – k * number of changes of direction
  • 11. Which algorithm best suited ? • Link strength of a parent-child link using corpus statistical information 11 Information content + distance Information Content : obtained by estimating probability of occurrence of class in a large text corpus
  • 12. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 12 Disambiguating Concepts (LESK ?) cache#n#1, cache#n#2, and cache#n#3
  • 13. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 13 Disambiguating Concepts (LESK ?) For example • WNs1 e.g. “a hidden storage space for money or provisions or weapons” • WNs2 e.g. “a secret store of valuables or money” • WNs3 e.g. “RAM memory that is set aside as a specialized buffer storage, which is continually updated; used to optimize data transfers between system elements with different characteristics”
  • 14. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 14 Disambiguating Concepts (LESK ?)
  • 15. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 15 Disambiguating Concepts (LESK ?)
  • 16. Evaluation – domain-specific concept extraction Annotator 1 Annotator 2 Annotator 3 ComSciPrecision for concepts 75% 56% 78% Our approach MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al. Bio MedicalRecall 58.70% 57.73% 20.27% • Identified 253 computer science domain-specific concepts validated by three domain experts • Measured the inter-annotator agreement using Fleiss' kappa • 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories) • Identified 47 domain-specific concepts for the GENIA corpus • compared with two different approaches discussed by Zhou et al. and Subramaniam et al. 16
  • 17. Why LESK ? 17 Conclusion Choosing a best WSD algorithm based on • Nature of your problem • Available factors • Performance with respect to accuracy and time
  • 18. References 18 K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41. P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005. X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer, 2006, pp. 1145-1149. L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417. G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305, pp. 305-332, 1998. S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed: Springer, 2002, pp. 136-145. Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138. M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual international conference on Systems documentation, 1986, pp. 24-26. C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265- 283, MIT Press, 1998. J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.