SlideShare a Scribd company logo
1 of 26
Can we use Linked Data
Semantic Annotators for
the Extraction of Domain-
Relevant Expressions?
Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada
Prof. Amal Zouaq, Royal Military College of Canada, Canada
Dr. Ludovic Jean-louis, Ecole Polytechnique de
Montréal, Canada
Introduction
What is Semantic Annotation on the LOD?
Candidate Selection
Disambiguation
Traditionally focused on named entity annotation
Semantic Annotators
LOD-based Semantic Annotators
Most used annotators
REST API / Web Services
Default configuration
Alchemy
(entities +
keywords)
YahooZemanta
Wikimeta
DBPedia
Spotlight
OpenCalais
Lupedia
Annotation Evaluation
Few comparative studies have been published
on the performance of linked data Semantic
Annotators
The available studies generally focus on
“traditional” named entity annotation evaluation
(ORG, Person, etc.)
NEs versus domain topics
Semantic Annotators start to go beyond these
traditional NEs
Predefined classes such as PERSON, ORG
Expansion of possible classes (NERD Taxonomy :
sport events, operating systems, political events, …)
 RDF (NE? Domain topic ?)
 Semantic Web (NE? Domain topic ?)
Domain Topics: important “concepts” for the
domain of interest
NE versus Topics Annotation
Research Question 1
Previous studies did not draw any conclusion about
the relevance of the annotated entities
But relevance might prove to be a crucial point for
tasks such as information seeking or query
answering
RQ1: Can we use linked data-based semantic
annotators to accurately identify Domain-relevant
expressions?
Research Methodology
We relied on a corpus of 8 texts taken from
Wikipedia, all related to the artificial intelligence
domain. Together, these texts represent 10570
words.
We obtained 2023 expression occurrences among
which 1151 were distinct.
Building the Gold Standard
We asked a human evaluator to analyze each
expression and make the following decisions:
Does the annotated expression represent an
understandable named entity or topic?
Is the expression a relevant keyword according
to the document?
Is the expression a relevant named entity
according to the document?
Distribution of Expressions
Note that only 639 out of 1151 detected expressions
are understandable (56%). Thus a substantial
number of detected expressions represents noise.
Frequency of detected Expressions
Most of the expressions were detected by only
one (76%) or two (16%) annotators.
Semantic annotators seemed complementary.
(326 relevant)
(117 relevant)
(40 relevant)
(15 relevant)
(4 relevant)
(4 relevant)
(1 relevant)
Evaluation
Standard precision, recall and F-score
“Partial” recall, since our Gold Standard
considers only the expressions detected by
at least one annotator, instead of considering
all possible expressions in the corpus.
All Expressions
mend
Named Entities
Domain Topics
Few Highlights
F-score is unsatisfactory for almost all annotators
Best F-score around 62-63 (All expressions, topics)
Alchemy seems to stand out
It obtains a high value for recall when all expressions
or only topics are considered
But the precision of Alchemy (59%) is not very
high, compared to the results of Zemanta
(81%), OpenCalais (75%) and Yahoo (74%)
None of the annotators was good at detecting relevant
named entities.
DBpedia Spotlight annotated many expressions that
are not considered understandable
Research Question 2
Given the complementarity of Semantic
Annotators, we decided to experiment with
this second research question:
Can we improve the overall performance
of semantic annotators using voting
methods and machine learning?
Voting and machine learning
methods
We experimented with the following
methods:
Simple votes
Weighted votes where the weight is the
precision of individual annotators on a
training corpus
K-nearest-neighbours classifier
Naïve Bayes classifier
Decision tree (C4.5)
Rule induction (PART algorithm)
Experiment Setup
We divided the set of annotated
expressions into 5 balanced partitions
One partition for testing
Remaining partitions for training
We ran 5 experiments for each of the
following situations: all entities, only
named entities and only topics
Baseline: Simple Votes
All Expressions:
Best prec : 77%
(Zemanta)
Best rec/F : 69% -
62% (Alchemy)
Best P: 81% (R/F
12%-21%)
(Zemanta)
Best R/F: 69%-63%
(Alchemy) P=59%
Threshold: Number of annotators
Weighted Votes
Best P: 81%
(Zemanta) (R/F
12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Machine Learning
Best P: 81% (Zemanta)
(R/F 12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Highlights
On the average, weighted vote displays the
best precision (0.66) among combination
methods if only topics are considered
Alchemy (0.59)
Alchemy finds many relevant expressions
that are not discovered by other annotators
Machine learning methods achieve better
recall and F-score on the average, especially
if decision tree or rule induction (PART) is
used
Conclusion
Can we use Linked Data Semantic
Annotators for the Extraction of Domain-
Relevant Expressions?
Best overall performance (Recall/F-Score:
Alchemy) (59%- 69%-63%)
Best precision: Zemanta (81%)
Need of filtering!
Combination/machine learning methods
Increased recall with weighted voting
scheme
Future Work
Limitations
The size of the gold standard
A limited set of semantic annotators
Increase the size of the GS
Explore various combinations for annotators/
features
Compare Semantic Annotators with keyword
extractors
QUESTIONS?
Thank you for your attention!
amal.zouaq@rmc.ca
http://azouaq.athabascau.ca

More Related Content

What's hot

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondPeter Geil
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Povedasemanticsconference
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionAldo Gangemi
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Andres Garcia-Silva
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010AAT Taiwan
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionsssw2012
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
 

What's hot (19)

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyond
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introduction
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 

Similar to Zouaq wole2013

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRubén Izquierdo Beviá
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRubén Izquierdo Beviá
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppratnapatil14
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Marcia Zeng
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsNaveen Ashish
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15Hans Höchtl
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Sunil Kumar Kopparapu
 

Similar to Zouaq wole2013 (20)

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Ontology
OntologyOntology
Ontology
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
Knowledge Extraction
Knowledge ExtractionKnowledge Extraction
Knowledge Extraction
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

Zouaq wole2013

  • 1. Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada Prof. Amal Zouaq, Royal Military College of Canada, Canada Dr. Ludovic Jean-louis, Ecole Polytechnique de Montréal, Canada
  • 2. Introduction What is Semantic Annotation on the LOD? Candidate Selection Disambiguation Traditionally focused on named entity annotation
  • 3. Semantic Annotators LOD-based Semantic Annotators Most used annotators REST API / Web Services Default configuration Alchemy (entities + keywords) YahooZemanta Wikimeta DBPedia Spotlight OpenCalais Lupedia
  • 4. Annotation Evaluation Few comparative studies have been published on the performance of linked data Semantic Annotators The available studies generally focus on “traditional” named entity annotation evaluation (ORG, Person, etc.)
  • 5. NEs versus domain topics Semantic Annotators start to go beyond these traditional NEs Predefined classes such as PERSON, ORG Expansion of possible classes (NERD Taxonomy : sport events, operating systems, political events, …)  RDF (NE? Domain topic ?)  Semantic Web (NE? Domain topic ?) Domain Topics: important “concepts” for the domain of interest
  • 6. NE versus Topics Annotation
  • 7. Research Question 1 Previous studies did not draw any conclusion about the relevance of the annotated entities But relevance might prove to be a crucial point for tasks such as information seeking or query answering RQ1: Can we use linked data-based semantic annotators to accurately identify Domain-relevant expressions?
  • 8. Research Methodology We relied on a corpus of 8 texts taken from Wikipedia, all related to the artificial intelligence domain. Together, these texts represent 10570 words. We obtained 2023 expression occurrences among which 1151 were distinct.
  • 9. Building the Gold Standard We asked a human evaluator to analyze each expression and make the following decisions: Does the annotated expression represent an understandable named entity or topic? Is the expression a relevant keyword according to the document? Is the expression a relevant named entity according to the document?
  • 10. Distribution of Expressions Note that only 639 out of 1151 detected expressions are understandable (56%). Thus a substantial number of detected expressions represents noise.
  • 11. Frequency of detected Expressions Most of the expressions were detected by only one (76%) or two (16%) annotators. Semantic annotators seemed complementary. (326 relevant) (117 relevant) (40 relevant) (15 relevant) (4 relevant) (4 relevant) (1 relevant)
  • 12. Evaluation Standard precision, recall and F-score “Partial” recall, since our Gold Standard considers only the expressions detected by at least one annotator, instead of considering all possible expressions in the corpus.
  • 16. Few Highlights F-score is unsatisfactory for almost all annotators Best F-score around 62-63 (All expressions, topics) Alchemy seems to stand out It obtains a high value for recall when all expressions or only topics are considered But the precision of Alchemy (59%) is not very high, compared to the results of Zemanta (81%), OpenCalais (75%) and Yahoo (74%) None of the annotators was good at detecting relevant named entities. DBpedia Spotlight annotated many expressions that are not considered understandable
  • 17. Research Question 2 Given the complementarity of Semantic Annotators, we decided to experiment with this second research question: Can we improve the overall performance of semantic annotators using voting methods and machine learning?
  • 18. Voting and machine learning methods We experimented with the following methods: Simple votes Weighted votes where the weight is the precision of individual annotators on a training corpus K-nearest-neighbours classifier Naïve Bayes classifier Decision tree (C4.5) Rule induction (PART algorithm)
  • 19. Experiment Setup We divided the set of annotated expressions into 5 balanced partitions One partition for testing Remaining partitions for training We ran 5 experiments for each of the following situations: all entities, only named entities and only topics
  • 20. Baseline: Simple Votes All Expressions: Best prec : 77% (Zemanta) Best rec/F : 69% - 62% (Alchemy) Best P: 81% (R/F 12%-21%) (Zemanta) Best R/F: 69%-63% (Alchemy) P=59% Threshold: Number of annotators
  • 21. Weighted Votes Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 22. Machine Learning Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 23. Highlights On the average, weighted vote displays the best precision (0.66) among combination methods if only topics are considered Alchemy (0.59) Alchemy finds many relevant expressions that are not discovered by other annotators Machine learning methods achieve better recall and F-score on the average, especially if decision tree or rule induction (PART) is used
  • 24. Conclusion Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Best overall performance (Recall/F-Score: Alchemy) (59%- 69%-63%) Best precision: Zemanta (81%) Need of filtering! Combination/machine learning methods Increased recall with weighted voting scheme
  • 25. Future Work Limitations The size of the gold standard A limited set of semantic annotators Increase the size of the GS Explore various combinations for annotators/ features Compare Semantic Annotators with keyword extractors
  • 26. QUESTIONS? Thank you for your attention! amal.zouaq@rmc.ca http://azouaq.athabascau.ca