SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Lessons Learnt from the Named Entity
rEcognition and Linking (NEEL)
Challenge Series
Giuseppe Rizzo

Bianca Pereira

Andrea Varga

Marieke van Erp

Amparo Elizabeth Cano Basave
By Piet Mondrian - Gemeentemuseum Den Haag, Public Domain, https://commons.wikimedia.org/w/index.php?curid=37614350
NEEL Challenge Overview
• Microposts are challenging because:

• brevity (140 characters)

• (domain specific) abbreviations and
typos

• ‘grammar free’

• The NEEL challenge aims to explore new
approaches to foster research into novel,
more accurate entity recognition and linking
approaches tailored to Microposts 

• NEEL ran from 2013 - 2016
NEEL Evolution
• 2013: Information Extraction 

• named entity recognition (4 types)

• 2014: Named Entity Extraction and Linking (NEEL) 

• named entity linking to DBpedia 3.9

• 2015: Named Entity rEcognition and Linking
(NEEL) 

• named entity recognition (7 types) and
linking to DBpedia 2014

• 2016: Named Entity rEcognition and Linking
(NEEL)

• named entity recognition (7 types) and
linking to DBpedia 2015-04, NIL clustering
Image source: https://c1.staticflickr.com/8/7020/6405801675_efd6d09977_b.jpg
Cross-domain task
• Named Entity and Event Linking is a shared
task in NLP and Semantic Web 

• Machine Learning approaches need data 

• Data curation is expensive and hard 

• Knowledge bases can reduce some of the
data bottleneck 

• Resulting in hybrid approaches
Typical Entity Linking Workflow
Evaluating Entity Linking
• end-to-end: evaluates a system on the
aggregated output of all steps 

• error propagation harms results 

• step-by-step: robust benchmark that
evaluates each step of the process
individually 

• time consuming to set up 

• penalises systems that do not follow
standard workflow 

• partial end-to-end: evaluates particular
steps in the process individually e.g. NER,
NIL & Linking
Named Entity Recognition and Linking challenges since 2013
Characteris
tic
TAC-KBP ERD SemEval W-NUT NEEL
2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016
Text
newswire

web sites 

discussion forum posts
web sites 

search
queries
technical
manuals

reports

formal discussion
tweets
tweets 

Reddit

YouTube

StackExchange
tweets
Kowledge
Base
Wikipedia Freebase Freebase Babelnet none none none none DBpedia
Entity given by Type
given by
KB
given by KB given by Type given by Type
Evaluation
file API file file file API file
partial

end-to-end
end-to-
end
end-to-end end-to-end end-to-end
partial

end-to-end
Target
conference
TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
Named Entity Recognition and Linking challenges since 2013
Characteris
tic
TAC-KBP ERD SemEval W-NUT NEEL
2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016
Text
newswire

web sites 

discussion forum posts
web sites 

search
queries
technical
manuals

reports

formal discussion
tweets
tweets 

Reddit

YouTube

StackExchange
tweets
Kowledge
Base
Wikipedia Freebase Freebase Babelnet none none none none DBpedia
Entity given by Type
given by
KB
given by KB given by Type given by Type
Evaluation
file API file file file API file
partial

end-to-end
end-to-
end
end-to-end end-to-end end-to-end
partial

end-to-end
Target
conference
TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
Named Entity Recognition and Linking challenges since 2013
Characteris
tic
TAC-KBP ERD SemEval W-NUT NEEL
2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016
Text
newswire

web sites 

discussion forum posts
web sites 

search
queries
technical
manuals

reports

formal discussion
tweets
tweets 

Reddit

YouTube

StackExchange
tweets
Kowledge
Base
Wikipedia Freebase Freebase Babelnet none none none none DBpedia
Entity given by Type
given by
KB
given by KB given by Type given by Type
Evaluation
file API file file file API file
partial

end-to-end
end-to-
end
end-to-end end-to-end end-to-end
partial

end-to-end
Target
conference
TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
Named Entity Recognition and Linking challenges since 2013
Characteris
tic
TAC-KBP ERD SemEval W-NUT NEEL
2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016
Text
newswire

web sites 

discussion forum posts
web sites 

search
queries
technical
manuals

reports

formal discussion
tweets
tweets 

Reddit

YouTube

StackExchange
tweets
Kowledge
Base
Wikipedia Freebase Freebase Babelnet none none none none DBpedia
Entity given by Type
given by
KB
given by KB given by Type given by Type
Evaluation
file API file file file API file
partial

end-to-end
end-to-
end
end-to-end end-to-end end-to-end
partial

end-to-end
Target
conference
TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
NEEL Datasets
Image source: https://www.maxpixel.net/Word-Data-Data-Deluge-Binary-System-Binary-Dataset-2728117
• 2013: 4,265 tweets, end of 2010, start of
2011. No explicit hashtag search, 66% train,
33% test.

• 2014: 3,505 tweets, 15 July 2011 - 15 August
2011. First Story Detection algorithm to
identify tweet clusters representing events,
70% train, 30% test.

• 2015: 6,025 tweets, extension of 2014 dataset
including tweets from 2013 and November
2014. Train: 2014 dataset, 8% development,
34% test. 

• 2016: 9,289 tweets, extension of 2014 & 2015
datasets via selection of hashtags. 65% train
(2015 datset), 1% development and 34% test.
NEEL Datasets (ctd)
• Entity types are not distributed equally 

• Difficult to balance entity types over different
dataset slices 

• Confusability: a measure of the number of surface
forms an entity can have (i.e. how many different
‘terms’ can refer to the same entity)

• Dominance: a measure of the number of
resources can be associated with a single surface
form (i.e. how many entities share the same
‘name’)
2013
2016
Confusability
Dominance
Results
• NEEL Challenge more difficult
every year (from 4 entity types to
7 + linking + NIL clustering)
• Systems more complex every
year
• 2016 task more difficult probably
due to domain specificity of test
dataset (US Primary Elections
and Star Wars)
Precision Recall F1
2013 0.764 0.604 0.67
2014 0.771 0.642 0.701
Tagging Clustering Linking Overall
2015 0.807 0.84 0.762 0.8067
2016 0.473 0.641 0.501 0.5486
Emerging Trends
• Tweet normalisation is common

• Use of KBs for mention detection and
typing

• End-to-end systems and pruning for
candidate selection

• Hierarchical clustering for aggregating
mentions of the same entity/event 

• Decrease in the use of off-the-shelf
systems (which were popular in the first
editions)
Lessons Learnt
• Creating balanced challenge datasets is hard!
• You are invited to expand and improve our
datasets!
• The datasets are available for evaluation of new
systems: http://
microposts2016.seas.upenn.edu/challenge.html
• NEEL provides an opportunity to compare
results against other systems
• Multilingual or other language challenges? (2016
also had an Italian variant)
• New popular micropost platforms require
different analyses
Acknowledgments:
Image source: https://upload.wikimedia.org/wikipedia/commons/d/de/The_Canadian_field-naturalist_%281983%29_%2819897979884%29.jpg
Are you a Master’s or PhD student?
Do you want to learn how to do this type of research yourself?
Join us in Italy next summer!
http://semanticwebsummerschool.org

Más contenido relacionado

Similar a Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge Series

Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jCorie Pollock
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summaryGiuseppe Rizzo
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Nextkrisztianbalog
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Hady Elsahar
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101Eric Sembrat
 
Recommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph DatabasesRecommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph DatabasesCalin Constantinov
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Beyond DevOps: Finding Value through Requirements
Beyond DevOps: Finding Value through RequirementsBeyond DevOps: Finding Value through Requirements
Beyond DevOps: Finding Value through RequirementsGail Murphy
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceoanainel
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphVaticle
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesAndrea Westerinen
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 

Similar a Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge Series (20)

Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101
 
Recommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph DatabasesRecommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph Databases
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Beyond DevOps: Finding Value through Requirements
Beyond DevOps: Finding Value through RequirementsBeyond DevOps: Finding Value through Requirements
Beyond DevOps: Finding Value through Requirements
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performance
 
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
Calin Constantinov - Neo4j - Keyboards and Mice - Craiova 2016
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologies
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 

Más de Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsMarieke van Erp
 

Más de Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge Series

  • 1. Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge Series Giuseppe Rizzo Bianca Pereira Andrea Varga Marieke van Erp Amparo Elizabeth Cano Basave By Piet Mondrian - Gemeentemuseum Den Haag, Public Domain, https://commons.wikimedia.org/w/index.php?curid=37614350
  • 2. NEEL Challenge Overview • Microposts are challenging because: • brevity (140 characters) • (domain specific) abbreviations and typos • ‘grammar free’ • The NEEL challenge aims to explore new approaches to foster research into novel, more accurate entity recognition and linking approaches tailored to Microposts • NEEL ran from 2013 - 2016
  • 3. NEEL Evolution • 2013: Information Extraction • named entity recognition (4 types) • 2014: Named Entity Extraction and Linking (NEEL) • named entity linking to DBpedia 3.9 • 2015: Named Entity rEcognition and Linking (NEEL) • named entity recognition (7 types) and linking to DBpedia 2014 • 2016: Named Entity rEcognition and Linking (NEEL) • named entity recognition (7 types) and linking to DBpedia 2015-04, NIL clustering Image source: https://c1.staticflickr.com/8/7020/6405801675_efd6d09977_b.jpg
  • 4. Cross-domain task • Named Entity and Event Linking is a shared task in NLP and Semantic Web • Machine Learning approaches need data • Data curation is expensive and hard • Knowledge bases can reduce some of the data bottleneck • Resulting in hybrid approaches
  • 6. Evaluating Entity Linking • end-to-end: evaluates a system on the aggregated output of all steps • error propagation harms results • step-by-step: robust benchmark that evaluates each step of the process individually • time consuming to set up • penalises systems that do not follow standard workflow • partial end-to-end: evaluates particular steps in the process individually e.g. NER, NIL & Linking
  • 7. Named Entity Recognition and Linking challenges since 2013 Characteris tic TAC-KBP ERD SemEval W-NUT NEEL 2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016 Text newswire web sites discussion forum posts web sites search queries technical manuals reports formal discussion tweets tweets Reddit YouTube StackExchange tweets Kowledge Base Wikipedia Freebase Freebase Babelnet none none none none DBpedia Entity given by Type given by KB given by KB given by Type given by Type Evaluation file API file file file API file partial end-to-end end-to- end end-to-end end-to-end end-to-end partial end-to-end Target conference TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
  • 8. Named Entity Recognition and Linking challenges since 2013 Characteris tic TAC-KBP ERD SemEval W-NUT NEEL 2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016 Text newswire web sites discussion forum posts web sites search queries technical manuals reports formal discussion tweets tweets Reddit YouTube StackExchange tweets Kowledge Base Wikipedia Freebase Freebase Babelnet none none none none DBpedia Entity given by Type given by KB given by KB given by Type given by Type Evaluation file API file file file API file partial end-to-end end-to- end end-to-end end-to-end end-to-end partial end-to-end Target conference TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
  • 9. Named Entity Recognition and Linking challenges since 2013 Characteris tic TAC-KBP ERD SemEval W-NUT NEEL 2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016 Text newswire web sites discussion forum posts web sites search queries technical manuals reports formal discussion tweets tweets Reddit YouTube StackExchange tweets Kowledge Base Wikipedia Freebase Freebase Babelnet none none none none DBpedia Entity given by Type given by KB given by KB given by Type given by Type Evaluation file API file file file API file partial end-to-end end-to- end end-to-end end-to-end end-to-end partial end-to-end Target conference TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
  • 10. Named Entity Recognition and Linking challenges since 2013 Characteris tic TAC-KBP ERD SemEval W-NUT NEEL 2014 2015 2016 2014 2015 2015 2016 2017 2013 2014 2015 2016 Text newswire web sites discussion forum posts web sites search queries technical manuals reports formal discussion tweets tweets Reddit YouTube StackExchange tweets Kowledge Base Wikipedia Freebase Freebase Babelnet none none none none DBpedia Entity given by Type given by KB given by KB given by Type given by Type Evaluation file API file file file API file partial end-to-end end-to- end end-to-end end-to-end end-to-end partial end-to-end Target conference TAC SIGIR NAACL-HLT ACL-IJNLP COLING EMNLP WWW
  • 11. NEEL Datasets Image source: https://www.maxpixel.net/Word-Data-Data-Deluge-Binary-System-Binary-Dataset-2728117 • 2013: 4,265 tweets, end of 2010, start of 2011. No explicit hashtag search, 66% train, 33% test. • 2014: 3,505 tweets, 15 July 2011 - 15 August 2011. First Story Detection algorithm to identify tweet clusters representing events, 70% train, 30% test. • 2015: 6,025 tweets, extension of 2014 dataset including tweets from 2013 and November 2014. Train: 2014 dataset, 8% development, 34% test. • 2016: 9,289 tweets, extension of 2014 & 2015 datasets via selection of hashtags. 65% train (2015 datset), 1% development and 34% test.
  • 12. NEEL Datasets (ctd) • Entity types are not distributed equally • Difficult to balance entity types over different dataset slices • Confusability: a measure of the number of surface forms an entity can have (i.e. how many different ‘terms’ can refer to the same entity) • Dominance: a measure of the number of resources can be associated with a single surface form (i.e. how many entities share the same ‘name’) 2013 2016 Confusability Dominance
  • 13. Results • NEEL Challenge more difficult every year (from 4 entity types to 7 + linking + NIL clustering) • Systems more complex every year • 2016 task more difficult probably due to domain specificity of test dataset (US Primary Elections and Star Wars) Precision Recall F1 2013 0.764 0.604 0.67 2014 0.771 0.642 0.701 Tagging Clustering Linking Overall 2015 0.807 0.84 0.762 0.8067 2016 0.473 0.641 0.501 0.5486
  • 14. Emerging Trends • Tweet normalisation is common • Use of KBs for mention detection and typing • End-to-end systems and pruning for candidate selection • Hierarchical clustering for aggregating mentions of the same entity/event • Decrease in the use of off-the-shelf systems (which were popular in the first editions)
  • 15. Lessons Learnt • Creating balanced challenge datasets is hard! • You are invited to expand and improve our datasets! • The datasets are available for evaluation of new systems: http:// microposts2016.seas.upenn.edu/challenge.html • NEEL provides an opportunity to compare results against other systems • Multilingual or other language challenges? (2016 also had an Italian variant) • New popular micropost platforms require different analyses
  • 17. Are you a Master’s or PhD student? Do you want to learn how to do this type of research yourself? Join us in Italy next summer! http://semanticwebsummerschool.org