SlideShare una empresa de Scribd logo
1 de 9
ELIS – Multimedia Lab
Fréderic Godin, Pedro Debevere, Erik Mannens,
Wesley De Neve and Rik Van de Walle
MSM2013 IE Challenge:
Leveraging Existing Tools for
Named Entity Recognition in Microposts
Multimedia Lab, Ghent University – iMinds, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Introduction: The challenge
Existing tools for NER are developed for news corpera
Develop NER tools for microposts
4 entity types: Person
Location
Organisation
Miscellaneous (film/movie, entertainment award event,
political event, programming language,
sporting event and TV show)
3
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (1)
Rizzo et al. evaluated the performance of:
AlchemyAPI, DBpedia Spotlight, Evri, Extractiv,
OpenCalais and Zemanta
On:
5 TED talks, 1000 news articles, and 217 conference
abstracts.
Could we do the same evaluation for microposts?
4
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (2)
Preprocessing: convert bracket tokens to brackets
Note: values can differ based on ontology mapping used!
PER LOC ORG MISC
AlchemyAPI 78.20% 74.60% 54.40% 10.20%
Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%
Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%
OpenCalais 69.30% 73.10% 55.80% 31.40%
Zemanta 70.40% 64.30% 48.10% 29.30%
F1 values
5
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (3)
AlchemyAPI: performs bad in recognizing exotic names,
small villages, buildings and organizations
Zemanta: same as AlchemyAPI + relies on capitalisation
OpenCalais: bad in recognizing small villages, buildings and
organizations. Does recognize big events!
DBpedia Spotlight: returns multiple ‘possible’ entities
What if we combine the power of all 4 services?
6
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (1)
Apply machine learning on a feature vector of the output
of the different services
AlchemyAPI DBpedia Spotlight OpenCalais Zemanta
Random Forest
Confidence level
PER, LOC, ORG, MISC
Service specific entity
16 features
PER, LOC, ORG, MISC
7
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (2)
Evaluation on entity type
PER LOC ORG MISC
Spotlight (0.2) 82.20% 75.70% 60.40% 47.40%
Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%
Noisy input data gives better results
(final results on test set are not included and are part of the challenge)
8
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Conclusions
Current NER tools do perform well in most cases
Shortcomings: Incorrect use of capital lettres
Abbreviations of organisations
Small villages, counties and buildings
Combining the output of several services yields good results
9
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin #MMLab

Más contenido relacionado

Destacado (9)

julio
 julio julio
julio
 
junio
 junio junio
junio
 
4. abril
4. abril4. abril
4. abril
 
MAYO
MAYOMAYO
MAYO
 
septiembre
 septiembre septiembre
septiembre
 
octubre
octubreoctubre
octubre
 
3. marzo
3. marzo3. marzo
3. marzo
 
11. noviembre
11. noviembre11. noviembre
11. noviembre
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 

Similar a Msm2013challenge

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用komunling
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...AugmentedWorldExpo
 
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...Jan Recker @ University of Hamburg
 
Identification keys
Identification keysIdentification keys
Identification keysvbrant
 
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaBEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaTutors India
 
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...Microsoft
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWvivatechijri
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryAlbert Y. C. Chen
 
Exponentials and Networks
Exponentials and NetworksExponentials and Networks
Exponentials and NetworksDavid Orban
 
research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance systemAnkitRao82
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionHemanth Haridas
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behindVince Smith
 
A Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedA Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedIRJET Journal
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Eventsmor
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentDavid De Roure
 
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemFaculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemMichael Greene
 
GDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignGDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignLavesh Akhadkar
 
MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614Diane Troyer
 

Similar a Msm2013challenge (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
 
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
 
Identification keys
Identification keysIdentification keys
Identification keys
 
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaBEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
 
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment Industry
 
Exponentials and Networks
Exponentials and NetworksExponentials and Networks
Exponentials and Networks
 
research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance system
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behind
 
A Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedA Smart Assistance for Visually Impaired
A Smart Assistance for Visually Impaired
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemFaculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
 
GDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignGDSC MMCOE - ML Campaign
GDSC MMCOE - ML Campaign
 
MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Msm2013challenge

  • 1. ELIS – Multimedia Lab Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Multimedia Lab, Ghent University – iMinds, Belgium Image and Video Systems Lab, KAIST, South Korea
  • 2. 2 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Introduction: The challenge Existing tools for NER are developed for news corpera Develop NER tools for microposts 4 entity types: Person Location Organisation Miscellaneous (film/movie, entertainment award event, political event, programming language, sporting event and TV show)
  • 3. 3 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (1) Rizzo et al. evaluated the performance of: AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais and Zemanta On: 5 TED talks, 1000 news articles, and 217 conference abstracts. Could we do the same evaluation for microposts?
  • 4. 4 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (2) Preprocessing: convert bracket tokens to brackets Note: values can differ based on ontology mapping used! PER LOC ORG MISC AlchemyAPI 78.20% 74.60% 54.40% 10.20% Spotlight (0.2) 57.60% 46.40% 24.40% 5.00% Spotlight (0.5) 32.90% 3.70% 6.50% 7.30% OpenCalais 69.30% 73.10% 55.80% 31.40% Zemanta 70.40% 64.30% 48.10% 29.30% F1 values
  • 5. 5 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (3) AlchemyAPI: performs bad in recognizing exotic names, small villages, buildings and organizations Zemanta: same as AlchemyAPI + relies on capitalisation OpenCalais: bad in recognizing small villages, buildings and organizations. Does recognize big events! DBpedia Spotlight: returns multiple ‘possible’ entities What if we combine the power of all 4 services?
  • 6. 6 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (1) Apply machine learning on a feature vector of the output of the different services AlchemyAPI DBpedia Spotlight OpenCalais Zemanta Random Forest Confidence level PER, LOC, ORG, MISC Service specific entity 16 features PER, LOC, ORG, MISC
  • 7. 7 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (2) Evaluation on entity type PER LOC ORG MISC Spotlight (0.2) 82.20% 75.70% 60.40% 47.40% Spotlight (0.5) 81.60% 74.30% 59.40% 40.50% Noisy input data gives better results (final results on test set are not included and are part of the challenge)
  • 8. 8 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Conclusions Current NER tools do perform well in most cases Shortcomings: Incorrect use of capital lettres Abbreviations of organisations Small villages, counties and buildings Combining the output of several services yields good results
  • 9. 9 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin #MMLab