SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
University of Economics                                                            Czech Technical University
             Prague                                                                             in Prague



           Recognizing, Classifying and Linking
           Entities with Wikipedia and DBpedia

                                                  Milan Dojchinovski1, Tomas Kliegr2
1 Faculty of Information Technology                                                 2Faculty
                                                                                           of Informatics and Statistics
Czech Technical University in Prague                                                 University of Economics, Prague


                                                                Milan Dojchinovski
                              milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk



                                            The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012)
                                                                                        November 22-23, 2012, Smolenice, SK

 Except where otherwise noted, the content of this presentation is licensed under
 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   2
Introduction

 ‣    Unsupervised and fully-automated:
  -    entity recognition - rule based lexico-syntactic patterns
  -    entity classification by extraction of hypernyms - targeted hypernym extraction
  -    entity linking to DBpedia concepts

 ‣    Publication as Linked Data
  -    results in NLP Interchange Format (NIF)




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   3
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   4
Tool Architecture

 ‣   Available as Web 2.0 application at: http://ner.vse.cz/thd

 ‣   Web API available at: http://ner.vse.cz/thd/docs




                                                          Fig 1. Architecture overview




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   5
Entity Recognition and Classification

 ‣    Entity Recognition
  -    2 JAPE grammars: 1) NNP+ 2) JJ* NN+
  -    input: free text
  -    output: Named (e.g., Diego Maradona ) or Common Entities (e.g., hockey player )

 ‣    Entity Classification
  -    supported by the Targeted Hypernym Discovery algorithm
  -    lexico-syntactic patterns, e.g. _x_ is a _y_




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   6
Entity Linking and Publication

 ‣    Entity Linking
  -    linking with concepts from DBpedia
  -    used Wikipedia Search API
  -    mapping Wikipedia article URL to its DBpedia representation

 ‣    Publication in NIF
  -    NLP Interchange Format (RDF-based representation)
  -    each processed document (context) has unique identifier
  -    each entity and hypernym as offset-based string




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   7
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   8
Experiments

 ‣   Question addressed
     -   How well our tool recognizes, classifies and links Named and Common Entities?
 ‣   Experiment setup
     -   manually created dataset, Czech Traveler Dataset
     -   101 Named Entities, 85 Common Entities
     -   comparison with 3 other systems: DBpedia Spotlight, Open Calais, Alchemy API
 ‣   Results
     -   Named Entities,
         •   f-score: recognition 0.66, classification 0.66, linking 0.58

     -   Common Entities
         •   f-score: recognition 0.60, classification 0.51, linking 0.61

     -   better results in all tasks
         •   overtaken only by DBpedia Spotlight - linking of common entities with f-score 0.69


Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   9
Overview

 ‣   Introduction

 ‣   Entity Recognition, Classification and Publication

 ‣   Experiments

 ‣   Conclusion and Future Work




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   10
Conclusion and Future Work

 ‣   Tool for Entity Recognition, Classification and Publication

 ‣   Future directions
     -   multilingual support - Dutch, German and Czech language
     -   grammar improvements
     -   evaluation on a standard benchmark




Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk   11
Feedback




                                                               Thank you!
                                             Questions, comments, ideas?


                                          demo at: http://ner.vse.cz/thd

                            Milan Dojchinovski                                       @m1ci
                            milan.dojchinovski@fit.cvut.cz                            http://dojchinovski.mk

  Except where otherwise noted, the content of this presentation is licensed under
  Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported                                          12

Más contenido relacionado

Similar a Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data PresentationShawn Day
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Anthony Fisher Camilleri
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayBaoxu Shi
 
DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME Stavros Zervoudakis
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked DataMilan Dojchinovski
 
Dariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingMinel Jean-Luc
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …Marc Smith
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Hristian Daskalov
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Frieda Brioschi
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceKai Eckert
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
 
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)IT Arena
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 

Similar a Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia (20)

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
Structured Data Presentation
Structured Data PresentationStructured Data Presentation
Structured Data Presentation
 
Lod2
Lod2Lod2
Lod2
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
 
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic WayConstructing Knowledge Graph for Social Networks in a Deep and Holistic Way
Constructing Knowledge Graph for Social Networks in a Deep and Holistic Way
 
DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME DLT analytics and AI workshop 17 October 2019 WELCOME
DLT analytics and AI workshop 17 October 2019 WELCOME
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked Data
 
Dariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displaying
 
Building arguments on Open Data
Building arguments on Open DataBuilding arguments on Open Data
Building arguments on Open Data
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
 
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...Blockchain in Learning & Career Development: The Case of the Open Source Univ...
Blockchain in Learning & Career Development: The Case of the Open Source Univ...
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
 
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...
 
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

  • 1. University of Economics Czech Technical University Prague in Prague Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia Milan Dojchinovski1, Tomas Kliegr2 1 Faculty of Information Technology 2Faculty of Informatics and Statistics Czech Technical University in Prague University of Economics, Prague Milan Dojchinovski milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012) November 22-23, 2012, Smolenice, SK Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
  • 2. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 2
  • 3. Introduction ‣ Unsupervised and fully-automated: - entity recognition - rule based lexico-syntactic patterns - entity classification by extraction of hypernyms - targeted hypernym extraction - entity linking to DBpedia concepts ‣ Publication as Linked Data - results in NLP Interchange Format (NIF) Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 3
  • 4. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 4
  • 5. Tool Architecture ‣ Available as Web 2.0 application at: http://ner.vse.cz/thd ‣ Web API available at: http://ner.vse.cz/thd/docs Fig 1. Architecture overview Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 5
  • 6. Entity Recognition and Classification ‣ Entity Recognition - 2 JAPE grammars: 1) NNP+ 2) JJ* NN+ - input: free text - output: Named (e.g., Diego Maradona ) or Common Entities (e.g., hockey player ) ‣ Entity Classification - supported by the Targeted Hypernym Discovery algorithm - lexico-syntactic patterns, e.g. _x_ is a _y_ Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 6
  • 7. Entity Linking and Publication ‣ Entity Linking - linking with concepts from DBpedia - used Wikipedia Search API - mapping Wikipedia article URL to its DBpedia representation ‣ Publication in NIF - NLP Interchange Format (RDF-based representation) - each processed document (context) has unique identifier - each entity and hypernym as offset-based string Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 7
  • 8. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 8
  • 9. Experiments ‣ Question addressed - How well our tool recognizes, classifies and links Named and Common Entities? ‣ Experiment setup - manually created dataset, Czech Traveler Dataset - 101 Named Entities, 85 Common Entities - comparison with 3 other systems: DBpedia Spotlight, Open Calais, Alchemy API ‣ Results - Named Entities, • f-score: recognition 0.66, classification 0.66, linking 0.58 - Common Entities • f-score: recognition 0.60, classification 0.51, linking 0.61 - better results in all tasks • overtaken only by DBpedia Spotlight - linking of common entities with f-score 0.69 Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 9
  • 10. Overview ‣ Introduction ‣ Entity Recognition, Classification and Publication ‣ Experiments ‣ Conclusion and Future Work Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 10
  • 11. Conclusion and Future Work ‣ Tool for Entity Recognition, Classification and Publication ‣ Future directions - multilingual support - Dutch, German and Czech language - grammar improvements - evaluation on a standard benchmark Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 11
  • 12. Feedback Thank you! Questions, comments, ideas? demo at: http://ner.vse.cz/thd Milan Dojchinovski @m1ci milan.dojchinovski@fit.cvut.cz http://dojchinovski.mk Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported 12