SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Combining Textual and Graph-Based Features
for Named Entity Disambiguation using
Undirected Probabilistic Graphical Models
Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung
& Philipp Cimiano
Semantic Computing Group
CITEC, Bielefeld University
1
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
2
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
3
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
4
is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
5
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
6
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
Link : dbr:Barack_Obama Term : “Barack Obama” Frequency : 1020
Link : dbr:Presidency_of_Obama Term : “Barack Obama” Frequency : 10
7
Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
8
NERFGUN
● Undirected Factor Graphs
● Collective disambiguation
9
NERFGUN
● Undirected Factor Graphs
● Collective disambiguation
● Textual & Graph-based Features (could be any language)
● Comparable with state-of-the-art systems
10
● Generates new states from given state
● Markov Chain Monte Carlo
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_UnionState s
11
● Generates new states from given state
● Markov Chain Monte Carlo
● State - partial or full assignment
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:IstanbulState s
12
Objective Score
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Union
13
Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
14
Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Randomly initialized
15
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
16
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey
E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
atomic change
17
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Commission
...
1 annotation changes
New state from all
possible candidates
18
Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Union
Output : List of
new states
...
19
Features
● PageRank - computed for all DBpedia resources using random walk
20
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
21
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
22
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
● Document Similarity - Text similarity of the given document and DBpedia
abstracts of each annotation
23
Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
● Document Similarity - Text similarity of the given document and DBpedia
abstracts of each annotation
● Topic Specific PageRank - computed for all DBpedia resources (while
noting the source and target nodes for each walk) using random walk
24
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
25
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Edit distance
26
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Term Frequency
e.g. inverted
index
27
Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Term Frequency
Edit distance
PageRank
Topic Specific PageRank
dbr:Turkey dbo:abstract “Turkey (/ˈtɜːrki/; Turkish: Türkiye [ˈtyɾcije]), officially the Republic of Turkey ... “
PageRank
28
Model Training
● SampleRank - learning weights for features
● Datasets : AIDA/CoNLL Training & MicroPost 2014 Training
29
Model Training - Local Evaluation
30
Model Training - Local Evaluation
31
PageRank + Term Frequency + Edit Distance 0.70 -short text
PageRank + Topic Specific PR + Term Frequency + Edit Distance 0.78 -long
text
Comparison
● GERBIL - framework for benchmarking named entity disambiguation and
recognition, question answering
● State-of-the-art systems : AGDISTIS, AIDA, DBpedia Spotlight, TagMe,
Babelfy, etc.
32
33
34
Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between annotations
● Impact of combining different features
● Achieves better on unseen datasets
● Comparable results to state-of-the-art
35
Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between annotations
● Impact of combining different features
● Achieves better on unseen datasets
● Comparable results to state-of-the-art
Thank you!
36

Más contenido relacionado

Destacado

On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...
On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...
On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...International Social Science Council (ISSC)
 
Martu El único e incomparable Iván
Martu   El único e incomparable IvánMartu   El único e incomparable Iván
Martu El único e incomparable IvánSexto Summa a y b
 
Developing national action plans on transport, health and environment
Developing national action plans on transport, health and environmentDeveloping national action plans on transport, health and environment
Developing national action plans on transport, health and environmentPeerasak C.
 
Com fer un sembazuru v2
Com fer un sembazuru v2Com fer un sembazuru v2
Com fer un sembazuru v2mercebolufer
 
Presentation on Probability
Presentation on ProbabilityPresentation on Probability
Presentation on ProbabilityblackboX
 

Destacado (8)

On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...
On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...
On the Move Migrations Seminar - Bus Stations of West African Cities: Unlocki...
 
Martu El único e incomparable Iván
Martu   El único e incomparable IvánMartu   El único e incomparable Iván
Martu El único e incomparable Iván
 
Integrated Security1
Integrated Security1Integrated Security1
Integrated Security1
 
X-mas edition LR 2016
X-mas edition LR 2016X-mas edition LR 2016
X-mas edition LR 2016
 
Developing national action plans on transport, health and environment
Developing national action plans on transport, health and environmentDeveloping national action plans on transport, health and environment
Developing national action plans on transport, health and environment
 
Com fer un sembazuru v2
Com fer un sembazuru v2Com fer un sembazuru v2
Com fer un sembazuru v2
 
On the Move Migrations Seminar - Resettlement and Development in Honduras
On the Move Migrations Seminar - Resettlement and Development in HondurasOn the Move Migrations Seminar - Resettlement and Development in Honduras
On the Move Migrations Seminar - Resettlement and Development in Honduras
 
Presentation on Probability
Presentation on ProbabilityPresentation on Probability
Presentation on Probability
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Último (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Combining Textual and Graph Features for Named Entity Disambiguation

  • 1. Combining Textual and Graph-Based Features for Named Entity Disambiguation using Undirected Probabilistic Graphical Models Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung & Philipp Cimiano Semantic Computing Group CITEC, Bielefeld University 1
  • 2. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 2
  • 3. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 3
  • 4. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 4
  • 5. is the capital of and the largest city inIstanbul Turkey Problem Definition - Named Entity Disambiguation E.U. 5
  • 6. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors 6
  • 7. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors Link : dbr:Barack_Obama Term : “Barack Obama” Frequency : 1020 Link : dbr:Presidency_of_Obama Term : “Barack Obama” Frequency : 10 7
  • 8. Candidate Retrieval ● Index from DBpedia & Wikipedia data with Frequency values ○ DBpedia label properties (rdfs:label, dbo:firstName, etc.) ○ Wikipedia anchors 8
  • 9. NERFGUN ● Undirected Factor Graphs ● Collective disambiguation 9
  • 10. NERFGUN ● Undirected Factor Graphs ● Collective disambiguation ● Textual & Graph-based Features (could be any language) ● Comparable with state-of-the-art systems 10
  • 11. ● Generates new states from given state ● Markov Chain Monte Carlo Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_UnionState s 11
  • 12. ● Generates new states from given state ● Markov Chain Monte Carlo ● State - partial or full assignment Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:IstanbulState s 12
  • 13. Objective Score is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Union 13
  • 14. Inference - Initial State is the capital ofIstanbul and the largest city in theTurkey E.U. 14
  • 15. Inference - Initial State is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Randomly initialized 15
  • 16. Inference is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si 16
  • 17. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission atomic change 17
  • 18. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Commission ... 1 annotation changes New state from all possible candidates 18
  • 19. Inference - Atomic change is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission Input : State si is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul dbr:Turkey dbr:European_Commission is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Union Output : List of new states ... 19
  • 20. Features ● PageRank - computed for all DBpedia resources using random walk 20
  • 21. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI 21
  • 22. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form 22
  • 23. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form ● Document Similarity - Text similarity of the given document and DBpedia abstracts of each annotation 23
  • 24. Features ● PageRank - computed for all DBpedia resources using random walk ● Term Frequency - frequency values between surface form and URI ● Edit distance - Levenshtein distance between URI and surface form ● Document Similarity - Text similarity of the given document and DBpedia abstracts of each annotation ● Topic Specific PageRank - computed for all DBpedia resources (while noting the source and target nodes for each walk) using random walk 24
  • 25. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union 25
  • 26. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Edit distance 26
  • 27. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Term Frequency e.g. inverted index 27
  • 28. Factor Graphs - Features is the capital ofIstanbul and the largest city in theTurkey E.U. dbr:Istanbul_University dbr:Turkey dbr:European_Union Term Frequency Edit distance PageRank Topic Specific PageRank dbr:Turkey dbo:abstract “Turkey (/ˈtɜːrki/; Turkish: Türkiye [ˈtyɾcije]), officially the Republic of Turkey ... “ PageRank 28
  • 29. Model Training ● SampleRank - learning weights for features ● Datasets : AIDA/CoNLL Training & MicroPost 2014 Training 29
  • 30. Model Training - Local Evaluation 30
  • 31. Model Training - Local Evaluation 31 PageRank + Term Frequency + Edit Distance 0.70 -short text PageRank + Topic Specific PR + Term Frequency + Edit Distance 0.78 -long text
  • 32. Comparison ● GERBIL - framework for benchmarking named entity disambiguation and recognition, question answering ● State-of-the-art systems : AGDISTIS, AIDA, DBpedia Spotlight, TagMe, Babelfy, etc. 32
  • 33. 33
  • 34. 34
  • 35. Conclusion ● Collective disambiguation of named entities ● Model based on factor graphs to capture dependencies between annotations ● Impact of combining different features ● Achieves better on unseen datasets ● Comparable results to state-of-the-art 35
  • 36. Conclusion ● Collective disambiguation of named entities ● Model based on factor graphs to capture dependencies between annotations ● Impact of combining different features ● Achieves better on unseen datasets ● Comparable results to state-of-the-art Thank you! 36