This document describes a named entity disambiguation system called NERFGUN that uses undirected probabilistic graphical models. NERFGUN combines textual and graph-based features for collective disambiguation of named entities. It indexes candidate entities from DBpedia and Wikipedia with frequency values. NERFGUN uses undirected factor graphs and Markov chain Monte Carlo inference to jointly model dependencies between annotations. Experimental results show NERFGUN achieves comparable or better performance than state-of-the-art systems on benchmark datasets.
Unlocking the Future of AI Agents with Large Language Models
Combining Textual and Graph Features for Named Entity Disambiguation
1. Combining Textual and Graph-Based Features
for Named Entity Disambiguation using
Undirected Probabilistic Graphical Models
Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung
& Philipp Cimiano
Semantic Computing Group
CITEC, Bielefeld University
1
2. is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
2
3. is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
3
4. is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
4
5. is the capital of and the largest city inIstanbul Turkey
Problem Definition - Named Entity Disambiguation
E.U.
5
6. Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
6
7. Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
Link : dbr:Barack_Obama Term : “Barack Obama” Frequency : 1020
Link : dbr:Presidency_of_Obama Term : “Barack Obama” Frequency : 10
7
8. Candidate Retrieval
● Index from DBpedia & Wikipedia data with Frequency values
○ DBpedia label properties (rdfs:label, dbo:firstName, etc.)
○ Wikipedia anchors
8
10. NERFGUN
● Undirected Factor Graphs
● Collective disambiguation
● Textual & Graph-based Features (could be any language)
● Comparable with state-of-the-art systems
10
11. ● Generates new states from given state
● Markov Chain Monte Carlo
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_UnionState s
11
12. ● Generates new states from given state
● Markov Chain Monte Carlo
● State - partial or full assignment
Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:IstanbulState s
12
13. Objective Score
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Union
13
14. Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
14
15. Inference - Initial State
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Randomly initialized
15
16. Inference
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
16
17. Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey
E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
atomic change
17
18. Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Commission
...
1 annotation changes
New state from all
possible candidates
18
19. Inference - Atomic change
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Commission
Input : State si
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul dbr:Turkey dbr:European_Commission
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_Atatürk_Airport dbr:Turkey dbr:European_Union
Output : List of
new states
...
19
21. Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
21
22. Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
22
23. Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
● Document Similarity - Text similarity of the given document and DBpedia
abstracts of each annotation
23
24. Features
● PageRank - computed for all DBpedia resources using random walk
● Term Frequency - frequency values between surface form and URI
● Edit distance - Levenshtein distance between URI and surface form
● Document Similarity - Text similarity of the given document and DBpedia
abstracts of each annotation
● Topic Specific PageRank - computed for all DBpedia resources (while
noting the source and target nodes for each walk) using random walk
24
25. Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
25
26. Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Edit distance
26
27. Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Term Frequency
e.g. inverted
index
27
28. Factor Graphs - Features
is the capital ofIstanbul and the largest city in theTurkey E.U.
dbr:Istanbul_University dbr:Turkey dbr:European_Union
Term Frequency
Edit distance
PageRank
Topic Specific PageRank
dbr:Turkey dbo:abstract “Turkey (/ˈtɜːrki/; Turkish: Türkiye [ˈtyɾcije]), officially the Republic of Turkey ... “
PageRank
28
29. Model Training
● SampleRank - learning weights for features
● Datasets : AIDA/CoNLL Training & MicroPost 2014 Training
29
31. Model Training - Local Evaluation
31
PageRank + Term Frequency + Edit Distance 0.70 -short text
PageRank + Topic Specific PR + Term Frequency + Edit Distance 0.78 -long
text
32. Comparison
● GERBIL - framework for benchmarking named entity disambiguation and
recognition, question answering
● State-of-the-art systems : AGDISTIS, AIDA, DBpedia Spotlight, TagMe,
Babelfy, etc.
32
35. Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between annotations
● Impact of combining different features
● Achieves better on unseen datasets
● Comparable results to state-of-the-art
35
36. Conclusion
● Collective disambiguation of named entities
● Model based on factor graphs to capture dependencies between annotations
● Impact of combining different features
● Achieves better on unseen datasets
● Comparable results to state-of-the-art
Thank you!
36