SlideShare una empresa de Scribd logo
1 de 18
Motivation
Data on the Web
29/05/13ESWC 2013, Montpellier, France
Some eyecatching opener illustrating growth and or diversity of web data
Combining a co-occurrence-based and a
semantic measure for entity linking
ESWC 2013: Extended Semantic Web Conference
28 May 2013, Montpellier, France
Bernardo Pereira Nunes, Stefan Dietze, Marco Antonio Casanova,
Ricardo Kawase, Besnik Fetahu, Wolfgang Nejdl
(PUC-Rio, BR) (L3S Research Center, DE)
Outline
– Introduction
– Motivation Example
– A combined approach towards entity linking
• Semantic Connectivity Score – Katz Index
• Co-occurrence-based measures
• Combined entity linking approach
– Evaluation
– Results
– Conclusions
29/05/13 ESWC 2013 – Montpellier, France
Introduction
• Linked Data and Web resources
• Sparsely interlinked resources
• Knowledge bases, with structured knowledge about entities
• NER & NED for extraction of entities
• Few semantics relationships between entities (skos:related, so:related)
• Entity linking, meaningful only at first (direct) degree of connectivity
• Exhaustive process considering large amounts of resources
29/05/13 ESWC 2013 – Montpellier, France
Motivation Example
• Semantic relatedness of concepts (entities)
• Exploit existing knowledge base structures
• Resource semantic similarity (entities)
• Latent relationships via semantic relations
29/05/13 ESWC 2013 – Montpellier, France
• The Charlotte Bobcats could go from the NBA’s
worst team to its best bargain.
•The New York Knicks got the big-game performances
they desperately needed from Carmelo Anthony and
Amar’e Stoudemire to beat the Miami Heat.
A combined entity-linking approach
Novel approach on entity-linking for resources of same and disparate datasets.
1. Semantic Connectivity Score (SCS)– knowledge graph based on Social
Network Theory – Katz Index.
2. Co-occurrence based Measure (CBM) – utilise entity co-occurrence in the
Web.
29/05/13 ESWC 2013 – Montpellier, France
Semantic Connectivity Score - SCS
• Measure relatedness of entity pairs computing Katz’s Index
• Use transversal properties to compute relatedness
• Exclude hierarchical properties:
– rdfs:subClassOf
– dcterms:subject
– skos:broader
• Quantify semantic connectivity of entity pairs (e1, e2):
29/05/13 ESWC 2013 – Montpellier, France
∑=
><
⋅=
τ
β
1
),(21 ||),( 21
l
l
ee
l
pathseeSCS
transversal paths of length l
between entity pairsdamping factor, exponentially
penalize longer paths.
Semantic Connectivity Score – SCS (1)
29/05/13 ESWC 2013 – Montpellier, France
• Remove edge directions from graphs
• Inverse properties considered equivalent:
i.e. isFathorOf ↔ isSonOf
• Empirically determine path length
Adoptions to knowledge graphs towards applying Katz index measure
Inverse property equivalence
Semantic Connectivity Score – SCS (2)
• Optimization factors for Katz:
– Exponentially many paths, measuring entity pair relatedness
– Small world assumptions
– Tradeoff of path length and connectivity contribution (τ=4)
29/05/13 ESWC 2013 – Montpellier, France
#Paths with increasing length Computation time for increasing path length
Co-occurrence-based Measure (CBM)
• Approximate number of Web resources mentioning entity pairs
• Similar to Pointwise Mutual Information and Normalised Google Distance
• Query search engines: e.g. “Carmelo Anthony” + “Charlotte Bobcats”
• Extract occurrences of each entity, and as well the entity pairs
29/05/13 ESWC 2013 – Montpellier, France







⋅
===
=∨=
=
otherwise,
))(log(
)),(log(
))(log(
)),(log(
1),()()(if,1
0)(0)(if,0
),(
2
21
1
21
2121
21
21
ecount
eecount
ecount
eecount
eecountecountecount
ecountecount
eeCBM
A combined entity-linking approach
• SCS as an exhaustive entity-linking procedure
• CBM –search engines to measure relatedness based on entity co-occurrence
• Complementary entity-linking results
• A combined measure, scalable and with broader coverage:
29/05/13 ESWC 2013 – Montpellier, France


 >
=+
otherwise),,(
0),(if),,(
),(
ji
jiji
jiSCSCBM
eeSCS
eeCBMeeCBM
eeα
Evaluation Setup
• Dataset: USAToday news
– 40, 000 document and 80, 000 entity pairs
• Gold standard generated using human evaluators
– 600 document and 1000 entity assessed pairs
• Quantify connectivity with 5-point Likert scale:
– correctness: strongly disagree to strongly agree
– expectedness: extremely unexpected to extremely expected
• Compare CBM, SCS, ESA entity-linking approaches
• Standard performance metrics: precision/recall/F1 measure
29/05/13 ESWC 2013 – Montpellier, France
Entity-Linking Results
• 5-point Likert scale, entity connectivity based on gold standard:
29/05/13 ESWC 2013 – Montpellier, France
Strongly Agree Agree Undecided Disagree Strongly Disagree
63 178 127 227 217
Precision/Recall/F1 compared against the gold standard for the competing approaches.
Entity-linking Results (1)
• Analysis of uncovered entity connections from competing approaches
• Expectedness of uncovered entity connections:
– SCS – 25% unexpected novel entity links
– CBM – 16% unexpected novel entity links
29/05/13 ESWC 2013 – Montpellier, France
CBM
(not in SCS)
CBM
(not in ESA)
SCS
(not in CBM)
SCS
(not in ESA)
ESA
(not in CBM)
ESA
(not in SCS)
Strongly Agree 9.5% 76% 3.1% 71% 7.9% 9.5%
Agree 12.3% 63.4% 11.2% 60.1% 8.9% 6.7%
Undecided 9.4% 60.6% 6.3% 59.8% 5.5% 7.9%
Disagree 15.0% 63.0% 7.1% 53.3% 7.1% 5.3%
Strongly Disagree 18.4% 63.1% 51.6% 4.6% 4.6% 6.9%
Entity-Linking Result Analysis
• Connectivity agreement: SCS vs. CBM
• Measured agreement based on Kendall’s correlation coefficient:
29/05/13 ESWC 2013 – Montpellier, France
Entity connectivity agreement, from connections induced by CBM and supported by SCS
τ k@2 k@5 k@10
USAToday 0.40 0.47 0.52
Entity-Linking Results Analysis (1)
• Complementary entity connections between SCS and CBM
• Many entity connections labelled as “undecided”, correct
• Examples: “Baracak Obama” and “Olympia Snowe”
– Human evaluators, marked as not connected
– SCS uncovered a connection of length 2 and more
29/05/13 ESWC 2013 – Montpellier, France
CBM SCS ESA CBM+SCS
Precision 0.32 0.34 0.16 0.34
Recall (GS) 0.81 0.78 0.23 0.90
Recall 0.52 0.51 0.15 0.58
F1 (GS) 0.46 0.47 0.19 0.50
F1 0.40 0.41 0.15 0.43
Conclusions
• An entity-linking approach across disparate datasets
• Knowledge graphs, adapted and utilized to uncover entity connections via
SCS and CBM
• Balanced tradeoffs between information gain and processing time for SCS
• Entity-linking gold standard measuring correctness of connectivity and
expectedness
• Combination of SCS and CBM as scalable entity-linking approach
• Increased precision and recall based on SCS+CBM
• Correctly uncovered connections marked as irrelevant by human evaluators
29/05/13 ESWC 2013 – Montpellier, France
Future Work
• Exploit semantics of edges connecting entities
• Detailed distinction of edges based on entity types
• Gold standard improvement, by showing the trace of intermediary entities
helping uncover a connection between an entity pair
• Filtering of nodes from a knowledge graph to improve scalability
29/05/13 ESWC 2013 – Montpellier, France
Thank you!
Questions?
29/05/13 ESWC 2013 – Montpellier, France

Más contenido relacionado

La actualidad más candente

Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Rinke Hoekstra
 
Introduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meetingIntroduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meeting
Hendrik Drachsler
 

La actualidad más candente (20)

Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Introduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meetingIntroduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meeting
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
 
WDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataWDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web Data
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 

Destacado

euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured Data
Besnik Fetahu
 

Destacado (8)

euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured Data
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity Pages
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For Wikipedia
 

Similar a Combining a co-occurrence-based and a semantic measure for entity linking

Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
Institute of Contemporary Sciences
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptx
DAYARNABBAIDYA3
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Luigi Vanfretti
 

Similar a Combining a co-occurrence-based and a semantic measure for entity linking (20)

Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data miningIEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of Documents
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Data fusion for city live event detection
Data fusion for city live event detectionData fusion for city live event detection
Data fusion for city live event detection
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptx
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Combining a co-occurrence-based and a semantic measure for entity linking

  • 1. Motivation Data on the Web 29/05/13ESWC 2013, Montpellier, France Some eyecatching opener illustrating growth and or diversity of web data Combining a co-occurrence-based and a semantic measure for entity linking ESWC 2013: Extended Semantic Web Conference 28 May 2013, Montpellier, France Bernardo Pereira Nunes, Stefan Dietze, Marco Antonio Casanova, Ricardo Kawase, Besnik Fetahu, Wolfgang Nejdl (PUC-Rio, BR) (L3S Research Center, DE)
  • 2. Outline – Introduction – Motivation Example – A combined approach towards entity linking • Semantic Connectivity Score – Katz Index • Co-occurrence-based measures • Combined entity linking approach – Evaluation – Results – Conclusions 29/05/13 ESWC 2013 – Montpellier, France
  • 3. Introduction • Linked Data and Web resources • Sparsely interlinked resources • Knowledge bases, with structured knowledge about entities • NER & NED for extraction of entities • Few semantics relationships between entities (skos:related, so:related) • Entity linking, meaningful only at first (direct) degree of connectivity • Exhaustive process considering large amounts of resources 29/05/13 ESWC 2013 – Montpellier, France
  • 4. Motivation Example • Semantic relatedness of concepts (entities) • Exploit existing knowledge base structures • Resource semantic similarity (entities) • Latent relationships via semantic relations 29/05/13 ESWC 2013 – Montpellier, France • The Charlotte Bobcats could go from the NBA’s worst team to its best bargain. •The New York Knicks got the big-game performances they desperately needed from Carmelo Anthony and Amar’e Stoudemire to beat the Miami Heat.
  • 5. A combined entity-linking approach Novel approach on entity-linking for resources of same and disparate datasets. 1. Semantic Connectivity Score (SCS)– knowledge graph based on Social Network Theory – Katz Index. 2. Co-occurrence based Measure (CBM) – utilise entity co-occurrence in the Web. 29/05/13 ESWC 2013 – Montpellier, France
  • 6. Semantic Connectivity Score - SCS • Measure relatedness of entity pairs computing Katz’s Index • Use transversal properties to compute relatedness • Exclude hierarchical properties: – rdfs:subClassOf – dcterms:subject – skos:broader • Quantify semantic connectivity of entity pairs (e1, e2): 29/05/13 ESWC 2013 – Montpellier, France ∑= >< ⋅= τ β 1 ),(21 ||),( 21 l l ee l pathseeSCS transversal paths of length l between entity pairsdamping factor, exponentially penalize longer paths.
  • 7. Semantic Connectivity Score – SCS (1) 29/05/13 ESWC 2013 – Montpellier, France • Remove edge directions from graphs • Inverse properties considered equivalent: i.e. isFathorOf ↔ isSonOf • Empirically determine path length Adoptions to knowledge graphs towards applying Katz index measure Inverse property equivalence
  • 8. Semantic Connectivity Score – SCS (2) • Optimization factors for Katz: – Exponentially many paths, measuring entity pair relatedness – Small world assumptions – Tradeoff of path length and connectivity contribution (τ=4) 29/05/13 ESWC 2013 – Montpellier, France #Paths with increasing length Computation time for increasing path length
  • 9. Co-occurrence-based Measure (CBM) • Approximate number of Web resources mentioning entity pairs • Similar to Pointwise Mutual Information and Normalised Google Distance • Query search engines: e.g. “Carmelo Anthony” + “Charlotte Bobcats” • Extract occurrences of each entity, and as well the entity pairs 29/05/13 ESWC 2013 – Montpellier, France        ⋅ === =∨= = otherwise, ))(log( )),(log( ))(log( )),(log( 1),()()(if,1 0)(0)(if,0 ),( 2 21 1 21 2121 21 21 ecount eecount ecount eecount eecountecountecount ecountecount eeCBM
  • 10. A combined entity-linking approach • SCS as an exhaustive entity-linking procedure • CBM –search engines to measure relatedness based on entity co-occurrence • Complementary entity-linking results • A combined measure, scalable and with broader coverage: 29/05/13 ESWC 2013 – Montpellier, France    > =+ otherwise),,( 0),(if),,( ),( ji jiji jiSCSCBM eeSCS eeCBMeeCBM eeα
  • 11. Evaluation Setup • Dataset: USAToday news – 40, 000 document and 80, 000 entity pairs • Gold standard generated using human evaluators – 600 document and 1000 entity assessed pairs • Quantify connectivity with 5-point Likert scale: – correctness: strongly disagree to strongly agree – expectedness: extremely unexpected to extremely expected • Compare CBM, SCS, ESA entity-linking approaches • Standard performance metrics: precision/recall/F1 measure 29/05/13 ESWC 2013 – Montpellier, France
  • 12. Entity-Linking Results • 5-point Likert scale, entity connectivity based on gold standard: 29/05/13 ESWC 2013 – Montpellier, France Strongly Agree Agree Undecided Disagree Strongly Disagree 63 178 127 227 217 Precision/Recall/F1 compared against the gold standard for the competing approaches.
  • 13. Entity-linking Results (1) • Analysis of uncovered entity connections from competing approaches • Expectedness of uncovered entity connections: – SCS – 25% unexpected novel entity links – CBM – 16% unexpected novel entity links 29/05/13 ESWC 2013 – Montpellier, France CBM (not in SCS) CBM (not in ESA) SCS (not in CBM) SCS (not in ESA) ESA (not in CBM) ESA (not in SCS) Strongly Agree 9.5% 76% 3.1% 71% 7.9% 9.5% Agree 12.3% 63.4% 11.2% 60.1% 8.9% 6.7% Undecided 9.4% 60.6% 6.3% 59.8% 5.5% 7.9% Disagree 15.0% 63.0% 7.1% 53.3% 7.1% 5.3% Strongly Disagree 18.4% 63.1% 51.6% 4.6% 4.6% 6.9%
  • 14. Entity-Linking Result Analysis • Connectivity agreement: SCS vs. CBM • Measured agreement based on Kendall’s correlation coefficient: 29/05/13 ESWC 2013 – Montpellier, France Entity connectivity agreement, from connections induced by CBM and supported by SCS τ k@2 k@5 k@10 USAToday 0.40 0.47 0.52
  • 15. Entity-Linking Results Analysis (1) • Complementary entity connections between SCS and CBM • Many entity connections labelled as “undecided”, correct • Examples: “Baracak Obama” and “Olympia Snowe” – Human evaluators, marked as not connected – SCS uncovered a connection of length 2 and more 29/05/13 ESWC 2013 – Montpellier, France CBM SCS ESA CBM+SCS Precision 0.32 0.34 0.16 0.34 Recall (GS) 0.81 0.78 0.23 0.90 Recall 0.52 0.51 0.15 0.58 F1 (GS) 0.46 0.47 0.19 0.50 F1 0.40 0.41 0.15 0.43
  • 16. Conclusions • An entity-linking approach across disparate datasets • Knowledge graphs, adapted and utilized to uncover entity connections via SCS and CBM • Balanced tradeoffs between information gain and processing time for SCS • Entity-linking gold standard measuring correctness of connectivity and expectedness • Combination of SCS and CBM as scalable entity-linking approach • Increased precision and recall based on SCS+CBM • Correctly uncovered connections marked as irrelevant by human evaluators 29/05/13 ESWC 2013 – Montpellier, France
  • 17. Future Work • Exploit semantics of edges connecting entities • Detailed distinction of edges based on entity types • Gold standard improvement, by showing the trace of intermediary entities helping uncover a connection between an entity pair • Filtering of nodes from a knowledge graph to improve scalability 29/05/13 ESWC 2013 – Montpellier, France
  • 18. Thank you! Questions? 29/05/13 ESWC 2013 – Montpellier, France