SlideShare una empresa de Scribd logo
1 de 18
Motivation
Data on the Web
29/05/13ESWC 2013, Montpellier, France
Some eyecatching opener illustrating growth and or diversity of web data
Combining a co-occurrence-based and a
semantic measure for entity linking
ESWC 2013: Extended Semantic Web Conference
28 May 2013, Montpellier, France
Bernardo Pereira Nunes, Stefan Dietze, Marco Antonio Casanova,
Ricardo Kawase, Besnik Fetahu, Wolfgang Nejdl
(PUC-Rio, BR) (L3S Research Center, DE)
Outline
– Introduction
– Motivation Example
– A combined approach towards entity linking
• Semantic Connectivity Score – Katz Index
• Co-occurrence-based measures
• Combined entity linking approach
– Evaluation
– Results
– Conclusions
29/05/13 ESWC 2013 – Montpellier, France
Introduction
• Linked Data and Web resources
• Sparsely interlinked resources
• Knowledge bases, with structured knowledge about entities
• NER & NED for extraction of entities
• Few semantics relationships between entities (skos:related, so:related)
• Entity linking, meaningful only at first (direct) degree of connectivity
• Exhaustive process considering large amounts of resources
29/05/13 ESWC 2013 – Montpellier, France
Motivation Example
• Semantic relatedness of concepts (entities)
• Exploit existing knowledge base structures
• Resource semantic similarity (entities)
• Latent relationships via semantic relations
29/05/13 ESWC 2013 – Montpellier, France
• The Charlotte Bobcats could go from the NBA’s
worst team to its best bargain.
•The New York Knicks got the big-game performances
they desperately needed from Carmelo Anthony and
Amar’e Stoudemire to beat the Miami Heat.
A combined entity-linking approach
Novel approach on entity-linking for resources of same and disparate datasets.
1. Semantic Connectivity Score (SCS)– knowledge graph based on Social
Network Theory – Katz Index.
2. Co-occurrence based Measure (CBM) – utilise entity co-occurrence in the
Web.
29/05/13 ESWC 2013 – Montpellier, France
Semantic Connectivity Score - SCS
• Measure relatedness of entity pairs computing Katz’s Index
• Use transversal properties to compute relatedness
• Exclude hierarchical properties:
– rdfs:subClassOf
– dcterms:subject
– skos:broader
• Quantify semantic connectivity of entity pairs (e1, e2):
29/05/13 ESWC 2013 – Montpellier, France
∑=
><
⋅=
τ
β
1
),(21 ||),( 21
l
l
ee
l
pathseeSCS
transversal paths of length l
between entity pairsdamping factor, exponentially
penalize longer paths.
Semantic Connectivity Score – SCS (1)
29/05/13 ESWC 2013 – Montpellier, France
• Remove edge directions from graphs
• Inverse properties considered equivalent:
i.e. isFathorOf ↔ isSonOf
• Empirically determine path length
Adoptions to knowledge graphs towards applying Katz index measure
Inverse property equivalence
Semantic Connectivity Score – SCS (2)
• Optimization factors for Katz:
– Exponentially many paths, measuring entity pair relatedness
– Small world assumptions
– Tradeoff of path length and connectivity contribution (τ=4)
29/05/13 ESWC 2013 – Montpellier, France
#Paths with increasing length Computation time for increasing path length
Co-occurrence-based Measure (CBM)
• Approximate number of Web resources mentioning entity pairs
• Similar to Pointwise Mutual Information and Normalised Google Distance
• Query search engines: e.g. “Carmelo Anthony” + “Charlotte Bobcats”
• Extract occurrences of each entity, and as well the entity pairs
29/05/13 ESWC 2013 – Montpellier, France







⋅
===
=∨=
=
otherwise,
))(log(
)),(log(
))(log(
)),(log(
1),()()(if,1
0)(0)(if,0
),(
2
21
1
21
2121
21
21
ecount
eecount
ecount
eecount
eecountecountecount
ecountecount
eeCBM
A combined entity-linking approach
• SCS as an exhaustive entity-linking procedure
• CBM –search engines to measure relatedness based on entity co-occurrence
• Complementary entity-linking results
• A combined measure, scalable and with broader coverage:
29/05/13 ESWC 2013 – Montpellier, France


 >
=+
otherwise),,(
0),(if),,(
),(
ji
jiji
jiSCSCBM
eeSCS
eeCBMeeCBM
eeα
Evaluation Setup
• Dataset: USAToday news
– 40, 000 document and 80, 000 entity pairs
• Gold standard generated using human evaluators
– 600 document and 1000 entity assessed pairs
• Quantify connectivity with 5-point Likert scale:
– correctness: strongly disagree to strongly agree
– expectedness: extremely unexpected to extremely expected
• Compare CBM, SCS, ESA entity-linking approaches
• Standard performance metrics: precision/recall/F1 measure
29/05/13 ESWC 2013 – Montpellier, France
Entity-Linking Results
• 5-point Likert scale, entity connectivity based on gold standard:
29/05/13 ESWC 2013 – Montpellier, France
Strongly Agree Agree Undecided Disagree Strongly Disagree
63 178 127 227 217
Precision/Recall/F1 compared against the gold standard for the competing approaches.
Entity-linking Results (1)
• Analysis of uncovered entity connections from competing approaches
• Expectedness of uncovered entity connections:
– SCS – 25% unexpected novel entity links
– CBM – 16% unexpected novel entity links
29/05/13 ESWC 2013 – Montpellier, France
CBM
(not in SCS)
CBM
(not in ESA)
SCS
(not in CBM)
SCS
(not in ESA)
ESA
(not in CBM)
ESA
(not in SCS)
Strongly Agree 9.5% 76% 3.1% 71% 7.9% 9.5%
Agree 12.3% 63.4% 11.2% 60.1% 8.9% 6.7%
Undecided 9.4% 60.6% 6.3% 59.8% 5.5% 7.9%
Disagree 15.0% 63.0% 7.1% 53.3% 7.1% 5.3%
Strongly Disagree 18.4% 63.1% 51.6% 4.6% 4.6% 6.9%
Entity-Linking Result Analysis
• Connectivity agreement: SCS vs. CBM
• Measured agreement based on Kendall’s correlation coefficient:
29/05/13 ESWC 2013 – Montpellier, France
Entity connectivity agreement, from connections induced by CBM and supported by SCS
τ k@2 k@5 k@10
USAToday 0.40 0.47 0.52
Entity-Linking Results Analysis (1)
• Complementary entity connections between SCS and CBM
• Many entity connections labelled as “undecided”, correct
• Examples: “Baracak Obama” and “Olympia Snowe”
– Human evaluators, marked as not connected
– SCS uncovered a connection of length 2 and more
29/05/13 ESWC 2013 – Montpellier, France
CBM SCS ESA CBM+SCS
Precision 0.32 0.34 0.16 0.34
Recall (GS) 0.81 0.78 0.23 0.90
Recall 0.52 0.51 0.15 0.58
F1 (GS) 0.46 0.47 0.19 0.50
F1 0.40 0.41 0.15 0.43
Conclusions
• An entity-linking approach across disparate datasets
• Knowledge graphs, adapted and utilized to uncover entity connections via
SCS and CBM
• Balanced tradeoffs between information gain and processing time for SCS
• Entity-linking gold standard measuring correctness of connectivity and
expectedness
• Combination of SCS and CBM as scalable entity-linking approach
• Increased precision and recall based on SCS+CBM
• Correctly uncovered connections marked as irrelevant by human evaluators
29/05/13 ESWC 2013 – Montpellier, France
Future Work
• Exploit semantics of edges connecting entities
• Detailed distinction of edges based on entity types
• Gold standard improvement, by showing the trace of intermediary entities
helping uncover a connection between an entity pair
• Filtering of nodes from a knowledge graph to improve scalability
29/05/13 ESWC 2013 – Montpellier, France
Thank you!
Questions?
29/05/13 ESWC 2013 – Montpellier, France

Más contenido relacionado

La actualidad más candente

Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Mathieu d'Aquin
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebMathieu d'Aquin
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Mathieu d'Aquin
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Introduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meetingIntroduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meetingHendrik Drachsler
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Christoph Lange
 
WDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataWDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataChristoph Lange
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentationPaolo Missier
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsChristoph Lange
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataCataldo Musto
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeGigaScience, BGI Hong Kong
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 

La actualidad más candente (20)

Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...Linking Universities - A broader look at the application of linked data and s...
Linking Universities - A broader look at the application of linked data and s...
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Introduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meetingIntroduction Presentation for LinkedUp kickoff meeting
Introduction Presentation for LinkedUp kickoff meeting
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
 
WDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web DataWDAqua ITN – Answering Questions using Web Data
WDAqua ITN – Answering Questions using Web Data
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 

Destacado

euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphBesnik Fetahu
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataBesnik Fetahu
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesBesnik Fetahu
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Besnik Fetahu
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?Besnik Fetahu
 
Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesBesnik Fetahu
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For WikipediaBesnik Fetahu
 

Destacado (8)

euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured Data
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity Pages
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For Wikipedia
 

Similar a Combining a co-occurrence-based and a semantic measure for entity linking

Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data miningIEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data miningsunda2011
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsIRJET Journal
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxAnonymous366406
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...sunda2011
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
Data fusion for city live event detection
Data fusion for city live event detectionData fusion for city live event detection
Data fusion for city live event detectionAlket Cecaj
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITOMarcoMellia
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Institute of Contemporary Sciences
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...eMadrid network
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataDatabricks
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxDAYARNABBAIDYA3
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 

Similar a Combining a co-occurrence-based and a semantic measure for entity linking (20)

Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data miningIEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
 
Performance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of DocumentsPerformance Analysis and Parallelization of CosineSimilarity of Documents
Performance Analysis and Parallelization of CosineSimilarity of Documents
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Knowledge...
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Data fusion for city live event detection
Data fusion for city live event detectionData fusion for city live event detection
Data fusion for city live event detection
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
 
GRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptxGRIFFOR_OxfordU CPS 20Mar2017.pptx
GRIFFOR_OxfordU CPS 20Mar2017.pptx
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 

Último

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Combining a co-occurrence-based and a semantic measure for entity linking

  • 1. Motivation Data on the Web 29/05/13ESWC 2013, Montpellier, France Some eyecatching opener illustrating growth and or diversity of web data Combining a co-occurrence-based and a semantic measure for entity linking ESWC 2013: Extended Semantic Web Conference 28 May 2013, Montpellier, France Bernardo Pereira Nunes, Stefan Dietze, Marco Antonio Casanova, Ricardo Kawase, Besnik Fetahu, Wolfgang Nejdl (PUC-Rio, BR) (L3S Research Center, DE)
  • 2. Outline – Introduction – Motivation Example – A combined approach towards entity linking • Semantic Connectivity Score – Katz Index • Co-occurrence-based measures • Combined entity linking approach – Evaluation – Results – Conclusions 29/05/13 ESWC 2013 – Montpellier, France
  • 3. Introduction • Linked Data and Web resources • Sparsely interlinked resources • Knowledge bases, with structured knowledge about entities • NER & NED for extraction of entities • Few semantics relationships between entities (skos:related, so:related) • Entity linking, meaningful only at first (direct) degree of connectivity • Exhaustive process considering large amounts of resources 29/05/13 ESWC 2013 – Montpellier, France
  • 4. Motivation Example • Semantic relatedness of concepts (entities) • Exploit existing knowledge base structures • Resource semantic similarity (entities) • Latent relationships via semantic relations 29/05/13 ESWC 2013 – Montpellier, France • The Charlotte Bobcats could go from the NBA’s worst team to its best bargain. •The New York Knicks got the big-game performances they desperately needed from Carmelo Anthony and Amar’e Stoudemire to beat the Miami Heat.
  • 5. A combined entity-linking approach Novel approach on entity-linking for resources of same and disparate datasets. 1. Semantic Connectivity Score (SCS)– knowledge graph based on Social Network Theory – Katz Index. 2. Co-occurrence based Measure (CBM) – utilise entity co-occurrence in the Web. 29/05/13 ESWC 2013 – Montpellier, France
  • 6. Semantic Connectivity Score - SCS • Measure relatedness of entity pairs computing Katz’s Index • Use transversal properties to compute relatedness • Exclude hierarchical properties: – rdfs:subClassOf – dcterms:subject – skos:broader • Quantify semantic connectivity of entity pairs (e1, e2): 29/05/13 ESWC 2013 – Montpellier, France ∑= >< ⋅= τ β 1 ),(21 ||),( 21 l l ee l pathseeSCS transversal paths of length l between entity pairsdamping factor, exponentially penalize longer paths.
  • 7. Semantic Connectivity Score – SCS (1) 29/05/13 ESWC 2013 – Montpellier, France • Remove edge directions from graphs • Inverse properties considered equivalent: i.e. isFathorOf ↔ isSonOf • Empirically determine path length Adoptions to knowledge graphs towards applying Katz index measure Inverse property equivalence
  • 8. Semantic Connectivity Score – SCS (2) • Optimization factors for Katz: – Exponentially many paths, measuring entity pair relatedness – Small world assumptions – Tradeoff of path length and connectivity contribution (τ=4) 29/05/13 ESWC 2013 – Montpellier, France #Paths with increasing length Computation time for increasing path length
  • 9. Co-occurrence-based Measure (CBM) • Approximate number of Web resources mentioning entity pairs • Similar to Pointwise Mutual Information and Normalised Google Distance • Query search engines: e.g. “Carmelo Anthony” + “Charlotte Bobcats” • Extract occurrences of each entity, and as well the entity pairs 29/05/13 ESWC 2013 – Montpellier, France        ⋅ === =∨= = otherwise, ))(log( )),(log( ))(log( )),(log( 1),()()(if,1 0)(0)(if,0 ),( 2 21 1 21 2121 21 21 ecount eecount ecount eecount eecountecountecount ecountecount eeCBM
  • 10. A combined entity-linking approach • SCS as an exhaustive entity-linking procedure • CBM –search engines to measure relatedness based on entity co-occurrence • Complementary entity-linking results • A combined measure, scalable and with broader coverage: 29/05/13 ESWC 2013 – Montpellier, France    > =+ otherwise),,( 0),(if),,( ),( ji jiji jiSCSCBM eeSCS eeCBMeeCBM eeα
  • 11. Evaluation Setup • Dataset: USAToday news – 40, 000 document and 80, 000 entity pairs • Gold standard generated using human evaluators – 600 document and 1000 entity assessed pairs • Quantify connectivity with 5-point Likert scale: – correctness: strongly disagree to strongly agree – expectedness: extremely unexpected to extremely expected • Compare CBM, SCS, ESA entity-linking approaches • Standard performance metrics: precision/recall/F1 measure 29/05/13 ESWC 2013 – Montpellier, France
  • 12. Entity-Linking Results • 5-point Likert scale, entity connectivity based on gold standard: 29/05/13 ESWC 2013 – Montpellier, France Strongly Agree Agree Undecided Disagree Strongly Disagree 63 178 127 227 217 Precision/Recall/F1 compared against the gold standard for the competing approaches.
  • 13. Entity-linking Results (1) • Analysis of uncovered entity connections from competing approaches • Expectedness of uncovered entity connections: – SCS – 25% unexpected novel entity links – CBM – 16% unexpected novel entity links 29/05/13 ESWC 2013 – Montpellier, France CBM (not in SCS) CBM (not in ESA) SCS (not in CBM) SCS (not in ESA) ESA (not in CBM) ESA (not in SCS) Strongly Agree 9.5% 76% 3.1% 71% 7.9% 9.5% Agree 12.3% 63.4% 11.2% 60.1% 8.9% 6.7% Undecided 9.4% 60.6% 6.3% 59.8% 5.5% 7.9% Disagree 15.0% 63.0% 7.1% 53.3% 7.1% 5.3% Strongly Disagree 18.4% 63.1% 51.6% 4.6% 4.6% 6.9%
  • 14. Entity-Linking Result Analysis • Connectivity agreement: SCS vs. CBM • Measured agreement based on Kendall’s correlation coefficient: 29/05/13 ESWC 2013 – Montpellier, France Entity connectivity agreement, from connections induced by CBM and supported by SCS τ k@2 k@5 k@10 USAToday 0.40 0.47 0.52
  • 15. Entity-Linking Results Analysis (1) • Complementary entity connections between SCS and CBM • Many entity connections labelled as “undecided”, correct • Examples: “Baracak Obama” and “Olympia Snowe” – Human evaluators, marked as not connected – SCS uncovered a connection of length 2 and more 29/05/13 ESWC 2013 – Montpellier, France CBM SCS ESA CBM+SCS Precision 0.32 0.34 0.16 0.34 Recall (GS) 0.81 0.78 0.23 0.90 Recall 0.52 0.51 0.15 0.58 F1 (GS) 0.46 0.47 0.19 0.50 F1 0.40 0.41 0.15 0.43
  • 16. Conclusions • An entity-linking approach across disparate datasets • Knowledge graphs, adapted and utilized to uncover entity connections via SCS and CBM • Balanced tradeoffs between information gain and processing time for SCS • Entity-linking gold standard measuring correctness of connectivity and expectedness • Combination of SCS and CBM as scalable entity-linking approach • Increased precision and recall based on SCS+CBM • Correctly uncovered connections marked as irrelevant by human evaluators 29/05/13 ESWC 2013 – Montpellier, France
  • 17. Future Work • Exploit semantics of edges connecting entities • Detailed distinction of edges based on entity types • Gold standard improvement, by showing the trace of intermediary entities helping uncover a connection between an entity pair • Filtering of nodes from a knowledge graph to improve scalability 29/05/13 ESWC 2013 – Montpellier, France
  • 18. Thank you! Questions? 29/05/13 ESWC 2013 – Montpellier, France