SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Property Matching and Query Expansion on
Linked Data Using Kullback-Leibler Divergence
Sean Golliher, Nathan Fortier, Logan Perreault

December 12, 2013

1 / 25
Property Matching Problem

Databases with different properties:

2 / 25
def: Query Expansion

Query expansion (QE) is the process of reformulating a seed
query to improve retrieval performance in information retrieval
operations.

3 / 25
Societal Cloud

4 / 25
Cloud Diagram (TRIZ Problem Solving)

5 / 25
Cloud Diagram Broken

6 / 25
Property Matching Problem

How do we find all actors in both databases?
Don’t want to manually inspect all databases
Can we use SPARQL query language to infer across all datasets?
SELECT ?p
WHERE { s ?p o }
Can only match total sizes of returned triple sets

7 / 25
Original Bayesian Approach

Problems with Bayesian Approach
Had to create, and track, a large vocabulary for training
Smoothing issues with very sparse text
Underflow issues – small confidence values
Complexity of likelihood was growing:
n different features in feature set X and c classes + tunable parameters.

8 / 25
KL-Divergence

Original paper from 1951 entitled “On Information and Sufficiency”
Also referred to as“relative entropy”
A system gains entropy when it moves to a state with more possible
arrangements. For example, a liquid to a gas.
Used in paper from 2003 for text categorization:
”Using KL-Distance for Text Categorization
Elegant and efficient method for plagiarism detection

9 / 25
KL-Divergence

Measure of divergence of information between two distributions:
D(P

Q) =

P(x) log
x∈X

P(x)
Q(x)

Not symmetric

10 / 25
KL-Divergence Example

11 / 25
KL-Divergence Example

Table : Generic Vocabularies Generated by Fixing on Predicates

d1

d2

d3

subject1
object1
object2
subject2
object3
object3

subject3
object4

subject1
object1
object2
subject4
object3

subject2
object3

ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5
5
0
5
0
5
1/4
tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now

12 / 25
Algorithm Description

13 / 25
Formal Problem Statement

Given:
Two databases DB1 and DB2
A predicate p1 ∈ DB1
An object type S1 where some triple “s p1 o exists in D1
where s ∈ S1

Find predicate p2 in DB2 where p2 is equivilant to p1

14 / 25
High Level Description

Create a document d1 containing labels of all objects linked
by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
For each predicate p2 used by S2 create a document d2
containing labels of all objects linked by p2
Remove stop words and language tags from d1 and d2
For each document compute the normalized KL-Divergence,
KLD ∗ (d1 , d2 )
Return predicate corresponding to the document with the
lowest KL-Divergence

15 / 25
Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 )
Create document d1 containing labels of all objects linked by p1
Find an object type S2 ∈ d2 where S1 is equivilant to S2
for each predicate p2 used by S2 do
Create document d2 containing labels of all objects linked by p2
end for
Remove stop words and language tags from d1 and d2
min ← 1
for each predicate pi used by S2 do
k ← KLD ∗ (d1 , di )
if k < min then
min ← k
pmap ← pi
end if
end for
return pmap

16 / 25
Computing KL-Divergence
KL-Divergence is computed as
(P(tk , di ) − P(tk , dj )) × log

KLD(di , dj ) =
k∈V

Where
P(tk , di ) =

tf (tk , di )
x∈di tf (tx , dj )

P(tk , di )
(1)
P(tk , dj )

(2)

If tk does not occur in di then P(tk , di ) ←
KL-Divergence is then normalized as follows:
KLD ∗ (di , dj ) =

KLD(di , dj )
KLD(di , 0)

(3)

17 / 25
Algorithm 2 tf (tk , di )
tf ← 0
for each term tx in di do
if sim(tk , tx ) > τ then
tf ← tf + 1
end if
end for
return tf

18 / 25
Experimental Results

19 / 25
Experimental Results

20 / 25
Experimental Results

21 / 25
Experimental Results

22 / 25
Experimental Results

23 / 25
Experimental Results

24 / 25
Questions?

25 / 25

Más contenido relacionado

La actualidad más candente

CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306Yasuo Tabei
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceYasuo Tabei
 
Functional programming
Functional programmingFunctional programming
Functional programmingHeman Gandhi
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
lecture 12
lecture 12lecture 12
lecture 12sajinsc
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...marxliouville
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackTwitter Inc.
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms ManishPrajapati78
 
Extract And Manage Knowledge
Extract And Manage KnowledgeExtract And Manage Knowledge
Extract And Manage Knowledgeabedali
 
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Tarek Dib
 

La actualidad más candente (20)

CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
lecture 12
lecture 12lecture 12
lecture 12
 
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
How to share a secret
How to share a secretHow to share a secret
How to share a secret
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
 
Extract And Manage Knowledge
Extract And Manage KnowledgeExtract And Manage Knowledge
Extract And Manage Knowledge
 
Computational Complexity
Computational ComplexityComputational Complexity
Computational Complexity
 
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN
 
LSH
LSHLSH
LSH
 

Similar a Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
IVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learningIVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learningCharles Deledalle
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
 
Practical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational InferencePractical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational InferenceArnim Bleier
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmDaniele Dell'Aglio
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisMehwish Alam
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf
 
Structure and interpretation of computer programs modularity, objects, and ...
Structure and interpretation of computer programs   modularity, objects, and ...Structure and interpretation of computer programs   modularity, objects, and ...
Structure and interpretation of computer programs modularity, objects, and ...bdemchak
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 

Similar a Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence (20)

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
IVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learningIVR - Chapter 7 - Patch models and dictionary learning
IVR - Chapter 7 - Patch models and dictionary learning
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Practical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational InferencePractical Collapsed Stochastic Variational Inference
Practical Collapsed Stochastic Variational Inference
 
On Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realmOn Unified Stream Reasoning - The RDF Stream Processing realm
On Unified Stream Reasoning - The RDF Stream Processing realm
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Structure and interpretation of computer programs modularity, objects, and ...
Structure and interpretation of computer programs   modularity, objects, and ...Structure and interpretation of computer programs   modularity, objects, and ...
Structure and interpretation of computer programs modularity, objects, and ...
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
Lec1
Lec1Lec1
Lec1
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Introduction to Prolog
Introduction to PrologIntroduction to Prolog
Introduction to Prolog
 

Más de Sean Golliher

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Sean Golliher
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - IndexingSean Golliher
 
PageRank and The Google Matrix
PageRank and The Google MatrixPageRank and The Google Matrix
PageRank and The Google MatrixSean Golliher
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerSean Golliher
 

Más de Sean Golliher (9)

Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)Time Series Forecasting using Neural Nets (GNNNs)
Time Series Forecasting using Neural Nets (GNNNs)
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
PageRank and The Google Matrix
PageRank and The Google MatrixPageRank and The Google Matrix
PageRank and The Google Matrix
 
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a CrawlerCSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence

  • 1. Property Matching and Query Expansion on Linked Data Using Kullback-Leibler Divergence Sean Golliher, Nathan Fortier, Logan Perreault December 12, 2013 1 / 25
  • 2. Property Matching Problem Databases with different properties: 2 / 25
  • 3. def: Query Expansion Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. 3 / 25
  • 5. Cloud Diagram (TRIZ Problem Solving) 5 / 25
  • 7. Property Matching Problem How do we find all actors in both databases? Don’t want to manually inspect all databases Can we use SPARQL query language to infer across all datasets? SELECT ?p WHERE { s ?p o } Can only match total sizes of returned triple sets 7 / 25
  • 8. Original Bayesian Approach Problems with Bayesian Approach Had to create, and track, a large vocabulary for training Smoothing issues with very sparse text Underflow issues – small confidence values Complexity of likelihood was growing: n different features in feature set X and c classes + tunable parameters. 8 / 25
  • 9. KL-Divergence Original paper from 1951 entitled “On Information and Sufficiency” Also referred to as“relative entropy” A system gains entropy when it moves to a state with more possible arrangements. For example, a liquid to a gas. Used in paper from 2003 for text categorization: ”Using KL-Distance for Text Categorization Elegant and efficient method for plagiarism detection 9 / 25
  • 10. KL-Divergence Measure of divergence of information between two distributions: D(P Q) = P(x) log x∈X P(x) Q(x) Not symmetric 10 / 25
  • 12. KL-Divergence Example Table : Generic Vocabularies Generated by Fixing on Predicates d1 d2 d3 subject1 object1 object2 subject2 object3 object3 subject3 object4 subject1 object1 object2 subject4 object3 subject2 object3 ex: D(d1 d2 ) = 1 log 1/5 + 1 log 1/5 + ........ + 2 log 2/5 5 0 5 0 5 1/4 tf( subject1 ) is 1/5 in d1 and 0 in d2 – using value for now 12 / 25
  • 14. Formal Problem Statement Given: Two databases DB1 and DB2 A predicate p1 ∈ DB1 An object type S1 where some triple “s p1 o exists in D1 where s ∈ S1 Find predicate p2 in DB2 where p2 is equivilant to p1 14 / 25
  • 15. High Level Description Create a document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 For each predicate p2 used by S2 create a document d2 containing labels of all objects linked by p2 Remove stop words and language tags from d1 and d2 For each document compute the normalized KL-Divergence, KLD ∗ (d1 , d2 ) Return predicate corresponding to the document with the lowest KL-Divergence 15 / 25
  • 16. Algorithm 1 FindPredicate(DB1 , DB2 , p1 , S1 ) Create document d1 containing labels of all objects linked by p1 Find an object type S2 ∈ d2 where S1 is equivilant to S2 for each predicate p2 used by S2 do Create document d2 containing labels of all objects linked by p2 end for Remove stop words and language tags from d1 and d2 min ← 1 for each predicate pi used by S2 do k ← KLD ∗ (d1 , di ) if k < min then min ← k pmap ← pi end if end for return pmap 16 / 25
  • 17. Computing KL-Divergence KL-Divergence is computed as (P(tk , di ) − P(tk , dj )) × log KLD(di , dj ) = k∈V Where P(tk , di ) = tf (tk , di ) x∈di tf (tx , dj ) P(tk , di ) (1) P(tk , dj ) (2) If tk does not occur in di then P(tk , di ) ← KL-Divergence is then normalized as follows: KLD ∗ (di , dj ) = KLD(di , dj ) KLD(di , 0) (3) 17 / 25
  • 18. Algorithm 2 tf (tk , di ) tf ← 0 for each term tx in di do if sim(tk , tx ) > τ then tf ← tf + 1 end if end for return tf 18 / 25