SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Bottom-up Discovery of Context-aware Quality Constraints
for Heterogeneous Knowledge Graphs
Xander Wilcke1
, Maurice de Kleijn2
, Victor de Boer1
,
Henk Scholten2
, Frank van Harmelen1
1. Dept. of Computer Science
2. Dept. of Spatial Economics
Vrije Universiteit Amsterdam, The Netherlands
KDIR 2020
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 2 / 29
Overview
1. Quality control - why context matters
2. Defining context-aware constraints
3. Discovering context-aware constraints
4. A two-fold evaluation, from
 an algorithmic perspective, and
 a user perspective
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 3 / 29
Knowledge as a graph
●
Knowledge Graphs are getting increasingly adopted
– Institutes, museums, tech giants, businesses, …
– Knowledge quality is no longer optional
How to maintain the quality of the knowledge across its entire life cycle?
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 4 / 29
Quality Control
●
A key component is the quality constraint
– Helps guard the consistency, accuracy, precision, etc.
– Constraint languages for knowledge graphs
●
SHACL
●
ShEx
Figure from “Jose E. Labra Gayo et al. (2018) Validating RDF Data, Synthesis Lectures on the
Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328”
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 5 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
material
??
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 6 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
Knowledge graphs allow for context-level constraints
material
??
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 7 / 29
Contextual Clusters
Context Unaware Context Aware
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 8 / 29
Contextual Clusters
Context Unaware Context Aware
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 9 / 29
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 10 / 29
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 11 / 29
Contributions
1. We introduce context-aware constraints, which
●
offer a more fine-grained control of the domains onto which to impose
restrictions
●
apply to domains defined by graph motifs (contextual pattern)
●
allow for multimodal pattern fragments (numbers, dates, texts, ...)
2. We also introduce a (embarrassingly parallel) bottom-up anytime algorithm to
discover context-aware constraints in heterogeneous knowledge graphs
3. We evaluate 1 and 2 in a user study with experts in a real-world knowledge
validation use case
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 12 / 29
Knowledge Graphs
●
Graph-shaped knowledge bases
●
Assertions are encoded as edges between nodes
●
Nodes can be
– Entities: things, concepts, etc.
– Literals: strings, numbers, dates, etc.
●
Contexts gives entities their meaning
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 13 / 29
Defining Context-aware Constraints
A context-aware constraint states that every entity
which satisfies antecedent must also satisfy consequent
here, and
assertion patterns
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 14 / 29
Defining Context-aware Constraints
Pattern variables can express
●
Any specific node (entity or literal) /
●
All entities of a type t (object-type)
●
All literals of a datatype dt (data-type)
●
All literals which match a
regular expression s (value-type)
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 15 / 29
Examples
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 16 / 29
Discovering Context-aware Constraints
●
Algorithm properties
– Bottom-up: learns constraints directly from any knowledge graph
– Anytime property: longer runs yield constraints with more fine-grained
domains
– Embarrassingly parallel: newly discovered constraints form a new branch of
which the children can be computed independently
●
Algorithm assumptions
1) The large majority of the knowledge is valid and accurate, and
2) that these two qualities can be captured using frequent pattern mining
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 17 / 29
Discovering Context-aware Constraints
Main components
●
The generation forest
stores constraints in generation trees,
and keeps track of process per depth
●
The explore-extend loop
explores and tests increasingly-more
complex constraints
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 18 / 29
Discovering Context-aware Constraints
1. Generate all constraints of depth 0 that
exceed minimal support and confidence:
1)
2)
3)
4)
5)
6)
7)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 19 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 20 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 21 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 22 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 23 / 29
Discovering Context-aware Constraints
Graph Perspective
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 24 / 29
Experiments & Evaluation
●
Algorithmic perspective
– Goal: to determine the trade-off between chosen support
and confidence, and the contraints they yield
– Form: grid search on 3 distinctly-different datasets with
support and confidence as parameters
●
User perspective
– Goal: to assess the effectiveness of our method to discover
constraints that are useful for quality control
– Form: a structured user evaluation with knowledge-management
experts in a real-world knowledge validation use case
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 25 / 29
Evaluation – Algorithmic Perspective
●
Strong positive correlation between the number of
discovered constraints and the chosen support and
confidence values (Table 3)
●
Number of relations (cf. dataset size) is likely the main
attributor to the number of discovered constraints
●
Possitive correlation between number of pruned and
discovered constraints suggests that the pruning
strategy is, to an extent, effective (Table 3)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 26 / 29
Evaluation – User Perspective
●
Structured User Evaluation
– Workshop hosted at Rijkswaterstaat,
The Netherlands
– Domain of asset management and
civil engineering
– 21 participants, all experts on
knowledge maintenance and validation
– Asked to assess constraints on
usefullness and graininess. Constraints
are divided into 3x4 groups of increasing
complexity (unbeknownst to participants)
Rijkswaterstaat, the Netherlands
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 27 / 29
Evaluation – User Perspective
●
More than half of the participants thought the
complexity of the discovered constraints was well
balanced (Tabel 8)
●
There is little difference in scores between the three
complexity groups, suggesting no interaction or
too little difference between groups (Tabel 9)
●
Overall fair to moderate agreement on usefullness
between participants, but significant differences in
agreements between complexity groups (Tabel 9)
●
Neutral to agreeable stance with respect to the overall
usefulness of our method, but a considerable portion
was unsure, likely due to lack of familiarity with the
domain (Tabel 10)
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 28 / 29
Conclusion
●
Context-aware constraints are, to an extent, useful for knowledge
validation tasks, and, for the most part, well-balanced with respect to
complexity
●
No direct relationship between the dimensions of a graph and the
number of discovered constraints. This makes it difficult to apply a
rule of thumb to the support and confidence values
●
Scalability remains a practical challenge, but is partly alleviated by
our pruning and optimization strategies, and by parallelizing the task
●
Analysis of our algorithm’s time complexity fell out of the current
scope, and should be investigated in future work
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 29 / 29
Thank You
●
Slides available at tinyurl.com/yyzr5876
●
Code available at gitlab.com/wxwilcke/cckg
●
Data available at gitlab.com/wxwilcke/mmkg

Más contenido relacionado

Similar a Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs

Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality AssurancePéter Király
 
Why ∆Q is the ideal network metric
Why ∆Q is the ideal network metricWhy ∆Q is the ideal network metric
Why ∆Q is the ideal network metricMartin Geddes
 
Modelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoModelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoAlessandra Bagnato
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataMarco Torchiano
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortPéter Király
 
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...OECD Governance
 
Portsmouth University Presentation
Portsmouth University PresentationPortsmouth University Presentation
Portsmouth University PresentationStavros Thomas
 
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Jean Vanderdonckt
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...Deltares
 
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Michael Dorner
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsPéter Király
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 

Similar a Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs (20)

Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Introduction to Metrology
Introduction to Metrology Introduction to Metrology
Introduction to Metrology
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Metadata Quality Assurance
Metadata Quality AssuranceMetadata Quality Assurance
Metadata Quality Assurance
 
Why ∆Q is the ideal network metric
Why ∆Q is the ideal network metricWhy ∆Q is the ideal network metric
Why ∆Q is the ideal network metric
 
Modelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra BagnatoModelsward 2018 Industrial Track - Alessandra Bagnato
Modelsward 2018 Industrial Track - Alessandra Bagnato
 
Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open Data
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
Metadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - shortMetadata quality Assurance Framework at QQML2016 - short
Metadata quality Assurance Framework at QQML2016 - short
 
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
Procurement strategy in major infrastructure: The AS-IS and STEPS - D. Makovš...
 
Portsmouth University Presentation
Portsmouth University PresentationPortsmouth University Presentation
Portsmouth University Presentation
 
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
Re-Engineering Graphical User Interfaces from their Resource Files with UsiRe...
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
 
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
PODS 2013 - Montali - Verification of Relational Data-Centric Dynamic Systems...
 

Último

MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...Annibale Panichella
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesjyothisaisri
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdfDebdattaGhosh6
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Sérgio Sacani
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
 
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...TALAPATI ARUNA CHENNA VYDYANAD
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyanmuralinath2
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfpablovgd
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent Universitypablovgd
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsSérgio Sacani
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)Areesha Ahmad
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Sérgio Sacani
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxjayabahari688
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureSérgio Sacani
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed systemADB online India
 

Último (20)

MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 

Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs

  • 1. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs Xander Wilcke1 , Maurice de Kleijn2 , Victor de Boer1 , Henk Scholten2 , Frank van Harmelen1 1. Dept. of Computer Science 2. Dept. of Spatial Economics Vrije Universiteit Amsterdam, The Netherlands KDIR 2020
  • 2. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 2 / 29 Overview 1. Quality control - why context matters 2. Defining context-aware constraints 3. Discovering context-aware constraints 4. A two-fold evaluation, from  an algorithmic perspective, and  a user perspective
  • 3. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 3 / 29 Knowledge as a graph ● Knowledge Graphs are getting increasingly adopted – Institutes, museums, tech giants, businesses, … – Knowledge quality is no longer optional How to maintain the quality of the knowledge across its entire life cycle?
  • 4. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 4 / 29 Quality Control ● A key component is the quality constraint – Helps guard the consistency, accuracy, precision, etc. – Constraint languages for knowledge graphs ● SHACL ● ShEx Figure from “Jose E. Labra Gayo et al. (2018) Validating RDF Data, Synthesis Lectures on the Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328”
  • 5. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 5 / 29 Quality Control for Knowledge Graphs ● Existing constraint languages work on the schema level e.g. – All nodes of a certain type: – All source / destination nodes of a certain relation: Bridge type material ??
  • 6. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 6 / 29 Quality Control for Knowledge Graphs ● Existing constraint languages work on the schema level e.g. – All nodes of a certain type: – All source / destination nodes of a certain relation: Bridge type Knowledge graphs allow for context-level constraints material ??
  • 7. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 7 / 29 Contextual Clusters Context Unaware Context Aware
  • 8. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 8 / 29 Contextual Clusters Context Unaware Context Aware
  • 9. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 9 / 29 Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway Example Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 10. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 10 / 29 Example Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 11. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 11 / 29 Contributions 1. We introduce context-aware constraints, which ● offer a more fine-grained control of the domains onto which to impose restrictions ● apply to domains defined by graph motifs (contextual pattern) ● allow for multimodal pattern fragments (numbers, dates, texts, ...) 2. We also introduce a (embarrassingly parallel) bottom-up anytime algorithm to discover context-aware constraints in heterogeneous knowledge graphs 3. We evaluate 1 and 2 in a user study with experts in a real-world knowledge validation use case
  • 12. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 12 / 29 Knowledge Graphs ● Graph-shaped knowledge bases ● Assertions are encoded as edges between nodes ● Nodes can be – Entities: things, concepts, etc. – Literals: strings, numbers, dates, etc. ● Contexts gives entities their meaning Bridge material crosses salinity type River "0.05" type Steel WMA max_load material function Road type "21.5" Highway
  • 13. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 13 / 29 Defining Context-aware Constraints A context-aware constraint states that every entity which satisfies antecedent must also satisfy consequent here, and assertion patterns Assertion pattern states that there exists a relation between any sets of nodes that match pattern variables and
  • 14. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 14 / 29 Defining Context-aware Constraints Pattern variables can express ● Any specific node (entity or literal) / ● All entities of a type t (object-type) ● All literals of a datatype dt (data-type) ● All literals which match a regular expression s (value-type) Assertion pattern states that there exists a relation between any sets of nodes that match pattern variables and
  • 15. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 15 / 29 Examples
  • 16. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 16 / 29 Discovering Context-aware Constraints ● Algorithm properties – Bottom-up: learns constraints directly from any knowledge graph – Anytime property: longer runs yield constraints with more fine-grained domains – Embarrassingly parallel: newly discovered constraints form a new branch of which the children can be computed independently ● Algorithm assumptions 1) The large majority of the knowledge is valid and accurate, and 2) that these two qualities can be captured using frequent pattern mining
  • 17. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 17 / 29 Discovering Context-aware Constraints Main components ● The generation forest stores constraints in generation trees, and keeps track of process per depth ● The explore-extend loop explores and tests increasingly-more complex constraints
  • 18. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 18 / 29 Discovering Context-aware Constraints 1. Generate all constraints of depth 0 that exceed minimal support and confidence: 1) 2) 3) 4) 5) 6) 7)
  • 19. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 19 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 20. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 20 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 21. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 21 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 22. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 22 / 29 Discovering Context-aware Constraints 2. For all constraints of depth : Test all diagonal combinations of candidate endpoints and extensions, and add to depth if they meet the minimal req.
  • 23. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 23 / 29 Discovering Context-aware Constraints Graph Perspective
  • 24. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 24 / 29 Experiments & Evaluation ● Algorithmic perspective – Goal: to determine the trade-off between chosen support and confidence, and the contraints they yield – Form: grid search on 3 distinctly-different datasets with support and confidence as parameters ● User perspective – Goal: to assess the effectiveness of our method to discover constraints that are useful for quality control – Form: a structured user evaluation with knowledge-management experts in a real-world knowledge validation use case
  • 25. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 25 / 29 Evaluation – Algorithmic Perspective ● Strong positive correlation between the number of discovered constraints and the chosen support and confidence values (Table 3) ● Number of relations (cf. dataset size) is likely the main attributor to the number of discovered constraints ● Possitive correlation between number of pruned and discovered constraints suggests that the pruning strategy is, to an extent, effective (Table 3)
  • 26. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 26 / 29 Evaluation – User Perspective ● Structured User Evaluation – Workshop hosted at Rijkswaterstaat, The Netherlands – Domain of asset management and civil engineering – 21 participants, all experts on knowledge maintenance and validation – Asked to assess constraints on usefullness and graininess. Constraints are divided into 3x4 groups of increasing complexity (unbeknownst to participants) Rijkswaterstaat, the Netherlands
  • 27. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 27 / 29 Evaluation – User Perspective ● More than half of the participants thought the complexity of the discovered constraints was well balanced (Tabel 8) ● There is little difference in scores between the three complexity groups, suggesting no interaction or too little difference between groups (Tabel 9) ● Overall fair to moderate agreement on usefullness between participants, but significant differences in agreements between complexity groups (Tabel 9) ● Neutral to agreeable stance with respect to the overall usefulness of our method, but a considerable portion was unsure, likely due to lack of familiarity with the domain (Tabel 10)
  • 28. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 28 / 29 Conclusion ● Context-aware constraints are, to an extent, useful for knowledge validation tasks, and, for the most part, well-balanced with respect to complexity ● No direct relationship between the dimensions of a graph and the number of discovered constraints. This makes it difficult to apply a rule of thumb to the support and confidence values ● Scalability remains a practical challenge, but is partly alleviated by our pruning and optimization strategies, and by parallelizing the task ● Analysis of our algorithm’s time complexity fell out of the current scope, and should be investigated in future work
  • 29. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 29 / 29 Thank You ● Slides available at tinyurl.com/yyzr5876 ● Code available at gitlab.com/wxwilcke/cckg ● Data available at gitlab.com/wxwilcke/mmkg