As knowledge graphs are getting increasingly adopted, the question of how to maintain the validity and accuracy of our knowledge becomes ever more relevant. We introduce context-aware constraints as a means to help preserve knowledge integrity. Context-aware constraints offer a more fine-grained control of the domain onto which we impose restrictions. We also introduce a bottom-up anytime algorithm to discover context-aware constraint directly from heterogeneous knowledge graphs---graphs made up from entities and literals of various (data) types which are linked using various relations. Our method is embarrassingly parallel and can exploit prior knowledge in the form of schemas to reduce computation time. We demonstrate our method on three different datasets and evaluate its effectiveness by letting experts on knowledge validation and management assess candidate constraints in a real-world knowledge validation use case. Our results show that overall, context-aware constraints are to an extent useful for knowledge validation tasks, and that the majority of the generated constraints are well balanced with respect to complexity. These slides were presented at KDIR 2020.
Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs
1. Bottom-up Discovery of Context-aware Quality Constraints
for Heterogeneous Knowledge Graphs
Xander Wilcke1
, Maurice de Kleijn2
, Victor de Boer1
,
Henk Scholten2
, Frank van Harmelen1
1. Dept. of Computer Science
2. Dept. of Spatial Economics
Vrije Universiteit Amsterdam, The Netherlands
KDIR 2020
2. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 2 / 29
Overview
1. Quality control - why context matters
2. Defining context-aware constraints
3. Discovering context-aware constraints
4. A two-fold evaluation, from
an algorithmic perspective, and
a user perspective
3. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 3 / 29
Knowledge as a graph
●
Knowledge Graphs are getting increasingly adopted
– Institutes, museums, tech giants, businesses, …
– Knowledge quality is no longer optional
How to maintain the quality of the knowledge across its entire life cycle?
4. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 4 / 29
Quality Control
●
A key component is the quality constraint
– Helps guard the consistency, accuracy, precision, etc.
– Constraint languages for knowledge graphs
●
SHACL
●
ShEx
Figure from “Jose E. Labra Gayo et al. (2018) Validating RDF Data, Synthesis Lectures on the
Semantic Web: Theory and Technology, Vol. 7, No. 1, 1-328”
5. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 5 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
material
??
6. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 6 / 29
Quality Control for Knowledge Graphs
●
Existing constraint languages work on the schema level
e.g.
– All nodes of a certain type:
– All source / destination nodes of a certain relation:
Bridge
type
Knowledge graphs allow for context-level constraints
material
??
9. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 9 / 29
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
10. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 10 / 29
Example
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
11. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 11 / 29
Contributions
1. We introduce context-aware constraints, which
●
offer a more fine-grained control of the domains onto which to impose
restrictions
●
apply to domains defined by graph motifs (contextual pattern)
●
allow for multimodal pattern fragments (numbers, dates, texts, ...)
2. We also introduce a (embarrassingly parallel) bottom-up anytime algorithm to
discover context-aware constraints in heterogeneous knowledge graphs
3. We evaluate 1 and 2 in a user study with experts in a real-world knowledge
validation use case
12. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 12 / 29
Knowledge Graphs
●
Graph-shaped knowledge bases
●
Assertions are encoded as edges between nodes
●
Nodes can be
– Entities: things, concepts, etc.
– Literals: strings, numbers, dates, etc.
●
Contexts gives entities their meaning
Bridge
material
crosses salinity
type
River
"0.05"
type
Steel
WMA
max_load
material
function
Road
type
"21.5"
Highway
13. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 13 / 29
Defining Context-aware Constraints
A context-aware constraint states that every entity
which satisfies antecedent must also satisfy consequent
here, and
assertion patterns
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
14. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 14 / 29
Defining Context-aware Constraints
Pattern variables can express
●
Any specific node (entity or literal) /
●
All entities of a type t (object-type)
●
All literals of a datatype dt (data-type)
●
All literals which match a
regular expression s (value-type)
Assertion pattern states that there exists
a relation between any sets of nodes that match
pattern variables and
15. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 15 / 29
Examples
16. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 16 / 29
Discovering Context-aware Constraints
●
Algorithm properties
– Bottom-up: learns constraints directly from any knowledge graph
– Anytime property: longer runs yield constraints with more fine-grained
domains
– Embarrassingly parallel: newly discovered constraints form a new branch of
which the children can be computed independently
●
Algorithm assumptions
1) The large majority of the knowledge is valid and accurate, and
2) that these two qualities can be captured using frequent pattern mining
17. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 17 / 29
Discovering Context-aware Constraints
Main components
●
The generation forest
stores constraints in generation trees,
and keeps track of process per depth
●
The explore-extend loop
explores and tests increasingly-more
complex constraints
18. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 18 / 29
Discovering Context-aware Constraints
1. Generate all constraints of depth 0 that
exceed minimal support and confidence:
1)
2)
3)
4)
5)
6)
7)
19. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 19 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
20. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 20 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
21. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 21 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
22. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 22 / 29
Discovering Context-aware Constraints
2. For all constraints of depth :
Test all diagonal combinations of candidate
endpoints and extensions, and add to
depth if they meet the minimal req.
24. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 24 / 29
Experiments & Evaluation
●
Algorithmic perspective
– Goal: to determine the trade-off between chosen support
and confidence, and the contraints they yield
– Form: grid search on 3 distinctly-different datasets with
support and confidence as parameters
●
User perspective
– Goal: to assess the effectiveness of our method to discover
constraints that are useful for quality control
– Form: a structured user evaluation with knowledge-management
experts in a real-world knowledge validation use case
25. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 25 / 29
Evaluation – Algorithmic Perspective
●
Strong positive correlation between the number of
discovered constraints and the chosen support and
confidence values (Table 3)
●
Number of relations (cf. dataset size) is likely the main
attributor to the number of discovered constraints
●
Possitive correlation between number of pruned and
discovered constraints suggests that the pruning
strategy is, to an extent, effective (Table 3)
26. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 26 / 29
Evaluation – User Perspective
●
Structured User Evaluation
– Workshop hosted at Rijkswaterstaat,
The Netherlands
– Domain of asset management and
civil engineering
– 21 participants, all experts on
knowledge maintenance and validation
– Asked to assess constraints on
usefullness and graininess. Constraints
are divided into 3x4 groups of increasing
complexity (unbeknownst to participants)
Rijkswaterstaat, the Netherlands
27. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 27 / 29
Evaluation – User Perspective
●
More than half of the participants thought the
complexity of the discovered constraints was well
balanced (Tabel 8)
●
There is little difference in scores between the three
complexity groups, suggesting no interaction or
too little difference between groups (Tabel 9)
●
Overall fair to moderate agreement on usefullness
between participants, but significant differences in
agreements between complexity groups (Tabel 9)
●
Neutral to agreeable stance with respect to the overall
usefulness of our method, but a considerable portion
was unsure, likely due to lack of familiarity with the
domain (Tabel 10)
28. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 28 / 29
Conclusion
●
Context-aware constraints are, to an extent, useful for knowledge
validation tasks, and, for the most part, well-balanced with respect to
complexity
●
No direct relationship between the dimensions of a graph and the
number of discovered constraints. This makes it difficult to apply a
rule of thumb to the support and confidence values
●
Scalability remains a practical challenge, but is partly alleviated by
our pruning and optimization strategies, and by parallelizing the task
●
Analysis of our algorithm’s time complexity fell out of the current
scope, and should be investigated in future work
29. Bottom-up Discovery of Context-aware Quality Constraints for Heterogeneous Knowledge Graphs | KDIR 2020 29 / 29
Thank You
●
Slides available at tinyurl.com/yyzr5876
●
Code available at gitlab.com/wxwilcke/cckg
●
Data available at gitlab.com/wxwilcke/mmkg