SlideShare a Scribd company logo
1 of 20
www.moving-project.eu
TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation
Till Blume and Ansgar Scherp
ZBW – Leibniz Information Centre for Economics
Christian-Albrechts-Universitat zu Kiel
Towards Flexible Indices for
Distributed Graph Data:
The Formal Schema-level Index Model FLuID
May 23rd, 2018, 30th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken),
22.05.2018 - 25.05.2018, Wuppertal, Germany.
www.moving-project.eu
2 of 17
Why use a Schema-level Index?
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
?!
I want more
metadata!
Where to
get it from? …
2
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
Problem:
• We are looking for a specific kind of metadata, e.g., about books.
• We do not know in which databases we can find such metadata.
• We need an index that can be queried to find matching databases.
Solution:
• A schema-level index (SLI) summarizes data by storing information of how the data is
modelled in a specific database.
• We formulate a structural query to find matching databases.
www.moving-project.eu
3 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
…
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
MOVING
platform
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
2
MOVING search scenario:
• The MOVING platform1 provides a search for bibliographic resources
• We harvest bibliographic metadata using different SLIs
• Such metadata is of great value since
• We can obtain good search results solely relying on the title [3].
• We can complement existing metadata.
• We can train machine learning models to further improve the search [4].
1http://platform.moving-project.eu
3
www.moving-project.eu
4 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
…
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
MOVING
platform
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
2
MOVING search scenario:
• Which SLIs are best suited to find bibliographic metadata in the Web of Data?
• Can we find semantically similar databases as well?
Proceedings of the …
Benjamin Elizalde
foaf:Agent
URI-9
URI-8
bibo:Proceedings
dct:subject
URI-6
3
www.moving-project.eu
5 of 17
• All schema-level indices (SLI) summarize data differently, for different
purposes, and lack a common formalization [1,2,5,7-11], for example:
• Consider incoming and outgoing properties (edges)
• Consider properties (edge label) and objects (target node)
• Consider types
• Consider types and properties
• …
• Without a common ground, it is difficult to develop new indices and compare
them to existing ones.
• Even for a single application scenario, a single SLI may not be sufficient since
how the data is modelled can vary a lot [6].
Motivation for FLuID
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
6 of 17
Approach
• Abstract from the Related Work (Bottom-up): Find generic, simple patterns in
existing SLIs and use them as basic building blocks to define all (complex)
schema structures that exist in previous SLIs.
• MOVING search scenario (Top-down): Flexible define indices that can reflect
semantic information and can be efficiently computed.
Solution
1. We formalized our building blocks using equivalence relations over directed
edge labeled multigraph (RDF graph).
2. We demonstrated how to model existing works and beyond.
3. We showed the scalability by conducting a complexity analysis.
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
7 of 17
• FLuID provides 7 schema elements:
• 3 simple elements: Object Cluster (OC), Property Cluster (PC), and Property-
Object Cluster (POC)
• 3 undirected elements: u-OC, u-PC, and u-POC
• 1 Complex Schema Element (CSE)
• FLuID provides 4 parameterizations:
• Label parameterization
• Chaining parameterization
• Ontology paramaterization
• Instance parameterization
• In total, FLuID provides 11 building blocks sufficient to model all
existing approaches and beyond.
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
8 of 17
• Instances: edges <s,p,o> with same subject node s, i.e.,
((i1, p1, o1), (i2, p2, o2)) ∈ I ⇔ i1 = i2.
• Edges belong to exactly 1 instance, nodes not necessarily
• Since instances partition the data graph, a set of instances also partitions the
data graph.
FLuID: Equivalence Relation Approach
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
9 of 17
• Object Cluster: summarize instances that share a set of connected objects, i.e.,
([i1]I , [i2]I ) ∈ OC ⇔ ∀(i1, p1, o1)∃(i2, p2, o2) : o1 = o2 ∧
∀(i2, p2, o2) ∃(i1, p1, o1) : o1 = o2
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
10 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is p1
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
11 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
rdf:type
p2
rdf:type
p3
p2
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
12 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
• Ontology paramaterization: RDFS Schema Graph
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
rdf:type
p2
rdf:type
p3
p2
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
13 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
• Ontology paramaterization: RDFS Schema Graph
• Instance parameterization: owl:sameAs
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
dct:creator
rdf:type
dct:creator
rdf:type
owl:sameAs
dct:creator
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
14 of 17
A Semantic Schema-level Index
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
Index
1foaf:Agent
dct:subject
bibo:Book
dct:creator …
2
Proceedings of the …
Benjamin Elizalde
foaf:Agent
URI-9 URI-8
bibo:Proceedings
dct:subject
URI-6
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
Family planning programmes in Africa
dct:creator
Pierre Prader
URI-0
bibo:Book
dct:subject
URI-3 URI-4 URI-5
owl:sameAs
Pierre Prader
URI-5
foaf:Agent
www.moving-project.eu
15 of 17
• Complexity Analysis
• We show that every SLI modeled with FLuID can be computed in O(n).
• Threat: The on-the-fly inferencing! If there was a linear dependency of RDFS
triples and dataset size, we would have quadratic complexity.
• Empirical Evaluation to estimate impact of inferencing
• We analyzed two real-world datasets from the Web of Data.
• TimBL-11M: 11 million triples (edges) crawled from one seed URI.
• DyLDO-127M: 127 million triples (edges) crawled from 95,000 seed URIs.
• Practical impact of the on-the-fly inferencing: g < 1.001.
• Thus, we did not find a linear dependency but rather a constant factor.
Evaluation
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
16 of 17
• Conclusion
• We have presented the novel, parameterized schema-level index model
FLuID, which is sufficient to express the functionalities of existing SLIs and
beyond.
• We showed that the build-time and space complexity of any SLI developed
with FLuID scales linear with respect to the number of triples indexed.
• Outlook
• Implementing FLuID in a single computation- and query-framework
• https://github.com/t-blume/fluid-framework
• http://lodatio.informatik.uni-kiel.de/
• Qualitatively comparing existing and new approaches.
Conclusion & Outlook
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
17 of 17
Thank you for your attention!
Any questions?
Project consortium and funding agency
MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092
www.moving-project.eu
18 of 17
References
1. F. Benedetti, S. Bergamaschi, and L. Po. Exposing the underlying schema of LOD sources. In Joint IEEE/WIC/ACM WI and
IAT, 2015.
2. M. Ciglan, K. Nørv˚ag, and L. Hluch´y. The SemSets model for ad-hoc semantic list search. In WWW, 2012.
3. L. Galke, F. Mai, A. Schelten, D. Brunsch, A. Scherp: Using titles vs. full-text as source for automated semantic document
annotation. In: K-CAP 2017
4. L. Galke, A. Saleh, A. Scherp: Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information
Retrieval. In: INFORMATIK 2017
5. R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In
VLDB 1997.
6. J. Jett, T. Nurmikko-Fuller, T.W. Cole, K.R. Page, J.S. Downie: Enhancing scholarly use of digital libraries: A comparative
survey and review of bibliographic metadata ontologies. In: JCDL 2016
7. M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - efficient construction of a data catalogue by stream-based
indexing of Linked Data. J. Web Sem., 16:52–58, 2012.
8. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: a database management system for semistructured
data. SIGMOD Record, 26(3):54–66, 1997.
9. T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In
ICDE, 2011.
10. J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the
Linked Open Data cloud. In ESWC, 2016.
11. B. Spahiu, R. Porrini, M. Palmonari, A. Rula, and A. Maurino. ABSTAT: ontology-driven Linked Data summaries with pattern
minimalization. In ESWC Satellite Events, Revised Selected Papers, 2016.
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
19 of 17
Search Engine Prototype: LODatio+
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
http://lodatio.informatik.uni-kiel.de
www.moving-project.eu
20 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
http://platform.moving-project.eu

More Related Content

What's hot

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.tomasknap
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
 
SPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergySPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergyYannis Kalfoglou
 
A Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesA Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesGBLxAPI
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Publishing metadata provenance
Publishing metadata provenancePublishing metadata provenance
Publishing metadata provenanceJana Hentschke
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAMFlink Forward
 
The LINQ Between XML and Database
The LINQ Between XML and DatabaseThe LINQ Between XML and Database
The LINQ Between XML and DatabaseIRJET Journal
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Sergio Fernández
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 
Geant4 Model Testing Framework: From PAW to ROOT
Geant4 Model Testing Framework:  From PAW to ROOTGeant4 Model Testing Framework:  From PAW to ROOT
Geant4 Model Testing Framework: From PAW to ROOTRoman Atachiants
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
 
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projectsArtificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projectsVictor Sanchez Anguix
 

What's hot (20)

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
SPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergySPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergy
 
A Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesA Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning Games
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Publishing metadata provenance
Publishing metadata provenancePublishing metadata provenance
Publishing metadata provenance
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 
The LINQ Between XML and Database
The LINQ Between XML and DatabaseThe LINQ Between XML and Database
The LINQ Between XML and Database
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
Geant4 Model Testing Framework: From PAW to ROOT
Geant4 Model Testing Framework:  From PAW to ROOTGeant4 Model Testing Framework:  From PAW to ROOT
Geant4 Model Testing Framework: From PAW to ROOT
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projectsArtificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projects
 

Similar to Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID

The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...Till Blume
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...Simon Price
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays
 
GraphChain
GraphChainGraphChain
GraphChainsopekmir
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3IPLODProject
 
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...Semantic-guided Communication & Composition in a Widget/Dashboard Environment...
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...Fink & Partner Media Services GmbH
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Visual Querying LOD sources with LODeX
 Visual Querying LOD sources with LODeX Visual Querying LOD sources with LODeX
Visual Querying LOD sources with LODeXFabio Benedetti
 

Similar to Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID (20)

The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
 
GraphChain
GraphChainGraphChain
GraphChain
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3
 
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...Semantic-guided Communication & Composition in a Widget/Dashboard Environment...
Semantic-guided Communication & Composition in a Widget/Dashboard Environment...
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Visual Querying LOD sources with LODeX
 Visual Querying LOD sources with LODeX Visual Querying LOD sources with LODeX
Visual Querying LOD sources with LODeX
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
swib15 ALIADA
swib15 ALIADAswib15 ALIADA
swib15 ALIADA
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 

Recently uploaded

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 

Recently uploaded (20)

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 

Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID

  • 1. www.moving-project.eu TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation Till Blume and Ansgar Scherp ZBW – Leibniz Information Centre for Economics Christian-Albrechts-Universitat zu Kiel Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID May 23rd, 2018, 30th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), 22.05.2018 - 25.05.2018, Wuppertal, Germany.
  • 2. www.moving-project.eu 2 of 17 Why use a Schema-level Index? Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID Index 1 foaf:Agent dct:subject bibo:Book dct:creator ?! I want more metadata! Where to get it from? … 2 Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 Problem: • We are looking for a specific kind of metadata, e.g., about books. • We do not know in which databases we can find such metadata. • We need an index that can be queried to find matching databases. Solution: • A schema-level index (SLI) summarizes data by storing information of how the data is modelled in a specific database. • We formulate a structural query to find matching databases.
  • 3. www.moving-project.eu 3 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID … Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 MOVING platform Index 1 foaf:Agent dct:subject bibo:Book dct:creator 2 MOVING search scenario: • The MOVING platform1 provides a search for bibliographic resources • We harvest bibliographic metadata using different SLIs • Such metadata is of great value since • We can obtain good search results solely relying on the title [3]. • We can complement existing metadata. • We can train machine learning models to further improve the search [4]. 1http://platform.moving-project.eu 3
  • 4. www.moving-project.eu 4 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID … Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 MOVING platform Index 1 foaf:Agent dct:subject bibo:Book dct:creator 2 MOVING search scenario: • Which SLIs are best suited to find bibliographic metadata in the Web of Data? • Can we find semantically similar databases as well? Proceedings of the … Benjamin Elizalde foaf:Agent URI-9 URI-8 bibo:Proceedings dct:subject URI-6 3
  • 5. www.moving-project.eu 5 of 17 • All schema-level indices (SLI) summarize data differently, for different purposes, and lack a common formalization [1,2,5,7-11], for example: • Consider incoming and outgoing properties (edges) • Consider properties (edge label) and objects (target node) • Consider types • Consider types and properties • … • Without a common ground, it is difficult to develop new indices and compare them to existing ones. • Even for a single application scenario, a single SLI may not be sufficient since how the data is modelled can vary a lot [6]. Motivation for FLuID Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 6. www.moving-project.eu 6 of 17 Approach • Abstract from the Related Work (Bottom-up): Find generic, simple patterns in existing SLIs and use them as basic building blocks to define all (complex) schema structures that exist in previous SLIs. • MOVING search scenario (Top-down): Flexible define indices that can reflect semantic information and can be efficiently computed. Solution 1. We formalized our building blocks using equivalence relations over directed edge labeled multigraph (RDF graph). 2. We demonstrated how to model existing works and beyond. 3. We showed the scalability by conducting a complexity analysis. The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 7. www.moving-project.eu 7 of 17 • FLuID provides 7 schema elements: • 3 simple elements: Object Cluster (OC), Property Cluster (PC), and Property- Object Cluster (POC) • 3 undirected elements: u-OC, u-PC, and u-POC • 1 Complex Schema Element (CSE) • FLuID provides 4 parameterizations: • Label parameterization • Chaining parameterization • Ontology paramaterization • Instance parameterization • In total, FLuID provides 11 building blocks sufficient to model all existing approaches and beyond. The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 8. www.moving-project.eu 8 of 17 • Instances: edges <s,p,o> with same subject node s, i.e., ((i1, p1, o1), (i2, p2, o2)) ∈ I ⇔ i1 = i2. • Edges belong to exactly 1 instance, nodes not necessarily • Since instances partition the data graph, a set of instances also partitions the data graph. FLuID: Equivalence Relation Approach Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 9. www.moving-project.eu 9 of 17 • Object Cluster: summarize instances that share a set of connected objects, i.e., ([i1]I , [i2]I ) ∈ OC ⇔ ∀(i1, p1, o1)∃(i2, p2, o2) : o1 = o2 ∧ ∀(i2, p2, o2) ∃(i1, p1, o1) : o1 = o2 The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 10. www.moving-project.eu 10 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is p1 The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 11. www.moving-project.eu 11 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 rdf:type p2 rdf:type p3 p2 rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 12. www.moving-project.eu 12 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type • Ontology paramaterization: RDFS Schema Graph The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 rdf:type p2 rdf:type p3 p2 rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 13. www.moving-project.eu 13 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type • Ontology paramaterization: RDFS Schema Graph • Instance parameterization: owl:sameAs The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 dct:creator rdf:type dct:creator rdf:type owl:sameAs dct:creator rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 14. www.moving-project.eu 14 of 17 A Semantic Schema-level Index Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID Index 1foaf:Agent dct:subject bibo:Book dct:creator … 2 Proceedings of the … Benjamin Elizalde foaf:Agent URI-9 URI-8 bibo:Proceedings dct:subject URI-6 Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 Family planning programmes in Africa dct:creator Pierre Prader URI-0 bibo:Book dct:subject URI-3 URI-4 URI-5 owl:sameAs Pierre Prader URI-5 foaf:Agent
  • 15. www.moving-project.eu 15 of 17 • Complexity Analysis • We show that every SLI modeled with FLuID can be computed in O(n). • Threat: The on-the-fly inferencing! If there was a linear dependency of RDFS triples and dataset size, we would have quadratic complexity. • Empirical Evaluation to estimate impact of inferencing • We analyzed two real-world datasets from the Web of Data. • TimBL-11M: 11 million triples (edges) crawled from one seed URI. • DyLDO-127M: 127 million triples (edges) crawled from 95,000 seed URIs. • Practical impact of the on-the-fly inferencing: g < 1.001. • Thus, we did not find a linear dependency but rather a constant factor. Evaluation Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 16. www.moving-project.eu 16 of 17 • Conclusion • We have presented the novel, parameterized schema-level index model FLuID, which is sufficient to express the functionalities of existing SLIs and beyond. • We showed that the build-time and space complexity of any SLI developed with FLuID scales linear with respect to the number of triples indexed. • Outlook • Implementing FLuID in a single computation- and query-framework • https://github.com/t-blume/fluid-framework • http://lodatio.informatik.uni-kiel.de/ • Qualitatively comparing existing and new approaches. Conclusion & Outlook Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 17. www.moving-project.eu 17 of 17 Thank you for your attention! Any questions? Project consortium and funding agency MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092
  • 18. www.moving-project.eu 18 of 17 References 1. F. Benedetti, S. Bergamaschi, and L. Po. Exposing the underlying schema of LOD sources. In Joint IEEE/WIC/ACM WI and IAT, 2015. 2. M. Ciglan, K. Nørv˚ag, and L. Hluch´y. The SemSets model for ad-hoc semantic list search. In WWW, 2012. 3. L. Galke, F. Mai, A. Schelten, D. Brunsch, A. Scherp: Using titles vs. full-text as source for automated semantic document annotation. In: K-CAP 2017 4. L. Galke, A. Saleh, A. Scherp: Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information Retrieval. In: INFORMATIK 2017 5. R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In VLDB 1997. 6. J. Jett, T. Nurmikko-Fuller, T.W. Cole, K.R. Page, J.S. Downie: Enhancing scholarly use of digital libraries: A comparative survey and review of bibliographic metadata ontologies. In: JCDL 2016 7. M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - efficient construction of a data catalogue by stream-based indexing of Linked Data. J. Web Sem., 16:52–58, 2012. 8. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: a database management system for semistructured data. SIGMOD Record, 26(3):54–66, 1997. 9. T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In ICDE, 2011. 10. J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the Linked Open Data cloud. In ESWC, 2016. 11. B. Spahiu, R. Porrini, M. Palmonari, A. Rula, and A. Maurino. ABSTAT: ontology-driven Linked Data summaries with pattern minimalization. In ESWC Satellite Events, Revised Selected Papers, 2016. Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 19. www.moving-project.eu 19 of 17 Search Engine Prototype: LODatio+ Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID http://lodatio.informatik.uni-kiel.de
  • 20. www.moving-project.eu 20 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID http://platform.moving-project.eu

Editor's Notes

  1. Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  2. Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  3. Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  4. Colors indicate partitions on the data graph
  5. P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  6. P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  7. P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  8. P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  9. P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  10. Build time is important for the computation of the index Index size influences the query time
  11. Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!