SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
A Workload-Aware Middleware for Storing
Massive RDF Graphs into NoSQL Databases
Exame de Qualificação de Doutorado
Luiz Henrique Zambom Santana
Prof. Dr. Ronaldo dos Santos Mello
orientador
UFSC/CTC/INE/PPGCC
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
Introduction: Motivation
● RDF is currently widespread:
○ Best buy:
■ http://www.nytimes.com/external/readwriteweb/2010/07/01/01readwriteweb-how-best-bu
y-is-using-the-semantic-web-23031.html
○ Globo.com:
■ https://www.slideshare.net/icaromedeiros/apresantacao-ufrj-icaro2013
○ US data.gov:
■ https://www.data.gov/developers/semantic-web
Introduction: Motivation (LOD stats)
Introduction: Objectives
This PhD Thesis proposal presents Rendezvous, a
middleware for storing massing RDF graphs. This
middleware includes a novel data partitioning approach, a
fragmentation strategy that maps pieces of this RDF graph
into NoSQL databases with different data models, and a
caching structure that accelerate the querying response.
Introduction: Contributions
● (i) a mapping of RDF data to the columnar, document, and key/value NoSQL
data models (SADALAGE; FOWLER, 2012)
● (ii) a workload-aware partitioner based on the current graph structure and,
mainly, on the typical application workload
● (iii) a caching schema based on key/value databases for speeding up the
query response time
● (iv) an experimental evaluation that compares the current version of our
approach against two baselines (Rainbow (GU; HU; HUANG, 2015) and
ScalaRDF (HU et al., )) by considering Redis, Apache Cassandra and
MongoDB, the most popular key/value, columnar and document NoSQL
databases, respectively
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
Background: RDF and SPARQL
Background: NoSQL
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
State of the Art - No NoSQL Triplestores
WARP (h-hop replication), YARS, Hexastore
(multiple indexes), 4store, SPIDER, RDF-3X,
SHARD, SW-Store (vertical partition), SOLID,
SPOVC (horizontal partition), and S2X
State of the Art - NoSQL Triplestores
State of the Art - NoSQL Triplestores
RDFJoin, RDFKB, Jena+HBase, Hive+HBase, CumulusRDF,
Rya, Stratustore, MAPSIN, H2RDF, AMADA, Trinity.RDF,
H2RDF+, MonetDBRDF, xR2RML, W3C RDF/JSON,
Rainbow, Sempala, PrestoRDF, RDFChain, Tomaszuk,
Bouhali, and Laurent, Papailiou et al., and, ScalaRDF.
State of the Art - Categories
● RDF/NoSQL Converters
● Polystores/Multimodel
● In-memory
Rainbow (GU; HU; HUANG, 2015)
Amada (Aranda-Andújar, 2012)
State of the Art
● BUGIOTTI, F. et al. Invisible glue:
scalable self-tuning multi-stores. In:
Conference on Innovative Data
Systems Research (CIDR). [S.l.:
s.n.], 2015.
State of the Art - Open Issues
● To avoid indexing all the triple component permutations
● To consider workload and the usage of statistics for data partitioning
● To exploit in-memory possibilities
● To combine RDF storage with multiple NoSQL models
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
Rendezvous: Architecture
Rendezvous: Storing
● Fragmentation
● Indexing
● Partitioning
● Mapping
Storing: Fragmentation
Storing: Fragmentation
Storing: Fragmentation
Storing: Fragmentation
Storing: Fragmentation
Storing: Indexing
Storing: Indexing - Simple queries and fragmentation
Storing: Partitioning
● The partition is used when the dataset is bigger than each server capabilities
Storing: Partitioning
Storing: Partitioning
Rendezvous: Querying
● Query decomposition
● Caching
Querying: Decomposition
Querying: Decomposition
Q1: SELECT ?x WHERE {x? p5 y? . y? p2 z? .}
Q2: SELECT ?x WHERE {x? p9 y? . M p10 y? .}
D1: db.partition1.find({p5:{$exists:true},
p2:{$exists:true}}})
D2: db.partition1.find({p9:{$exists:true}, subject:M}})
Querying: Decomposition
Q3: SELECT ?x WHERE { x? p1 y? . y? p2 z? . z? p3 w? . }
C1: SELECT S1,O1 FROM
p1 SELECT S2,O2 FROM
p2 WHERE O=S1 SELECT S3,O3 FROM
p3 WHERE O=S2 AND S=D (C1)
Q4: SELECT ?x WHERE {x? p2 y? .y? p3 z? .x? p5 w? .w? p9 k? .L p11 k?.}
SQ5: SELECT ?x WHERE {x? p2 y?. y? p3 z?. x? p5 w?.}
SQ6: SELECT ?x WHERE {x? p5 w?. w? p9 k?. L p11 k?.}
Querying: Decomposition
Q5: SELECT ?x WHERE { x? p1 y? . y? p2 z? . z? p3 w? . z? p5 ?k . k? p6 G . k?
p7 I . k? p8 H }
P1: {k? p6 G . k? p7 I . k? p8 H }
P2: {x? p1 y? . y? p2 z? . z? p3 w?}
P3: { z? p5 ?k}
Querying: Caching
Querying: Caching (chain queries)
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
Evaluation
● LUBM: ontology for the University domain, synthetic RDF data scalable to any
size, and 14 extensional queries representing a variety of properties
● Generated dataset with 4000 universities (around 100 GB and contains
around 500 million triples)
● 12 queries with joins, all of them have at least one subject-subject join, and
six of them also have at least one subject-object join
● Apache Jena version 3.2.0 with Java 1.8, and we use Redis 3.2, MongoDB
3.4.3, and Apache Cassandra 3.10
● Amazon m3.xlarge spot with 7.5 GB of memory and 1 x 32 SSD capacity
Evaluation: Rendezvous vs. Rainbow
Evaluation: Rendezvous vs. ScalaRDF
Evaluation: Conclusions
● Fragments are scalable
● Bigger boundaries are not necessarily related to bigger
storage size
● Graph-aware partitions are better than NoSQL partitions
● Near cache is fast but it makes more difficult to keep data
consistency
Evaluation: Future Work
● Compression of triples during the storage
● Update and delete operations
● Other NoSQL types (e.g., graph)
● Better datasets
Agenda
● Introduction: Motivation, objectives, and contributions
● Background
○ RDF
○ NoSQL
● State of the Art
○ Open Issues
● Rendezvous
○ Storing: Fragmentation, Indexing, Partitioning, and Mapping
○ Querying: Query decomposition and Caching
● Evaluation
● Schedule
Schedule
● Middleware development (continuously until 2018)
○ Compression
○ Graph database
○ More complex and abstract workload awareness
● Submission of papers (continuously until 2018)
○ Special Interest Group On Management of Data (SIGMOD)
○ Very Large Databases (VLDB)
○ IEEE Transactions on Knowledge and Data Engineering (TKDE)
● Defense of the PhD thesis (2019)
LUBM model

Más contenido relacionado

La actualidad más candente

Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of Interest
Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of InterestWherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of Interest
Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of InterestWhereCampBerlin
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding FormJakob .
 
Semantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care AnalyticsSemantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care AnalyticsRobert Piro
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopGoDataDriven
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Olaf Hartig
 
DrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.orgDrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.orgscorlosquet
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureDatabricks
 
musiclopedia presentation
musiclopedia presentationmusiclopedia presentation
musiclopedia presentationPablo Recabal
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 
Semantic Web introduction
Semantic Web introductionSemantic Web introduction
Semantic Web introductionGraphity
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationINRIA-OAK
 

La actualidad más candente (18)

Getting triples from records: the role of ISBD
Getting triples from records: the role of ISBDGetting triples from records: the role of ISBD
Getting triples from records: the role of ISBD
 
Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of Interest
Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of InterestWherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of Interest
Wherecamp Navigation Conference 2015 - SPOI SDI4pps: Points of Interest
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Semantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care AnalyticsSemantic Web Technologies in Health Care Analytics
Semantic Web Technologies in Health Care Analytics
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
DrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.orgDrupalCamp NJ 2014 Solr and Schema.org
DrupalCamp NJ 2014 Solr and Schema.org
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
musiclopedia presentation
musiclopedia presentationmusiclopedia presentation
musiclopedia presentation
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Semantic Web introduction
Semantic Web introductionSemantic Web introduction
Semantic Web introduction
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 

Similar a A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases

Graph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraGraph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraRavindra Ranwala
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache SparkDatabricks
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)Farzin Bagheri
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...BigData_Europe
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Hajira Jabeen
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012scorlosquet
 

Similar a A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases (20)

Graph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraGraph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandra
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
Ceph Research at UCSC
Ceph Research at UCSCCeph Research at UCSC
Ceph Research at UCSC
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 

Más de Luiz Henrique Zambom Santana

Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...Luiz Henrique Zambom Santana
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkLuiz Henrique Zambom Santana
 
De Arquiteto para Gerente: como debugar uma equipe
De Arquiteto para Gerente: como debugar uma equipeDe Arquiteto para Gerente: como debugar uma equipe
De Arquiteto para Gerente: como debugar uma equipeLuiz Henrique Zambom Santana
 
VoltDB: as vantagens e os desafios dos banco de dados NewSQL
VoltDB: as vantagens e os desafios dos banco de dados NewSQLVoltDB: as vantagens e os desafios dos banco de dados NewSQL
VoltDB: as vantagens e os desafios dos banco de dados NewSQLLuiz Henrique Zambom Santana
 
Uma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
Uma visão sobre Fast-Data: Spark, VoltDB e ElasticsearchUma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
Uma visão sobre Fast-Data: Spark, VoltDB e ElasticsearchLuiz Henrique Zambom Santana
 
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...Luiz Henrique Zambom Santana
 
Novidades do elasticsearch 2.0 e como usá-lo com PHP
Novidades do elasticsearch 2.0 e como usá-lo com PHPNovidades do elasticsearch 2.0 e como usá-lo com PHP
Novidades do elasticsearch 2.0 e como usá-lo com PHPLuiz Henrique Zambom Santana
 
Design of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureDesign of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureLuiz Henrique Zambom Santana
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesLuiz Henrique Zambom Santana
 

Más de Luiz Henrique Zambom Santana (20)

Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
De Arquiteto para Gerente: como debugar uma equipe
De Arquiteto para Gerente: como debugar uma equipeDe Arquiteto para Gerente: como debugar uma equipe
De Arquiteto para Gerente: como debugar uma equipe
 
VoltDB: as vantagens e os desafios dos banco de dados NewSQL
VoltDB: as vantagens e os desafios dos banco de dados NewSQLVoltDB: as vantagens e os desafios dos banco de dados NewSQL
VoltDB: as vantagens e os desafios dos banco de dados NewSQL
 
IBM Watson, Apache Spark ou TensorFlow?
IBM Watson, Apache Spark ou TensorFlow?IBM Watson, Apache Spark ou TensorFlow?
IBM Watson, Apache Spark ou TensorFlow?
 
Uma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
Uma visão sobre Fast-Data: Spark, VoltDB e ElasticsearchUma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
Uma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
 
Banco de dados nas nuvens - aula 3
Banco de dados nas nuvens - aula 3Banco de dados nas nuvens - aula 3
Banco de dados nas nuvens - aula 3
 
Banco de dados nas nuvens - aula 2
Banco de dados nas nuvens - aula 2Banco de dados nas nuvens - aula 2
Banco de dados nas nuvens - aula 2
 
Banco de dados nas nuvens - aula 1
Banco de dados nas nuvens - aula 1Banco de dados nas nuvens - aula 1
Banco de dados nas nuvens - aula 1
 
Normalização
NormalizaçãoNormalização
Normalização
 
SQL Joins
SQL JoinsSQL Joins
SQL Joins
 
Consultas básicas em SQL
Consultas básicas em SQLConsultas básicas em SQL
Consultas básicas em SQL
 
Processamento em Big Data
Processamento em Big DataProcessamento em Big Data
Processamento em Big Data
 
Seminário de Andamento de Doutorado
Seminário de Andamento de DoutoradoSeminário de Andamento de Doutorado
Seminário de Andamento de Doutorado
 
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
 
Workshop de ELK - EmergiNet
Workshop de ELK - EmergiNetWorkshop de ELK - EmergiNet
Workshop de ELK - EmergiNet
 
Novidades do elasticsearch 2.0 e como usá-lo com PHP
Novidades do elasticsearch 2.0 e como usá-lo com PHPNovidades do elasticsearch 2.0 e como usá-lo com PHP
Novidades do elasticsearch 2.0 e como usá-lo com PHP
 
Design of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureDesign of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore Architecture
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL Repositories
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases

  • 1. A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases Exame de Qualificação de Doutorado Luiz Henrique Zambom Santana Prof. Dr. Ronaldo dos Santos Mello orientador UFSC/CTC/INE/PPGCC
  • 2. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 3. Introduction: Motivation ● RDF is currently widespread: ○ Best buy: ■ http://www.nytimes.com/external/readwriteweb/2010/07/01/01readwriteweb-how-best-bu y-is-using-the-semantic-web-23031.html ○ Globo.com: ■ https://www.slideshare.net/icaromedeiros/apresantacao-ufrj-icaro2013 ○ US data.gov: ■ https://www.data.gov/developers/semantic-web
  • 5. Introduction: Objectives This PhD Thesis proposal presents Rendezvous, a middleware for storing massing RDF graphs. This middleware includes a novel data partitioning approach, a fragmentation strategy that maps pieces of this RDF graph into NoSQL databases with different data models, and a caching structure that accelerate the querying response.
  • 6. Introduction: Contributions ● (i) a mapping of RDF data to the columnar, document, and key/value NoSQL data models (SADALAGE; FOWLER, 2012) ● (ii) a workload-aware partitioner based on the current graph structure and, mainly, on the typical application workload ● (iii) a caching schema based on key/value databases for speeding up the query response time ● (iv) an experimental evaluation that compares the current version of our approach against two baselines (Rainbow (GU; HU; HUANG, 2015) and ScalaRDF (HU et al., )) by considering Redis, Apache Cassandra and MongoDB, the most popular key/value, columnar and document NoSQL databases, respectively
  • 7. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 10. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 11. State of the Art - No NoSQL Triplestores WARP (h-hop replication), YARS, Hexastore (multiple indexes), 4store, SPIDER, RDF-3X, SHARD, SW-Store (vertical partition), SOLID, SPOVC (horizontal partition), and S2X
  • 12. State of the Art - NoSQL Triplestores
  • 13. State of the Art - NoSQL Triplestores RDFJoin, RDFKB, Jena+HBase, Hive+HBase, CumulusRDF, Rya, Stratustore, MAPSIN, H2RDF, AMADA, Trinity.RDF, H2RDF+, MonetDBRDF, xR2RML, W3C RDF/JSON, Rainbow, Sempala, PrestoRDF, RDFChain, Tomaszuk, Bouhali, and Laurent, Papailiou et al., and, ScalaRDF.
  • 14. State of the Art - Categories ● RDF/NoSQL Converters ● Polystores/Multimodel ● In-memory Rainbow (GU; HU; HUANG, 2015) Amada (Aranda-Andújar, 2012)
  • 15. State of the Art ● BUGIOTTI, F. et al. Invisible glue: scalable self-tuning multi-stores. In: Conference on Innovative Data Systems Research (CIDR). [S.l.: s.n.], 2015.
  • 16. State of the Art - Open Issues ● To avoid indexing all the triple component permutations ● To consider workload and the usage of statistics for data partitioning ● To exploit in-memory possibilities ● To combine RDF storage with multiple NoSQL models
  • 17. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 19. Rendezvous: Storing ● Fragmentation ● Indexing ● Partitioning ● Mapping
  • 26. Storing: Indexing - Simple queries and fragmentation
  • 27. Storing: Partitioning ● The partition is used when the dataset is bigger than each server capabilities
  • 30. Rendezvous: Querying ● Query decomposition ● Caching
  • 32. Querying: Decomposition Q1: SELECT ?x WHERE {x? p5 y? . y? p2 z? .} Q2: SELECT ?x WHERE {x? p9 y? . M p10 y? .} D1: db.partition1.find({p5:{$exists:true}, p2:{$exists:true}}}) D2: db.partition1.find({p9:{$exists:true}, subject:M}})
  • 33. Querying: Decomposition Q3: SELECT ?x WHERE { x? p1 y? . y? p2 z? . z? p3 w? . } C1: SELECT S1,O1 FROM p1 SELECT S2,O2 FROM p2 WHERE O=S1 SELECT S3,O3 FROM p3 WHERE O=S2 AND S=D (C1) Q4: SELECT ?x WHERE {x? p2 y? .y? p3 z? .x? p5 w? .w? p9 k? .L p11 k?.} SQ5: SELECT ?x WHERE {x? p2 y?. y? p3 z?. x? p5 w?.} SQ6: SELECT ?x WHERE {x? p5 w?. w? p9 k?. L p11 k?.}
  • 34. Querying: Decomposition Q5: SELECT ?x WHERE { x? p1 y? . y? p2 z? . z? p3 w? . z? p5 ?k . k? p6 G . k? p7 I . k? p8 H } P1: {k? p6 G . k? p7 I . k? p8 H } P2: {x? p1 y? . y? p2 z? . z? p3 w?} P3: { z? p5 ?k}
  • 37. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 38. Evaluation ● LUBM: ontology for the University domain, synthetic RDF data scalable to any size, and 14 extensional queries representing a variety of properties ● Generated dataset with 4000 universities (around 100 GB and contains around 500 million triples) ● 12 queries with joins, all of them have at least one subject-subject join, and six of them also have at least one subject-object join ● Apache Jena version 3.2.0 with Java 1.8, and we use Redis 3.2, MongoDB 3.4.3, and Apache Cassandra 3.10 ● Amazon m3.xlarge spot with 7.5 GB of memory and 1 x 32 SSD capacity
  • 41. Evaluation: Conclusions ● Fragments are scalable ● Bigger boundaries are not necessarily related to bigger storage size ● Graph-aware partitions are better than NoSQL partitions ● Near cache is fast but it makes more difficult to keep data consistency
  • 42. Evaluation: Future Work ● Compression of triples during the storage ● Update and delete operations ● Other NoSQL types (e.g., graph) ● Better datasets
  • 43. Agenda ● Introduction: Motivation, objectives, and contributions ● Background ○ RDF ○ NoSQL ● State of the Art ○ Open Issues ● Rendezvous ○ Storing: Fragmentation, Indexing, Partitioning, and Mapping ○ Querying: Query decomposition and Caching ● Evaluation ● Schedule
  • 44. Schedule ● Middleware development (continuously until 2018) ○ Compression ○ Graph database ○ More complex and abstract workload awareness ● Submission of papers (continuously until 2018) ○ Special Interest Group On Management of Data (SIGMOD) ○ Very Large Databases (VLDB) ○ IEEE Transactions on Knowledge and Data Engineering (TKDE) ● Defense of the PhD thesis (2019)