SlideShare una empresa de Scribd logo
1 de 36
Jarrar © 2013 1
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Lecture Notes on RDF Stores
Birzeit University, Palestine
2013
RDF Stores
Challenges and Solutions
Jarrar © 2013 2
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2014/01/web-data-management.html
Jarrar © 2013 3
Lecture Outline
Part I: Querying RDF(S P O) tables using SQL
Part 2: Practical Session (RDF graphs )
Part 3: SQL-based RDF Stores
Keywords: RDF, RDF Stores, querying SPO Table, Graph Databases, Querying Graph, self joins,
RDF3X, Oracle Semantic Technology, C-Store, vertical patronizing, MashQL, Graph
Signature, Semantic Web, Data Web,
Jarrar © 2013 4
Part 1
Querying RDF(S P O) tables using SQL
Jarrar © 2013 5
RDF as a Data Model
RDF (Resource Description Framework) is the Data Model
behind the Semantic/Data Web.
RDF is a graph-based data model.
In RDF, Data is represented as triples: <Subject, Predicate,
Object>, or in short: <S,P,O>.
P
S O
Jarrar © 2013 6
RDF as a Data Model
RDF’s way of representing data as triples is more
elementary than Databases or XML.
– This enables easy data integration and interoperability
between systems (as will be demonstrated later).
• EXAMPLE: the fact that the movie Sicko (2007) is directed
by Michael Moore from the USA can be represented by the
following triples which form a directed labeled graph:
M1
2007
Michael
Moore
Sicko
name
D1
< M1, name, Sicko >
< M1, year, 2007 >
< M1, directedBy, D1 >
< D1, name, Michael Moore >
< D1, country, C1 >
< C1, name, USA>
Based on [1]
Jarrar © 2013 7
RDF graphs as an SPO table
Typically, an RDF Graph is stored
as one <S,P,O> Table in a
Relational Database Management
System.
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
M1
M2
M3
2007
directedBy
D1
Michael Moore
P1
Emmy Awards
2009
C1 USA
Washington DC
directedBy
D2
actedIn
1995
Brave Heart
P2 Oscars
1996
M4
directedBy
D3
actedIn
2007
Caramel
Mel Gibson
C2 Lebanon
Beirut
Nadine Labaki
P3 Stockholm Festival
2007
name
C3
Sweden
Stockholm
location
1995
Capitalism
Sicko
Jarrar © 2013 8
RDF graphs as an SPO table
How do we query graph-shaped
data stored in one relational table?
How do we traverse a graph stored
in one relational table?
 Self Joins!
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
Jarrar © 2013 9
How to Query RDF data stored in SPO table?
Exercises:
(1) What is the name of director D3?
(2) What is the name of the director of the movie M1?
(3) List all the movies who have directors from the USA and their directors.
(4) List all the names of the directors from Lebanon who have won prizes
and the prizes they have won.
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
… … …
M1
M2
M3
2007
directedBy
D1
Michael Moore
P1
Emmy Awards
2009
C1 USA
Washington DC
directedBy
D2
actedIn
1995
Brave Heart
P2 Oscars
1996
M4
directedBy
D3
actedIn
2007
Caramel
Mel Gibson
C2 Lebanon
Beirut
Nadine Labaki
P3 Stockholm Festival
2007
name
C3
Sweden
Stockholm
location
1995
Capitalism
Sicko
Jarrar © 2013 10
How to Query RDF data stored in SPO table?
Exercise 1: A simple query.
– What is the name of director D3?
– Consider the table MoviesTable(s,p,o).
– SQL:
Select o
From MoviesTable T
Where T.s = ‘D3’ AND T.p = ‘Name’;
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
Answer: Nadine Labaki
Jarrar © 2013 11
How to Query RDF data stored in SPO table?
Exercise 2: A path query with 1 join.
– What is the name of the director of the
movie M1.
– Consider the table MoviesTable (s,p,o).
– SQL:
Select T2.o
From MoviesTable T1, MoviesTable T2
Where {T1.s = ‘M1’ AND
T1.p = ‘directedBy’ AND
T1.o = T2.s AND
T2.p = ‘name’};
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
M4 Year 2007
M4 directedBy D3
M4 Name Caramel
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
Answer: Michael Moore
Jarrar © 2013 12
How to Query RDF data stored in SPO table?
Exercise 3: A path query with 2 joins.
– List all the movies who have directors from
the USA and their directors.
– Consider the table MoviesTable(s,p,o).
– SQL:
Select T1.s, T1.o
From MoviesTable T1, MoviesTable T2,
MoviesTable T3
Where {T1.o = T2.s AND
T2.o = T3.s AND
T1.p = ‘directedBy’ AND
T2.p = ‘country’ AND
T3.p = ‘name’ AND
T3.o = ‘USA’};
S P O
M1 year 2007
M1 Name Sicko
M1 directedBy D1
M2 directedBy D1
M2 Year 2009
M2 Name Capitalism
M3 Year 1995
M3 directedBy D2
M3 Name Brave Heart
… … …
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
… … …
C1 Name USA
C1 Capital Washington DC
C2 Name Lebanon
C2 Capital Beirut
… … …
Answer: M1 D1; M2 D1; M3 D2
Jarrar © 2013 13
How to Query RDF data stored in SPO table?
Exercise 4: Star query.
– List all the names of the directors from Lebanon who
have won prizes and the prizes they have won.
– Consider the table MoviesTable (S,P,O).
Select T1.o, T4.o
From MoviesTable T1, MoviesTable T2,
MoviesTable T3, MoviesTable T4
Where {T1.p = ‘name’ AND
T2.s = T1.s AND
T2.p = ‘country’ AND
T2.o = T3.s AND
T3.p = ‘name’ AND
T3.o = ‘Lebanon’ AND
T4.s = T1.s AND
T4.p = ‘hasWonPrizeIn’};
S P O
D1 Name Michael Moore
D1 hasWonPrizeIn P1
D1 Country C1
D2 Counrty C1
D2 hasWonPrizeIn P2
D2 Name Mel Gibson
D2 actedIn M3
D3 Name Nadine Labaki
D3 Country C2
D3 hasWonPrizeIn P3
D3 actedIn M4
… … …
C1 Name USA
C1 Capital Washington DC
C2 Name Lebanon
C2 Capital Beirut
… … …
Answer: ‘Nadine Labaki’ , P3
Jarrar © 2013 14
Part 2:
Querying SPO Tables using SQL
Jarrar © 2013 15
Practical Session
Given the RDF graph in the next slide, do the following:
(1) Insert the data graph as <S,P,O> triples in a 3-column table in any
DBMS.
Schema of the table: Books (Subject, Predicate, Object)
(2) Write the following queries in SQL and execute them over the
newly created table:
• List all the authors born in a country which has the name
Palestine.
• List the names of all authors with the name of their affiliation
who are born in a country whose capital’s population is 14M.
Note that the author must have an affiliation.
• List the names of all books whose authors are born in Lebanon
along with the name of the author.
Jarrar © 2013 16
BK1
BK2
BK3
BK4
AU1
AU2
AU3
AU4
CN1
CN2
CA1
CA2
Viswanathan
CN3 CA3
Said
Wamadat
TheProphet
Naima
Gibran
ColombiaUniversity
Palestine
India
Lebanon
Jerusalem
New Delhi
Beirut
7.6K
2.0M
14.0M
CU
Author
Author
Author
Name
Capital
Capital
Name
Name
This data graph is about books. It talks about four books (BK1-BK4).
Information recorded about a book includes data such as; its author,
affiliation, country of birth including its capital and the population of its
capital.
Jarrar © 2013 17
Practical Session
• Each student should work alone.
• The student is encouraged to first answer each query mentally from the
graph before writing and executing the SQL query, and to compare the
results of the query execution with his/her expected results.
• The student is strongly recommended to write two additional queries,
execute them on the data graph, and hand them along with the required
queries.
• The student must highlight problems and challenges of executing queries
on graph-shaped data and propose solutions to the problem.
• Each student must expect to present his/her queries at class and, more
importantly, he/she must be ready to discuss the problems of querying
graph-shaped data and his/her proposed solutions.
• The final delivery should include (i) a snapshot of the built table, (ii) the
SQL queries, (iii) The results of the queries, and (iv) a one-page discussion
of the problems of querying graph-shaped data and the proposed possible
solutions. These must be handed in a report form in PDF Format.
Jarrar © 2013 18
Part 3
SQL-based RDF Stores
Querying Graph-Shaped Data using SQL
Jarrar © 2013 19
Complexity of querying graph-shaped data
Where does the complexity of querying graph-shaped data
stored in a relational table lie?
The complexity of querying a data graph lies in the many self-
joins that are to be performed on the table.
Accessing multiple properties for a resource requires subject-
subject joins (Star-shaped queries).
Path expressions require subject-object joins.
In general, a query with n edges on an SPO table requires
n-1 self-joins of that table.
Jarrar © 2013 20
Solution-1: Subject-Property Matrixes
Clustered Property Table:
• A clustered property table, contains clusters of
properties tend to be defined together.
• Multiple property tables with different clusters of
properties may be created to optimize queries.
• A key requirement for this type of property table is
that a particular property may only appear in at most
one property table.
• Advantage: Some queries can be answered directly
from the property table with no joins.
- Ex: retrieve the title of the book authored in 2001.Oracle Solution [3]
implemented in 11g
Jarrar © 2013 21
Solution-2: Vertically Partitioning
• Triples table is rewritten into n two column tables
where n is the number of unique properties.
• In each of these tables, the first column contains the
subjects that define that property and the second
column contains the object values for those subjects.
• An experimental implementation of this approach
was performed using an open source column-
oriented database system (C-Store).
Six Vertically Partitioned Tables:
C-Store Solution [4]
Jarrar © 2013 22
Solution-3: Indexes and Dictionaries
Build Indexes.
– On each column: S, P, O
– On Permutations of two columns: SP, SO, PS, PO, …
– On Permutation of three columns: SPO, SOP, PSO, POS,
OSP, OPS, …
Dictionary Encoding String Data?
– Replacing all literals by IDs.
– Triple stores are compressed.
– Allows for faster comparison and matching operations.
 RDF3X Solution [5]
Jarrar © 2013 23
Discussion
Advantages and Disadvantages of storing RDF in
other schemas?
What do you prefer?
Any other suggestions to optimize queries over
graph-shaped data?
Jarrar © 2013 24
MashQL (Query Formulation & indexing Graphs)
What about summarizing the data graph, and instead of querying the
original data, we query the summary and get the same results?
What about summarizing 15M SPO triples into less then 500,000?
This is called Graph Signature Indexing. It is currently in research at
Birzeit University.
Initial comparison
between GS performance
and Oracle’s Semantic
Technologies
(Research started at the University of Cyprus, then at Birzeit University)
Jarrar © 2013 25
MashQL (Query Formulation & indexing Graphs)
A graphical query formulation
language for the Data Web
A general structured-data
retrieval language (not merely
an interface)
Addresses all of the following challenges:
 The user does not know the schema.
 There is no offline or inline schemaontology.
 The query may involve multiple sources.
 The query language is expressive (not a single-purpose interface)
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Artist “^Dion”
YearProdYear > 2003
Title AlbumTitle
Based on [2]
Jarrar © 2013 26
MashQL Example
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Artist “^Dion”
YearProdYear > 2003
Title AlbumTitle
 Interactive Query Formulation.
 MashQL queries are translated into and executed as SPARQL.
Celine Dion’s
Albums after
2003?
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
Jarrar © 2013 27
MashQL Example
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf
Query
Everything
Celine Dion’s
Albums after
2003?
Jarrar © 2013 28
MashQL Example
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Types
Album
A1
A2
B1
Everything
Individuals
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
SELECT X WHERE {?S rdf:Type ?X}
SELECT X WHERE {?X ?P ?O}
Union
SELECT X WHERE {?S ?P ?X}
Background queries
Celine Dion’s
Albums after
2003?
Jarrar © 2013 29
MashQL Example
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Author
Title
ProdYear
Type
Year
Artist
Artist
Equals
Contains
OneOf
Not
Between
LessThan
MoreThan
Equals
Contains
DionContains
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
SELECT P WHERE {?Everything ?P ?O}
Background query
Celine Dion’s
Albums after
2003?
Jarrar © 2013 30
MashQL Example
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Author
Title
ProdYear
Type
Year
Artist
Artist “^Dion
Year
ProdYear
YearProdYear
Equals
Contains
OneOf
Not
Between
LessThan
MoreThanMoreThan
MoreTh 2003
Equals
SELECT P WHERE {?Everything ?P ?O}
Background query
Celine Dion’s
Albums after
2003?
Jarrar © 2013 31
MashQL Example
www.site2.com/rdf
:B1 rdf:Type bibo:Album
:B1 :Title “The prayer”
:B1 :Artist “Celine Dion”
:B1 :Artist “Josh Groban”
:B1 :Year 2008
:B2 rdf:Type bibo:Album
:B2 :Title “Miracle”
:B2 :Author “Celine Dion”
:B2 :Year 2004
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Artist “^Dion
YearProdYear > 2003
Author
Title
ProdYear
Type
Year
Artist
Title
Title AlbumTitle
www.site1.com/rdf
:A1 :Title “Taking Chances”
:A1 :Artist “Dion C.”
:A1 :ProdYear 2007
:A1 :Producer “Peer Astrom”
:A2 :Title “Monodose”
:A2 :Artist “Ziad Rahbani”
SELECT P WHERE {?Everything ?P ?O}
Background query
Celine Dion’s
Albums after
2003?
Jarrar © 2013 32
MashQL Example
RDF Input
From:
Query
Everything
http://www.site1.com/rdf
http://www.site2.com/rdf
Artist “^Dion”
YearProdYear > 2003
Title AlbumTitle PREFIX S1: <http://site1.com/rdf>
PREFIX S2: <http://site1.com/rdf>
SELECT ?AlbumTitle
FROM <http://site1.com/rdf>
FROM <http://site2.com/rdf>
WHERE {
{?X S1:Artist ?X1} UNION {?X S2:Artist ?X1}
{?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2}
{{?X S1:Title ?AlbumTitle} UNION {?X
S2:Title ?AlbumTitle}}
FILTER regex(?X1, “^Dion”)
FILTER (?X2 > 2003)}
Celine Dion’s
Albums after
2003?
Jarrar © 2013 33
MashQL Query Optimization
• The major challenge in MashQL is interactivity!
• User interaction must be within 100ms.
• However, querying RDF is typically slow.
• How about dealing with huge datasets!?
• Implemented Solutions (i.e., Oracle) proved to perform
weakly specially in MashQL background queries.
A query optimization solution was developed.,
called Graph Signature Index.
Jarrar © 2013 34
MashQL Query Optimization: Graph Signature
Summarizing the RDF dataset
(Algorithms were developed)
Then, executing the queries on the summary
instead of the original dataset
(A new execution plan was developed)
‘Graph Signature’ optimizes RDF queries by :
Jarrar © 2013 35
Experiments
• Experiments performed on 15M SPO triples (rows) of the Yago Dataset.
• Graph Signature Index of less than 500K was built on the data.
• Two sets of queries were performed, and the results were as follows:
Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469
Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703
Query A1 A2 A3 A4 A5
GS 0.344 1.953 5.250 10.234 24.672
Oracle 110.390 302.672 525.844 702.969 >20min
Extreme-case linear queries of depth 1-5:
Queries that cover all MashQL Background queries some queries that
might occur as the final queries generated in MashQL:
Jarrar © 2013 36
References
1. Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query
Optimization for the Data Web - Two Disk-Based algorithms: Trace
Equivalence and Bisimilarity.
2. Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation
Language for the Data Web. IEEE Transactions on Knowledge and
Data Engineering.IEEE Computer Society.
3. Chong E, Das S, Eadon G, Srinivasan J: An efficient SQL-based RDF
querying scheme. VLDB’05.
4. Abadi et al, Scalable Semantic Web Data Management Using Vertical
Partitioning. In VLDB 2007.
5. Neumann T, Weikum G: RDF3X: RISC style engine for RDF.
VLDB’08.

Más contenido relacionado

La actualidad más candente

2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
Josef Petrák
 
SWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mappingSWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mapping
Mariano Rodriguez-Muro
 

La actualidad más candente (20)

Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology Language
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
What’s in a structured value?
What’s in a structured value?What’s in a structured value?
What’s in a structured value?
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadata
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
SWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mappingSWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mapping
 
SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 

Destacado

Pal gov.tutorial2.session9.lab rdf-stores
Pal gov.tutorial2.session9.lab rdf-storesPal gov.tutorial2.session9.lab rdf-stores
Pal gov.tutorial2.session9.lab rdf-stores
Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
Mustafa Jarrar
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
Mustafa Jarrar
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
Mustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Mustafa Jarrar
 

Destacado (20)

A Data Mashup Language for the Data Web
A Data Mashup Language for the Data WebA Data Mashup Language for the Data Web
A Data Mashup Language for the Data Web
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF Schema
 
Pal gov.tutorial2.session9.lab rdf-stores
Pal gov.tutorial2.session9.lab rdf-storesPal gov.tutorial2.session9.lab rdf-stores
Pal gov.tutorial2.session9.lab rdf-stores
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked Data
 
Jarrar: Zinnar
Jarrar: ZinnarJarrar: Zinnar
Jarrar: Zinnar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and Constraints
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data Integration
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data Integration
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDF
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
 
Jarrar: RDFa
Jarrar: RDFaJarrar: RDFa
Jarrar: RDFa
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
 
Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps
 

Más de Mustafa Jarrar

Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
Mustafa Jarrar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 

Más de Mustafa Jarrar (20)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Jarrar: RDF Stores -Challenges and Solutions

  • 1. Jarrar © 2013 1 Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Lecture Notes on RDF Stores Birzeit University, Palestine 2013 RDF Stores Challenges and Solutions
  • 2. Jarrar © 2013 2 Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2014/01/web-data-management.html
  • 3. Jarrar © 2013 3 Lecture Outline Part I: Querying RDF(S P O) tables using SQL Part 2: Practical Session (RDF graphs ) Part 3: SQL-based RDF Stores Keywords: RDF, RDF Stores, querying SPO Table, Graph Databases, Querying Graph, self joins, RDF3X, Oracle Semantic Technology, C-Store, vertical patronizing, MashQL, Graph Signature, Semantic Web, Data Web,
  • 4. Jarrar © 2013 4 Part 1 Querying RDF(S P O) tables using SQL
  • 5. Jarrar © 2013 5 RDF as a Data Model RDF (Resource Description Framework) is the Data Model behind the Semantic/Data Web. RDF is a graph-based data model. In RDF, Data is represented as triples: <Subject, Predicate, Object>, or in short: <S,P,O>. P S O
  • 6. Jarrar © 2013 6 RDF as a Data Model RDF’s way of representing data as triples is more elementary than Databases or XML. – This enables easy data integration and interoperability between systems (as will be demonstrated later). • EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael Moore from the USA can be represented by the following triples which form a directed labeled graph: M1 2007 Michael Moore Sicko name D1 < M1, name, Sicko > < M1, year, 2007 > < M1, directedBy, D1 > < D1, name, Michael Moore > < D1, country, C1 > < C1, name, USA> Based on [1]
  • 7. Jarrar © 2013 7 RDF graphs as an SPO table Typically, an RDF Graph is stored as one <S,P,O> Table in a Relational Database Management System. S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … M1 M2 M3 2007 directedBy D1 Michael Moore P1 Emmy Awards 2009 C1 USA Washington DC directedBy D2 actedIn 1995 Brave Heart P2 Oscars 1996 M4 directedBy D3 actedIn 2007 Caramel Mel Gibson C2 Lebanon Beirut Nadine Labaki P3 Stockholm Festival 2007 name C3 Sweden Stockholm location 1995 Capitalism Sicko
  • 8. Jarrar © 2013 8 RDF graphs as an SPO table How do we query graph-shaped data stored in one relational table? How do we traverse a graph stored in one relational table?  Self Joins! S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … …
  • 9. Jarrar © 2013 9 How to Query RDF data stored in SPO table? Exercises: (1) What is the name of director D3? (2) What is the name of the director of the movie M1? (3) List all the movies who have directors from the USA and their directors. (4) List all the names of the directors from Lebanon who have won prizes and the prizes they have won. S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 … … … M1 M2 M3 2007 directedBy D1 Michael Moore P1 Emmy Awards 2009 C1 USA Washington DC directedBy D2 actedIn 1995 Brave Heart P2 Oscars 1996 M4 directedBy D3 actedIn 2007 Caramel Mel Gibson C2 Lebanon Beirut Nadine Labaki P3 Stockholm Festival 2007 name C3 Sweden Stockholm location 1995 Capitalism Sicko
  • 10. Jarrar © 2013 10 How to Query RDF data stored in SPO table? Exercise 1: A simple query. – What is the name of director D3? – Consider the table MoviesTable(s,p,o). – SQL: Select o From MoviesTable T Where T.s = ‘D3’ AND T.p = ‘Name’; S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … Answer: Nadine Labaki
  • 11. Jarrar © 2013 11 How to Query RDF data stored in SPO table? Exercise 2: A path query with 1 join. – What is the name of the director of the movie M1. – Consider the table MoviesTable (s,p,o). – SQL: Select T2.o From MoviesTable T1, MoviesTable T2 Where {T1.s = ‘M1’ AND T1.p = ‘directedBy’ AND T1.o = T2.s AND T2.p = ‘name’}; S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … Answer: Michael Moore
  • 12. Jarrar © 2013 12 How to Query RDF data stored in SPO table? Exercise 3: A path query with 2 joins. – List all the movies who have directors from the USA and their directors. – Consider the table MoviesTable(s,p,o). – SQL: Select T1.s, T1.o From MoviesTable T1, MoviesTable T2, MoviesTable T3 Where {T1.o = T2.s AND T2.o = T3.s AND T1.p = ‘directedBy’ AND T2.p = ‘country’ AND T3.p = ‘name’ AND T3.o = ‘USA’}; S P O M1 year 2007 M1 Name Sicko M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism M3 Year 1995 M3 directedBy D2 M3 Name Brave Heart … … … D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 … … … C1 Name USA C1 Capital Washington DC C2 Name Lebanon C2 Capital Beirut … … … Answer: M1 D1; M2 D1; M3 D2
  • 13. Jarrar © 2013 13 How to Query RDF data stored in SPO table? Exercise 4: Star query. – List all the names of the directors from Lebanon who have won prizes and the prizes they have won. – Consider the table MoviesTable (S,P,O). Select T1.o, T4.o From MoviesTable T1, MoviesTable T2, MoviesTable T3, MoviesTable T4 Where {T1.p = ‘name’ AND T2.s = T1.s AND T2.p = ‘country’ AND T2.o = T3.s AND T3.p = ‘name’ AND T3.o = ‘Lebanon’ AND T4.s = T1.s AND T4.p = ‘hasWonPrizeIn’}; S P O D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … C1 Name USA C1 Capital Washington DC C2 Name Lebanon C2 Capital Beirut … … … Answer: ‘Nadine Labaki’ , P3
  • 14. Jarrar © 2013 14 Part 2: Querying SPO Tables using SQL
  • 15. Jarrar © 2013 15 Practical Session Given the RDF graph in the next slide, do the following: (1) Insert the data graph as <S,P,O> triples in a 3-column table in any DBMS. Schema of the table: Books (Subject, Predicate, Object) (2) Write the following queries in SQL and execute them over the newly created table: • List all the authors born in a country which has the name Palestine. • List the names of all authors with the name of their affiliation who are born in a country whose capital’s population is 14M. Note that the author must have an affiliation. • List the names of all books whose authors are born in Lebanon along with the name of the author.
  • 16. Jarrar © 2013 16 BK1 BK2 BK3 BK4 AU1 AU2 AU3 AU4 CN1 CN2 CA1 CA2 Viswanathan CN3 CA3 Said Wamadat TheProphet Naima Gibran ColombiaUniversity Palestine India Lebanon Jerusalem New Delhi Beirut 7.6K 2.0M 14.0M CU Author Author Author Name Capital Capital Name Name This data graph is about books. It talks about four books (BK1-BK4). Information recorded about a book includes data such as; its author, affiliation, country of birth including its capital and the population of its capital.
  • 17. Jarrar © 2013 17 Practical Session • Each student should work alone. • The student is encouraged to first answer each query mentally from the graph before writing and executing the SQL query, and to compare the results of the query execution with his/her expected results. • The student is strongly recommended to write two additional queries, execute them on the data graph, and hand them along with the required queries. • The student must highlight problems and challenges of executing queries on graph-shaped data and propose solutions to the problem. • Each student must expect to present his/her queries at class and, more importantly, he/she must be ready to discuss the problems of querying graph-shaped data and his/her proposed solutions. • The final delivery should include (i) a snapshot of the built table, (ii) the SQL queries, (iii) The results of the queries, and (iv) a one-page discussion of the problems of querying graph-shaped data and the proposed possible solutions. These must be handed in a report form in PDF Format.
  • 18. Jarrar © 2013 18 Part 3 SQL-based RDF Stores Querying Graph-Shaped Data using SQL
  • 19. Jarrar © 2013 19 Complexity of querying graph-shaped data Where does the complexity of querying graph-shaped data stored in a relational table lie? The complexity of querying a data graph lies in the many self- joins that are to be performed on the table. Accessing multiple properties for a resource requires subject- subject joins (Star-shaped queries). Path expressions require subject-object joins. In general, a query with n edges on an SPO table requires n-1 self-joins of that table.
  • 20. Jarrar © 2013 20 Solution-1: Subject-Property Matrixes Clustered Property Table: • A clustered property table, contains clusters of properties tend to be defined together. • Multiple property tables with different clusters of properties may be created to optimize queries. • A key requirement for this type of property table is that a particular property may only appear in at most one property table. • Advantage: Some queries can be answered directly from the property table with no joins. - Ex: retrieve the title of the book authored in 2001.Oracle Solution [3] implemented in 11g
  • 21. Jarrar © 2013 21 Solution-2: Vertically Partitioning • Triples table is rewritten into n two column tables where n is the number of unique properties. • In each of these tables, the first column contains the subjects that define that property and the second column contains the object values for those subjects. • An experimental implementation of this approach was performed using an open source column- oriented database system (C-Store). Six Vertically Partitioned Tables: C-Store Solution [4]
  • 22. Jarrar © 2013 22 Solution-3: Indexes and Dictionaries Build Indexes. – On each column: S, P, O – On Permutations of two columns: SP, SO, PS, PO, … – On Permutation of three columns: SPO, SOP, PSO, POS, OSP, OPS, … Dictionary Encoding String Data? – Replacing all literals by IDs. – Triple stores are compressed. – Allows for faster comparison and matching operations.  RDF3X Solution [5]
  • 23. Jarrar © 2013 23 Discussion Advantages and Disadvantages of storing RDF in other schemas? What do you prefer? Any other suggestions to optimize queries over graph-shaped data?
  • 24. Jarrar © 2013 24 MashQL (Query Formulation & indexing Graphs) What about summarizing the data graph, and instead of querying the original data, we query the summary and get the same results? What about summarizing 15M SPO triples into less then 500,000? This is called Graph Signature Indexing. It is currently in research at Birzeit University. Initial comparison between GS performance and Oracle’s Semantic Technologies (Research started at the University of Cyprus, then at Birzeit University)
  • 25. Jarrar © 2013 25 MashQL (Query Formulation & indexing Graphs) A graphical query formulation language for the Data Web A general structured-data retrieval language (not merely an interface) Addresses all of the following challenges:  The user does not know the schema.  There is no offline or inline schemaontology.  The query may involve multiple sources.  The query language is expressive (not a single-purpose interface) RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Artist “^Dion” YearProdYear > 2003 Title AlbumTitle Based on [2]
  • 26. Jarrar © 2013 26 MashQL Example RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Artist “^Dion” YearProdYear > 2003 Title AlbumTitle  Interactive Query Formulation.  MashQL queries are translated into and executed as SPARQL. Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004
  • 27. Jarrar © 2013 27 MashQL Example www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf Query Everything Celine Dion’s Albums after 2003?
  • 28. Jarrar © 2013 28 MashQL Example RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Types Album A1 A2 B1 Everything Individuals www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 SELECT X WHERE {?S rdf:Type ?X} SELECT X WHERE {?X ?P ?O} Union SELECT X WHERE {?S ?P ?X} Background queries Celine Dion’s Albums after 2003?
  • 29. Jarrar © 2013 29 MashQL Example RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Author Title ProdYear Type Year Artist Artist Equals Contains OneOf Not Between LessThan MoreThan Equals Contains DionContains www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 SELECT P WHERE {?Everything ?P ?O} Background query Celine Dion’s Albums after 2003?
  • 30. Jarrar © 2013 30 MashQL Example www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Author Title ProdYear Type Year Artist Artist “^Dion Year ProdYear YearProdYear Equals Contains OneOf Not Between LessThan MoreThanMoreThan MoreTh 2003 Equals SELECT P WHERE {?Everything ?P ?O} Background query Celine Dion’s Albums after 2003?
  • 31. Jarrar © 2013 31 MashQL Example www.site2.com/rdf :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Artist “^Dion YearProdYear > 2003 Author Title ProdYear Type Year Artist Title Title AlbumTitle www.site1.com/rdf :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” :A1 :ProdYear 2007 :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” :A2 :Artist “Ziad Rahbani” SELECT P WHERE {?Everything ?P ?O} Background query Celine Dion’s Albums after 2003?
  • 32. Jarrar © 2013 32 MashQL Example RDF Input From: Query Everything http://www.site1.com/rdf http://www.site2.com/rdf Artist “^Dion” YearProdYear > 2003 Title AlbumTitle PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ?AlbumTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {?X S1:Artist ?X1} UNION {?X S2:Artist ?X1} {?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2} {{?X S1:Title ?AlbumTitle} UNION {?X S2:Title ?AlbumTitle}} FILTER regex(?X1, “^Dion”) FILTER (?X2 > 2003)} Celine Dion’s Albums after 2003?
  • 33. Jarrar © 2013 33 MashQL Query Optimization • The major challenge in MashQL is interactivity! • User interaction must be within 100ms. • However, querying RDF is typically slow. • How about dealing with huge datasets!? • Implemented Solutions (i.e., Oracle) proved to perform weakly specially in MashQL background queries. A query optimization solution was developed., called Graph Signature Index.
  • 34. Jarrar © 2013 34 MashQL Query Optimization: Graph Signature Summarizing the RDF dataset (Algorithms were developed) Then, executing the queries on the summary instead of the original dataset (A new execution plan was developed) ‘Graph Signature’ optimizes RDF queries by :
  • 35. Jarrar © 2013 35 Experiments • Experiments performed on 15M SPO triples (rows) of the Yago Dataset. • Graph Signature Index of less than 500K was built on the data. • Two sets of queries were performed, and the results were as follows: Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469 Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703 Query A1 A2 A3 A4 A5 GS 0.344 1.953 5.250 10.234 24.672 Oracle 110.390 302.672 525.844 702.969 >20min Extreme-case linear queries of depth 1-5: Queries that cover all MashQL Background queries some queries that might occur as the final queries generated in MashQL:
  • 36. Jarrar © 2013 36 References 1. Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query Optimization for the Data Web - Two Disk-Based algorithms: Trace Equivalence and Bisimilarity. 2. Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation Language for the Data Web. IEEE Transactions on Knowledge and Data Engineering.IEEE Computer Society. 3. Chong E, Das S, Eadon G, Srinivasan J: An efficient SQL-based RDF querying scheme. VLDB’05. 4. Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. 5. Neumann T, Weikum G: RDF3X: RISC style engine for RDF. VLDB’08.