Efficient source selection for sparql endpoint federation

SUPERVISORS
PROF. DR.-ING. HABIL. KLAUS-PETER FÄHNRICH,
UNIVERSITY OF LEIPZIG
DR. AXEL-CYRILLE NGONGA NGOMO , UNIVERSITY OF
LEIPZIG
May 13th, 2016
EFFICIENT SOURCE SELECTION FOR
SPARQL ENDPOINT QUERY
FEDERATION
Muhammad Saleem
Faculty of Mathematics and Computer Science
University of Leipzig
PhD Defense
1

OUTLINE
1. Introduction
2. Problem Statement
3. State-of-the-art Analysis
4. HIBISCUS: Hyper graph-based source selection
5. DAW: Duplicate-aware source selection
6. SAFE: Policy-aware source selection
7. TopFed: Data distribution-aware source selection
8. FEASIBLE and LSQ
9. LargeRDFBench
10. Conclusion
11. Publication and Awards
2

INTRODUCTION
 Linked, decentralized
and distributed architecture
 9,960 datasets
 ~150B triples
 Complex information needs
 Need for federated queries
3

INTRODUCTION: EXAMPLE
Return the party membership and news pages about all US presidents.
 Party memberships
 US presidents
 US presidents
 News pages
 Computation of results require data from both sources
4

INTRODUCTION: EXECUTION OF
FEDERATION
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Rewrite query
and get Individual
Triple Patterns
Identify
capable/relevant
sources
Generate
optimized query
Execution Plan
Integrate sub-
queries results
Execute sub-
queries
Federation
Engine
5

MOTIVATION: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
DBpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
6

7
S1TP1 = S1TP2 =
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

8
S1TP1 = S1TP2 =
S1TP3 =
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

9
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

10
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S5-S9
Total triple pattern-wise sources selected =
1+1+1+1+8 => 12
S4
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

MOTIVATION: ANYTHING WRONG?
11
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S5-S9
1+1+1+1+1=> 5
S4
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
317068
irrelevant
intermediate
results
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

PROBLEM STATEMENT
12
Overestimation of sources is expensive
 Extra intermediate results
 Extra network traffic
 Increase overall runtime
1. How to perform join-aware source
selection with ensured result set
completeness?
2. How to test the efficiency of the
source selection?
Comprehensive benchmarks
 Which system is better and why?
 What are the limitations of a given
system?
 How one can improve a given
system?
3. How to design comprehensive
federated SPARQL as well as triple
stores benchmark?

STATE-OF-THE-ART
13Saleem et al. A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems (Semantic

PROBLEM STATEMENT AND
CONTRIBUTIONS
14
Research Questions
1. How to perform join-aware
source selection with
ensured result set
completeness?
2. How to perform duplicate-
aware source selection?
3. How to perform policy-aware
source selection?
4. How to perform data
distribution-aware source
selection?
5. How to design
comprehensive federated
SPARQL as well as triple
stores benchmark?
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed

CONTRIBUTIONS
15
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
source selection?
selection?
5. How to design
stores benchmark?

MOTIVATION: JOIN-AWARE SOURCE
SELECTION
16
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S5-S9
1+1+1+1+1=> 5
S4
WHERE {
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9

HIBISCUS: HYPER GRAPH-BASED
SOURCE SELECTION
 Models SPARQL queries as hypergraphs
 Makes use of URI‘s authorities in index
 Performs join-aware triple pattern-wise source selection
 Can be combined with any existing SPARQL endpoint federation
system
17
Muhammad Saleem, Axel-Cyrille Ngonga Ngomo HiBISCuS: Hypergraph-
Based Source Selection for SPARQL Endpoint Federation (ESWC, 2014)

SOURCE SELECTION
 Makes use of the URI’s authorities
18
http://dbpedia.org/ontology/party
Scheme Authority Path

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
19

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
20

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
dbpedia:
party
?party
21

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
dbpedia:
party
?party
nyt:topic
Page
?page
22

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
owl:
SameAS
dbpedia:
party
?party
nyt:topic
Page
?page
Star simple hybrid Tail of hyperedge
23

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
owl:
SameAS
dbpedia:
party
?party
nyt:topic
Page
?page
24
dbpedi
a
KEG
G
NY
T
SWDF
LMD
B
Geo
Jamend
o
Obj.
auth.
dbpedi
a
Sbj.
auth.
KEG
G
Sbj.
auth. NY
T
Sbj.
auth.
SWD
F
Sbj.
auth. LMD
B
Sbj.
auth.
Geo
Sbj.
auth. DrgB
nk
Sbj.
auth.
Jamend
o
Sbj.
auth.
DrgBnk

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
owl:
SameAS
dbpedia:
party
?party
nyt:topic
Page
?page
25
dbpedi
a
KEG
G
NY
T
SWDF
LMD
B
Geo
Jamend
o
Obj.
auth.
dbpedi
a
Sbj.
auth.
KEG
G
Sbj.
auth. NY
T
Sbj.
auth.
SWD
F
Sbj.
auth. LMD
B
Sbj.
auth.
Geo
Sbj.
auth. DrgB
nk
Sbj.
auth.
Jamend
o
Sbj.
auth.
DrgBnk

SOURCE SELECTION
WHERE {
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_States
dbpedia:
nationality
?x
owl:
SameAS
dbpedia:
party
?party
nyt:topic
Page
?page
26
Total triple pattern-wise sources selected = 5
instead of 12

EFFICIENT SOURCE SELECTION
FedX(warm) SPLENDID DARQ ANAPSID HiBISCus (warm)
Query #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST
CD 78 0 7.33 78 99 320.9 84 0 7.286 36 43 186 35 0 30.43
LS 56 0 7.99 56 90 307.3 77 0 7.571 44 63 477.4 41 0 23.14
LD 97 0 8.09 97 126 279 113 0 7.727 54 37 803.5 47 0 16
Net 231 0 8 231 315 299 274 0 7.56 134 143 554 123 0 22
27

FEDX EXTENSION WITH HIBISCUS
0
50
100
150
200
250
300
350
400
450
500
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(msec)
FedX (warm) FedX+HiBISCus
Improvement in 20/25 queries with net performance
improvement 24.61%
28

SPLENDID EXTENSION WITH
HIBISCUS
29
0
200
400
600
800
1000
1200
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg.
Queryexecutiontime(msec)
SPLENDID SPLENDID+HiBISCus
improvement 82.72%

DARQ EXTENSION WITH HIBISCUS
30
0.01
0.1
1
10
100
1000
10000
100000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg
Queryexecutiontime(msec)logscale
Hundreds
ANAPSID SPLENDID+HiBISCusNotsupported
Notsupported
Runtimeerror
Runtimeerror
Runtimeerror
Timeout
Timeout
Notsupported
Notsupported
Timeout
Timeout
improvement 92.22%

SPLENDID+HIBISCUS VS. ANAPSID
31
0.01
0.1
1
10
100
1000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 Avg.
Queryexecutiontime(msec)logscale
Hundreds
ANAPSID SPLENDID+HiBISCus
ZeroResults
improvement 98%

CONTRIBUTIONS
32
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
3. How to perform policy-
selection?
5. How to design
stores benchmark?

DAW: DUPLICATE-AWARE SOURCE
SELECTION
33
Retrieved results for TP1 (?uri <p1> ?v1)
Triple pattern-wise source selection and skipping
S1 S2 S3TP1 =
Total triple pattern-wise selected sources = 4
S1 S2TP2 = S4
Min. number of new triples (threshold) = 20
Total triple pattern-wise skipped sources = 2
Retrieved results for TP2 (?uri <p2> ?v2)

DAW: DUPLICATE-AWARE SOURCE
SELECTION
 A combination of MIPs with compact data summaries
 Use average selectivities values for bound subject and objects
 Can be combined with any existing SPARQL endpoint federation
system
 Can be used for partial result retrieval
34
Saleem et al. DAW: Duplicate-AWare Federated Query Processing over the Web of
Data (ISWC, 2013)

DAW: MIN-WISE INDEPENDENT
PERMUTATIONS
35
48 24 36 18 820
21 3 12 24 877
9 21 15 24 4640
21 18 45 30 339
h1 = (7x + 3) mod 51
h2 = (5x + 6) mod 51
hN = (3x + 9) mod 51
8
9
9
Apply Permutations to all ID’s
ID set
Create MIP
Vector from
Minima of
Permutations
8
9
30
24
36
9
8
24
20
48
36
13
MIPs estimated operations
h(concat(s,o))
T4(s,p,o) T5(s,p,o) T6(s,p,o)
T1(s,p,o) T2(s,p,o) T3(s,p,o)
Triples
VA VB
8
9
20
24
36
9
Union (VA , VB)
Resemblance (VA , VB ) = 2/6 => 0.33
Overlap (VA , VB ) =
0.33*(6+6) / (1+0.33) => 3
hi = ai∗x + bimod U
𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 (𝑆𝐴, 𝑆 𝐵) =
𝑆 𝐴⋂𝑆𝐵
𝑆 𝐴⋃𝑆𝐵
≈
|VA⋂VB|
𝑁 Overlap (𝑆𝐴, 𝑆 𝐵)≈
𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴,𝑉 𝐵 ×( 𝑆 𝐴 + 𝑆 𝐵 )
(𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴,𝑉𝐵 +1)
𝐸𝑟𝑟𝑜𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 𝑂(1 𝑁)

FEDX EXTENSION WITH DAW
36
0
1
2
3
4
5
6
STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP
Diseasome Publication Geo Data Movie
Executiontime(sec)
FedX DAW
Over all performance Evaluation
Diseasome Publication Geo Data Movie Overall
Average Gain % Average Gain % Average Gain % Average Gain % Average Gain %
FedX 2.44
18.79
1.48
-12.38
4.60
14.71
1.74
7.59
2.44
9.76
DAW 1.98 1.67 3.92 1.61 2.20

SPLENDID EXTENSION WITH DAW
37
0
1
2
3
4
5
6
7
8
9
10
Diseasome Publication Geo Movie
Executiontime(sec)
SPLENDID DAW
Average Gain % Average Gain % Average Gain % Average Gain % Average Gain %
SPLENDID 3.78 19.48 2.18 -8.94 7.27 14.40 1.9 11.16 3.71 11.11
DAW 3.04 2.37 6.22 1.688 3.30

DARQ EXTENSION WITH DAW
38
0
5
10
15
20
25
30
35
40
Diseasome Publication Geo Movie
Executiontime(sec)
DARQ DAW
Average Gain % Average Gain % Average Gain % Average Gain % AverageGain %
DARQ 8.27
23.34
5.26
6.14
23.44
16.31
1.96
13.88
9.59
16.46
DAW 6.34 4.94 19.62 1.688 8.01

CONTRIBUTIONS
39
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
source selection?
selection?
5. How to design
stores benchmark?

SAFE: POLICY-AWARE SOURCE
SELECTION
40
return number of patients that have been administered the drug Insulin and exhibit
BMI > 25 and Hypertension and Diabetes as adverse events
Switzerland Cyprus Greece
Yasar et al. SAFE: Policy Aware SPARQL Query Federation Over RDF Data

SELECTION
41
Source
Selection
Access Policy
Filtering
Query
Execution

SELECTION
42
Access Policy Framework
Source
Selection
Access Policy
Filtering
Query
Execution
Oya
Clinical Researcher
Expertise – Diabetes
Requested Data
S1 S2 S3
Input Input
Denies AccessGrants Access
S1
S2
S3

SAFE: SOURCE SELECTION
EVALUATION
43
Systems Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Avg
SAFE 8 10 13 16 15 13 15 16 7 7 9 7 11
FedX 9 13 16 24 20 14 16 19 15 17 9 16 16
Systems Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Avg
SAFE 0 0 0 0 0 0 0 0 0 0 0 0 0
FedX 36 28 40 64 48 40 44 40 21 21 9 21 35
Sum of triple-pattern-wise sources selected for each query
Number of SPARQL ASK requests used for source selection

SAFE: QUERY RUNTIME
EVALUATION
44
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Q 8 Q9 Q10 Q11 Q12 Avg.
Time-LogScale(msec)
Query
SAFE FedX
SAFE is 3.61 times faster than FedX

CONTRIBUTIONS
45
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
selection?
5. How to design
stores benchmark?

TOPFED: DATA DISTRIBUTION-AWARE
SOURCE SELECTION
 Intelligent data distribution combined with
 Efficient source selection to handle federation over Big Data
 Federation over 20.4 billion Linked TCGA data
46Saleem et al. TopFed: TCGA Tailored Federated Query Processing and Linking to

TOPFED
47
b1 b2 p1 p2 g1 g2 g3p3 p4 g4 g5 g6p5 p6 g7 g8 g9
C = {CNV, SNP, E-Gene, E-Protein, miRNA, Clinical}
F = {Expression-Exon}M = {beta_value, position}
(CNV, SNP, E-Gene, miRNA,
E-Protein, Clinical)
Exon-Expression
Methylation
D = {seg_mean, rpmmm, scaled_est, p_exp_val}
C-2 = {{p ∈ {E ∪ A ∪ G} ∨ {p = rdf:type ∧ o ∈ F}} ∧ {{S-Join(p, E ∪ F) ∨ P-Join(p, E ∪ F)} ∨ {!S-Join(p, M ∪ B ∪ D ∪ C) ∧ !P-Join(p, M ∪ B ∪ D ∪ C) }}}
C-3 = {{p ∈ {M ∪ A} ∨ {p = rdf:type ∧ o ∈ B}} ∧ {{S-Join(p, M ∪ B) ∨ P-Join(p, M ∪ B) } ∨ {!S-Join(p, E ∪ F ∪ D ∪ C) ∧ !P-Join(p, E ∪ F ∪ D ∪ C) }}}
C-1 = {{p ∈ {D ∪ A ∪ G} ∨ {p = rdf:type ∧ o ∈ C}} ∧ {{S-Join(p, D ∪ C) ∨ P-Join(p, D ∪ C) } ∨ {!S-Join(p, M ∪ B ∪ E ∪ F) ∧ !P-Join(p, M ∪ B ∪ E ∪ F) }}}
C-1 ∨ Category
Colour = blue
IF tumour lookup is successful
forward to corresponding leaf
Else
broadcast to every one
For each query triple t(s, p, o) ∈ T
A = {chromosome, result, bcr_patient_barcode} G = {start, stop}
B = {DNA-Methylation}
E = {RPKM}
Tumours
SPARQL
endpoints
C-2 ∨ Category
Colour = pink
C-3 ∨ Category
Colour = green
1-16 17-33 1-5 6-11 12-16 17-22 23-27 28-33 1-4 5-8 9-12 13-16 17-20 21-24 25-27 28-30 31-33

TOPFED VS. FEDX
48
Selects 50% less data sources than FedX without
losing recall

TOPFED VS. FEDX
 TopFed outperforms FedX significantly on 90% of the queries
 On average, the query run time of TopFed is about 1/3 of that of FedX
49
1
10
100
1000
10000
100000
Query
No
1 2 3 4 5 6 7 8 9 10 Average
QueryExecutionTime(ms)LogScale
FedX (chached) TopFed

CONTRIBUTIONS
50
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
selection?
5. How to design
stores benchmark?

SPARQL BENCHMARKS
Non-Federated Benchmarks
 Centralized repositories
 Query span over a single dataset
 Real or synthetic
 Examples: LUBM, SP2Bench, BSBM, WatDiv, DBPSB, FEASIBLE
Federated Benchmarks
 Multiple Interlinked datasets
 Query span over multiple datasets
 Real or synthetic
 Examples: FedBench, LargeRDFBench
51

FEASIBLE: BENCHMARK
GENERATION FRAMEWORK
 Dataset cleaning
 Feature vectors and normalization
 Selection of exemplars
 Selection of benchmark queries
52Saleem et al. FEASIBLE: A Featured-Based SPARQL Benchmark Generation

FEATURE VECTORS AND
NORMALIZATION
53
SELECT DISTINCT ?entita ?nome
WHERE
{
?entita rdf:type dbo:VideoGame .
?entita rdfs:label ?nome
FILTER regex(?nome, "konami", "i")
}
LIMIT 100
Query Type: SELECT
Results Size: 13
Basic Graph Patterns (BGPs): 1
Triple Patterns: 2
Join Vertices: 1
Mean Join Vertices Degree: 2.0
Mean triple patterns selectivity: 0.01709761619798973
UNION: No
DISTINCT: Yes
ORDER BY: No
REGEX: Yes
LIMIT: Yes
OFFSET: No
OPTIONAL: No
FILTER: Yes
GROUP BY: No
Runtime (ms): 65
13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65
0.11 0.53
0.6
7
0.1
4
0.0
8 0.017 0 1 0 1 1 0 0 1 0 0.14
Feature Vector
Normalized Feature Vector

FEASIBLE
54
Plot feature vectors in a multidimensional space
Query F1 F2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

FEASIBLE
55
Calculate average point
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

FEASIBLE
56
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point of minimum Euclidean distance to avg. point
*Red is our first exemplar

FEASIBLE
57
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars

FEASIBLE
58
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

FEASIBLE
59
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select point that is farthest to exemplars

FEASIBLE
60
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
61
Calculate distance from Q1 to each exemplars

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
62
Assign Q1 to the minimum distance exemplar

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
63
Repeat the process for Q2

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
64

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
65

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
66

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
67

Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FEASIBLE
68

FEASIBLE
69
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate Average across each cluster

FEASIBLE
70
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Calculate distance of each point in cluster to the average

FEASIBLE
71
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Select minimum distance query as the final benchmark
query from that cluster
Purple, i.e., Q2 is the final selected query from yellow cluster

FEASIBLE
72
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Purple, i.e., Q3 is the final selected query from green cluster

FEASIBLE
73
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Purple, i.e., Q8 is the final selected query from brown cluster
Our benchmark queries are Q2, Q3, and Q8

COMPARISON OF COMPOSITE
ERROR
74
FEASIBLE’s composite error is 54.9% less than DBPSB

RANK-WISE RANKING OF TRIPLE
STORES
75
All values are in percentages
 None of the system is sole winner or loser for a particular rank
 Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)
 Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)
 OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)
 Sesame is either fast or slow. Rank 1 (31.71% of the queries) and
rank 4 (23.14%)

CONTRIBUTIONS
76
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
QUETSAL,
LargeRDFBen
ch, State-of-
the-art
EvaluationHIBISCuS,
DAW,
SAFE,
TopFed
Research Questions
ensured result set
completeness?
selection?
5. How to design
stores benchmark?

LARGERDFBENCH
32 Queries
 10 simple
 10 complex
 8 large data
14 Interlined datasets
77
Linked
MDB
DBpedi
a
New
York
Times
Linked
TCGA-
M
Linked
TCGA-
E
Linked
TCGA-
A
Affymetr
ix
SW
Dog
Food
KEGG
Drug
bank
Jamend
o
ChEBI
Geo
names
basedNear owl:sameAs
x-geneid
#Links: 251.3k
country, ethnicity, race
keggCompoundId
bcr_patient_barcode
Same instance
Life Sciences Cross Domain Large Data
bcr_patient_barcode
#Links: 1.7k
#Links: 4.1k
#Links: 21.7k
#Links: 1.3k
Saleem et al. LargeRDFBench: A Billion Triples Benchmark for SPARQL Endpoint

LARGERDFBENCH QUERIES
PROPERTIES
14 Simple
 2-7 triple patterns
 Subset of SPARQL clauses
 Query execution time around 2 seconds on avg.
10 Complex
 8-13 triple patterns
 Use more SPARQL clauses
 Query execution time up to 10 min
8 Large Data
 Minimum 80459 results
 Large intermediate results
 Query execution time in hours
80

SOURCE SELECTION EVALUATION
81

RESULT SET COMPLETENESS AND
CORRECTNESS
82

QUERIES RUNTIME RESULTS
83
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 Avg.
Time-LogScale(msec)
FedX(cold) FedX(100% cached) SPLENDID ANAPSID FedX+HiBISCuS SPLENDID+HiBISCuS
FedX+HiBISCuS, FedX  SPLENDID+HiBISCuS  ANAPSID  SPLENDID
12/14 8/14 10/14

QUERIES RUNTIME RESULTS
84
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Avg.
Time-LogScale(msec)
FedX(cold) FedX(100% cached) SPLENDID ANAPSID FedX+HiBISCuS SPLENDID+HiBISCuS
Runtimeerror
Runtimeerror
Runtimeerror
ANAPSID  SPLENDID+HiBISCuS  FedX+HiBISCuS, FedX
SPLENDID
4/7 5/7 5/7

CONCLUSIONS
86
S2 S3 S4
RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
S1
RDF

CONCLUSIONS
87
S2 S3 S4
RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
S1
RDF
Better source selection leads to
overall improvement of runtime
performance
• HIBISCUS: 24.61% - 92.22%
• DAW: 9.79% - 16.46%
• SAFE: 84%
• TopFed: 68%

CONCLUSIONS
88
S2 S3 S4
RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
S1
RDF
performance
• HIBISCUS: 24.61% - 92.22%
• DAW: 9.79% - 16.46%
• SAFE: 84%
• TopFed: 68%
Better benchmarking
allows for informed
selection of RDF stores
• 55% less error than
DBSPB
• Column stores
(Virtuoso) not always
best

CONCLUSIONS
89
S2 S3 S4
RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
S1
RDF
performance
• HIBISCUS: 24.61% - 92.22%
• DAW: 9.79% - 16.46%
• SAFE: 84%
• TopFed: 68%
Better benchmarking
allows for informed
DBSPB
• Column stores
best
LargeRDFBench addresses
drawbacks of current
federated benchmarks
• SPARQL features
• Size of intermediary
results
• Total runtime of
queries

CONCLUSIONS
90
S2 S3 S4
RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimizer
Integrator
Federation
Engine
S1
RDF
performance
• HIBISCUS: 24.61% - 92.22%
• DAW: 9.79% - 16.46%
• SAFE: 84%
• TopFed: 68%
Better benchmarking
allows for informed
DBSPB
• Column stores
best
LargeRDFBench addresses
drawbacks of current
federated benchmarks
• SPARQL features
• Size of intermediary
results
• Total runtime of
queries
Contributions allow for
• Informed selection of triple
stores and of federation
engines
• Better source selection
• Efficient query planning
• Reduction of intermediate
results,
• Time-efficient query
execution

FUTURE DIRECTIONS
 Top-K relevant source selection
 Cost-based query planning
 Caching intermediate results
 Intelligent data distribution
 Provenance and runtime estimation
 Federated benchmarks out of queries log
 Synthetic benchmarks more like real benchmarks
91

AWARDS
1. Best paper award at conference on Semantics in Healthcare and
Life Sciences (CSHALS 2014) with paper titled GenomeSnip:
Fragmenting the Genomic Wheel to augment discovery in cancer
research
2. Semantic Web Challenge-Big Data Track winner at ISWC 2013 with
paper titled Fostering Serendipity through Big Linked Data
3. I-CHALLENGE (Linked Data Cup) winner at I-Semantics 2013 with
paper titled Linked Cancer Genome Atlas Database
92

PUBLICATIONS AND CITATIONS
Total Publications: 25
 5 Journals (I.F. 2.55, 2.55, 2.26, 0.44)
 10 Conference (5 A ranked, CORE)
 4 Workshops
 2 Tutorials (A ranked, CORE)
 1 Technical report
 3 Demo (A ranked, CORE)
93

PUBLICATIONS
2016
1. Muhammad Saleem, Ricardo Usbeck, Michael Roder, and Axel-Cyrille Ngonga Ngomo SPARQL Querying
Benchmarks Tutorial at International Semantic Web Conference (ISWC), 2015
2. Ethem Cem Ozkan, Muhammad Saleem, Erdogan Dogdu, and Axel-Cyrille Ngonga Ngomo UPSP: Unique
Predicate-based Source Selection for SPARQL Endpoint Federation PROFILES at Extended Semantic Web
Conference (ESWC), 2016
95

PUBLICATIONS
2015
1. Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo A Fine-
Grained Evaluation of SPARQL Endpoint Federation Systems Semantic Web Journal, 2015
2. Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo FEASIBLE: A Featured-Based
SPARQL Benchmark Generation Framework International Semantic Web Conference (ISWC), 2015
3. Muhammad Saleem, Muhammad Intizar Ali, Ruben Verborgh, Qaiser Mehmood, and Axel-Cyrille
Ngonga Ngomo LSQ: The Linked SPARQL Queries Dataset International Semantic Web Conference
(ISWC), 2015
4. Muhammad Saleem, Muhammad Intizar Ali, Ruben Verborgh, andAxel-Cyrille Ngonga
Ngomo Federated Query Processing over Linked Data Tutorial at International Semantic Web
Conference (ISWC), 2015
5. Muhammad Saleem, Intizar Ali, Aidan Hogan,Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo LSQ:
The Linked SPARQL Queries Dataset Technical Report LSQ Technical Report
6. Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo Automatic SPARQL Benchmark
Generation Using FEASIBLE Demo at International Semantic Web Conference (ISWC), 2015
7. Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga
Ngomo The LSQ Dataset: Querying for Queries Demo at International Semantic Web Conference (ISWC),
2015
8. Syeda Sana e Zainab, Ali Hasnain, Muhammad Saleem, Qaiser Mehmood, Durre Zehra, and Stefan
Decker SPARQL Query Formulation and Execution using FedViz Demo at International Semantic Web
Conference (ISWC), 2015
9. Syeda Sana e Zainab, Ali Hasnain, Muhammad Saleem, Qaiser Mehmood, Durre Zehra, and Stefan
Decker FedViz: A Visual Interface for SPARQL Queries Formulation and Execution VOILA
96

PUBLICATIONS
2014
1. Yasar Khan, Muhammad Saleem, Aftab Iqbal, Muntazir Mehdi, Aidan Hogan, Panagiotis Hasapis, Axel-
Cyrille Ngonga Ngomo, Stefan Decker, and Ratnesh Sahay SAFE: Policy Aware SPARQL Query Federation
Over RDF Data Cubes Semantic Web Applications and Tools for Life Sciences (SWAT4LS), 2014
2. Nur Aini Rakhmawati, Muhammad Saleem, Sarasi Lalithsena, and Stefan Decker QFed: Query Set For
Federated SPARQL Query Benchmark 16th International Conference on Information Integration and Web-
based Applications & Services (iiWAS), 2014
3. Bühmann, Lorenz, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Muhammad Saleem, Andreas Both, Valter
Crescenzi, Paolo Merialdo, and Disheng Qiu Web-Scale Extension of RDF Knowledge Bases from
Templated Websites International Semantic Web Conference (ISWC), 2014
4. Muhammad Saleem, Axel-Cyrille Ngonga HiBISCuS: Hypergraph-Based Source Selection for SPARQL
Endpoint Federation Extended Semantic Web Conference (ESWC), 2014
5. Maulik R. Kamdar, Aftab Iqbal, Muhammad Saleem, Helena F. Deus, and Stefan Decker GenomeSnip:
Fragmenting the Genomic Wheel to augment discovery in cancer research CSHALS, 2014, (Best paper
award)
6. Muhammad Saleem, Shanmukha Sampath, Axel-Cyrille Ngonga Ngomo, Aftab Iqbal, Jonas Almeidaand,
and Helena Deus TopFed: TCGA Tailored Federated Query Processing and Linking to LOD Journal of
Biomedical Semantics, 2014
7. Muhammad Saleem, Maulik R. Kamdar, Aftab Iqbal, Shanmukha Sampath, Helena F. Deus, and Axel-
Cyrille Ngonga Ngomo Big Linked Cancer Data: Integrating Linked TCGA and PubMed Journal of Web
Semantics, 2014 97

PUBLICATIONS
2009-2013
1. Muhammad Saleem, Maulik R. Kamdar, Aftab Iqbal, Shanmukha Sampath, Helena F. Deus, and Axel-Cyrille
Ngonga Ngomo Fostering Serendipity through Big Linked Data Semantic Web Challenge at International
Semantic Web Conference (ISWC), 2013, Semantic Web Challenge (Big Data Track) Winner
2. Muhammad Saleem, Shanmukha S Padmanabhuni, Axel-Cyrille Ngonga Ngomo, Jonas S Almeida, and
Stefan Decker, Helena Deus Linked Cancer Genome Atlas Database In Linked Data Cup, I-
Semantics2013, I-CHALLENGE (Linked Data Cup) Winner
3. Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josian Xavier Pariera, Helena F. Deus, and Manfred
Hauswirth DAW: Duplicate-AWare Federated Query Processing over the Web of Data International Semantic
Web Conference (ISWC), 2013
4. Muhammad Saleem, Ali Zahir, Yasir Ismail, and Bilal Saeed Enhanced Generic Information Services Using
Mobile Messaging Grid and Pervasive Computing (GPC), 2010
5. Muhammad Saleem, Ali Zahir, Yasir Ismail, and Bilal Saeed Enhanced Generic Information Services Using
Mobile Messaging Grid and Pervasive Computing (GPC), 2010
6. Muhammad Saleem, and Kyung-Goo Doh Generic Information System Using SMS Gateway The Fourth
International Conference on Computer Sciences and Convergence Information Technology (ICCIT), 2009
7. Muhammad Saleem, Rasheed Hussain, Yasir Ismail, and Shaikh Mohsin Cost Effective Software Engineering
using Program Slicing Techniques The 2nd International Conference on Interaction Sciences: Information
Technology, Culture and Human (ICIS), 2009
98

STATE-OF-THE-ART: SPARQL
FEDERATION APPROACHES
 SPARQL Endpoint Federation (SEF)
 Linked Data Federation (LDF)
 Distributed Hash Tables (DHTs)
 Hybrid of SEF+LDF
100
Saleem et al. A Fine-Grained Evaluation of SPARQL Endpoint
Federation Systems (Semantic Web Journal, 2015)

STATE-OF-THE-ART: SOURCE
SELECTION
 Index-only
 Index-free (SPARQL ASK Queries)
 Hybrid (Index+ SPARQL ASK Queries)
101

HIBISCUS: DATA SUMMARIES
104
[] a ds:Service ;
ds:endpointUrl <http://dbpedia.org/sparql> ;
ds:capability [
ds:predicate dbpedia:party ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority <http://dbpedia.org/> ;
] ;
ds:capability [
ds:predicate rdf:type ;
ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct
classes
] ;
ds:capability [
ds:predicate dbpedia:postalCode ;
#No objAuthority as the object value for dbpedia:postalCode is string
] ;

EFFICIENT SOURCE SELECTION
FedX(warm) SPLENDID DARQ ANAPSID HiBISCus (warm)
Query #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST #TP #AR SST
CD 78 0 7.33 78 99 320.9 84 0 7.286 36 43 186 35 0 30.43
LS 56 0 7.99 56 90 307.3 77 0 7.571 44 63 477.4 41 0 23.14
LD 97 0 8.09 97 126 279 113 0 7.727 54 37 803.5 47 0 16
Net 231 0 8 231 315 299 274 0 7.56 134 143 554 123 0 22
105

DAW: DATA SUMMARIES
106
[] a sd:Service ;
sd:endpointUrl <http://localhost:8890/sparql> ;
sd:capability [
sd:predicate diseasome:name ;
sd:totalTriples 147 ;
sd:avgSbjSel ``0.0068'' ;
sd:avgObjSel ``0.0069'' ;
sd:MIPs ``-6908232 -7090543 -6892373 -7064247 ...''; ] ;
sd:capability [
sd:predicate diseasome:chromosomalLocation ;
sd:totalTtriples 160 ;
sd:avgSbjSel ``0.0062'' ;
sd:avgObjSel ``0.0072'' ;
sd:MIPs ``-7056448 -7056410 -6845713 -6966021 ...''; ] ;

107
0
20
40
60
80
100
120
Recallin%
Ranked Sources
Optimal
DAW
0
20
40
60
80
100
120
Recallin% Ranked Sources
Optimal
DAW
Diseasome Publication
SOURCE RANKING VS. RECALL

TRIPLE STORE BENCHMARKS
Synthetic Benchmarks
 Make use of the synthetic queries and/or data
 Suitable to test scalability
 Often fail to reflect real datasets
 Examples: LUBM, SP2Bench, BSBM, WatDiv
Query Log Benchmarks
 Make use of the real queries from queries log
 Can be more close to the reality
 Scalability can be tested
 Examples: DBPSB, FEASIBLE
108

FEASIBLE: COMPOSITE ERROR
ESTIMATION
109
L is the query log, B is the benchmark and K is the set of all features

LARGERDFBENCH DATASETS
STATISTICS
110

SPARQL 1.1 QUERIES RUNTIME
RESULTS
111
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 Avg.
Time-LogScale(msec)
FedX(100% cached) ANAPSID
ANAPSID  FedX
8/14

SPARQL 1.1 QUERIES RUNTIME
RESULTS
112
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Avg.
Time-LogScale(msec)
FedX(100% cached) ANAPSID
Runtimeerror
Timeout
Timeout
Timeout
Runtimeerror
Timeout
FedX  ANAPSID
6/8

CONCLUSION
 HIBISCUS: Hyper graph-based source
selection
 FedX: 20/25 queries, net improvement 24.61%
 SPLENDID: 24/25 queries, net improvement 82.75%
 DARQ: 20/20 queries, net improvement 92.22%
 DAW: Duplicate-aware source selection
 FedX: 63/79 queries, net improvement 9.79 %
 SPLENDID: 66/79 queries, net improvement 11.11%
 DARQ: 70/79 queries, net improvement 16.46%
 SAFE: Policy-aware source selection
 FedX: 12/12 queries, net improvement 84 %
 TopFed: Data distribution-aware selection
 FedX: 10/10 queries, net improvement 68 %
113
 Join-aware source selection leads
to,
Efficient query planning,
Reduce intermediate results, and
Decrease overall runtime
 FEASIBLE: Triple Store Benchmark
 FEASIBLE composite error is 55% smaller
than DBPSB
 New insights on performance of triple
stores
 LargeRDFBench
 Simple queries benchmarks are not
sufficient
 Ranking changes from simple to complex
queries

Efficient source selection for sparql endpoint federation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Efficient source selection for sparql endpoint federation

Similar a Efficient source selection for sparql endpoint federation (20)

Más de Muhammad Saleem

Más de Muhammad Saleem (10)

Último

Último (20)

Efficient source selection for sparql endpoint federation