1. FEASIBLE: A Feature-Based SPARQL Benchmark
Generation Framework
Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1
http://feasible.aksw.org/
1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
2Insight Center for Data Analytics, National University of Ireland, Galway
International Semantic Web Conference, Bethlehem, USA, 2015
10/14/2015 1
2. Triple Stores Benchmarks
• Synthetic Benchmarks
• Make use of the synthetic queries and/or data
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Often fail to reflect the reality
• For example, LUBM, SP2Bench, BSBM, WatDiv etc.
• Queries Log Benchmarks
• Make use of the real queries from queries log
• Can be more close to the reality
• Can be used with different data sizes
• Scalability can be tested
• For example, DBPSB, FEASIBLE
10/14/2015 2
3. DBpedia SPARQL Benchmark
• Based on real DBpedia queries log
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Only Considers SPARQL SELECT
• Does not consider Important query features
• For example, number of join vertices, triple patterns selectivities
• Not customizable for given use cases or needs of an application
10/14/2015 3
4. FEASIBLE SPARQL Benchmark
• Can be applied to any SPARQL queries log
• Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT
• Considers Important query features
• For example, number of join vertices, triple patterns selectivities,
query runtime, resultset size, number of BGPs, Mean join vertices
degree, number of triple patterns etc.
• Customizable for given use cases or needs of an application
10/14/2015 4
5. FEASIBLE SPARQL Benchmark
• Dataset cleaning
• Feature vectors and normalization
• Selection of exemplars
• Selection of benchmark queries
10/14/2015 5
6. Dataset Cleaning
• Remove syntactically incorrect queries
• Remove zero result size queries
• It is an optional step
• Not of theoretical necessity
• Leads to practically reliable benchmarks
10/14/2015 6
24. Selection of Benchmark Queries
10/14/2015 24
Calculate distance of each point in cluster to the average
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
25. Selection of Benchmark Queries
10/14/2015 25
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q2 is the final selected query from yellow cluster
26. Selection of Benchmark Queries
10/14/2015 26
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q8 is the final selected query from brown cluster
27. Selection of Benchmark Queries
10/14/2015 27
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q3 is the final selected query from green cluster
Our benchmark queries are Q2, Q3, and Q8
28. Experimental Setup
• Composite Error Estimation
• L is the query log, B is the benchmark and K is the set of all features
10/14/2015 28
29. Experimental Setup
• Virtuoso Open-Source Edition version 7.2
• NumberOfBuffers = 680000, MaxDirtyBuffers = 500000
• Sesame Version 2.7.8
• Tomcat 7 as HTTP interface and native storage layout.
• Set the spoc, posc, opsc indices to those specified in the native storage configuration
• The Java heap size was set to 6GB
• Jena-TDB (Fuseki) Version 2.0
• Java heap size set to 6GB
• OWLIM-SE Version 6.1
• Tomcat 7.0 as HTTP interface
• Set the entity index size to 45,000,000 and enabled the predicate list
• Rule set was empty and the Java heap size was set to 6GB.
• We configured all triple stores to use 6GB of memory and used default values
otherwise.
10/14/2015 29
30. Comparison of Composite Error
10/14/2015 30
FEASIBLE’s composite error is 54.9% less than DBPSB
33. Rank-wise Ranking of Triple Stores
10/14/2015 33
All values are in percentages
• None of the system is sole winner or loser for a particular rank
• Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)
• Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)
• OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)
• Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)