SlideShare una empresa de Scribd logo
1 de 34
FEASIBLE: A Feature-Based SPARQL Benchmark
Generation Framework
Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1
http://feasible.aksw.org/
1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
2Insight Center for Data Analytics, National University of Ireland, Galway
International Semantic Web Conference, Bethlehem, USA, 2015
10/14/2015 1
Triple Stores Benchmarks
• Synthetic Benchmarks
• Make use of the synthetic queries and/or data
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Often fail to reflect the reality
• For example, LUBM, SP2Bench, BSBM, WatDiv etc.
• Queries Log Benchmarks
• Make use of the real queries from queries log
• Can be more close to the reality
• Can be used with different data sizes
• Scalability can be tested
• For example, DBPSB, FEASIBLE
10/14/2015 2
DBpedia SPARQL Benchmark
• Based on real DBpedia queries log
• Benchmarks of different data sizes possible
• Suitable to test the scalability
• Only Considers SPARQL SELECT
• Does not consider Important query features
• For example, number of join vertices, triple patterns selectivities
• Not customizable for given use cases or needs of an application
10/14/2015 3
FEASIBLE SPARQL Benchmark
• Can be applied to any SPARQL queries log
• Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT
• Considers Important query features
• For example, number of join vertices, triple patterns selectivities,
query runtime, resultset size, number of BGPs, Mean join vertices
degree, number of triple patterns etc.
• Customizable for given use cases or needs of an application
10/14/2015 4
FEASIBLE SPARQL Benchmark
• Dataset cleaning
• Feature vectors and normalization
• Selection of exemplars
• Selection of benchmark queries
10/14/2015 5
Dataset Cleaning
• Remove syntactically incorrect queries
• Remove zero result size queries
• It is an optional step
• Not of theoretical necessity
• Leads to practically reliable benchmarks
10/14/2015 6
Feature Vectors and Normalization
SELECT DISTINCT ?entita ?nome
WHERE
{
?entita rdf:type dbo:VideoGame .
?entita rdfs:label ?nome
FILTER regex(?nome, "konami", "i")
}
LIMIT 100
Query Type: SELECT
Results Size: 13
Basic Graph Patterns (BGPs): 1
Triple Patterns: 2
Join Vertices: 1
Mean Join Vertices Degree: 2.0
Mean triple patterns selectivity: 0.01709761619798973
UNION: No
DISTINCT: Yes
ORDER BY: No
REGEX: Yes
LIMIT: Yes
OFFSET: No
OPTIONAL: No
FILTER: Yes
GROUP BY: No
Runtime (ms): 65
13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65
0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14
Feature Vector
Normalized Feature Vector
10/14/2015 7
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 8
Plot feature vectors in a multidimensional space
Query Feature 1 Feature 2
Q1 0.2 0.2
Q2 0.5 0.3
Q3 0.8 0.3
Q4 0.9 0.1
Q5 0.5 0.5
Q6 0.2 0.7
Q7 0.1 0.8
Q8 0.13 0.65
Q9 0.9 0.5
Q10 0.1 0.5
Suppose we need a benchmark of 3 queries
Selection of exemplars
10/14/2015 9
Calculate average point
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 10
Select point of minimum Euclidean distance to avg. point
*Red is our first exemplar
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 11
Select point that is farthest to exemplars
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 12
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 13
Select point that is farthest to exemplars
Selection of exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 14
Selection of Benchmark Queries
10/14/2015 15
Calculate distance from Q1 to each exemplars
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 16
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Assign Q1 to the minimum distance exemplar
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 17
Repeat the process for Q2
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 18
Repeat the process for Q3
Selection of Benchmark Queries
10/14/2015 19
Repeat the process for Q6
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10/14/2015 20
Repeat the process for Q8
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 21
Repeat the process for Q9
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 22
Repeat the process for Q10
Selection of Benchmark Queries
10/14/2015 23
Calculate Average across each cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 24
Calculate distance of each point in cluster to the average
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Selection of Benchmark Queries
10/14/2015 25
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q2 is the final selected query from yellow cluster
Selection of Benchmark Queries
10/14/2015 26
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q8 is the final selected query from brown cluster
Selection of Benchmark Queries
10/14/2015 27
Select minimum distance query as the final benchmark
query from that cluster
Q1
Q2 Q3
Q4
Q5
Q6
Q7
Q8
Q9Q10
Avg.
Avg.
Avg.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Black, i.e., Q3 is the final selected query from green cluster
Our benchmark queries are Q2, Q3, and Q8
Experimental Setup
• Composite Error Estimation
• L is the query log, B is the benchmark and K is the set of all features
10/14/2015 28
Experimental Setup
• Virtuoso Open-Source Edition version 7.2
• NumberOfBuffers = 680000, MaxDirtyBuffers = 500000
• Sesame Version 2.7.8
• Tomcat 7 as HTTP interface and native storage layout.
• Set the spoc, posc, opsc indices to those specified in the native storage configuration
• The Java heap size was set to 6GB
• Jena-TDB (Fuseki) Version 2.0
• Java heap size set to 6GB
• OWLIM-SE Version 6.1
• Tomcat 7.0 as HTTP interface
• Set the entity index size to 45,000,000 and enabled the predicate list
• Rule set was empty and the Java heap size was set to 6GB.
• We configured all triple stores to use 6GB of memory and used default values
otherwise.
10/14/2015 29
Comparison of Composite Error
10/14/2015 30
FEASIBLE’s composite error is 54.9% less than DBPSB
Comparison of Triple Stores: QpS
10/14/2015 31
0
50
100
150
200
250
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
0.5
1
1.5
2
2.5
3
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
10
20
30
40
50
60
70
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame
Virtuoso
OWLIM-SE
Fuseki
Sesame
Virtuoso
OWLIM-SE
Fuseki
SWDF DBpedia
QpS
SPARQL ASK SPARQL CONSTRUCT
SPARQL DESCRIBE SPARQL SELECT
Comparison of Triple Stores: Mix Queries
10/14/2015 32
0
5
10
15
20
25
30
35
40
Sesame Virtuoso OWLIM-SE Fuseki
SWDF
QMpH
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame Virtuoso OWLIM-SE Fuseki
DBpedia
QMpH
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Sesame Virtuoso OWLIM-SE Fuseki
SWDF
QpS
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Sesame Virtuoso OWLIM-SE Fuseki
DBpedia
QpS
Rank-wise Ranking of Triple Stores
10/14/2015 33
All values are in percentages
• None of the system is sole winner or loser for a particular rank
• Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)
• Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)
• OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)
• Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)
Thanks
saleem@informatik.uni-Leipzig.de
Try Yourself
http://feasible.aksw.org/
10/14/2015 34

Más contenido relacionado

Similar a FEASIBLE-Benchmark-Framework-ISWC2015

open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing CertificationVskills
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16Sumi Ryu
 
10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser TestingPerfecto by Perforce
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test FrameworkSmartBear
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph Community
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherencearagozin
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Ruby Meditation
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsAboul Ella Hassanien
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...OPNFV
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Lionel Briand
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...Isabelle Van Campenhoudt
 

Similar a FEASIBLE-Benchmark-Framework-ISWC2015 (20)

open sta testing Certification
open sta testing Certificationopen sta testing Certification
open sta testing Certification
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
Postgre sql vs oracle
Postgre sql vs oraclePostgre sql vs oracle
Postgre sql vs oracle
 
10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing10 Emerging Test Frameworks for Cross Browser Testing
10 Emerging Test Frameworks for Cross Browser Testing
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Classifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm AlgorithmsClassifiers Optimization Using Swarm Algorithms
Classifiers Optimization Using Swarm Algorithms
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
 
Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...Improving Fault Localization for Simulink Models using Search-Based Testing a...
Improving Fault Localization for Simulink Models using Search-Based Testing a...
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...SQL 2016 Query Store: Et si mes queries m'étaient contées...
SQL 2016 Query Store: Et si mes queries m'étaient contées...
 

Más de Muhammad Saleem

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

Más de Muhammad Saleem (15)

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Último

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Último (20)

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

FEASIBLE-Benchmark-Framework-ISWC2015

  • 1. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1 http://feasible.aksw.org/ 1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 2Insight Center for Data Analytics, National University of Ireland, Galway International Semantic Web Conference, Bethlehem, USA, 2015 10/14/2015 1
  • 2. Triple Stores Benchmarks • Synthetic Benchmarks • Make use of the synthetic queries and/or data • Benchmarks of different data sizes possible • Suitable to test the scalability • Often fail to reflect the reality • For example, LUBM, SP2Bench, BSBM, WatDiv etc. • Queries Log Benchmarks • Make use of the real queries from queries log • Can be more close to the reality • Can be used with different data sizes • Scalability can be tested • For example, DBPSB, FEASIBLE 10/14/2015 2
  • 3. DBpedia SPARQL Benchmark • Based on real DBpedia queries log • Benchmarks of different data sizes possible • Suitable to test the scalability • Only Considers SPARQL SELECT • Does not consider Important query features • For example, number of join vertices, triple patterns selectivities • Not customizable for given use cases or needs of an application 10/14/2015 3
  • 4. FEASIBLE SPARQL Benchmark • Can be applied to any SPARQL queries log • Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT • Considers Important query features • For example, number of join vertices, triple patterns selectivities, query runtime, resultset size, number of BGPs, Mean join vertices degree, number of triple patterns etc. • Customizable for given use cases or needs of an application 10/14/2015 4
  • 5. FEASIBLE SPARQL Benchmark • Dataset cleaning • Feature vectors and normalization • Selection of exemplars • Selection of benchmark queries 10/14/2015 5
  • 6. Dataset Cleaning • Remove syntactically incorrect queries • Remove zero result size queries • It is an optional step • Not of theoretical necessity • Leads to practically reliable benchmarks 10/14/2015 6
  • 7. Feature Vectors and Normalization SELECT DISTINCT ?entita ?nome WHERE { ?entita rdf:type dbo:VideoGame . ?entita rdfs:label ?nome FILTER regex(?nome, "konami", "i") } LIMIT 100 Query Type: SELECT Results Size: 13 Basic Graph Patterns (BGPs): 1 Triple Patterns: 2 Join Vertices: 1 Mean Join Vertices Degree: 2.0 Mean triple patterns selectivity: 0.01709761619798973 UNION: No DISTINCT: Yes ORDER BY: No REGEX: Yes LIMIT: Yes OFFSET: No OPTIONAL: No FILTER: Yes GROUP BY: No Runtime (ms): 65 13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65 0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14 Feature Vector Normalized Feature Vector 10/14/2015 7
  • 8. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 8 Plot feature vectors in a multidimensional space Query Feature 1 Feature 2 Q1 0.2 0.2 Q2 0.5 0.3 Q3 0.8 0.3 Q4 0.9 0.1 Q5 0.5 0.5 Q6 0.2 0.7 Q7 0.1 0.8 Q8 0.13 0.65 Q9 0.9 0.5 Q10 0.1 0.5 Suppose we need a benchmark of 3 queries
  • 9. Selection of exemplars 10/14/2015 9 Calculate average point Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 10. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 10 Select point of minimum Euclidean distance to avg. point *Red is our first exemplar
  • 11. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 11 Select point that is farthest to exemplars
  • 12. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 12
  • 13. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 13 Select point that is farthest to exemplars
  • 14. Selection of exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 14
  • 15. Selection of Benchmark Queries 10/14/2015 15 Calculate distance from Q1 to each exemplars Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 16. Selection of Benchmark Queries 10/14/2015 16 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Assign Q1 to the minimum distance exemplar
  • 17. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 17 Repeat the process for Q2
  • 18. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 18 Repeat the process for Q3
  • 19. Selection of Benchmark Queries 10/14/2015 19 Repeat the process for Q6 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 20. Selection of Benchmark Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10/14/2015 20 Repeat the process for Q8
  • 21. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Selection of Benchmark Queries 10/14/2015 21 Repeat the process for Q9
  • 22. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Selection of Benchmark Queries 10/14/2015 22 Repeat the process for Q10
  • 23. Selection of Benchmark Queries 10/14/2015 23 Calculate Average across each cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 24. Selection of Benchmark Queries 10/14/2015 24 Calculate distance of each point in cluster to the average Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 25. Selection of Benchmark Queries 10/14/2015 25 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q2 is the final selected query from yellow cluster
  • 26. Selection of Benchmark Queries 10/14/2015 26 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q8 is the final selected query from brown cluster
  • 27. Selection of Benchmark Queries 10/14/2015 27 Select minimum distance query as the final benchmark query from that cluster Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9Q10 Avg. Avg. Avg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black, i.e., Q3 is the final selected query from green cluster Our benchmark queries are Q2, Q3, and Q8
  • 28. Experimental Setup • Composite Error Estimation • L is the query log, B is the benchmark and K is the set of all features 10/14/2015 28
  • 29. Experimental Setup • Virtuoso Open-Source Edition version 7.2 • NumberOfBuffers = 680000, MaxDirtyBuffers = 500000 • Sesame Version 2.7.8 • Tomcat 7 as HTTP interface and native storage layout. • Set the spoc, posc, opsc indices to those specified in the native storage configuration • The Java heap size was set to 6GB • Jena-TDB (Fuseki) Version 2.0 • Java heap size set to 6GB • OWLIM-SE Version 6.1 • Tomcat 7.0 as HTTP interface • Set the entity index size to 45,000,000 and enabled the predicate list • Rule set was empty and the Java heap size was set to 6GB. • We configured all triple stores to use 6GB of memory and used default values otherwise. 10/14/2015 29
  • 30. Comparison of Composite Error 10/14/2015 30 FEASIBLE’s composite error is 54.9% less than DBPSB
  • 31. Comparison of Triple Stores: QpS 10/14/2015 31 0 50 100 150 200 250 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 0.5 1 1.5 2 2.5 3 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 10 20 30 40 50 60 70 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki Sesame Virtuoso OWLIM-SE Fuseki SWDF DBpedia QpS SPARQL ASK SPARQL CONSTRUCT SPARQL DESCRIBE SPARQL SELECT
  • 32. Comparison of Triple Stores: Mix Queries 10/14/2015 32 0 5 10 15 20 25 30 35 40 Sesame Virtuoso OWLIM-SE Fuseki SWDF QMpH 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki DBpedia QMpH 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Sesame Virtuoso OWLIM-SE Fuseki SWDF QpS 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Sesame Virtuoso OWLIM-SE Fuseki DBpedia QpS
  • 33. Rank-wise Ranking of Triple Stores 10/14/2015 33 All values are in percentages • None of the system is sole winner or loser for a particular rank • Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%) • Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%) • OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %) • Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)