SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Diefficiency Metrics:
Measuring	the	Continuous	Efficiency	of	
Query	Processing	Approaches
Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter
Presented at the International Semantic Web Conference 2017
Best Resource Paper Nominee
Motivation	(1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve	resources	classified	as	DBpedia that	have	SMILES	identifiers.
Query:
Query Engine
2
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.37 sec.
{?d1 à dbr:Ziprepol} 0.37 sec.
{?d1 à dbr:Viminol} 0.37 sec.
{?d1 à dbr:Trifluperidol} 0.37 sec.
{?d1 à dbr:Trabectedin} 0.37 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Blocking Approach:
Produces all results at the end of execution.
Input
Output
Motivation	(1)
SELECT ?d1 WHERE {
?d1 dcterms:subject dbc:Alcohols .
?d1 dbp:smiles ?s .}
Retrieve	resources	classified	as	DBpedia that	have	SMILES	identifiers.
Query:
3
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33 sec.
{?d1 à dbr:Ziprepol} 0.35 sec.
{?d1 à dbr:Viminol} 0.35 sec.
{?d1 à dbr:Trifluperidol} 0.36 sec.
{?d1 à dbr:Trabectedin} 0.36 sec.
{?d1 à dbr:Tolvaptan} 0.37 sec.
Incremental Approach:
Produces results as soon as they are ready, e.g., ANAPSID, nLDE, TPF Client.
Query Engine
Input
Output
Motivation	(2)
4
Metrics
nLDE Not
Adaptive
nLDE
Selective
nLDE
Random
Time First
Answer (sec.) 0.37 0.24 0.33
Execution Time
(sec.) 10.59 12.10 9.30
Throughput
(answer/sec.) 486.27 421.87 553.66
Completeness 100% 100% 100%
Query Engine
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Continuous PerformanceTraditional Metrics
Overall, nLDE Random outperforms the
other approaches.
nLDE Not Adaptive outperforms the other
approaches in the first 7.5 sec. of execution.
Motivation	(3)
We need quantitative methods to measure
the continuous efficiency of
query processing approaches.
5
Related	Work
6
Current	Performance	Metrics
Effectiveness Efficiency
Combined Metric [Guo05]
Answer Completeness
[Guo05] [Montoya12]
Correctness [Zhang12]
Answer Soundness [Guo05]
Execution Time [Guo05] [Bizer09]
[Montoya12] [Zhang12]
Loading Time [Guo05]
Throughput [Zhang12]
Time for the First Tuple
[Acosta11]
Queries per Second [Bizer09]
Average Slowdown [Sharaf08]
These metrics do not consider continuous performance;
they are not tailored to benchmark incremental approaches.
7
Our	Approach:	
Measuring	Continuous	Efficiency
8
Diefficiency Metrics
• Diefficiency: continuous efficiency.
• Combination of the Greek prefix di(a)- (which means “through” or
“across”) and efficiency.
• Continuous performance of approaches is recorded in answer traces.
• Our metrics quantify the diefficiency of incremental approaches.
9
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
Answer	Distribution	Function	
• Defined as 𝑋: 0; 𝑡& → ℕ.
• 𝑡& is the point in time when the last answer was produced.
• 𝑋 𝑥 indicates the number of answers produced until the time 𝑥.
• 𝑋 is built from answer traces (applying linear interpolations).
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q9.sparqlAnswer Distribution FunctionAnswer Trace
Answer Time
{?d1 à dbr:Zuclopenthixol} 0.33
{?d1 à dbr:Ziprepol} 0.35
{?d1 à dbr:Viminol} 0.35
{?d1 à dbr:Trifluperidol} 0.36
{?d1 à dbr:Tolvaptan} 0.37
… 10
Metric	dief@t
• Quantifies diefficiency during the first t time units of execution.
• Measures the area under the curve in the interval [0; 𝑡] of 𝑋 𝑥 .
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
dief @t := X(x)dx
0
t
∫
dief@t interpretation: Higher is better.
11
Not Adaptive Selective Random
7323.46 1148.63 5031.90
k = 2000
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time (sec.)
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Metric	dief@k
• Quantifies diefficiency while producing the first k answers.
• Measures the area under the curve of the interval 0; 𝑡𝑘 of 𝑋 𝑥 .
• 𝑡𝑘 is the point in time where the kth answer is produced.
dief@k interpretation: Lower is better.
dief @k := X(x)dx
0
tk
∫
12
Not Adaptive Selective Random
4686.11 3235.67 3517.85
Measuring diefficiency at any time interval
• With dief@t it is possible to measure the diefficiency of an approach during
the interval 𝑡-;	 𝑡/ , as follows:
𝑑𝑖𝑒𝑓@𝑡/ − 𝑑𝑖𝑒𝑓@𝑡-
Extensions	of	dief@t and	dief@k
13
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5073.37 869.18 4024.21
Extensions	of	dief@t and	dief@k
Measuring diefficiency between the ka
th and kb
th answers
• With dief@k it is possible to measure the diefficiency of an approach while
producing the answers 𝑘- and 𝑘/ (with 𝑘- ≤ 𝑘/), as follows:
𝑑𝑖𝑒𝑓@𝑘/ − 𝑑𝑖𝑒𝑓@𝑘-
14
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0 12.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Not Adaptive Selective Random
5847.05 5457.67 3468.71
Properties	of	dief@t and	dief@k
Analytical Relationship Between dief@t and dief@k
Let 𝑡9 be the point in time when the 𝑘th answer is produced.
Theorem 1:
The diefficiency of blocking approaches is always zero.
Theorem 2:
In queries where the number of answers is greater than one, the total
diefficiency of incremental approaches is higher than zero.
15
𝑑𝑖𝑒𝑓@𝑡9 = 𝑑𝑖𝑒𝑓@𝑘
Empirical	Study
16
Experimental	Settings
• Query engine: nLDE [Acosta15] with three configurations:
• nLDE Not Adaptive (NA)
• nLDE Selective (Sel)
• nLDE Random (Ran)
• Queries and dataset:
• nLDE Benchmark 1: 16 non-selective queries (4 –14 triple patterns)
• DBpedia dataset (v. 2015)
• Technical specifications: Debian Wheezy 64 bit with CPU 2x Intel(R)
Xeon(R) CPU E5-2670 2.60GHz (16 physical cores), and 256GB RAM.
17
0
5000
10000
0 20 40 60
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q17.sparql
(TFFF)^−1
(ET)^−1
Comp T
dief@t
NA
Ran
Sel
Comparing	dief@t with	Other	Metrics	(1)
18
(Time for the First Tuple)-1
Completeness Throughput
Plot interpretation: Higher is better.
Results for Query Q17
Uncovered pattern:
Ran outperforms NA
(Execution Time)-1
Comparing	dief@t with	Other	Metrics	(2)
19
Queries in which 𝒅𝒊𝒆𝒇@𝒕	uncovers unknown patterns
5
10
15
0.6 0.9 1.2 1.5
Time
#AnswersProduced
nLDE Not Adaptive
nLDE Selective
nLDE Random
Q2.sparql
k=25%
k=50%
k=75%
k=100%
NA
Ran
Sel
Measuring	Answer	Rate	with	dief@k (1)
20
Plot interpretation: Lower is better.
Sel produces the
first 25% slower
than Ran
Sel produces the last
portions of the answer
at a faster rate
Results for Query Q2
Measuring	Answer	Rate	with	dief@k (2)
21
Only in these queries, all the
approaches produced results
at a uniform rate.
Conclusions	&	Outlook
22
Conclusions
𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌: Measure the diefficiency of incremental approaches.
• We have demonstrated the theoretical soundness of the metrics.
• Our empirical study indicates that 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘	allow for
uncovering performance particularities.
• A final remark:
23
𝒅𝒊𝒆𝒇@𝒕	 and 𝒅𝒊𝒆𝒇@𝒌	can measure the
performance of any incremental approach.
✔ Streaming query processing ✔Top-k ✔ Monotonic reasoning ✔ Crowdsourcing
• dief R package to compute 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘
https://github.com/maribelacosta/dief
• Jupyter notebook:
• https://github.com/maribelacosta/dief-notebooks
• Online demo:
http://km.aifb.kit.edu/services/dief-app/
Available	Resources	
24
References
[Acosta15] M. Acosta and M.-E. Vidal. Networks of linked data eddies: An adaptive web query
processing engine for RDF data. In ISWC, pages 111–127, 2015.
[Acosta11] M. Acosta, M.-E. Vidal, J. Castillo, T. Lampo, and E. Ruckhaus. ANAPSID: An adaptive
query processing engine for SPARQL endpoints. In ISWC, pages 18–34, 2011.
[Bizer09] C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst.,
5(2):1–24, 2009.
[Guo05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web
Semant., 3(2-3):158–182, Oct. 2005.
[Montoya12] G.Montoya, M.-E Vidal, Ó. Corcho, E. Ruckhaus, and C.B.Aranda.Benchmarking
federated SPARQL query engines: Are existing testbeds enough? In ISWC, pages 313–324, 2012.
[Sharaf08] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for
processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1–5:44,
2008.
[Zhang12] Y. Zhang, M. Pham, Ó. Corcho, and J. Calbimonte. SRBench: A streaming RDF/SPARQL
benchmark. In ISWC, pages 641–657, 2012.
25
0
1000
2000
3000
4000
5000
0.0 2.5 5.0 7.5 10.0
Time
#AnswersProduced
nLDE Not Adaptive
26
Diefficiency Metrics:	
Measuring	the	Continuous	Efficiency	of	Query	Processing	Approaches	
Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter
dief @t := X(x)dx
0
t
∫
dief @k := X(x)dx
0
tk
∫

Más contenido relacionado

Similar a Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
MLconf
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
MongoDB
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programming
TUOS-Sam
 

Similar a Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches (20)

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
Performance and Benchmarking
Performance and BenchmarkingPerformance and Benchmarking
Performance and Benchmarking
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
Data science with R - Clustering and Classification
Data science with R - Clustering and ClassificationData science with R - Clustering and Classification
Data science with R - Clustering and Classification
 
Matlab
Matlab Matlab
Matlab
 
Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...Tailored source-code-transformation-synthesize-computationally-diverse-progra...
Tailored source-code-transformation-synthesize-computationally-diverse-progra...
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Testing: ¿what, how, why?
Testing: ¿what, how, why?Testing: ¿what, how, why?
Testing: ¿what, how, why?
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programming
 
ANSSummer2015
ANSSummer2015ANSSummer2015
ANSSummer2015
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROM
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
 

Más de Maribel Acosta Deibe

Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Maribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 

Más de Maribel Acosta Deibe (8)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Último

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 

Último (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 

Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches

  • 1. Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter Presented at the International Semantic Web Conference 2017 Best Resource Paper Nominee
  • 2. Motivation (1) SELECT ?d1 WHERE { ?d1 dcterms:subject dbc:Alcohols . ?d1 dbp:smiles ?s .} Retrieve resources classified as DBpedia that have SMILES identifiers. Query: Query Engine 2 Answer Time {?d1 à dbr:Zuclopenthixol} 0.37 sec. {?d1 à dbr:Ziprepol} 0.37 sec. {?d1 à dbr:Viminol} 0.37 sec. {?d1 à dbr:Trifluperidol} 0.37 sec. {?d1 à dbr:Trabectedin} 0.37 sec. {?d1 à dbr:Tolvaptan} 0.37 sec. Blocking Approach: Produces all results at the end of execution. Input Output
  • 3. Motivation (1) SELECT ?d1 WHERE { ?d1 dcterms:subject dbc:Alcohols . ?d1 dbp:smiles ?s .} Retrieve resources classified as DBpedia that have SMILES identifiers. Query: 3 Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 sec. {?d1 à dbr:Ziprepol} 0.35 sec. {?d1 à dbr:Viminol} 0.35 sec. {?d1 à dbr:Trifluperidol} 0.36 sec. {?d1 à dbr:Trabectedin} 0.36 sec. {?d1 à dbr:Tolvaptan} 0.37 sec. Incremental Approach: Produces results as soon as they are ready, e.g., ANAPSID, nLDE, TPF Client. Query Engine Input Output
  • 4. Motivation (2) 4 Metrics nLDE Not Adaptive nLDE Selective nLDE Random Time First Answer (sec.) 0.37 0.24 0.33 Execution Time (sec.) 10.59 12.10 9.30 Throughput (answer/sec.) 486.27 421.87 553.66 Completeness 100% 100% 100% Query Engine 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Continuous PerformanceTraditional Metrics Overall, nLDE Random outperforms the other approaches. nLDE Not Adaptive outperforms the other approaches in the first 7.5 sec. of execution.
  • 5. Motivation (3) We need quantitative methods to measure the continuous efficiency of query processing approaches. 5
  • 7. Current Performance Metrics Effectiveness Efficiency Combined Metric [Guo05] Answer Completeness [Guo05] [Montoya12] Correctness [Zhang12] Answer Soundness [Guo05] Execution Time [Guo05] [Bizer09] [Montoya12] [Zhang12] Loading Time [Guo05] Throughput [Zhang12] Time for the First Tuple [Acosta11] Queries per Second [Bizer09] Average Slowdown [Sharaf08] These metrics do not consider continuous performance; they are not tailored to benchmark incremental approaches. 7
  • 9. Diefficiency Metrics • Diefficiency: continuous efficiency. • Combination of the Greek prefix di(a)- (which means “through” or “across”) and efficiency. • Continuous performance of approaches is recorded in answer traces. • Our metrics quantify the diefficiency of incremental approaches. 9 Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 {?d1 à dbr:Ziprepol} 0.35 {?d1 à dbr:Viminol} 0.35 {?d1 à dbr:Trifluperidol} 0.36 {?d1 à dbr:Tolvaptan} 0.37
  • 10. Answer Distribution Function • Defined as 𝑋: 0; 𝑡& → ℕ. • 𝑡& is the point in time when the last answer was produced. • 𝑋 𝑥 indicates the number of answers produced until the time 𝑥. • 𝑋 is built from answer traces (applying linear interpolations). 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q9.sparqlAnswer Distribution FunctionAnswer Trace Answer Time {?d1 à dbr:Zuclopenthixol} 0.33 {?d1 à dbr:Ziprepol} 0.35 {?d1 à dbr:Viminol} 0.35 {?d1 à dbr:Trifluperidol} 0.36 {?d1 à dbr:Tolvaptan} 0.37 … 10
  • 11. Metric dief@t • Quantifies diefficiency during the first t time units of execution. • Measures the area under the curve in the interval [0; 𝑡] of 𝑋 𝑥 . 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random dief @t := X(x)dx 0 t ∫ dief@t interpretation: Higher is better. 11 Not Adaptive Selective Random 7323.46 1148.63 5031.90
  • 12. k = 2000 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time (sec.) #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Metric dief@k • Quantifies diefficiency while producing the first k answers. • Measures the area under the curve of the interval 0; 𝑡𝑘 of 𝑋 𝑥 . • 𝑡𝑘 is the point in time where the kth answer is produced. dief@k interpretation: Lower is better. dief @k := X(x)dx 0 tk ∫ 12 Not Adaptive Selective Random 4686.11 3235.67 3517.85
  • 13. Measuring diefficiency at any time interval • With dief@t it is possible to measure the diefficiency of an approach during the interval 𝑡-; 𝑡/ , as follows: 𝑑𝑖𝑒𝑓@𝑡/ − 𝑑𝑖𝑒𝑓@𝑡- Extensions of dief@t and dief@k 13 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Not Adaptive Selective Random 5073.37 869.18 4024.21
  • 14. Extensions of dief@t and dief@k Measuring diefficiency between the ka th and kb th answers • With dief@k it is possible to measure the diefficiency of an approach while producing the answers 𝑘- and 𝑘/ (with 𝑘- ≤ 𝑘/), as follows: 𝑑𝑖𝑒𝑓@𝑘/ − 𝑑𝑖𝑒𝑓@𝑘- 14 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 12.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Not Adaptive Selective Random 5847.05 5457.67 3468.71
  • 15. Properties of dief@t and dief@k Analytical Relationship Between dief@t and dief@k Let 𝑡9 be the point in time when the 𝑘th answer is produced. Theorem 1: The diefficiency of blocking approaches is always zero. Theorem 2: In queries where the number of answers is greater than one, the total diefficiency of incremental approaches is higher than zero. 15 𝑑𝑖𝑒𝑓@𝑡9 = 𝑑𝑖𝑒𝑓@𝑘
  • 17. Experimental Settings • Query engine: nLDE [Acosta15] with three configurations: • nLDE Not Adaptive (NA) • nLDE Selective (Sel) • nLDE Random (Ran) • Queries and dataset: • nLDE Benchmark 1: 16 non-selective queries (4 –14 triple patterns) • DBpedia dataset (v. 2015) • Technical specifications: Debian Wheezy 64 bit with CPU 2x Intel(R) Xeon(R) CPU E5-2670 2.60GHz (16 physical cores), and 256GB RAM. 17
  • 18. 0 5000 10000 0 20 40 60 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q17.sparql (TFFF)^−1 (ET)^−1 Comp T dief@t NA Ran Sel Comparing dief@t with Other Metrics (1) 18 (Time for the First Tuple)-1 Completeness Throughput Plot interpretation: Higher is better. Results for Query Q17 Uncovered pattern: Ran outperforms NA (Execution Time)-1
  • 19. Comparing dief@t with Other Metrics (2) 19 Queries in which 𝒅𝒊𝒆𝒇@𝒕 uncovers unknown patterns
  • 20. 5 10 15 0.6 0.9 1.2 1.5 Time #AnswersProduced nLDE Not Adaptive nLDE Selective nLDE Random Q2.sparql k=25% k=50% k=75% k=100% NA Ran Sel Measuring Answer Rate with dief@k (1) 20 Plot interpretation: Lower is better. Sel produces the first 25% slower than Ran Sel produces the last portions of the answer at a faster rate Results for Query Q2
  • 21. Measuring Answer Rate with dief@k (2) 21 Only in these queries, all the approaches produced results at a uniform rate.
  • 23. Conclusions 𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌: Measure the diefficiency of incremental approaches. • We have demonstrated the theoretical soundness of the metrics. • Our empirical study indicates that 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘 allow for uncovering performance particularities. • A final remark: 23 𝒅𝒊𝒆𝒇@𝒕 and 𝒅𝒊𝒆𝒇@𝒌 can measure the performance of any incremental approach. ✔ Streaming query processing ✔Top-k ✔ Monotonic reasoning ✔ Crowdsourcing
  • 24. • dief R package to compute 𝑑𝑖𝑒𝑓@𝑡 and 𝑑𝑖𝑒𝑓@𝑘 https://github.com/maribelacosta/dief • Jupyter notebook: • https://github.com/maribelacosta/dief-notebooks • Online demo: http://km.aifb.kit.edu/services/dief-app/ Available Resources 24
  • 25. References [Acosta15] M. Acosta and M.-E. Vidal. Networks of linked data eddies: An adaptive web query processing engine for RDF data. In ISWC, pages 111–127, 2015. [Acosta11] M. Acosta, M.-E. Vidal, J. Castillo, T. Lampo, and E. Ruckhaus. ANAPSID: An adaptive query processing engine for SPARQL endpoints. In ISWC, pages 18–34, 2011. [Bizer09] C. Bizer and A. Schultz. The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst., 5(2):1–24, 2009. [Guo05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semant., 3(2-3):158–182, Oct. 2005. [Montoya12] G.Montoya, M.-E Vidal, Ó. Corcho, E. Ruckhaus, and C.B.Aranda.Benchmarking federated SPARQL query engines: Are existing testbeds enough? In ISWC, pages 313–324, 2012. [Sharaf08] M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1–5:44, 2008. [Zhang12] Y. Zhang, M. Pham, Ó. Corcho, and J. Calbimonte. SRBench: A streaming RDF/SPARQL benchmark. In ISWC, pages 641–657, 2012. 25
  • 26. 0 1000 2000 3000 4000 5000 0.0 2.5 5.0 7.5 10.0 Time #AnswersProduced nLDE Not Adaptive 26 Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches Maribel Acosta, Maria-Esther Vidal, York Sure-Vetter dief @t := X(x)dx 0 t ∫ dief @k := X(x)dx 0 tk ∫