SlideShare una empresa de Scribd logo
1 de 11
DICE
Horizon	2020		Research	&	Innovation	Action
Grant	Agreement	no.	644869
http://www.dice-h2020.eu
Funded	by	the	Horizon	 2020
Framework	Programme	of	the	European	Union
Configuration	Optimization	
for	Big	Data	Software
Pooyan Jamshidi
Imperial	College	London
Configuration	Optimization	(WP5)
2©DICE
• T5.1;	Lead:	IMP
• Contributors:	XLAB,	IEAT
• Experimentally produce	an	optimal	configuration
• Parameters	not	assigned	by	WP3	optimization
Big	Data	Application	Configuration
3©DICE
Performance	Gain
4©DICE
0 1 2 3 4 5
average read latency (µs) ×10
4
0
20
40
60
80
100
120
140
160
observations
1000 1200 1400 1600 1800 2000
average read latency (µs)
0
10
20
30
40
50
60
70
observations
1
1
×10
4
(a) cass-20 (b) cass-10
question explicitly in the paper. Instead, here we define
the questions and provide evidences based on the results re-
ported in the paper. Here, we only focus on the research
questions that we believe are most important for software
engineering community.
RQ1: How large is the potential di↵erence and as a result
potential impact of a wrong choice of parameter settings?
We demonstrated for both SPS in Figure 2 (in the paper)
and for NoSQL systems in Figure 6 and also in Figure 7 that
the performance di↵erence between di↵erent configurations
is significant. The error measures in the experimental results
in the paper also demonstrate this large di↵erence. The
performance gain between the worst and best settings are
measured for each datasets in Table 4.
Table 4: Performance gain between best and worst settings.
Dataset Best (ms) Worst(ms) Gain (%)
1 wc(6D) 55209 3.3172 99%
2 sol(6D) 40499 1.2000 100%
3 rs(6D) 34733 1.9000 99%
4 wc(3D) 94553 1.2994 100%
5 wc(5D) 405.5 47.387 88%
6 cass-10 1.955 1.206 38%
7 cass-20 40.011 2.045 95%
Di↵erent parameter settings cause very large vari-
ance in the performance of the system under test.
RQ2: How does a “default” setting compare to the worst
and optimum configuration in terms of performance?
Figure 16 reports the relative performance for the de-
fault setting, in terms of throughput and latency with the
worst and best configuration as well as the one prescribed
by NoSQL experts. As opposed our expectations, a default
configuration is worst that majority of other configurations,
it performs better than only 10% of other settings for read
operations. However, for write operations the default con-
figurations performs better than half of the other settings.
This was expected because Apache Cassandra is designed to
be write e cient. Even the configuration that is prescribed
pert’s prescription is also far from optimal on indi-
vidual problem instances.
RQ3: How is the performance of a configuration optimiza-
tion on a version when it has been tuned on another version?
If one makes tuning on a specific version of the system
under test, then we would expect it would be optimum for
other versions of the system after it has gone under some
changes. The results for both SPS and NoSQL in Figures
2 and 7 show that the optimum setting will be di↵erent for
di↵erent versions. We also observed that parameter settings
that should work well on average can be particularly ine -
cient on new versions compared to the best configuration for
those versions. In other words, there is a very high variance
in the performance of parameter settings. This has been
also observed by another study in search-based software en-
gineering community [1]. One interesting observations that
has not reported previously is that the performance data
have correlations among di↵erent versions.
Tuned parameters can improve upon default values
on average, but they are far from optimal on individ-
ual problem instances.
RQ4: How much can transfer learning improve tuning with
a larger number of observations from other versions?
The larger the training set is, the more accurate the model
predictions will likely be. In the context of configuration
tuning, carrying over a larger observation set will result in
better tuning, but it would be more expensive to carry out
the optimization process as we have shown in Section 4.6.
The results in Figure 18(b) and Figure 19(b) show that al-
though the models become more accurate slightly, the gain
in performance tuning does not pay o↵ the costs.
Although transferring more observations improves
performance and reduce the entropy, the improve-
ment is low.
3. REFERENCES
[1] A. Arcuri and G. Fraser. Parameter tuning or default
values? an empirical investigation in search-based
software engineering. Empirical Software Engineering,
18(3):594–623, 2013.
6 cass-10 1.955 1.206 38%
7 cass-20 40.011 2.045 95%
Di↵erent parameter settings cause very large vari-
ance in the performance of the system under test.
RQ2: How does a “default” setting compare to the worst
and optimum configuration in terms of performance?
Figure 16 reports the relative performance for the de-
fault setting, in terms of throughput and latency with the
worst and best configuration as well as the one prescribed
by NoSQL experts. As opposed our expectations, a default
configuration is worst that majority of other configurations,
it performs better than only 10% of other settings for read
operations. However, for write operations the default con-
figurations performs better than half of the other settings.
This was expected because Apache Cassandra is designed to
be write e cient. Even the configuration that is prescribed
a larger nu
The larg
predictions
tuning, ca
better tun
the optimi
The result
though the
in perform
Althou
perform
ment is
3. REF
[1] A. Arc
values?
softwar
18(3):5
3
Configuration	Optimization	features
• A	traditionally	time-costly	operation	considerably	
reduced	in	duration
• Configurations	get	more	efficient	with	each	build
• Support	for	Apache	Storm	and	Cassandra
• Metrics:	latency,	throughput
5©DICE
0 5 10 15 20
Number of counters
100
120
140
160
180
200
220
240
Latency(ms)
splitters=2
splitters=3
number of counters
number of splitters
latency(ms)
100
150
1
200
250
2
300
Cubic Interpolation Over Finer Grid
243 684 10125 14166 18
Overview	of	our	approach
6©DICE
-1.5 -1 -0.5 0 0.5 1 1.5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Configuration
Space
Empirical
Model
2
4
6
8
10
12
1
2
3
4
5
6
160
140
120
100
80
60
180
Experiment
(exhastive)
Experiment
Experiment
0 20 40 60 80 100 120 140 160 180 200
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Selection Criteria
(b) Sequential Design
(a) Design of Experiment
BO4CO	Architecture
7©DICE
Configuration
Optimisation Tool
(BO4CO)
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
TL4CO	(DevOps compatible)
8©DICE
Configuration
Optimisation Tool
(TL4CO)
performance
repository
Monitoring
Deployment Service
Data Preparation
configuration
parameters
values
configuration
parameters
values
Experimental Suite
Testbed
Doc
Data Broker
Tester
experiment time
polling interval
previous versions
configuration
parameters
GP model
Kafka
Vn
V1 V2
System Under Test
historical
data
Workload
Generator
Technology Interface
Storm
Cassandra
Spark
0
4
0.5
1
4
×104
Latency(µs)
3
1.5
3
2
2
2
1 1
-1
4
0
1
2
4
Latency(µs)
×105
3
3
4
3
5
2
2
1 1
1000
4
1500
2000
2500
4
3000
Latency(µs)
3
3500
4000
3
4500
2
2
1 1
1300
4
1350
1400
4
Latency(µs)
3
1450
1500
3
1550
2
2
1 1
(a) cass-20 v1 (b) cass-20 v2
concurrent_reads
concurrent_writes
concurrent_reads
concurrent_writes
concurrent_reads
concurrent_writes
concurrent_reads
concurrent_writes
Experiments	(Storm+Cassandra)
9©DICE
sults that were not included in the main text.
1. FURTHER DETAILS ON SETTINGS
1.1 Configuration Parameters
The Apache Storm parameters (cf. Table 1):
• max spout (topology.max.spout.pending). The maximum
number of tuples that can be pending on a spout.
• spout wait (topology.sleep.spout.wait.strategy.time.ms).
Time in ms the SleepEmptyEmitStrategy should sleep.
• netty min wait (storm.messaging.netty.min wait ms). The
min time netty waits to get the control back from OS.
• spouts, splitters, counters, bolts. Parallelism level.
• heap. The size of the worker heap.
• bu↵er size (storm.messaging.netty.bu↵er size). The size
of the transfer queue.
• emit freq (topology.tick.tuple.freq.secs). The frequency
at which tick tuples are received.
• top level. The length of a linear topology.
• message size, chunk size. The size of tuples and chunk
of messages sent across PEs respectively.
The Apache Cassandra parameters (cf. Table 1):
• trickle fsync: when enabled it forces OS to flush the
write bu↵er at a regular time.
• auto snapshot: enables the system to take snapshots,
avoiding to lose data during truncation or table drop.
• concurrent reads: number of threads dedicated to the
read operations.
• concurrent writes: number of threads dedicated to the
write operations.
• file cache size in mb: the memory used as reading bu↵ers.
1.2 Dataset
Table 1: Experimental datasets, note that this is the complete
set of datasets that we experimentally collected over the course of
3 month of 6 di↵erent cloud infrastructures. We only have shown
part of this in the paper.
Dataset Parameters Size Testbed
1 wc(6D)
1-spouts: {1,3},
2-max spout: {1,2,10,100,1000,10000},
3-spout wait:{1,2,3,10,100},
4-splitters:{1,2,3,6},
5-counters:{1,3,6,12},
6-netty min wait:{10,100,1000}
2880 C1
2 sol(6D)
1-spouts:{1,3},
2-max spout:{1,10,100,1000,10000},
3-top level:{2,3,4,5},
4-netty min wait:{10,100,1000},
5-message size: {10,100,1e3,1e4,1e5,1e6},
6-bolts: {1,2,3,6}
2866 C2
3 rs(6D)
1-spouts:{1,3},
2-max spout:{10,100,1000,10000},
3-sorters:{1,2,3,6,9,12,15,18},
4-emit freq:{1,10,60,120,300},
5-chunk size:{1e5,1e6,2e6,1e7},
6-message size:{1e3,1e4,1e5}
3840 C3
4 wc(3D)
1-max spout:{1,10,100,1e3, 1e4,1e5,1e6},
2-splitters:{1,2,3,4,5,6},
3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}
756 C4
5 wc+rs
1-max spout:{1,10,100,1e3, 1e4,1e5,1e6},
2-splitters:{1,2,3,6},
3-counters:{1,3,6,9,12,15,18}
196 C4
6 wc+sol
1-max spout:{1,10,100,1e3, 1e4,1e5,1e6},
2-splitters:{1,2,3,6},
3-counters:{1,3,6,9,12,15,18}
196 C4
7 wc+wc
1-max spout:{1,10,100,1e3, 1e4,1e5,1e6},
2-splitters:{1,2,3,6},
3-counters:{1,3,6,9,12,15,18}
196 C4
8 wc(5D)
1-spouts:{1,2,3},
2-splitters:{1,2,3,6},
3-counters:{1,2,3,6,9,12},
4-bu↵er-size:{256k,1m,5m,10m,100m},
5-heap:{“-Xmx512m”, “-Xmx1024m”, “-Xmx2048m”}
1080 C5
9 wc-c1
1-spout wait:{1,2,3,4,5,6,7,8,9,10,100,1e3,1e1,6e4},
2-splitters:{1,2,3,4,5,6},
3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}
1343 C1
10 wc-c3
1-spout wait:{1,2,3,4,5,6,7,8,9,10,100,1e3,1e4,6e4},
2-splitters:{1,2,3,4,5,6},
3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}
1512 C3
11
12
cass-10
cass-20
1-trickle fsync:{false,true},
2-auto snapshot:{false,true},
3-con. reads:{16,24,32,40},
4-con. writes:{16,24,32,40},
5-file cache size in mb:{256,384,512,640},
6-con. compactors:{1,3,5,7}
1024 C6
- Over 4 months (24/7) of
experimental efforts
- on 6 different cloud
platforms with 4 benchmark
systems (3 Storm, 1 NoSQL)
How	does	a	“default”	setting	compare	to	the	worst	and	
optimum	configuration	in	terms	of	performance?	
10©DICE
0 500 1000 1500
Throughput (ops/sec)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Averagereadlatency(µs)
×104
TL4CO
BO4CO
BO4CO after
20 iterations TL4CO after
20 iterations
TL4CO after
100 iterations
0 500 1000 1500
Throughput (ops/sec)
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Averagewritelatency(µs)
TL4CO
BO4CO
Default configuration
Configuration
recommended
by expert
TL4CO after
100 iterations
BO4CO after
100 iterations
Default configuration
Configuration
recommended
by expert
IGHTS
to eval-
ch ques-
fied the
e define
sults re-
research
software
a result
ings?
e paper)
e 7 that
urations
l results
e. The
ings are
ettings.
(%)
by practitioners and NoSQL experts is far from the optimum
setting TL4CO found after only 100 iterations.
Default parameter settings perform poorly. The ex-
pert’s prescription is also far from optimal on indi-
vidual problem instances.
RQ3: How is the performance of a configuration optimiza-
tion on a version when it has been tuned on another version?
If one makes tuning on a specific version of the system
under test, then we would expect it would be optimum for
other versions of the system after it has gone under some
changes. The results for both SPS and NoSQL in Figures
2 and 7 show that the optimum setting will be di↵erent for
di↵erent versions. We also observed that parameter settings
that should work well on average can be particularly ine -
cient on new versions compared to the best configuration for
those versions. In other words, there is a very high variance
in the performance of parameter settings. This has been
also observed by another study in search-based software en-
gineering community [1]. One interesting observations that
has not reported previously is that the performance data
Interested	to	know	more	about	this?
11©DICE
please email us at
p.jamshidi@imperial.ac.uk
Or
g.casale@imperial.ac.uk

Más contenido relacionado

La actualidad más candente

Optimal load balancing in cloud computing
Optimal load balancing in cloud computingOptimal load balancing in cloud computing
Optimal load balancing in cloud computingPriyanka Bhowmick
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...AzarulIkhwan
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingZbigniew Jerzak
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingZbigniew Jerzak
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Papitha Velumani
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptUtshab Saha
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
 
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...IRJET Journal
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
Fault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big DataFault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big DataKaran Pardeshi
 
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...bhavikpooja
 
REVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingREVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingJaya Gautam
 
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...IRJET Journal
 
Dynamic Cloud Partitioning and Load Balancing in Cloud
Dynamic Cloud Partitioning and Load Balancing in Cloud Dynamic Cloud Partitioning and Load Balancing in Cloud
Dynamic Cloud Partitioning and Load Balancing in Cloud Shyam Hajare
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemJPINFOTECH JAYAPRAKASH
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithmiosrjce
 

La actualidad más candente (20)

Optimal load balancing in cloud computing
Optimal load balancing in cloud computingOptimal load balancing in cloud computing
Optimal load balancing in cloud computing
 
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
Task Scheduling using Tabu Search algorithm in Cloud Computing Environment us...
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
 
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...
IRJET- Enhance Dynamic Heterogeneous Shortest Job first (DHSJF): A Task Schedu...
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Fault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big DataFault tolerant mechanisms in Big Data
Fault tolerant mechanisms in Big Data
 
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
 
REVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingREVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud Computing
 
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
Scheduling Algorithm Based Simulator for Resource Allocation Task in Cloud Co...
 
moveMountainIEEE
moveMountainIEEEmoveMountainIEEE
moveMountainIEEE
 
Dynamic Cloud Partitioning and Load Balancing in Cloud
Dynamic Cloud Partitioning and Load Balancing in Cloud Dynamic Cloud Partitioning and Load Balancing in Cloud
Dynamic Cloud Partitioning and Load Balancing in Cloud
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithm
 

Destacado

Kdd15 - distributed personalization
Kdd15  - distributed personalizationKdd15  - distributed personalization
Kdd15 - distributed personalizationXu Miao
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsHéloïse Nonne
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learningArnaud Rachez
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
UNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkUNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkZhan Zhang
 
Using Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and LearningUsing Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and LearningDr. Volkan OBAN
 
Distributed machine learning examples
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examplesStanley Wang
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learningArnaud Rachez
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedJonathan Mitchell
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Tuukka Ylä-Anttila
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 

Destacado (14)

Kdd15 - distributed personalization
Kdd15  - distributed personalizationKdd15  - distributed personalization
Kdd15 - distributed personalization
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
UNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on SparkUNET: Massive Scale DNN on Spark
UNET: Massive Scale DNN on Spark
 
Using Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and LearningUsing Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and Learning
 
Distributed machine learning examples
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examples
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learning
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
 
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engag...
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 

Similar a Configuration Optimization for Big Data Software

Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Usingjorgerodriguessimao
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxHafizSaifullah4
 
MySQL Replication Performance in the Cloud
MySQL Replication Performance in the CloudMySQL Replication Performance in the Cloud
MySQL Replication Performance in the CloudVitor Oliveira
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationFisnik Kraja
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstracttsysglobalsolutions
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3'sdelagoya
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
Database performance management
Database performance managementDatabase performance management
Database performance managementscottaver
 
Get a clearer picture of potential cloud performance by looking beyond SPECra...
Get a clearer picture of potential cloud performance by looking beyond SPECra...Get a clearer picture of potential cloud performance by looking beyond SPECra...
Get a clearer picture of potential cloud performance by looking beyond SPECra...Principled Technologies
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentEricsson
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkLiang Men
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...IEEEFINALSEMSTUDENTPROJECTS
 

Similar a Configuration Optimization for Big Data Software (20)

Model checking
Model checkingModel checking
Model checking
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Using
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
CH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptxCH02-Computer Organization and Architecture 10e.pptx
CH02-Computer Organization and Architecture 10e.pptx
 
MySQL Replication Performance in the Cloud
MySQL Replication Performance in the CloudMySQL Replication Performance in the Cloud
MySQL Replication Performance in the Cloud
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
Resisting skew accumulation
Resisting skew accumulationResisting skew accumulation
Resisting skew accumulation
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Database performance management
Database performance managementDatabase performance management
Database performance management
 
Get a clearer picture of potential cloud performance by looking beyond SPECra...
Get a clearer picture of potential cloud performance by looking beyond SPECra...Get a clearer picture of potential cloud performance by looking beyond SPECra...
Get a clearer picture of potential cloud performance by looking beyond SPECra...
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration Framework
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...
 

Más de Pooyan Jamshidi

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachPooyan Jamshidi
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...Pooyan Jamshidi
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Pooyan Jamshidi
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsPooyan Jamshidi
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...Pooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwarePooyan Jamshidi
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwarePooyan Jamshidi
 
Production-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectProduction-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectPooyan Jamshidi
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwarePooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization ToolPooyan Jamshidi
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Pooyan Jamshidi
 

Más de Pooyan Jamshidi (20)

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...Transfer Learning for Performance Analysis of Configurable Systems:A Causal ...
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Production-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectProduction-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software Architect
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
 

Último

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Último (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Configuration Optimization for Big Data Software

  • 2. Configuration Optimization (WP5) 2©DICE • T5.1; Lead: IMP • Contributors: XLAB, IEAT • Experimentally produce an optimal configuration • Parameters not assigned by WP3 optimization
  • 4. Performance Gain 4©DICE 0 1 2 3 4 5 average read latency (µs) ×10 4 0 20 40 60 80 100 120 140 160 observations 1000 1200 1400 1600 1800 2000 average read latency (µs) 0 10 20 30 40 50 60 70 observations 1 1 ×10 4 (a) cass-20 (b) cass-10 question explicitly in the paper. Instead, here we define the questions and provide evidences based on the results re- ported in the paper. Here, we only focus on the research questions that we believe are most important for software engineering community. RQ1: How large is the potential di↵erence and as a result potential impact of a wrong choice of parameter settings? We demonstrated for both SPS in Figure 2 (in the paper) and for NoSQL systems in Figure 6 and also in Figure 7 that the performance di↵erence between di↵erent configurations is significant. The error measures in the experimental results in the paper also demonstrate this large di↵erence. The performance gain between the worst and best settings are measured for each datasets in Table 4. Table 4: Performance gain between best and worst settings. Dataset Best (ms) Worst(ms) Gain (%) 1 wc(6D) 55209 3.3172 99% 2 sol(6D) 40499 1.2000 100% 3 rs(6D) 34733 1.9000 99% 4 wc(3D) 94553 1.2994 100% 5 wc(5D) 405.5 47.387 88% 6 cass-10 1.955 1.206 38% 7 cass-20 40.011 2.045 95% Di↵erent parameter settings cause very large vari- ance in the performance of the system under test. RQ2: How does a “default” setting compare to the worst and optimum configuration in terms of performance? Figure 16 reports the relative performance for the de- fault setting, in terms of throughput and latency with the worst and best configuration as well as the one prescribed by NoSQL experts. As opposed our expectations, a default configuration is worst that majority of other configurations, it performs better than only 10% of other settings for read operations. However, for write operations the default con- figurations performs better than half of the other settings. This was expected because Apache Cassandra is designed to be write e cient. Even the configuration that is prescribed pert’s prescription is also far from optimal on indi- vidual problem instances. RQ3: How is the performance of a configuration optimiza- tion on a version when it has been tuned on another version? If one makes tuning on a specific version of the system under test, then we would expect it would be optimum for other versions of the system after it has gone under some changes. The results for both SPS and NoSQL in Figures 2 and 7 show that the optimum setting will be di↵erent for di↵erent versions. We also observed that parameter settings that should work well on average can be particularly ine - cient on new versions compared to the best configuration for those versions. In other words, there is a very high variance in the performance of parameter settings. This has been also observed by another study in search-based software en- gineering community [1]. One interesting observations that has not reported previously is that the performance data have correlations among di↵erent versions. Tuned parameters can improve upon default values on average, but they are far from optimal on individ- ual problem instances. RQ4: How much can transfer learning improve tuning with a larger number of observations from other versions? The larger the training set is, the more accurate the model predictions will likely be. In the context of configuration tuning, carrying over a larger observation set will result in better tuning, but it would be more expensive to carry out the optimization process as we have shown in Section 4.6. The results in Figure 18(b) and Figure 19(b) show that al- though the models become more accurate slightly, the gain in performance tuning does not pay o↵ the costs. Although transferring more observations improves performance and reduce the entropy, the improve- ment is low. 3. REFERENCES [1] A. Arcuri and G. Fraser. Parameter tuning or default values? an empirical investigation in search-based software engineering. Empirical Software Engineering, 18(3):594–623, 2013. 6 cass-10 1.955 1.206 38% 7 cass-20 40.011 2.045 95% Di↵erent parameter settings cause very large vari- ance in the performance of the system under test. RQ2: How does a “default” setting compare to the worst and optimum configuration in terms of performance? Figure 16 reports the relative performance for the de- fault setting, in terms of throughput and latency with the worst and best configuration as well as the one prescribed by NoSQL experts. As opposed our expectations, a default configuration is worst that majority of other configurations, it performs better than only 10% of other settings for read operations. However, for write operations the default con- figurations performs better than half of the other settings. This was expected because Apache Cassandra is designed to be write e cient. Even the configuration that is prescribed a larger nu The larg predictions tuning, ca better tun the optimi The result though the in perform Althou perform ment is 3. REF [1] A. Arc values? softwar 18(3):5 3
  • 5. Configuration Optimization features • A traditionally time-costly operation considerably reduced in duration • Configurations get more efficient with each build • Support for Apache Storm and Cassandra • Metrics: latency, throughput 5©DICE 0 5 10 15 20 Number of counters 100 120 140 160 180 200 220 240 Latency(ms) splitters=2 splitters=3 number of counters number of splitters latency(ms) 100 150 1 200 250 2 300 Cubic Interpolation Over Finer Grid 243 684 10125 14166 18
  • 6. Overview of our approach 6©DICE -1.5 -1 -0.5 0 0.5 1 1.5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Configuration Space Empirical Model 2 4 6 8 10 12 1 2 3 4 5 6 160 140 120 100 80 60 180 Experiment (exhastive) Experiment Experiment 0 20 40 60 80 100 120 140 160 180 200 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Selection Criteria (b) Sequential Design (a) Design of Experiment
  • 7. BO4CO Architecture 7©DICE Configuration Optimisation Tool (BO4CO) performance repository Monitoring Deployment Service Data Preparation configuration parameters values configuration parameters values Experimental Suite Testbed Doc Data Broker Tester
  • 8. TL4CO (DevOps compatible) 8©DICE Configuration Optimisation Tool (TL4CO) performance repository Monitoring Deployment Service Data Preparation configuration parameters values configuration parameters values Experimental Suite Testbed Doc Data Broker Tester experiment time polling interval previous versions configuration parameters GP model Kafka Vn V1 V2 System Under Test historical data Workload Generator Technology Interface Storm Cassandra Spark 0 4 0.5 1 4 ×104 Latency(µs) 3 1.5 3 2 2 2 1 1 -1 4 0 1 2 4 Latency(µs) ×105 3 3 4 3 5 2 2 1 1 1000 4 1500 2000 2500 4 3000 Latency(µs) 3 3500 4000 3 4500 2 2 1 1 1300 4 1350 1400 4 Latency(µs) 3 1450 1500 3 1550 2 2 1 1 (a) cass-20 v1 (b) cass-20 v2 concurrent_reads concurrent_writes concurrent_reads concurrent_writes concurrent_reads concurrent_writes concurrent_reads concurrent_writes
  • 9. Experiments (Storm+Cassandra) 9©DICE sults that were not included in the main text. 1. FURTHER DETAILS ON SETTINGS 1.1 Configuration Parameters The Apache Storm parameters (cf. Table 1): • max spout (topology.max.spout.pending). The maximum number of tuples that can be pending on a spout. • spout wait (topology.sleep.spout.wait.strategy.time.ms). Time in ms the SleepEmptyEmitStrategy should sleep. • netty min wait (storm.messaging.netty.min wait ms). The min time netty waits to get the control back from OS. • spouts, splitters, counters, bolts. Parallelism level. • heap. The size of the worker heap. • bu↵er size (storm.messaging.netty.bu↵er size). The size of the transfer queue. • emit freq (topology.tick.tuple.freq.secs). The frequency at which tick tuples are received. • top level. The length of a linear topology. • message size, chunk size. The size of tuples and chunk of messages sent across PEs respectively. The Apache Cassandra parameters (cf. Table 1): • trickle fsync: when enabled it forces OS to flush the write bu↵er at a regular time. • auto snapshot: enables the system to take snapshots, avoiding to lose data during truncation or table drop. • concurrent reads: number of threads dedicated to the read operations. • concurrent writes: number of threads dedicated to the write operations. • file cache size in mb: the memory used as reading bu↵ers. 1.2 Dataset Table 1: Experimental datasets, note that this is the complete set of datasets that we experimentally collected over the course of 3 month of 6 di↵erent cloud infrastructures. We only have shown part of this in the paper. Dataset Parameters Size Testbed 1 wc(6D) 1-spouts: {1,3}, 2-max spout: {1,2,10,100,1000,10000}, 3-spout wait:{1,2,3,10,100}, 4-splitters:{1,2,3,6}, 5-counters:{1,3,6,12}, 6-netty min wait:{10,100,1000} 2880 C1 2 sol(6D) 1-spouts:{1,3}, 2-max spout:{1,10,100,1000,10000}, 3-top level:{2,3,4,5}, 4-netty min wait:{10,100,1000}, 5-message size: {10,100,1e3,1e4,1e5,1e6}, 6-bolts: {1,2,3,6} 2866 C2 3 rs(6D) 1-spouts:{1,3}, 2-max spout:{10,100,1000,10000}, 3-sorters:{1,2,3,6,9,12,15,18}, 4-emit freq:{1,10,60,120,300}, 5-chunk size:{1e5,1e6,2e6,1e7}, 6-message size:{1e3,1e4,1e5} 3840 C3 4 wc(3D) 1-max spout:{1,10,100,1e3, 1e4,1e5,1e6}, 2-splitters:{1,2,3,4,5,6}, 3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18} 756 C4 5 wc+rs 1-max spout:{1,10,100,1e3, 1e4,1e5,1e6}, 2-splitters:{1,2,3,6}, 3-counters:{1,3,6,9,12,15,18} 196 C4 6 wc+sol 1-max spout:{1,10,100,1e3, 1e4,1e5,1e6}, 2-splitters:{1,2,3,6}, 3-counters:{1,3,6,9,12,15,18} 196 C4 7 wc+wc 1-max spout:{1,10,100,1e3, 1e4,1e5,1e6}, 2-splitters:{1,2,3,6}, 3-counters:{1,3,6,9,12,15,18} 196 C4 8 wc(5D) 1-spouts:{1,2,3}, 2-splitters:{1,2,3,6}, 3-counters:{1,2,3,6,9,12}, 4-bu↵er-size:{256k,1m,5m,10m,100m}, 5-heap:{“-Xmx512m”, “-Xmx1024m”, “-Xmx2048m”} 1080 C5 9 wc-c1 1-spout wait:{1,2,3,4,5,6,7,8,9,10,100,1e3,1e1,6e4}, 2-splitters:{1,2,3,4,5,6}, 3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18} 1343 C1 10 wc-c3 1-spout wait:{1,2,3,4,5,6,7,8,9,10,100,1e3,1e4,6e4}, 2-splitters:{1,2,3,4,5,6}, 3-counters:{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18} 1512 C3 11 12 cass-10 cass-20 1-trickle fsync:{false,true}, 2-auto snapshot:{false,true}, 3-con. reads:{16,24,32,40}, 4-con. writes:{16,24,32,40}, 5-file cache size in mb:{256,384,512,640}, 6-con. compactors:{1,3,5,7} 1024 C6 - Over 4 months (24/7) of experimental efforts - on 6 different cloud platforms with 4 benchmark systems (3 Storm, 1 NoSQL)
  • 10. How does a “default” setting compare to the worst and optimum configuration in terms of performance? 10©DICE 0 500 1000 1500 Throughput (ops/sec) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Averagereadlatency(µs) ×104 TL4CO BO4CO BO4CO after 20 iterations TL4CO after 20 iterations TL4CO after 100 iterations 0 500 1000 1500 Throughput (ops/sec) 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Averagewritelatency(µs) TL4CO BO4CO Default configuration Configuration recommended by expert TL4CO after 100 iterations BO4CO after 100 iterations Default configuration Configuration recommended by expert IGHTS to eval- ch ques- fied the e define sults re- research software a result ings? e paper) e 7 that urations l results e. The ings are ettings. (%) by practitioners and NoSQL experts is far from the optimum setting TL4CO found after only 100 iterations. Default parameter settings perform poorly. The ex- pert’s prescription is also far from optimal on indi- vidual problem instances. RQ3: How is the performance of a configuration optimiza- tion on a version when it has been tuned on another version? If one makes tuning on a specific version of the system under test, then we would expect it would be optimum for other versions of the system after it has gone under some changes. The results for both SPS and NoSQL in Figures 2 and 7 show that the optimum setting will be di↵erent for di↵erent versions. We also observed that parameter settings that should work well on average can be particularly ine - cient on new versions compared to the best configuration for those versions. In other words, there is a very high variance in the performance of parameter settings. This has been also observed by another study in search-based software en- gineering community [1]. One interesting observations that has not reported previously is that the performance data
  • 11. Interested to know more about this? 11©DICE please email us at p.jamshidi@imperial.ac.uk Or g.casale@imperial.ac.uk