SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Transfer Learning for
Performance Analysis of
Configurable Systems
A Causal Analysis
Mohammad Ali Javidian Pooyan Jamshidi Marco Valtorta
Many systems are now configurable
2
built
Empirical observations confirm that systems
are becoming increasingly configurable
3
Modern systems:
q Increasingly configurable
with software evolution
q Deployed in dynamic and
uncertain environments
[Tianyin Xu, et al., “Too Many Knobs…”, FSE’15]
arameters (“knobs”). With
software to ensure high re-
unting, error-prone task.
derstanding a fundamental
users really need so many
answer, we study the con-
ncluding thousands of cus-
(Storage-A), and hundreds
e system software projects.
g findings to motivate soft-
re cautious and disciplined
hese findings, we provide
ch can significantly reduce
A as an example, the guide-
ers and simplify 19.7% of
n existing users. Also, we
on methods in the context
ir effectiveness in dealing
7/2006 7/2008 7/2010 7/2012 7/2014
0
100
200
300
Numbero
Release time
1
2
Numbero
1/1998 1/2002 1/2006 1/2010 1/2014
0
100
200
300
400
500
600
1.3.14
2.2.14
2.3.4
2.0.35
1.3.24
Numberofparameters
Release time
Apache
1
1
2
Numberofparameters
Figure 1: The increasing number of
software evolution. Storage-A is a comm
5
Influence of options is typically significant
number of counters
number of splitters
latency(ms)
150
100
1
200
250
2
300
243 684 10125 14166 18
Only by tweaking
2 options out of 200
in Apache Storm
- observed ~100%
change in latency
7
How does transfer learning come to the scene ?
DataModel
Transferable
Knowledge
Extract Reuse
Source (Given)
y
C
Learn
Target (Learn)
y
C
Learn
v An ML approach
uses the knowledge
learned on the
source…
v …to learn a cheaper
model for the target
[Pooyan Jamshidi, et al., “Transfer Learning for
Performance Analysis…”, ASE’17]
8
(Javidian, Jamshidi, Valtorta. AAAI Spring Symposium 2019, Stanford, CA.)
TargetSource
Causal
Model
II. INTUITION
ng the performance behavior of configurable
ms can enable (i) performance debugging, (ii)
uning, (iii) design-time evolution, or (iv) runtime
]. We lack empirical understanding of how the
ehavior of a system will vary when the environ-
stem changes. Such empirical understanding will
tant insights to develop faster and more accurate
niques that allow us to make predictions and
of performance for highly configurable systems
nvironments [10]. For instance, we can learn
ehavior of a system on a cheap hardware in a
environment and use that to understand the per-
vior of the system on a production server before
e end user. More specifically, we would like to
e relationship is between the performance of a
pecific environment (characterized by software
hardware, workload, and system version) to the
ary its environmental conditions.
arch, we aim for an empirical understanding of
behavior to improve learning via an informed
ess. In other words, we at learning a perfor-
n a changed environment based on a well-suited
hat has been determined by the knowledge we
er environments. Therefore, the main research
ether there exists a common information (trans-
le knowledge) that applies to both source and
ments of systems and therefore can be carried
A. Preliminary concepts
In this section, we provide formal definitions
cepts that we use throughout this study. The form
enable us to concisely convey concept throughou
1) Configuration and environment space: Le
the i-th feature of a configurable system A wh
enabled or disabled and one of them holds by
configuration space is mathematically a Cartesia
all the features C = Dom(F1) ⇥ · · · ⇥ Dom
Dom(Fi) = {0, 1}. A configuration of a sy
a member of the configuration space (feature s
all the parameters are assigned to a specific v
range (i.e., complete instantiations of the system’s
We also describe an environment instance by
e = [w, h, v] drawn from a given environmen
W ⇥H ⇥V , where they respectively represent se
values for workload, hardware and system versio
2) Performance model: Given a software sy
configuration space F and environmental instan
formance model is a black-box function f : F
given some observations of the system performa
combination of system’s features x 2 F in an
e 2 E. To construct a performance model for
with configuration space F, we run A in environm
e 2 E on various combinations of configurations
record the resulting performance values yi = f(x
F where ✏i ⇠ N (0, i). The training data for o
models is then simply Dtr = {(xi, yi)}n
i=1. In o
response function is simply a mapping from the i
a measurable performance metric that produces in
N
behavior of configurable
formance debugging, (ii)
evolution, or (iv) runtime
understanding of how the
ll vary when the environ-
pirical understanding will
faster and more accurate
o make predictions and
hly configurable systems
instance, we can learn
n a cheap hardware in a
hat to understand the per-
production server before
fically, we would like to
een the performance of a
haracterized by software
nd system version) to the
onditions.
mpirical understanding of
earning via an informed
we at learning a perfor-
ent based on a well-suited
ed by the knowledge we
efore, the main research
mmon information (trans-
plies to both source and
therefore can be carried
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
record the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
response function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
rvations of the system performance for each
system’s features x 2 F in an environment
truct a performance model for a system A
n space F, we run A in environment instance
s combinations of configurations xi 2 F, and
ing performance values yi = f(xi) + ✏i, xi 2
N (0, i). The training data for our regression
imply Dtr = {(xi, yi)}n
i=1. In other words, a
n is simply a mapping from the input space to
formance metric that produces interval-scaled
sume it produces real numbers).
ce distribution: For the performance model,
d associated the performance response to each
ow let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
, that defines a probability distribution over
asures for each environmental conditions. To
ormance distribution for a system A with
ace F, similarly to the process of deriving
models, we run A on various combinations
i 2 F, for a specific environment instance
d the resulting performance values yi. We then
istribution to the set of measured performance
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
e 2 E on various combinations of configurations xi 2 F, and
ecord the resulting performance values yi = f(xi) + ✏i, xi 2
F where ✏i ⇠ N (0, i). The training data for our regression
models is then simply Dtr = {(xi, yi)}n
i=1. In other words, a
esponse function is simply a mapping from the input space to
a measurable performance metric that produces interval-scaled
data (here we assume it produces real numbers).
3) Performance distribution: For the performance model,
we measured and associated the performance response to each
configuration, now let introduce another concept where we
vary the environment and we measure the performance. An
empirical performance distribution is a stochastic process,
pd : E ! (R), that defines a probability distribution over
performance measures for each environmental conditions. To
construct a performance distribution for a system A with
configuration space F, similarly to the process of deriving
he performance models, we run A on various combinations
configurations xi 2 F, for a specific environment instance
e 2 E and record the resulting performance values yi. We then
fit a probability distribution to the set of measured performance
Extract Reuse
Learn
C
C
P
Interventional Data
C
Learn
pr(P|do(Ci)) =?
Causal
Structure
Transferable
K
now
ledge
O1 O2 O3 O4 O5
P S
Observational
Data
N
behavior of configurable
formance debugging, (ii)
evolution, or (iv) runtime
nderstanding of how the
ll vary when the environ-
pirical understanding will
faster and more accurate
o make predictions and
hly configurable systems
instance, we can learn
n a cheap hardware in a
hat to understand the per-
production server before
fically, we would like to
een the performance of a
haracterized by software
nd system version) to the
onditions.
mpirical understanding of
earning via an informed
we at learning a perfor-
A. Preliminary concepts
In this section, we provide formal definitions of four con-
cepts that we use throughout this study. The formal notations
enable us to concisely convey concept throughout the paper.
1) Configuration and environment space: Let Fi indicate
the i-th feature of a configurable system A which is either
enabled or disabled and one of them holds by default. The
configuration space is mathematically a Cartesian product of
all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where
Dom(Fi) = {0, 1}. A configuration of a system is then
a member of the configuration space (feature space) where
all the parameters are assigned to a specific value in their
range (i.e., complete instantiations of the system’s parameters).
We also describe an environment instance by 3 variables
e = [w, h, v] drawn from a given environment space E =
W ⇥H ⇥V , where they respectively represent sets of possible
values for workload, hardware and system version.
2) Performance model: Given a software system A with
configuration space F and environmental instances E, a per-
formance model is a black-box function f : F ⇥ E ! R
given some observations of the system performance for each
combination of system’s features x 2 F in an environment
e 2 E. To construct a performance model for a system A
with configuration space F, we run A in environment instance
oad, hardware and system version.
ce model: Given a software system A with
ace F and environmental instances E, a per-
is a black-box function f : F ⇥ E ! R
rvations of the system performance for each
system’s features x 2 F in an environment
truct a performance model for a system A
n space F, we run A in environment instance
s combinations of configurations xi 2 F, and
ing performance values yi = f(xi) + ✏i, xi 2
N (0, i). The training data for our regression
imply Dtr = {(xi, yi)}n
i=1. In other words, a
n is simply a mapping from the input space to
formance metric that produces interval-scaled
sume it produces real numbers).
ce distribution: For the performance model,
d associated the performance response to each
ow let introduce another concept where we
ment and we measure the performance. An
mance distribution is a stochastic process,
, that defines a probability distribution over
asures for each environmental conditions. To
ormance distribution for a system A with
ace F, similarly to the process of deriving
Interventional
Data
Observational
Data
Causal Effect of
Config. Options
on Performance
How do causal inference tools come to the scene ?
Research Questions
Is it possible to identify causal relations from observational
data and how generalizable are they in highly-configurable
systems?
• RQ1 (Identifiability): Is it possible to estimate causal
effects of configuration options on performance from
observational studies alone?
• RQ2 (Transportability): Is the causal effect of influential
configuration options on performance transportable
across environments?
• RQ3 (Recoverability): Is it possible to recover
conditional probabilities from selection-biased data to the
entire population?
9
RQ1 (Identifiability): Is it possible to
estimate causal effects of configuration
options on performance from observational
studies alone?
12
P(encoding-time|do(visualize)=1)=P(encoding-time|visualize=1)
with mean of 0.37 and variance of 0.14.
RQ1: Results and Implications
Results:
§ Small number of influential configuration options
§ P(perf|do(O_i=o')) is estimable in environments with a
single performance measurement
§ P(perf|do(O_i=o')) is estimable in environments with
multiple performance measurements
Implications:
§ Leading to effective exploration strategies
13
15
RQ2 (Transportability): Is the causal effect
of influential configuration options on
performance transportable across
environments?
16
RQ2: Results and Implications
Results:
§ Trivial transportability: !" → $%&' ← )
§ Small environmental changes lead to transportability of
causal relations
§ With severe environmental changes, transportability of
some causal relations is still possible
Implications:
§ Running new costly experiments in the target environment
can be avoided
18
RQ3 (Recoverability): Is it possible to
recover conditional probabilities from
selection-biased data to the entire population?
19
RQ3: Results and Implications
Results:
§ Recoverability without external data is possible
§ Small sample size may lead to unrecoverable selection bias
Implications:
§ Cost-efficient sampling for performance prediction of
configurable systems
§ Avoiding of biased estimates of causal/statistical effects
20
Possibility
of identifiability
and
transportability

Más contenido relacionado

La actualidad más candente

PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Pooyan Jamshidi
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3Md. Mahedi Mahfuj
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Handson System
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithmRicha Kumari
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Mpage Scicomp14 Bynd Tsk Geom
Mpage Scicomp14 Bynd Tsk GeomMpage Scicomp14 Bynd Tsk Geom
Mpage Scicomp14 Bynd Tsk Geommpage_colo
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 

La actualidad más candente (20)

Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
Recursion
RecursionRecursion
Recursion
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Chapter 5 pc
Chapter 5 pcChapter 5 pc
Chapter 5 pc
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Chapter 1 pc
Chapter 1 pcChapter 1 pc
Chapter 1 pc
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Mpage Scicomp14 Bynd Tsk Geom
Mpage Scicomp14 Bynd Tsk GeomMpage Scicomp14 Bynd Tsk Geom
Mpage Scicomp14 Bynd Tsk Geom
 
Basic Communication
Basic CommunicationBasic Communication
Basic Communication
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 

Similar a Transfer Learning for Performance Analysis of Configurable Systems: A Causal Analysis

Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwarePooyan Jamshidi
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Shinwoo Jang
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsCHOOSE
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...IJCNCJournal
 
A formal conceptual framework
A formal conceptual frameworkA formal conceptual framework
A formal conceptual frameworkijseajournal
 
A new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning AutomataA new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning Automatainfopapers
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling IJECEIAES
 
Compositional testing for fsm based models
Compositional testing for fsm based modelsCompositional testing for fsm based models
Compositional testing for fsm based modelsijseajournal
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures NotesRobinRohit2
 
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...SERENEWorkshop
 
SERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_schoolSERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_schoolHenry Muccini
 

Similar a Transfer Learning for Performance Analysis of Configurable Systems: A Causal Analysis (20)

Transfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable SoftwareTransfer Learning for Performance Analysis of Highly-Configurable Software
Transfer Learning for Performance Analysis of Highly-Configurable Software
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
 
Aa4506146150
Aa4506146150Aa4506146150
Aa4506146150
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
 
A formal conceptual framework
A formal conceptual frameworkA formal conceptual framework
A formal conceptual framework
 
A new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning AutomataA new Reinforcement Scheme for Stochastic Learning Automata
A new Reinforcement Scheme for Stochastic Learning Automata
 
An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling An Effective PSO-inspired Algorithm for Workflow Scheduling
An Effective PSO-inspired Algorithm for Workflow Scheduling
 
Compositional testing for fsm based models
Compositional testing for fsm based modelsCompositional testing for fsm based models
Compositional testing for fsm based models
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Ijetr021232
Ijetr021232Ijetr021232
Ijetr021232
 
Ijetr021232
Ijetr021232Ijetr021232
Ijetr021232
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
 
SERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_schoolSERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_school
 
I046850
I046850I046850
I046850
 

Más de Pooyan Jamshidi

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachPooyan Jamshidi
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...Pooyan Jamshidi
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Pooyan Jamshidi
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsPooyan Jamshidi
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOpsPooyan Jamshidi
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwarePooyan Jamshidi
 
Production-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectProduction-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectPooyan Jamshidi
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwarePooyan Jamshidi
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization ToolPooyan Jamshidi
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwarePooyan Jamshidi
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Pooyan Jamshidi
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICEPooyan Jamshidi
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllersPooyan Jamshidi
 
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Pooyan Jamshidi
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringPooyan Jamshidi
 

Más de Pooyan Jamshidi (20)

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery ApproachLearning LWF Chain Graphs: A Markov Blanket Discovery Approach
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning SystemsTransfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Machine Learning Systems
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Learning to Sample
Learning to SampleLearning to Sample
Learning to Sample
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Production-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software ArchitectProduction-Ready Machine Learning for the Software Architect
Production-Ready Machine Learning for the Software Architect
 
Architecting for Scale
Architecting for ScaleArchitecting for Scale
Architecting for Scale
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
Sensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic SoftwareSensitivity Analysis for Building Adaptive Robotic Software
Sensitivity Analysis for Building Adaptive Robotic Software
 
Configuration Optimization Tool
Configuration Optimization ToolConfiguration Optimization Tool
Configuration Optimization Tool
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data Software
 
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software Engineering
 

Último

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Último (20)

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Transfer Learning for Performance Analysis of Configurable Systems: A Causal Analysis

  • 1. Transfer Learning for Performance Analysis of Configurable Systems A Causal Analysis Mohammad Ali Javidian Pooyan Jamshidi Marco Valtorta
  • 2. Many systems are now configurable 2 built
  • 3. Empirical observations confirm that systems are becoming increasingly configurable 3 Modern systems: q Increasingly configurable with software evolution q Deployed in dynamic and uncertain environments [Tianyin Xu, et al., “Too Many Knobs…”, FSE’15] arameters (“knobs”). With software to ensure high re- unting, error-prone task. derstanding a fundamental users really need so many answer, we study the con- ncluding thousands of cus- (Storage-A), and hundreds e system software projects. g findings to motivate soft- re cautious and disciplined hese findings, we provide ch can significantly reduce A as an example, the guide- ers and simplify 19.7% of n existing users. Also, we on methods in the context ir effectiveness in dealing 7/2006 7/2008 7/2010 7/2012 7/2014 0 100 200 300 Numbero Release time 1 2 Numbero 1/1998 1/2002 1/2006 1/2010 1/2014 0 100 200 300 400 500 600 1.3.14 2.2.14 2.3.4 2.0.35 1.3.24 Numberofparameters Release time Apache 1 1 2 Numberofparameters Figure 1: The increasing number of software evolution. Storage-A is a comm
  • 4. 5 Influence of options is typically significant number of counters number of splitters latency(ms) 150 100 1 200 250 2 300 243 684 10125 14166 18 Only by tweaking 2 options out of 200 in Apache Storm - observed ~100% change in latency
  • 5. 7 How does transfer learning come to the scene ? DataModel Transferable Knowledge Extract Reuse Source (Given) y C Learn Target (Learn) y C Learn v An ML approach uses the knowledge learned on the source… v …to learn a cheaper model for the target [Pooyan Jamshidi, et al., “Transfer Learning for Performance Analysis…”, ASE’17]
  • 6. 8 (Javidian, Jamshidi, Valtorta. AAAI Spring Symposium 2019, Stanford, CA.) TargetSource Causal Model II. INTUITION ng the performance behavior of configurable ms can enable (i) performance debugging, (ii) uning, (iii) design-time evolution, or (iv) runtime ]. We lack empirical understanding of how the ehavior of a system will vary when the environ- stem changes. Such empirical understanding will tant insights to develop faster and more accurate niques that allow us to make predictions and of performance for highly configurable systems nvironments [10]. For instance, we can learn ehavior of a system on a cheap hardware in a environment and use that to understand the per- vior of the system on a production server before e end user. More specifically, we would like to e relationship is between the performance of a pecific environment (characterized by software hardware, workload, and system version) to the ary its environmental conditions. arch, we aim for an empirical understanding of behavior to improve learning via an informed ess. In other words, we at learning a perfor- n a changed environment based on a well-suited hat has been determined by the knowledge we er environments. Therefore, the main research ether there exists a common information (trans- le knowledge) that applies to both source and ments of systems and therefore can be carried A. Preliminary concepts In this section, we provide formal definitions cepts that we use throughout this study. The form enable us to concisely convey concept throughou 1) Configuration and environment space: Le the i-th feature of a configurable system A wh enabled or disabled and one of them holds by configuration space is mathematically a Cartesia all the features C = Dom(F1) ⇥ · · · ⇥ Dom Dom(Fi) = {0, 1}. A configuration of a sy a member of the configuration space (feature s all the parameters are assigned to a specific v range (i.e., complete instantiations of the system’s We also describe an environment instance by e = [w, h, v] drawn from a given environmen W ⇥H ⇥V , where they respectively represent se values for workload, hardware and system versio 2) Performance model: Given a software sy configuration space F and environmental instan formance model is a black-box function f : F given some observations of the system performa combination of system’s features x 2 F in an e 2 E. To construct a performance model for with configuration space F, we run A in environm e 2 E on various combinations of configurations record the resulting performance values yi = f(x F where ✏i ⇠ N (0, i). The training data for o models is then simply Dtr = {(xi, yi)}n i=1. In o response function is simply a mapping from the i a measurable performance metric that produces in N behavior of configurable formance debugging, (ii) evolution, or (iv) runtime understanding of how the ll vary when the environ- pirical understanding will faster and more accurate o make predictions and hly configurable systems instance, we can learn n a cheap hardware in a hat to understand the per- production server before fically, we would like to een the performance of a haracterized by software nd system version) to the onditions. mpirical understanding of earning via an informed we at learning a perfor- ent based on a well-suited ed by the knowledge we efore, the main research mmon information (trans- plies to both source and therefore can be carried A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and record the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a response function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled rvations of the system performance for each system’s features x 2 F in an environment truct a performance model for a system A n space F, we run A in environment instance s combinations of configurations xi 2 F, and ing performance values yi = f(xi) + ✏i, xi 2 N (0, i). The training data for our regression imply Dtr = {(xi, yi)}n i=1. In other words, a n is simply a mapping from the input space to formance metric that produces interval-scaled sume it produces real numbers). ce distribution: For the performance model, d associated the performance response to each ow let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, , that defines a probability distribution over asures for each environmental conditions. To ormance distribution for a system A with ace F, similarly to the process of deriving models, we run A on various combinations i 2 F, for a specific environment instance d the resulting performance values yi. We then istribution to the set of measured performance given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance e 2 E on various combinations of configurations xi 2 F, and ecord the resulting performance values yi = f(xi) + ✏i, xi 2 F where ✏i ⇠ N (0, i). The training data for our regression models is then simply Dtr = {(xi, yi)}n i=1. In other words, a esponse function is simply a mapping from the input space to a measurable performance metric that produces interval-scaled data (here we assume it produces real numbers). 3) Performance distribution: For the performance model, we measured and associated the performance response to each configuration, now let introduce another concept where we vary the environment and we measure the performance. An empirical performance distribution is a stochastic process, pd : E ! (R), that defines a probability distribution over performance measures for each environmental conditions. To construct a performance distribution for a system A with configuration space F, similarly to the process of deriving he performance models, we run A on various combinations configurations xi 2 F, for a specific environment instance e 2 E and record the resulting performance values yi. We then fit a probability distribution to the set of measured performance Extract Reuse Learn C C P Interventional Data C Learn pr(P|do(Ci)) =? Causal Structure Transferable K now ledge O1 O2 O3 O4 O5 P S Observational Data N behavior of configurable formance debugging, (ii) evolution, or (iv) runtime nderstanding of how the ll vary when the environ- pirical understanding will faster and more accurate o make predictions and hly configurable systems instance, we can learn n a cheap hardware in a hat to understand the per- production server before fically, we would like to een the performance of a haracterized by software nd system version) to the onditions. mpirical understanding of earning via an informed we at learning a perfor- A. Preliminary concepts In this section, we provide formal definitions of four con- cepts that we use throughout this study. The formal notations enable us to concisely convey concept throughout the paper. 1) Configuration and environment space: Let Fi indicate the i-th feature of a configurable system A which is either enabled or disabled and one of them holds by default. The configuration space is mathematically a Cartesian product of all the features C = Dom(F1) ⇥ · · · ⇥ Dom(Fd), where Dom(Fi) = {0, 1}. A configuration of a system is then a member of the configuration space (feature space) where all the parameters are assigned to a specific value in their range (i.e., complete instantiations of the system’s parameters). We also describe an environment instance by 3 variables e = [w, h, v] drawn from a given environment space E = W ⇥H ⇥V , where they respectively represent sets of possible values for workload, hardware and system version. 2) Performance model: Given a software system A with configuration space F and environmental instances E, a per- formance model is a black-box function f : F ⇥ E ! R given some observations of the system performance for each combination of system’s features x 2 F in an environment e 2 E. To construct a performance model for a system A with configuration space F, we run A in environment instance oad, hardware and system version. ce model: Given a software system A with ace F and environmental instances E, a per- is a black-box function f : F ⇥ E ! R rvations of the system performance for each system’s features x 2 F in an environment truct a performance model for a system A n space F, we run A in environment instance s combinations of configurations xi 2 F, and ing performance values yi = f(xi) + ✏i, xi 2 N (0, i). The training data for our regression imply Dtr = {(xi, yi)}n i=1. In other words, a n is simply a mapping from the input space to formance metric that produces interval-scaled sume it produces real numbers). ce distribution: For the performance model, d associated the performance response to each ow let introduce another concept where we ment and we measure the performance. An mance distribution is a stochastic process, , that defines a probability distribution over asures for each environmental conditions. To ormance distribution for a system A with ace F, similarly to the process of deriving Interventional Data Observational Data Causal Effect of Config. Options on Performance How do causal inference tools come to the scene ?
  • 7. Research Questions Is it possible to identify causal relations from observational data and how generalizable are they in highly-configurable systems? • RQ1 (Identifiability): Is it possible to estimate causal effects of configuration options on performance from observational studies alone? • RQ2 (Transportability): Is the causal effect of influential configuration options on performance transportable across environments? • RQ3 (Recoverability): Is it possible to recover conditional probabilities from selection-biased data to the entire population? 9
  • 8. RQ1 (Identifiability): Is it possible to estimate causal effects of configuration options on performance from observational studies alone? 12 P(encoding-time|do(visualize)=1)=P(encoding-time|visualize=1) with mean of 0.37 and variance of 0.14.
  • 9. RQ1: Results and Implications Results: § Small number of influential configuration options § P(perf|do(O_i=o')) is estimable in environments with a single performance measurement § P(perf|do(O_i=o')) is estimable in environments with multiple performance measurements Implications: § Leading to effective exploration strategies 13
  • 10. 15 RQ2 (Transportability): Is the causal effect of influential configuration options on performance transportable across environments?
  • 11. 16 RQ2: Results and Implications Results: § Trivial transportability: !" → $%&' ← ) § Small environmental changes lead to transportability of causal relations § With severe environmental changes, transportability of some causal relations is still possible Implications: § Running new costly experiments in the target environment can be avoided
  • 12. 18 RQ3 (Recoverability): Is it possible to recover conditional probabilities from selection-biased data to the entire population?
  • 13. 19 RQ3: Results and Implications Results: § Recoverability without external data is possible § Small sample size may lead to unrecoverable selection bias Implications: § Cost-efficient sampling for performance prediction of configurable systems § Avoiding of biased estimates of causal/statistical effects