A Methodology to Predict the Performance of Distributed Simulation Systems
1. 1
A Methodology to Predict the
Performance of Distributed
Simulation Systems
Daniele Gianni1,2, Giuseppe Iazeolla2, and Andrea D’Ambrogio2
daniele.gianni@esa.int
1European Space Agency
{iazeolla, dambro}@info.uniroma2.it
2Dept. of Computer Science
University of Rome TorVergata
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010),
May 17 – 19, 2010, Atlanta, GA, US
2. Methodology Objective
A local simulator is available
We wish to turn it into a distributed simulator
Before implementing it (i.e. at design time) we
wonder: will its execution time be shorter that
local simulator execution time?
(In some cases the distributed version may run
slower than the equivalent local version)
Methodology objective: predict (at design
time) distributed simulator execution time
If prediction meets performance requirements,
then implement it
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 2
3. 3
Presentation Overview
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
Terminology
Problem Statement
Methodology Presentation
• Modelling Assumptions / Execution Graph /
Performance Model / Model Parameterization
Case of Study
• Simulated System / OMNET++ Simulator /
Model Prediction / Model Validation
4. Terminology
= Simulated System
SL() = Local Simulator (LS)
SD() = Distributed Simulator (DS)
PM(SD()) = Performance Model to predict
SD() execution time
424th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
5. Problem Statement
We want to simulate
presents intrinsic parallelism
Will SD() be faster than SL()?
524th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
6. Predictive Evaluation Process
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 6
build
SL(Σ)
SL(Σ)
exec time
OK?
Stop
identify
tentative
partitioning
build and
evaluate
prediction model
PM(SD(Σ))
pred
exec time
SD(Σ)
OK?
alternative
partitioning?
Stop
build
SD(Σ)
NO
YES
NO
YES
NO
YES
7. Modelling Assumptions
Communication between federates is
based on a decentralized RTI
Federation time management is
conservative
Hosts’ CPUs and communication networks
are the only machinery devices affecting
the federation execution
We consider SD() consisting of 2
federates (however the model can be
immediately generalized to n federates)
724th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
8. Model Building
Identification of machinery devices and
their interconnections
Derivation of the Execution Graph (an
extended flow chart whose blocks are
associated to device time requests)
Derivation of the Performance Model
(Extended Queueing Network)
Model parameterization based on data
from the RTI implementation and from the
local simulator
824th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
9. Execution Graph
9
START
Fork
Local
Initialization
Execution (LEXI)
HLA
Initialization
Federate2
Start RTI Interface
Federate1
C1S
SN0
C0
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT1
HLA
Execution
Wait
(HLAF-Wait)
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT1
Join
S1
C1RL
C1
SM1
C1
SM4
p1RTI
END
p1QUIT
C1RL
SC
C1RC2
From Federate 2's SM4
C1S
PoTMT1
p1NSYNC
p1SYNC
SM2
AND
AND
SM1
HLA
Initialization
C2S
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT2
HLA
Execution
Wait
(HLAF-Wait)
S2
C2RL
C2
SM1
C2
SM4
p2RTI
p2QUIT
C2RL
SC
C2RC1
From Federate 1's SM4
C2S
PoTMT2
p2NSYNC
p2SYNC
SM2
AND
AND
SM0
C0
B1
B2
B2S2
B2S1
B2S3
PoTMT1MT1
Release
Test
END
SM3
p1SYNC
1 – p1SYNC
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT2
PoTMT2MT2
Release
Test
END
SM3
p2SYNC
1 – p2SYNC
SM1
LS Segments
HLA Segments
Local
Initialization
Execution (LEXI)
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
10. Federate 1 Execution Graph (B1)
10
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT1
Join
C1
SM1
C1
SM4
p1RTI
END
p1QUIT
PoTMT1
p1NSYNC
p1SYNC
AND
C0
B1
LS Segments
HLA Segments
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
11. Federate 1 Execution Graph (B2)
1124th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
HLA
Execution
Wait
(HLAF-Wait)
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT1
C1RL
p1QUIT
C1RL
SC
C1RC2
From Federate 2's SM4
SM2
AND
SM1
B2
B2S2
B2S1
B2S3
PoTMT1MT1
Release
Test
END
SM3
p1SYNC
1 – p1SYNC
LS Segments
HLA Segments
19. Model Parameterization
Procedure to determine model parameters
from SL() and RTI (as instance of
simulation infrastructure)
From EG, performance engineering
standard procedures can derive:
• For each class of jobs
• Service times at each center (i.e.: tCPU1, tCPU2,
tNet12, and tNet21)
• Routing probabilities (i.e.: p1QUIT, p2QUIT,
p1SYNC, and p2SYNC)
1924th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
20. Case of Study
Simulated System (LS)
2024th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
21. Case of Study
DS partitioning
2124th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
22. Hardware
• Host 1 and Host 2 connected by a MAN
• RTT = 20.5 ms
• Bandwidth = 94 KB/s
SD() consisting of 2 federates
• Federate 1: simulates S1 and S2 and is run by
Host 1
• Federate 2: simulates S3 and S4 and is run by
Host 2
22
Case of Study
Hardware and SD() Configuration
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
23. We consider two scenarios (A and B)
Scenario A
light processing times to simulate S1 through S4,
specifically DCPU = 10 ms wall-clock time to for
each processing
Scenario B:
heavy processing times to simulate S1 through S4,
specifically DCPU = 500 ms wall-clock time for each
processing
23
Case of Study
Problem Domain
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
24. Is SD() faster than SL() in Scenario A?
Is SD() faster than SL() in Scenario B?
24
Case of Study
Problem Statement
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
25. Scenario A: DCPU = 10 ms (light processing)
Scenario B: DCPU = 500 ms (heavy processing)
In scenario B, will the parallelization speed-up
compensate the HLA overhead? I.e. is
predicted SD() execution time shorter than
SL() time?
Can we give a YES answer in the Predictive
Evaluation Process, and then proceed to
implement SD()?
25
Case of Study
Design Question
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
26. 26
OMNet++ Performance Model Implementation
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
28. Model Validation
The execution time prediction has been
validated by:
• implementing the actual DS system, and
• measuring and comparing
the actual execution time
with
the predicted execution time
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 28
29. 29
Model Validation
Scenario SD()
predicted execution time
(sec)
SD()
actual execution time
(sec)
A 17 20
B 846 880
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
30. Conclusions
We have introduced a predictive methodology
that given
• a simulation model
• a distributed simulation infrastructure
• a model partitioning, and
• a set of distributed hosts
Enables simulator designers to predict the
distributed simulator execution time
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 30
31. Conclusions
We have shown an example application to
the evaluation of the distribution of
a local simulator SL()
over a metropolitan area network
in the cases of
• light event processing times (Scenario A)
• heavy event processing times (Scenario B)
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 31
32. Conclusions
We have validated the execution time
prediction by
• implementing the actual DS simulator
• measuring and comparing
the actual execution time with
the predicted execution time
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 32
33. Acknowledgements
3324th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
FIRB Projects “Performance Evaluation of Complex
Systems”, “Software frameworks and technologies
for the development of open-source distributed
simulation code”, (Italian Ministry of Research and
CERTIA Research Center of the University of Roma
TorVergata); FP7 euHeart Project (European
Commission); Research Fellowship of the
European Space Agency.
Ylenia Cannone and Luca Marcheggiani for their
implementation and validation of the OMNet++
simulator.