A Methodology to Predict the Performance of Distributed Simulation Systems

1
A Methodology to Predict the
Performance of Distributed
Simulation Systems
Daniele Gianni1,2, Giuseppe Iazeolla2, and Andrea D’Ambrogio2
daniele.gianni@esa.int
1European Space Agency
{iazeolla, dambro}@info.uniroma2.it
2Dept. of Computer Science
University of Rome TorVergata
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010),
May 17 – 19, 2010, Atlanta, GA, US

Methodology Objective
 A local simulator is available
 We wish to turn it into a distributed simulator
 Before implementing it (i.e. at design time) we
wonder: will its execution time be shorter that
local simulator execution time?
 (In some cases the distributed version may run
slower than the equivalent local version)
 Methodology objective: predict (at design
time) distributed simulator execution time
 If prediction meets performance requirements,
then implement it
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US 2

3
Presentation Overview
24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010), May 17 – 19, 2010, Atlanta, GA, US
 Terminology
 Problem Statement
 Methodology Presentation
• Modelling Assumptions / Execution Graph /
Performance Model / Model Parameterization
 Case of Study
• Simulated System / OMNET++ Simulator /
Model Prediction / Model Validation

Terminology
  = Simulated System
 SL() =  Local Simulator (LS)
 SD() =  Distributed Simulator (DS)
 PM(SD()) = Performance Model to predict
SD() execution time

Problem Statement
 We want to simulate 
  presents intrinsic parallelism
 Will SD() be faster than SL()?

Predictive Evaluation Process
build
SL(Σ)
SL(Σ)
exec time
OK?
Stop
identify
tentative
partitioning
build and
evaluate
prediction model
PM(SD(Σ))
pred
exec time
SD(Σ)
OK?
alternative
partitioning?
Stop
build
SD(Σ)
NO
YES
NO
YES
NO
YES

Modelling Assumptions
 Communication between federates is
based on a decentralized RTI
 Federation time management is
conservative
 Hosts’ CPUs and communication networks
are the only machinery devices affecting
the federation execution
 We consider SD() consisting of 2
federates (however the model can be
immediately generalized to n federates)

Model Building
 Identification of machinery devices and
their interconnections
 Derivation of the Execution Graph (an
extended flow chart whose blocks are
associated to device time requests)
 Derivation of the Performance Model
(Extended Queueing Network)
 Model parameterization based on data
from the RTI implementation and from the
local simulator

Execution Graph
9
START
Fork
Local
Initialization
Execution (LEXI)
HLA
Initialization
Federate2
Start RTI Interface
Federate1
C1S
SN0
C0
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT1
HLA
Execution
Wait
(HLAF-Wait)
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT1
Join
S1
C1RL
C1
SM1
C1
SM4
p1RTI
END
p1QUIT
C1RL
SC
C1RC2
From Federate 2's SM4
C1S
PoTMT1
p1NSYNC
p1SYNC
SM2
AND
AND
SM1
HLA
Initialization
C2S
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT2
HLA
Execution
Wait
(HLAF-Wait)
S2
C2RL
C2
SM1
C2
SM4
p2RTI
p2QUIT
C2RL
SC
C2RC1
C2S
PoTMT2
p2NSYNC
p2SYNC
SM2
AND
AND
SM0
C0
B1
B2
B2S2
B2S1
B2S3
PoTMT1MT1
Release
Test
END
SM3
p1SYNC
1 – p1SYNC
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT2
PoTMT2MT2
Release
Test
END
SM3
p2SYNC
1 – p2SYNC
SM1
LS Segments
HLA Segments
Local
Initialization
Execution (LEXI)

Federate 1 Execution Graph (B1)
10
Local
Execution
(LEX)
Exit
Test
HLA RTI
Service
Execution
(HLAR)
Semaphore
MT1
Join
C1
SM1
C1
SM4
p1RTI
END
p1QUIT
PoTMT1
p1NSYNC
p1SYNC
AND
C0
B1
LS Segments
HLA Segments

Federate 1 Execution Graph (B2)
HLA
Execution
Wait
(HLAF-Wait)
LEW
HLA Execution
Exit
(HLAF-Ex)
Release
Semaphore
MT1
C1RL
p1QUIT
C1RL
SC
C1RC2
SM2
AND
SM1
B2
B2S2
B2S1
B2S3
PoTMT1MT1
Release
Test
END
SM3
p1SYNC
1 – p1SYNC
LS Segments
HLA Segments

Performance Model
12
CPU1
A21
RMT1
S1
SC12
AMT1
Sink
Sink
Source
FederateAmbassador
services entry point
RTIAmbassador services invocationSimulation
System start-up
command
(initialization phase
not modelled)
FederateAmbassador
services exit point
C0
C1RL2
C1RC2
Join Sink
SN0
SN2
SN1
SN3
SN4
C1RC2
Net12
R12
CHLA1
C1
C1
SA1
PoT21
C0
PoTMT1PoTMT1
p1SYNC-FA
1-p1SYNC-FA
C1RC2
PoT12
p1QUIT
Federate 1
Fork
SN0
p1SYNC-FA
1-p1SYNC-FA
C1
C1RL2
C1
C1
SN1
CPU2
A12
RMT2
S2
SC21
AMT2
Sink
Sink
FederateAmbassador
RTIAmbassador services invocation
FederateAmbassador
services exit point
C1RL2
C2RC1
SN2
SN1
SN3
SN4
C2RC1
Net21
R21
CHLA2
C2
C2
SA2
PoT12
PoTMT2PoTMT2
p2SYNC-FA
1-p2SYNC-FA
C2RC1
PoT21
P2QUIT
Federate 2
P2SYNC-FA
1-p2SYNC-FA
C2
C2RL1
C2
C2
SN1

Performance Model - SN0
CPU1
A21
RMT1
S1
SC12
AMT1
Sink
Sink
Source
FederateAmbassador
System start-up
command
not modelled)
FederateAmbassador
services exit point
C0
C1RL2
C1RC2
Join Sink
SN2
SN1
SN3
SN4
C1RC2
Net12
R12
CHLA1
C1
C1
SA1
PoT21
C0
PoTMT1PoTMT1
p1SYNC-FA
1-p1SYNC-FA
C1RC2
PoT12
p1QUIT
Federate 1
Fork
SN0
p1SYNC-FA
1-p1SYNC-FA
C1
C1RL2
C1
C1
SN1
Federate 2
SN0

Performance Model - Federate 1 SN1
CPU1
A21
RMT1
S1
SC12
AMT1
Sink
Sink
FederateAmbassador
FederateAmbassador
services exit point
C1RL2
C1RC2
SN2
SN1
SN3
SN4
C1RC2
Net12
R12
CHLA1
C1
C1
SA1
PoT21
PoTMT1PoTMT1
p1SYNC-FA
1-p1SYNC-FA
C1RC2
PoT12
p1QUIT
Federate 1
p1SYNC-FA
1-p1SYNC-FA
C1
C1RL2
C1
C1
SN1

CPU1
A21
RMT1
S1
SC12
AMT1
Sink
Sink
FederateAmbassador
FederateAmbassador
services exit point
C1RL2
C1RC2
SN2
SN1
SN3
SN4
C1RC2
Net12
R12
CHLA1
C1
C1
SA1
PoT21
PoTMT1PoTMT1
p1SYNC-FA
1-p1SYNC-FA
C1RC2
PoT12
p1QUIT
Federate 1
p1SYNC-FA
1-p1SYNC-FA
C1
C1RL2
C1
C1
SN1

Performance Model – Job Flow
18
CPU1
A21
RMT1
S1
SC12
AMT1
Sink
Sink
Source
FederateAmbassador
System start-up
command
not modelled)
FederateAmbassador
services exit point
C0
C1RL2
C1RC2
Join Sink
SN0
SN2
SN1
SN3
SN4
C1RC2
Net12
R12
CHLA1
C1
C1
SA1
PoT21
C0
PoTMT1PoTMT1
p1SYNC-FA
1-p1SYNC-FA
C1RC2
PoT12
p1QUIT
Federate 1
Fork
SN0
p1SYNC-FA
1-p1SYNC-FA
C1
C1RL2
C1
C1
SN1
CPU2
A12
RMT2
S2
SC21
AMT2
Sink
Sink
FederateAmbassador
FederateAmbassador
services exit point
C1RL2
C2RC1
SN2
SN1
SN3
SN4
C2RC1
Net21
R21
CHLA2
C2
C2
SA2
PoT12
PoTMT2PoTMT2
p2SYNC-FA
1-p2SYNC-FA
C2RC1
PoT21
P2QUIT
Federate 2
P2SYNC-FA
1-p2SYNC-FA
C2
C2RL1
C2
C2
SN1

Model Parameterization
 Procedure to determine model parameters
from SL() and RTI (as instance of
simulation infrastructure)
 From EG, performance engineering
standard procedures can derive:
• For each class of jobs
• Service times at each center (i.e.: tCPU1, tCPU2,
tNet12, and tNet21)
• Routing probabilities (i.e.: p1QUIT, p2QUIT,
p1SYNC, and p2SYNC)

Case of Study
Simulated System (LS)

Case of Study
DS partitioning

 Hardware
• Host 1 and Host 2 connected by a MAN
• RTT = 20.5 ms
• Bandwidth = 94 KB/s
 SD() consisting of 2 federates
• Federate 1: simulates S1 and S2 and is run by
Host 1
• Federate 2: simulates S3 and S4 and is run by
Host 2
22
Case of Study
Hardware and SD() Configuration

 We consider two scenarios (A and B)
 Scenario A
light processing times to simulate S1 through S4,
specifically DCPU = 10 ms wall-clock time to for
each processing
 Scenario B:
heavy processing times to simulate S1 through S4,
specifically DCPU = 500 ms wall-clock time for each
processing
23
Case of Study
Problem Domain

 Is SD() faster than SL() in Scenario A?
 Is SD() faster than SL() in Scenario B?
24
Case of Study
Problem Statement

 Scenario A: DCPU = 10 ms (light processing)
 Scenario B: DCPU = 500 ms (heavy processing)
 In scenario B, will the parallelization speed-up
compensate the HLA overhead? I.e. is
predicted SD() execution time shorter than
SL() time?
 Can we give a YES answer in the Predictive
Evaluation Process, and then proceed to
implement SD()?
25
Case of Study
Design Question

26
OMNet++ Performance Model Implementation

27
Model prediction
Scenario
Simulation
System
Execution Time
(sec)
A
SL() 5
SD() 17
B
SL() 1400
SD() 846

Model Validation
 The execution time prediction has been
validated by:
• implementing the actual DS system, and
• measuring and comparing
 the actual execution time
with
 the predicted execution time

29
Model Validation
Scenario SD()
predicted execution time
(sec)
SD()
actual execution time
(sec)
A 17 20
B 846 880

Conclusions
 We have introduced a predictive methodology
that given
• a simulation model
• a distributed simulation infrastructure
• a model partitioning, and
• a set of distributed hosts
Enables simulator designers to predict the
distributed simulator execution time

Conclusions
 We have shown an example application to
the evaluation of the distribution of
 a local simulator SL()
over a metropolitan area network
 in the cases of
• light event processing times (Scenario A)
• heavy event processing times (Scenario B)

Conclusions
 We have validated the execution time
prediction by
• implementing the actual DS simulator
• measuring and comparing
the actual execution time with
the predicted execution time

Acknowledgements
FIRB Projects “Performance Evaluation of Complex
Systems”, “Software frameworks and technologies
for the development of open-source distributed
simulation code”, (Italian Ministry of Research and
CERTIA Research Center of the University of Roma
TorVergata); FP7 euHeart Project (European
Commission); Research Fellowship of the
European Space Agency.
Ylenia Cannone and Luca Marcheggiani for their
implementation and validation of the OMNet++
simulator.

A Methodology to Predict the Performance of Distributed Simulation Systems

Recomendados

Recomendados

Más contenido relacionado

Similar a A Methodology to Predict the Performance of Distributed Simulation Systems

Similar a A Methodology to Predict the Performance of Distributed Simulation Systems (20)

Más de Daniele Gianni

Más de Daniele Gianni (20)

Último

Último (20)

A Methodology to Predict the Performance of Distributed Simulation Systems