RAMSES: Robust Analytic Models for Science at Extreme Scales

Gagan Agarwal1* Prasanna Balaprakash2 Ian Foster2* Raj Kettimuthu2
Sven Leyffer2 Vitali Morozov2 Todd Munson2 Nagi Rao3*
Saday Sadayappan1 Brad Settlemyer3 Brian Tierney4* Don Towsley5*
Venkat Vishwanath2 Yao Zhang2
1 Ohio State University 2 Argonne National Laboratory
3 Oak Ridge National Laboratory 4 ESnet 5 UMass Amherst (* Co-PIs)
Advanced Scientific Computing Research
Program manager: Rich Carlson ♦︎

2
Prediction, explanation, & optimization are
challenging for even “simple” E2E workflows
Source
data
store
Desti-nation
data
store
Wide
Area
Network
For example, file transfer, for which we want to:
• Predict achievable throughput for a specific configuration
• Explain factors influencing performance
• Optimize parameter values to achieve high speeds

3
Prediction, explanation, & optimization are
challenging for even “simple” E2E workflows
Application
OS
FS Stack
HBA/HCA
Router
LAN
Switch
Source
data
transfer
node
TCP
IP
NIC
Application
OS
Router TCP
FS Stack
HBA/HCA
LAN
Switch
IP
NIC
Storage Array
Wide
Area
Network
OST
MDT
Lustre
file
system
Destination
data transfer
node
OSS
OSS
MDS
MDS
+ diverse environments
+ diverse workloads
+ contention

85 Gbps sustained disk-to-disk over 100
Gbps network, Ottawa—New Orleans
4
Raj Kettiumuthu
and team,
Argonne

High-speed transfers to/from AWS cloud,
via Globus transfer service
• UChicago  AWS S3 (US region): Sustained 2 Gbps
– 2 GridFTP servers, GPFS file system at UChicago
– Multi-part upload via 16 concurrent HTTP connections
• AWS  AWS (same region): Sustained 5 Gbps
5
go#s3

6
One Advanced
Photon Source
data node:
125 destinations

How to create more accurate, useful, and
portable models of such systems?
Simple analytical model:
T= α+ β*l
[startup cost + sustained bandwidth]
Experiment + regression
to estimate α, β
10
First-principles modeling
to better capture details
of system & application
components
Data-driven modeling to
learn unknown details of
system & application
components
Model
composition
Model, data
comparison

The RAMSES vision
To develop a new science of end-to-end
analytical performance modeling that will
transform understanding of the behavior of
science workflows in extreme-scale science
environments.
Based on integration of first-principles and
data-driven modeling, and structured
approach to model evaluation & composition
11

The RAMSES research agenda & platform
Modeling
Develop, evaluate,
and refine component
and end-to-end models
Tools
Develop easy-to-use
tools to provide end-users
with actionable
advice
Estimation
Develop and apply data-driven
estimation methods:
differential regression,
surrogate models,
etc.
Experiments
Extensive, automated
Databas
experiments to test models
& build database
12
Evaluators Advisor
e
Estimators Tester

We are informed by five challenge workflows
13
Transfer: High-performance, end-to-end
file transfer
Scattering: Capture and analysis of
diffuse scattering experimental data
MapReduce: Data-intensive, distributed
data analytics
Exascale: Performance of exascale
application kernels on memory hierarchies
In-situ: Configuration and placement of in-situ
analysis computations

Transfer: End-to-end file movement
Storage Array
14
Application
OS
FS Stack
HBA/HCA
Router
LAN
Switch
Source
data
transfer
node
TCP
IP
NIC
Application
OS
TCP
IP
FS Stack
HBA/HCA
Router
LAN
Switch
NIC
Wide
Area
Network
Predict: Throughput for configuration
Explain: Factors influencing performance
Optimize: Parameters for high speeds
OST
MDT
Lustre
file
system
Destination
data transfer
node
OSS
OSS
MDS
MDS

Scattering: Linking simulation and
experiment to study disordered structures
Diffuse scattering images from Ray Osborn et al., Argonne
Experimental Sample
scattering
Material
composition
Simulated
structure
Simulated
scattering
La 60%
Sr 40%
Detect errors
(secs—mins)
Knowledge base
Past experiments;
simulations; literature;
expert knowledge
Select experiments
(mins—hours)
Contribute to knowledge base
Simulations driven by
experiments (mins—days)
Knowledge-driven
decision making
Evolutionary optimization

Immediate assessment of alignment quality in
near-field high-energy diffraction microscopy
1
Blue Gene/Q
Orthros
(All data in NFS)
3: Generate
Parameters
FOP.c
50 tasks
25s/task
¼ CPU hours
Uses Swift/K
Dataset
360 files
4 GB total
1: Median calc
75s (90% I/O)
MedianImage.c
Uses Swift/K
2: Peak Search
15s per file
ImageProcessing.c
Uses Swift/K
Reduced
Dataset
360 files
5 MB total
feedback to experiment
Detector
4: Analysis Pass
FitOrientation.c
60s/task (PC)
1667 CPU hours
60s/task (BG/Q)
1667 CPU hours
Uses Swift/T
GO Transfer
Up to
2.2 M CPU hours
per week!
ssh
Globus Catalog
Scientific Metadata
Workflow Workflow Progress
Control
Script
Bash
Manual
This is a
single
workflow
3: Convert bin L
to N
2 min for all files,
convert files to
Network Endian
format
Before
After
Hemant Sharma, Justin Wozniak, Mike Wilde, Jon Almer

MapReduce: Distributing data and
computation for data analytics
Job Assignment
...
...
Data
Slaves
Master
Local Cluster
Local
Reduction
...
...
Data
Slaves
Master
Cloud
Environment
Job Assignment
Local
Reduction
Index
17
Remote data
analysis
Job
assignment
Global
reduction

Exascale simulation
18
Images Courtesy: Joseph Insley (Argonne)
HACC Cosmology
• Compute intensive phase with
regular stride one access
• Tree walk phase: irregular
memory access with high
branching and integer ops
• 3D FFT communication intensive
phase
• I/O Phase
Nek5000 CFD
• Matrix vector product phase
• Conjugate gradient iteration
• Communication phase
involving nearest neighbor
exchange and vector
reductions

In situ analysis on the DOE Leadership
Compute
Resource
(Multi
Petaflop,
High Radix
Interconnect
Dragonfly,
5D Torus)
Computing Infrastructure
I/O
Nodes
Switch
Complex
Analysis
Nodes/Cluster
(IB) File Server
Nodes
Storage System
1536
GB/s
DTN Nodes
We need to perform the right computation at
the right place and time, taking into account
details of the simulation, resources, and analysis
1
2
3
4

A diverse set of components
Server
Parallel
computer
Router
Storage system
LAN
WAN
TCP, UDT
GridFTP
File systems
GridFTP server
NECbone
HACCbone
Checksum
Encryption
MapReduce
Other apps
Transfer Y Y Y Y Y Y Y Y Y Y Y
Scattering Y Y Y Y Y Y Y Y
Exascale Y Y Y Y Y Y
Distributed
MapReduce Y Y Y Y Y Y Y Y Y
In-Situ Y Y Y Y Y Y Y Y
20

Develop, evaluate, and refine
component and end-to-end
models
• Models from the literature
• Fluid models for network flows
• SKOPE modeling
system
21
Develop and apply
data-driven
estimation methods
• Differential regression
• Surrogate models
• Other methods from literature
Develop easy-to-use tools to
provide end-users with
actionable advice
• Runtime advisor, integrated
with Globus transfer system
Automated experiments to
test models and build
database
• Experiment design
• Testbeds

Overview Input Output
Workload input
Code
skeletons
Parser
Per-function
intermediate repr.
(Block Skeleton Trees)
Behavior
modeling engine
Execution-based
intermediate repr.
(Bayesian execution tree)
Transformation
engine
Performance
projection
Characterization
engine
Transformed
Bayesian execution
tree
Hardware model
system
specifications
Performance
projection
Schema for
suggested
tranformations
Synthesized
characteristics
Source code
User Effort
(semi-automated with
a source-to-source
translator)
Automatic
SKOPE language
Back end Front end
Bottleneck analysis
SKOPE
performance
modeling
framework

Differential regression for combining
data from different sources
Example of use: Predict performance on connection length L
not realizable on physical infrastructure
E.g., IB-RDMA or HTCP throughput on 900-mile connection
1) Make multiple measurements of performance on path lengths d:
– Ms(d): OPNET simulation
– ME(d): ANUE-emulated path
– MU(di): Real network (USN)
2) Compute measurement regressions on d: ṀA(.), A∈{S, E, U}
3) Compute differential regressions: ΔṀA,B(.) = ṀA(.) - ṀB(.), A, B∈{S, E, U}
4) Apply differential regression to obtain estimates, C∈{S, E}
퓜U(d) = MC(d) - ΔṀC,U(d)
simulated/emulated measurements point regression estimate

We will extend the differential regression
method in several areas
• To compare different component models
– E.g., different models of network elements, storage
systems, protocol implementations
• To compare different composite models
– E.g., different methods for combining memory and
CPU models
• To compare model outputs with measurements
24

Component model
component
System
parameters
Task size
parameters
i
cost
terms
performance
quality model
p i
si
Experiment design
(active learning)
Analytical
and
empirical
models
ˆQ
i ( pi ,si ) is a regression
estimate of

End-to-end profile composition
Source LAN
profile
WAN
profile
Destination LAN
profile
Configuration for
host and edge
devices
Configuration
for WAN
devices
Configuration for
host and edge
devices
composition
operations

End-to-end model composition & analysis
• End-to-end model using composition
– It is an approximation: due to component interactions
not modelled by the composition operator
• Actual end-to-end performance model
– Component models are “corrected” to account for un-modelled
effects: this form is assumed to exist
27

Using end-to-end measurements and differential
regression to correct regression estimates
• Regression estimate of composed model:
– “Estimated”, since components models are “incomplete”
as derived from first principles and/or measurements
• Error due to regression estimate:
• Error can be mitigated using measurements:
Corrected estimate of :
28
Q p,s ( )Å ˆQ
p,s ( ) = Q p,s ( )- ˆQ
p,s ( ) éë
ùû
2
ˆ (p, ) Qs
Qp,s
ˆQ
p,s ( ) = ˆQ
p,s ( )+ ˆD
(p,s)
Analytical
model
Correction from differential
regression using
measurements

Performance guarantees
• Vapnik-Chervonenkis theory: under finite VC-dim(F)
P I ˆD, ˆQ, p ( )- I D*, ˆQ, p ( ) >e { } <d F,l,e ( )
Estimated Optimal
– Guarantees that error of regression estimate is close to
optimal with a certain probability
– Distribution-free: does not require detailed knowledge
of error distributions – uses end-to-end measurements
• Error of the corrected estimate:
29
i p
I D, ˆQ
( , p) = Qp,s - ˆQ
p,s ( )- D p,s ( ) éë
ùû
ò dPQp,s

Surrogate modeling framework
to inform choice of experiments
30
Machine learning &
optimization
Performance
metrics
Informative
configurations
First-principles models
Evaluation

Fluid models of network flows
GridFTP flow i, parallelism ki
dT k T t
i i i
 
2
dt R k
Bottleneck router
T t p t
dt     
Solve for throughputs, and
transfer delays
Special case: known p
31
GridFTP flow i:
RTT Ri
Throughput Ti
Bottleneck
router:
Capacity C
Loss rate p
{ 0} 1Q j
j
dQ
C T
i
i
i
k
T
R p

( )
( ) ( )
2
i
i i

32
Model composition
Analytical
models
Performance projections
Regression
models
Experiments Historical logs
Emulators
Code skeletons
SKOPE
language
Workload
parameters
Source
code
Benchmarks
Simulators
SKOPE
System models
(current or future)
Application behavior
models
Our
multi-modal
approach

33
File transfer performance projections
System models Application behavior
Application
to file
transfer
Model composition
Analytical
models
Regression
models
Code skeletons
SKOPE
language
Workload
parameters
Source
code
SKOPE
models
Storage, TCP, WAN
iperf
GridFTP
Emulators XDD

34
Exascale simulation perf. projections
System models Application behavior
Compute, memory, models
Model composition
Analytical
models
Regression
models
Code skeletons
SKOPE
language
Workload
parameters
Source
code
SKOPE
interconnect
MPI
benchmarks
Stream
DGEMM IOR
corresponding CPU of a code skeleton is int roduced in the comment is not discussed in further L ist ing 1: Mat Mul ’ s CPU 1 f l oat A[ N] [ K] , B[ K] [ M] ;
f l oat C[ N] [ M] ;
3 i nt i , j , k ;
f or ( i =0; i <N; ++i ) {
5 f or ( j =0; j <M; ++j ) {
f l oat sum = 0;
7 f or ( k =0; k <K; ++k) {
sum+=A[ i ] [ k] * B[ k ] [ j ] ;
9 }
C[ i ] [ j ] = sum;
11 }
L ist ing 2: Mat Mul ’ s code skele-t
on
1 f l oat A[ N] [ K]
f l oat B[ K] [ M]
3 f l oat C[ N] [ M]
/ * t he l oop space * /
5 par al l el _f or ( N, M)
: i , j
7 {
/ * comput at i on w/ t
9 * i nst r uc t i on count
* /
11 comp 1
/ * st r eami ng l oop * /
13 st r eam k = 0: K {
/ * l oad * /
15 l d A[ i ] [ k ]
l d B[ k ] [ j ]
17 comp 3
}
19 comp 5
/ * st or e * /
21 st C[ i ] [ j ]
}
The following informat a computat ional kernel.
Dat a par al lel ism homoge-neous
tasks repeated express data parallelism the innermost parallel A task corresponds f or loop. I t is expressed computat ion.
Dat a accesses are oper-at
ions. The accessed in-dices,
array sizes, and be expressed as well; are random unless users and List ing 6).
Application
to exascale
simulation

A performance database
• We aim to collect instrumentation data in a
central database to simplify model validation
• We plan to use the perfSONAR measurement
archive tool as a starting point
– REST API on top of Cassandra and Postgres
– Optimized for time series data
– Will extend as needed
– http://software.es.net/esmond/
35

Application to transfer optimization
36
Performance
predictor
Parameter
database
Performance
analyst
Model
refiner
User
feedback
agent
Globus
(1) Transfer service
description
(3) Transfer
performance
(4) User
feedback
(2)
Prediction
Prediction
Analysis
Analysis
Parameter
update

Summary
• We focus on the science of modeling: integration
of first-principles and data-driven models; model
composition and evaluation
• Our challenge applications span a broad
spectrum of DOE resources and disciplines
• We see big opportunities for cooperation: e.g.,
on development and evaluation of component
models
37

Thanks, and for more information
• Thanks to our sponsors:
Advanced Scientific Computing Research
Program manager: Rich Carlson
• Thanks to my RAMSES project co-participants
• For more information, please see
https://sites.google.com/site/ramsesdoeproject/
ianfoster.org and @ianfoster 38

RAMSES: Robust Analytic Models for Science at Extreme Scales

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (17)

Similar a RAMSES: Robust Analytic Models for Science at Extreme Scales

Similar a RAMSES: Robust Analytic Models for Science at Extreme Scales (20)

Más de Ian Foster

Más de Ian Foster (20)

Último

Último (20)

RAMSES: Robust Analytic Models for Science at Extreme Scales

Notas del editor