High Performance Computing: an Introduction for the Society of Actuaries

High Performance Computing

Adam DeConinck
R Systems NA, Inc.

1

Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.

2

Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.

“Scaling up” typically means a small server or
fast multi-core desktop.

Speedup exists, but for very large models, not
significant.

Single machines don't scale up forever.

3

For the largest models, a different approach is required.

4

High-Performance Computing involves many
distinct computer processors working together on
the same calculation.

Large problems are divided into smaller parts and
distributed among the many computers.

Usually clusters of quasi-independent computers
which are coordinated by a central scheduler.

5

Typical HPC Cluster

Login
External
connection Ethernet network

Scheduler

Computes

File Server
High-speed network
(10GigE / Infiniband)

6

Performance gains
High-end
workstation

Duration (s)

Number of cores


Performance test: stochastic finance model on R Systems cluster

High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes

Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster

No more performance gain after ~500 cores: why? Some operations can't be parallelized.

Additional cores? Run multiple models simultaneously

7

Performance comes at a price: complexity.


New paradigm: real-time analysis vs batch jobs.

Applications must be written specifically to take
advantage of distributed computing.

Performance characteristics of applications change.

Debugging becomes more of a challenge.

8

New paradigm: real-time analysis vs batch jobs.

Most small analyses are done in Large jobs are typically done in a
real time: batch model:

“At-your-desk” analysis 
Submit job to a queue

Small models only 
Much larger models

Fast iterations 
Slow iterations

No waiting for resources 
May need to wait

9

Applications must be written specifically to
take advantage of distributed
computing.

Explicitly split your problem into smaller
“chunks”

“Message passing” between processes

Entire computation can be slowed by one
or two slow chunks

Exception: “embarrassingly parallel”
problems

Easy-to-split, independent chunks of
computation

Thankfully, many useful models fall under
“Embarrassingly parallel” =
this heading. (e.g. stochastic models) No inter-process communication

10

Performance characteristics of applications change.

On a single machine: On a cluster:

CPU speed (compute) 
Single-machine metrics

Cache 
Network

Memory 
File server

Disk 
Scheduler contention

Results from other nodes

11

Debugging becomes more of a challenge.


More complexity = more pieces that can fail

Race conditions: sequence of events no longer deterministic

Single nodes can “stall” and slow the entire computation

Scheduler, file server, login server all have their own challenges

12

External resources

One solution to handling complexity: outsource it!

Historical HPC facilities: universities, national labs

Often have the most absolute compute capacity, and will sell
excess capacity

Competition with academic projects, typically do not include
SLA or high-level support

Dedicated commercial HPC facilities providing “on-demand”
compute power.

13

External HPC Internal HPC

Outsource HPC sysadmin 
Requires in-house expertise

No hardware investment 
Major investment in hardware

Pay-as-you-go 
Possible idle time

Easy to migrate to new tech 
Upgrades require new hardware

14

Internal HPC External HPC

No external contention 
No guaranteed access

All internal—easy security 
Security arrangements complex

Full control over configuration 
Limited control of configuration

Simpler licensing control 
Some licensing complex


Requires in-house expertise 
Outsource HPC sysadmin

Major investment in hardware 
No hardware investment

Possible idle time 
Pay-as-you-go

Upgrades require new hardware 
Easy to migrate to new tech

15

“The Cloud”

“Cloud computing”: virtual machines, dynamic allocation of resources in
an external resource

Lower performance (virtualization), higher flexibility

Usually no contracts necessary: pay with your credit card, get 16 nodes

Often have to do all your own sysadmin

Low support, high control

16

CASE STUDY:
Windows cluster for Actuarial
Application

17

Global insurance company


Needed 500-1000 cores on a temporary basis

Preferred a utility, “pay-as-you-go” model

Experimenting with external resources for “burst”
capacity during high-activity periods

Commercially licensed and supported application

Requested a proof of concept

18

Cluster configuration

Application embarrassingly parallel, small-to-medium data files,
computationally and memory-intensive

Prioritize computation (processors), access to fileserver over
inter-node communication, large storage

Upgraded memory in compute nodes to 2 GB/core

128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for
1024 cores total

Windows 2008 HPC R2 operating system

Application and fileserver on login node

19

Stumbling blocks

Application optimization
Customer had a wide variety of models which generated different usage
patterns. (IO, compute, memory-intensive jobs) Required dynamic
reconfiguration for different conditions.

Technical issue
Iterative testing process. Application turned out to be generating massive
fileserver contention. Had to make changes to both software and hardware.

Human processes
Users were accustomed to internal access model. Required changes both
for providers (increase ease-of-use) and users (change workflow)

Security
Customer had never worked with an external provider before. Complex
internal security policy had to be reconciled with remote access.
20

Lessons learned:


Security was the biggest delaying factor. The initial security setup took over
3 months from the first expression of interest, even though cluster setup
was done in less than a week.

Only mattered the first time though: subsequent runs started much
more smoothly.

A low-cost proof-of-concept run was important to demonstrate feasibility,
and for working the bugs out.

A good relationship with the application vendor was extremely important
to solving problems and properly optimizing the model for performance.

21

Recent developments: GPUs

22

Graphics processing units


CPU: complex, general-purpose processor

GPU: highly-specialized parallel processor, optimized for performing operations for
common graphics routines

Highly specialized → many more “cores” for same cost and space

Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core

NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core

Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU

Same operations can be adapted for non-graphics applications: “GPGPU”

23
Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/

HPC/Actuarial using GPUs

Random-number generation

Finite-difference modeling

Image processing


Numerical Algorithms Group:
GPU random-number generator

MATLAB: operations on large arrays/matrices

Wolfram Mathematica: symbolic math analysis

Data from
http://www.nvidia.com/object/computational_finan
ce.html

24

High Performance Computing: an Introduction for the Society of Actuaries

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (19)

Similar a High Performance Computing: an Introduction for the Society of Actuaries

Similar a High Performance Computing: an Introduction for the Society of Actuaries (20)

Último

Último (20)

High Performance Computing: an Introduction for the Society of Actuaries