SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
High Performance Computing

        Adam DeConinck
       R Systems NA, Inc.




        1
Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.




    2
Development of models begins at small scale.

Working on your laptop is convenient, simple.

Actual analysis, however, is slow.


“Scaling up” typically means a small server or
fast multi-core desktop.

Speedup exists, but for very large models, not
significant.

Single machines don't scale up forever.


    3
For the largest models, a different approach is required.


                    4
High-Performance Computing involves many
  distinct computer processors working together on
  the same calculation.

Large problems are divided into smaller parts and
  distributed among the many computers.

Usually clusters of quasi-independent computers
  which are coordinated by a central scheduler.


                   5
Typical HPC Cluster

             Login
External
connection                 Ethernet network




                     Scheduler



                                               Computes

                     File Server
                                              High-speed network
                                              (10GigE / Infiniband)

                      6
Performance gains
    High-end
    workstation



                    Duration (s)




                                            Number of cores


     Performance test: stochastic finance model on R Systems cluster

     High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes
      
          Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster

     No more performance gain after ~500 cores: why? Some operations can't be parallelized.

     Additional cores? Run multiple models simultaneously


                                     7
Performance comes at a price: complexity.



    New paradigm: real-time analysis vs batch jobs.

    Applications must be written specifically to take
    advantage of distributed computing.

    Performance characteristics of applications change.

    Debugging becomes more of a challenge.



                     8
New paradigm: real-time analysis vs batch jobs.




Most small analyses are done in    Large jobs are typically done in a
  real time:                          batch model:

    “At-your-desk” analysis        
                                       Submit job to a queue

    Small models only              
                                       Much larger models

    Fast iterations                
                                       Slow iterations

    No waiting for resources       
                                       May need to wait



                               9
Applications must be written specifically to
  take advantage of distributed
  computing.

    Explicitly split your problem into smaller
    “chunks”

    “Message passing” between processes

    Entire computation can be slowed by one
    or two slow chunks

    Exception: “embarrassingly parallel”
    problems

    Easy-to-split, independent chunks of
    computation

    Thankfully, many useful models fall under
                                                   “Embarrassingly parallel” =
    this heading. (e.g. stochastic models)       No inter-process communication

                              10
Performance characteristics of applications change.


On a single machine:        On a cluster:

    CPU speed (compute)     
                                Single-machine metrics

    Cache                   
                                Network

    Memory                  
                                File server

    Disk                    
                                Scheduler contention
                            
                                Results from other nodes



                   11
Debugging becomes more of a challenge.


    More complexity = more pieces that can fail

    Race conditions: sequence of events no longer deterministic

    Single nodes can “stall” and slow the entire computation

    Scheduler, file server, login server all have their own challenges




                          12
External resources

    One solution to handling complexity: outsource it!

    Historical HPC facilities: universities, national labs
    
        Often have the most absolute compute capacity, and will sell
        excess capacity
    
        Competition with academic projects, typically do not include
        SLA or high-level support

    Dedicated commercial HPC facilities providing “on-demand”
    compute power.



                           13
External HPC                      Internal HPC

    Outsource HPC sysadmin        
                                      Requires in-house expertise

    No hardware investment        
                                      Major investment in hardware

    Pay-as-you-go                 
                                      Possible idle time

    Easy to migrate to new tech   
                                      Upgrades require new hardware




                          14
Internal HPC                          External HPC

    No external contention            
                                          No guaranteed access

    All internal—easy security        
                                          Security arrangements complex

    Full control over configuration   
                                          Limited control of configuration

    Simpler licensing control         
                                          Some licensing complex


    Requires in-house expertise       
                                          Outsource HPC sysadmin

    Major investment in hardware      
                                          No hardware investment

    Possible idle time                
                                          Pay-as-you-go

    Upgrades require new hardware     
                                          Easy to migrate to new tech



                             15
“The Cloud”

    “Cloud computing”: virtual machines, dynamic allocation of resources in
    an external resource

    Lower performance (virtualization), higher flexibility

    Usually no contracts necessary: pay with your credit card, get 16 nodes

    Often have to do all your own sysadmin

    Low support, high control



                              16
CASE STUDY:
Windows cluster for Actuarial
       Application




         17
Global insurance company


    Needed 500-1000 cores on a temporary basis

    Preferred a utility, “pay-as-you-go” model

    Experimenting with external resources for “burst”
    capacity during high-activity periods

    Commercially licensed and supported application

    Requested a proof of concept

                     18
Cluster configuration

    Application embarrassingly parallel, small-to-medium data files,
    computationally and memory-intensive
    
        Prioritize computation (processors), access to fileserver over
        inter-node communication, large storage
    
        Upgraded memory in compute nodes to 2 GB/core

    128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for
    1024 cores total

    Windows 2008 HPC R2 operating system

    Application and fileserver on login node


                            19
Stumbling blocks

    Application optimization
Customer had a wide variety of models which generated different usage
  patterns. (IO, compute, memory-intensive jobs) Required dynamic
  reconfiguration for different conditions.

    Technical issue
Iterative testing process. Application turned out to be generating massive
   fileserver contention. Had to make changes to both software and hardware.

    Human processes
    Users were accustomed to internal access model. Required changes both
    for providers (increase ease-of-use) and users (change workflow)

    Security
    Customer had never worked with an external provider before. Complex
    internal security policy had to be reconciled with remote access.
                            20
Lessons learned:


    Security was the biggest delaying factor. The initial security setup took over
    3 months from the first expression of interest, even though cluster setup
    was done in less than a week.
    
        Only mattered the first time though: subsequent runs started much
        more smoothly.

    A low-cost proof-of-concept run was important to demonstrate feasibility,
    and for working the bugs out.

    A good relationship with the application vendor was extremely important
    to solving problems and properly optimizing the model for performance.



                              21
Recent developments: GPUs




       22
Graphics processing units




    CPU: complex, general-purpose processor

    GPU: highly-specialized parallel processor, optimized for performing operations for
    common graphics routines

    Highly specialized → many more “cores” for same cost and space
     
         Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core
     
         NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core

    Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU

    Same operations can be adapted for non-graphics applications: “GPGPU”

                                       23
                      Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
HPC/Actuarial using GPUs
                                                   
                                                        Random-number generation
                                                   
                                                        Finite-difference modeling
                                                   
                                                        Image processing

                                                   
                                                        Numerical Algorithms Group:
                                                        GPU random-number generator
                                                   
                                                        MATLAB: operations on large arrays/matrices
                                                   
                                                        Wolfram Mathematica: symbolic math analysis


Data from
http://www.nvidia.com/object/computational_finan
ce.html




                                                   24

Más contenido relacionado

La actualidad más candente

High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
Jason Shih
 
High performance computing
High performance computingHigh performance computing
High performance computing
Guy Tel-Zur
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
parwind
 

La actualidad más candente (20)

Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
 
Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance Computing
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is Here
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
 
Edge computing
Edge computingEdge computing
Edge computing
 
In memory computing
In memory computingIn memory computing
In memory computing
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processor
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 

Destacado

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 

Destacado (19)

High performance concrete ppt
High performance concrete pptHigh performance concrete ppt
High performance concrete ppt
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
INCITE - INtegrated Components for Interactive TEaching
INCITE - INtegrated Components for Interactive TEachingINCITE - INtegrated Components for Interactive TEaching
INCITE - INtegrated Components for Interactive TEaching
 
JAWS
JAWSJAWS
JAWS
 
High Performance Statistical Computing
High Performance Statistical ComputingHigh Performance Statistical Computing
High Performance Statistical Computing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Kalray TURBOCARD2 @ ISC'14
Kalray TURBOCARD2 @ ISC'14Kalray TURBOCARD2 @ ISC'14
Kalray TURBOCARD2 @ ISC'14
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
High Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge EconomyHigh Performance Computing: The Essential tool for a Knowledge Economy
High Performance Computing: The Essential tool for a Knowledge Economy
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
GPFS - graphical intro
GPFS - graphical introGPFS - graphical intro
GPFS - graphical intro
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Unix _linux_fundamentals_for_hpc-_b
Unix  _linux_fundamentals_for_hpc-_bUnix  _linux_fundamentals_for_hpc-_b
Unix _linux_fundamentals_for_hpc-_b
 
Parasitic Computing
Parasitic ComputingParasitic Computing
Parasitic Computing
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
Delivering Transformational Solutions to Industry by Dr. Frederick Streitz, D...
 
Biometric technology
Biometric technologyBiometric technology
Biometric technology
 

Similar a High Performance Computing: an Introduction for the Society of Actuaries

Performance testing virtualized systems v5
Performance testing virtualized systems v5Performance testing virtualized systems v5
Performance testing virtualized systems v5
Mentora
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
ChangWoo Min
 

Similar a High Performance Computing: an Introduction for the Society of Actuaries (20)

Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
B9 cmis
B9 cmisB9 cmis
B9 cmis
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made EasyMatching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
Matching Your Costs to Your DAU: Thin Client Back-End Infrastructure Made Easy
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
InTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AIInTech Event | Cognitive Infrastructure for Enterprise AI
InTech Event | Cognitive Infrastructure for Enterprise AI
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
 
Performance testing virtualized systems v5
Performance testing virtualized systems v5Performance testing virtualized systems v5
Performance testing virtualized systems v5
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
Adaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin OrchestrateAdaptive Computing Using PlateSpin Orchestrate
Adaptive Computing Using PlateSpin Orchestrate
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Best Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in EnterprisesBest Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in Enterprises
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

High Performance Computing: an Introduction for the Society of Actuaries

  • 1. High Performance Computing Adam DeConinck R Systems NA, Inc. 1
  • 2. Development of models begins at small scale. Working on your laptop is convenient, simple. Actual analysis, however, is slow. 2
  • 3. Development of models begins at small scale. Working on your laptop is convenient, simple. Actual analysis, however, is slow. “Scaling up” typically means a small server or fast multi-core desktop. Speedup exists, but for very large models, not significant. Single machines don't scale up forever. 3
  • 4. For the largest models, a different approach is required. 4
  • 5. High-Performance Computing involves many distinct computer processors working together on the same calculation. Large problems are divided into smaller parts and distributed among the many computers. Usually clusters of quasi-independent computers which are coordinated by a central scheduler. 5
  • 6. Typical HPC Cluster Login External connection Ethernet network Scheduler Computes File Server High-speed network (10GigE / Infiniband) 6
  • 7. Performance gains High-end workstation Duration (s) Number of cores  Performance test: stochastic finance model on R Systems cluster  High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes  Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster  No more performance gain after ~500 cores: why? Some operations can't be parallelized.  Additional cores? Run multiple models simultaneously 7
  • 8. Performance comes at a price: complexity.  New paradigm: real-time analysis vs batch jobs.  Applications must be written specifically to take advantage of distributed computing.  Performance characteristics of applications change.  Debugging becomes more of a challenge. 8
  • 9. New paradigm: real-time analysis vs batch jobs. Most small analyses are done in Large jobs are typically done in a real time: batch model:  “At-your-desk” analysis  Submit job to a queue  Small models only  Much larger models  Fast iterations  Slow iterations  No waiting for resources  May need to wait 9
  • 10. Applications must be written specifically to take advantage of distributed computing.  Explicitly split your problem into smaller “chunks”  “Message passing” between processes  Entire computation can be slowed by one or two slow chunks  Exception: “embarrassingly parallel” problems  Easy-to-split, independent chunks of computation  Thankfully, many useful models fall under “Embarrassingly parallel” = this heading. (e.g. stochastic models) No inter-process communication 10
  • 11. Performance characteristics of applications change. On a single machine: On a cluster:  CPU speed (compute)  Single-machine metrics  Cache  Network  Memory  File server  Disk  Scheduler contention  Results from other nodes 11
  • 12. Debugging becomes more of a challenge.  More complexity = more pieces that can fail  Race conditions: sequence of events no longer deterministic  Single nodes can “stall” and slow the entire computation  Scheduler, file server, login server all have their own challenges 12
  • 13. External resources  One solution to handling complexity: outsource it!  Historical HPC facilities: universities, national labs  Often have the most absolute compute capacity, and will sell excess capacity  Competition with academic projects, typically do not include SLA or high-level support  Dedicated commercial HPC facilities providing “on-demand” compute power. 13
  • 14. External HPC Internal HPC  Outsource HPC sysadmin  Requires in-house expertise  No hardware investment  Major investment in hardware  Pay-as-you-go  Possible idle time  Easy to migrate to new tech  Upgrades require new hardware 14
  • 15. Internal HPC External HPC  No external contention  No guaranteed access  All internal—easy security  Security arrangements complex  Full control over configuration  Limited control of configuration  Simpler licensing control  Some licensing complex  Requires in-house expertise  Outsource HPC sysadmin  Major investment in hardware  No hardware investment  Possible idle time  Pay-as-you-go  Upgrades require new hardware  Easy to migrate to new tech 15
  • 16. “The Cloud”  “Cloud computing”: virtual machines, dynamic allocation of resources in an external resource  Lower performance (virtualization), higher flexibility  Usually no contracts necessary: pay with your credit card, get 16 nodes  Often have to do all your own sysadmin  Low support, high control 16
  • 17. CASE STUDY: Windows cluster for Actuarial Application 17
  • 18. Global insurance company  Needed 500-1000 cores on a temporary basis  Preferred a utility, “pay-as-you-go” model  Experimenting with external resources for “burst” capacity during high-activity periods  Commercially licensed and supported application  Requested a proof of concept 18
  • 19. Cluster configuration  Application embarrassingly parallel, small-to-medium data files, computationally and memory-intensive  Prioritize computation (processors), access to fileserver over inter-node communication, large storage  Upgraded memory in compute nodes to 2 GB/core  128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for 1024 cores total  Windows 2008 HPC R2 operating system  Application and fileserver on login node 19
  • 20. Stumbling blocks  Application optimization Customer had a wide variety of models which generated different usage patterns. (IO, compute, memory-intensive jobs) Required dynamic reconfiguration for different conditions.  Technical issue Iterative testing process. Application turned out to be generating massive fileserver contention. Had to make changes to both software and hardware.  Human processes Users were accustomed to internal access model. Required changes both for providers (increase ease-of-use) and users (change workflow)  Security Customer had never worked with an external provider before. Complex internal security policy had to be reconciled with remote access. 20
  • 21. Lessons learned:  Security was the biggest delaying factor. The initial security setup took over 3 months from the first expression of interest, even though cluster setup was done in less than a week.  Only mattered the first time though: subsequent runs started much more smoothly.  A low-cost proof-of-concept run was important to demonstrate feasibility, and for working the bugs out.  A good relationship with the application vendor was extremely important to solving problems and properly optimizing the model for performance. 21
  • 23. Graphics processing units  CPU: complex, general-purpose processor  GPU: highly-specialized parallel processor, optimized for performing operations for common graphics routines  Highly specialized → many more “cores” for same cost and space  Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core  NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core  Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU  Same operations can be adapted for non-graphics applications: “GPGPU” 23 Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
  • 24. HPC/Actuarial using GPUs  Random-number generation  Finite-difference modeling  Image processing  Numerical Algorithms Group: GPU random-number generator  MATLAB: operations on large arrays/matrices  Wolfram Mathematica: symbolic math analysis Data from http://www.nvidia.com/object/computational_finan ce.html 24