SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
April 2011




High Performance Computing
Challenges on the Road to Exascale Computing




H. J. Schick

IBM Germany Research & Development GmbH        © 2011 IBM Corporation
Agenda

 Introduction


 The What and Why of High Performance Computing


 Exascale Challenges


 Balanced Systems


 Blue Gene Architecture and Blue Gene Active Storage


 Supercomputers in a Sugar Cube




2                                                       © 2011 IBM Corporation
© 2011 IBM Corporation
Origination of the “Jugene” Supercomputer




4                                           © 2011 IBM Corporation
Supercomputer Satisfies Need for FLOPS


 FLOPS = FLoating point OPerations per Second.
    – Mega=106, Giga=109, Tera=1012, Peta=1015, Exa=1018


 Simulation is a major application area.


 Many simulations based on the notion of “timestep”.
    – At each timestep, advance the constituent parts according to their physics or
      chemistry.
    – Example Challenge:
      Molecular dynamics has picosecond=10-12 timescale,
      but many biology processes have millisecond=10-3 timescale.
       • Simulation has 109 timesteps!
         Each timestep requires many operations!



5                                                                          © 2011 IBM Corporation
Simulation Pseudo-code:

// Initialize state of atoms.


While time < 1 millisecond {
    // Calculate forces on 40,000 atoms.
    // Calculate velocities of all atoms.
    // Advance position of all atoms.
    time = time + 1picosecond
}


// Write biology result.




6                                           © 2011 IBM Corporation
Supercomputing is Capability Computing

 A single instance of an application using large tightly-coupled computer resources.
   – For example, a single 1000-year climate simulation.




 Contrast to Capacity Computing:
   – Many instances of one or more applications using large loosely-coupled computer
     resources.
   – For example, 1000 independent 1-year climate simulations.
   – Often trivial parallelism.
     Often suited for GRID or SETI@Home-style systems.




7                                                                                © 2011 IBM Corporation
Supercomputer Versus Your Desktop

 Assume 2000-processor supercomputer delivers simulation result in 1 day.


 Assuming memory-size is not a problem, then your 1-processor desktop would deliver same
  result in 2000 days = 5 years.


 So supercomputers make results available on a human timescale.




8                                                                              © 2011 IBM Corporation
But what could you do if all objects were   intelligent…




                              …and connected?


9                                                © 2011 IBM Corporation
What could you do with
unlimited computing power…
for pennies?




Could you predict the path of a
storm down to the square
kilometer?                        Could you identify another 20%
                                  of proven oil reserves without
                                  drilling one hole?   © 2011 IBM Corporation
Grand Challenges


“A grand challenge is a fundamental problem in science or
engineering, with broad applications, whose solution would be
enabled by the application of high performance computing resources
that could become available in the near future.”




     Computational fluid dynamics           Electronic structure     Calculations to
                                            calculations for the     understand the
     • Design of hypersonic aircraft,       design of new            fundamental nature
       efficient automobile bodies, and     materials:               of matter:
       extremely quiet submarines.
     • Weather forecasting for short and    • Chemical catalysts     • Quantum
       long term effects.                   • Immunological agents     chromodynamics
     • Efficient recovery of oil, and for   • Superconductors        • Condensed matter
       many other applications.                                        theory



11                                                                                  © 2011 IBM Corporation
Enough Atoms to See Grains in Solidification of Metal
http://www-phys.llnl.gov/Research/Metals_Alloys/news.html




12                                                          © 2011 IBM Corporation
Building Blocks of Matter




 QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)


 Quarks are the constituents of matter which strongly interact exchanging gluons.


 Particular phenomena
     – Confinement
     – Asymptotic freedom (Nobel Prize 2004)


 Theory of strong interactions = Quantum Chromodynamics (QCD)




13                                                                                   © 2011 IBM Corporation
Projected Performance Development




                   Almost a doubling every year !!!


14                                                    © 2011 IBM Corporation
Extrapolating an Exaflop in 2018
Standard technology scaling will not get us there in 2018

                       BlueGene/L   Exaflop       Exaflop compromise   Assumption for “compromise guess”
                       (2005)       Directly      using traditional
                                    scaled        technology
Node Peak Perf         5.6GF        20TF          20TF                 Same node count (64k)

hardware               2            8000          1600                 Assume 3.5GHz
concurrency/node

System Power in        1 MW         3.5 GW        25 MW                Expected based on technology improvement through 4 technology generations. (Only
Compute Chip                                                           compute chip power scaling, I/Os also scaled same way)

Link Bandwidth         1.4Gbps      5 Tbps        1 Tbps               Not possible to maintain bandwidth ratio.
(Each unidirectional
3-D link)

Wires per              2            400 wires     80 wires              Large wire count will eliminate high density and drive links onto cables where they are
unidirectional 3-D                                                     100x more expensive. Assume 20 Gbps signaling
link

Pins in network on     24 pins      5,000 pins    1,000 pins           20 Gbps differential assumed. 20 Gbps over copper will be limited to 12 inches. Will need
node                                                                   optics for in rack interconnects.
                                                                       10Gbps now possible in both copper and optics.

Power in network       100 KW       20 MW         4 MW                 10 mW/Gbps assumed.
                                                                       Now: 25 mW/Gbps for long distance (greater than 2 feet on copper) for both ends one
                                                                       direction. 45mW/Gbps optics both ends one direction. + 15mW/Gbps of electrical
                                                                       Electrical power in future: separately optimized links for power.


Memory                 5.6GB/s      20TB/s        1 TB/s               Not possible to maintain external bandwidth/Flop
Bandwidth/node
L2 cache/node          4 MB         16 GB         500 MB               About 6-7 technology generations with expected eDRAM density improvements

Data pins associated    128 data    40,000 pins   2000 pins            3.2 Gbps per pin
with memory/node       pins

Power in memory I/O    12.8 KW      80 MW         4 MW                 10 mW/Gbps assumed. Most current power in address bus.
(not DRAM)                                                             Future probably about 15mW/Gbps maybe get to 10mW/Gbps (2.5mW/Gbps is c*v^2*f for
                                                                       random data on data pins) Address power is higher.

15                                                                                                                                         © 2011 IBM Corporation
The Big Leap from Petaflops to Exaflops

 We will hit 20 Petaflop in 2011/2012 …. Now beginning research for ~2018 Exascale.


 IT/CMOS industry is trying to double performance every 2 years.
  HPC industry is trying to double performance every year.


 Technology disruptions in many areas.


     – BAD NEWS: Scalability of current technologies?
        • Silicon Power, Interconnect, Memory, Packaging.

     – GOOD NEWS: Emerging technologies?
        • Memory technologies (e.g. storage class memory), 3D-chips, etc.


 Exploiting exascale machines.
   – Want to maximize science output per €.
   – Need multiple partner applications to evaluate HW trade-offs.



16                                                                              © 2011 IBM Corporation
Exascale Challenges – Energy

 Power consumption will increase in the future!


 What is the critical limit?
   – JSC has 5 MW, potential of 10 MW
   – 1 MW is 1 M€ / year
   – 20 MW expected to be the critical limit


 Are Exascale systems a Large Scale Facility?
   – LHC uses 100 MW


 Energy efficiency
   – Cooling uses significant fraction (PUE > 1.2 today → 1.0)
   – Hot cooling water (40°C and more) might help
   – Free cooling: use free air to cool water
   – Heat recycling: use waste heat for heating, cooling, etc.




17                                                               © 2011 IBM Corporation
Exascale Challenges – Resiliency

 Ever increasing number of components
   – O(10000) nodes
   – O(100000) DIMMs of RAM


 Each component's MTBF will not increase
   – Optimistic: Remains constant
   – Realistic: Smaller structures, lower voltages → decrease


 Global MTBF will decrease
   – Critical limit? 1 day? 1 hour? Time to write checkpoint!


 How to handle failures
   – Try to anticipate failures via monitoring
   – Software must help to handle failures
       • checkpoints, process-migration, transactional computing




18                                                                 © 2011 IBM Corporation
Exascale Challenges – Applications

 Ever increasing levels of parallelism
   – Thousands of nodes, hundreds of cores, dozens of registers
   – Automatic parallelization vs. explicit exposure
   – How large are coherency domains?
   – How many languages do we have to learn?


 MPI + X most probably not sufficient
   – 1 process / core makes orchestration of processes harder
   – GPUs require explicit handling today (CUDA, OpenCL)


 What is the future paradigm
   – MPI + X + Y? PGAS + X (+Y)?
   – PGAS: UPC, Co-Array Fortran, X10, Chapel, Fortress, …


 Which applications are inherently scalable enough at all?




19                                                                © 2011 IBM Corporation
Balanced Systems




 Example caxpy:




 Processor         FPU throughput    Memory bandwidth


                   [FLOPS / cycle]    [words / cycle]   [FLOPS / word]

 apeNEXT                 8                   2                4
 QCDOC (MM)              2                 0.63              3.2
 QCDOC (LS)              2                   2                1
 Xeon                    2                 0.29               7
 GPU                   128 x 2            17.3 (*)           14.8
 Cell/B.E. (MM)         8x4                  1               32
 Cell/B.E. (LS)         8x4                8x4                1


20                                                            © 2011 IBM Corporation
Balanced Systems ???




21                     © 2011 IBM Corporation
… but are they Reliable, Available and Serviceable ???




22                                                       © 2011 IBM Corporation
Blue Gene/P




23            © 2011 IBM Corporation
Blue Gene/P                                                                                       System
                                                                                             72 Racks, 72x32x32
                                                                         Cabled 8x8x16

                                                            Rack
                                                        32 Node Cards




                                                                                                   1 PF/s
                                Node Card                                                       144 (288) TB
                             (32 chips 4x4x2)
                         32 compute, 0-1 IO cards
                                                                                 13.9 TF/s
                                                                                  2 (4) TB

                                                                    435 GF/s
           Compute Card                                            64 (128) GB
          1 chip, 20 DRAMs



     Chip                                    13.6 GF/s
 4 processors                              2.0 GB DDR2
                                          (4.0GB 6/30/08)


       13.6 GF/s
     8 MB EDRAM

24                                                                                              © 2011 IBM Corporation
Blue Gene/P Compute ASIC
      32k I1/32k D1
                            Snoop     snoop
                             filter
       PPC450         128




                                                Multiplexing switch
      Double FPU              L2                                                                                                               4MB
                                                                                            256
                                                                                                                      Shared L3   512b data   eDRAM
                                                                                                                      Directory    72b ECC
      32k I1/32k D1                                                   256
                            Snoop                                                                                    for eDRAM                L3 Cache
                             filter                                                                                                              or
       PPC450                                                                                                         w/ECC                    On-Chip
                      128
                                                                                                                                               Memory

      Double FPU              L2
                                                                                  32
      32k I1/32k D1                                                                          Shared



                                                Multiplexing switch
                            Snoop                                                            SRAM
                             filter
       PPC450         128                                                                                                                      4MB
                                                                                                                      Shared L3               eDRAM
                              L2                                                                                                  512b data
      Double FPU                                                                                                      Directory    72b ECC
                                                                                                                     for eDRAM                L3 Cache
                                                                                                                                                 or
      32k I1/32k D1
                            Snoop                                                                                     w/ECC                    On-Chip
                             filter                                                                                                            Memory
       PPC450         128

      Double FPU
                              L2
                                                                                                         Arb
                                                                      DMA
                         Hybrid
                          PMU                                                                                                                   DDR-2            DDR-2
                        w/ SRAM         JTAG                                                            Global         Ethernet                Controller       Controller
                        256x64b        Access
                                                                      Torus            Collective
                                                                                                        Barrier        10 Gbit                  w/ ECC           w/ ECC



                                        JTAG                       6 3.4Gb/s            3 6.8Gb/s       4 global       10 Gb/s                      13.6 Gb/s
                                                                  bidirectional        bidirectional   barriers or
                                                                                                                                                 DDR-2 DRAM bus
                                                                                                       interrupts
                                                                                                                                                            © 2011 IBM Corporation
Blue Gene/P Compute Card

                          2 x 16GB interface to 2 or 4
                                                                  BGQ ASIC 29mm x 29mm FC-PBGA
                          GB SDRAM-DDR2




      NVRAM, monitors, decoupling,
      Vtt termination
                                All network and IO, power input


                                                                                                 © 2011 IBM Corporation
Blue Gene/P Node Board                                                  32 Compute nodes




      Optional IO card (one of 2 possible)


                                                           Local DC-DC regulators
                                                       (6 required, 8 with redundancy)
                                   10Gb optical link
                                                                                  © 2011 IBM Corporation
Blue Gene Interconnection Networks
Optimized for Parallel Programming and Scalable Management



                                 3D Torus
                                     –   Interconnects all compute nodes (65,536)
                                     –   Virtual cut-through hardware routing
                                     –   1.4Gb/s on all 12 node links (2.1 GB/s per node)
                                     –   Communications backbone for computations
                                     –   0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth

                                 Global Collective Network
                                     –   One-to-all broadcast functionality
                                     –   Reduction operations functionality
                                     –   2.8 Gb/s of bandwidth per link; One-way global latency 2.5 µs
                                     –   ~23TB/s total bandwidth (64k machine)
                                     –   Interconnects all compute and I/O nodes (1024)

                                 Low Latency Global Barrier and Interrupt
                                     – Round trip latency 1.3 µs
                                 Control Network
                                     – Boot, monitoring and diagnostics
                                 Ethernet
                                     – Incorporated into every node ASIC
                                     – Active in the I/O nodes (1:64)
                                     – All external comm. (file I/O, control, user interaction, etc.)



28                                                                                                   © 2011 IBM Corporation
Source: Kirk Borne, Data Science Challenges from
     Distributed Petabyte Astronomical Data Collections:
29   Preparing for the Data Avalanche through
                                 © 2011 IBM Corporation
     Persistence, Parallelization, and Provenance
Blue Gene Architecture in Review
     Blue Gene is not just FLOPs …




                                                       … it’s also torus
                                                       network, power
                                                       efficiency, and dense
                                                       packaging.

                                                       Focus on scalability
                                                       rather than on
                                                       configurability
                                                       gives the Blue Gene
                                                       family’s System-on-a-
                                                       Chip architecture
                                                       unprecedented
                                                       scalability and
                                                       reliability.
30
30          Blue Gene Active Storage   HEC FSIO 2010                           © 2011 IBM Corporation
Thought Experiment: A Blue Gene Active Storage Machine
• Integrate significant storage class memory (SCM) at each node
      •   For now, Flash memory, maybe similar function to Fusion-io ioDrive Duo
      •   Future systems may deploy Phase Change Memory (PCM), Memristor, or …?
                                                                                                    ioDrive Duo       One Board      512 Node
      •   Assume node density will drops 50% -- 512 Nodes/Rack for embedded apps
      •   Objective: balance Flash bandwidth to network all-to-all throughput                       SLC NAND Cap.     320 GB         160 TB

                                                                                                    Read BW (64K)     1450 MB/s      725 GB/s

• Resulting System Attributes:                                                                      Write BW (64K)    1400 MB/s      700 GB/s

      •   Rack: 0.5 petabyte, 512 Blue Gene processors, and embedded torus network                  Read IOPS (4K)    270,000        138 Mega

      •   700 TB/s I/O bandwidth to Flash – competitive with ~70 large disk controllers             Write IOPS (4K)   257,000        131 Maga

               •   Order of magnitude less space and power than equivalent perf via disk solution   Mixed R/W
                                                                                                                      207,000        105 Mega
               •   Can configure fewer disk controllers and optimize them for archival use          IOPs(75/25@4K)
      •   With network all-to-all throughput at 1GB/s per node, anticipate:
               •   1TB sort from/to persistent storage in order 10 secs.
               •   130 Million IOPs per rack, 700 GB/s I/O bandwidth
      •   Inherit Blue Gene attributes: scalability, reliability, power efficiency,


• Research Challenges (list not exhaustive):
      •   Packaging – can the integration succeed?
      •   Resilience – storage, network, system management, middleware
      •   Data management – need clear split between on-line and archival data
      •   Data structures and algorithms can take specific advantage of the BGAS
          architecture – no one cares it’s not x86 since software is embedded in storage


• Related Work:
      •   Gordon (UCSD) http://nvsl.ucsd.edu/papers/Asplos2009Gordon.pdf
      •   FAWN (CMU) http://www.cs.cmu.edu/~fawnproj/papers/fawn-sosp2009.pdf
      •   RamCloud (Stanford) http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf



31                       Blue Gene Active Storage        HEC FSIO 2010                                                 © 2011 IBM Corporation
From individual transistors to the globe
Energy-consumption issues (and thermal issues) propagate through hardware levels




32                                                                                 © 2011 IBM Corporation
Energy consumption of datacenters today




                                                   Source: APC, Whitepaper #154 (2008)




 Current air-cooled datacenters are extremely inefficient. Cooling needs as
  much energy as IT and both are thrown-away.
 Provocative: Datacenter is a huge “Heater with integrated Logic”.
 For a 10 MW datacenter US$ 3 - 5M is wasted per year.

33                                                                                       © 2011 IBM Corporation
Hot-water-cooled datacenters – towards zero emission



                     Micro-channel
                     liquid coolers
                                      Heat exchanger




     CMOS 80ºC




                                                       Direct „Waste“-Heat usage
                                                       e.g. heating


34      Water 60ºC                                                   © 2011 IBM Corporation
Paradigm change: Moore’s law goes 3D


                               Multi-Chip Design                                Brain: synapse network



                                                            System on Chip




Meindl 05 et al.                                                                    3D Integration



                                             Benefits:
                                              High core-cache bandwidth
                                              Separation of technologies
                                              Reduction in wire length
                                              Equivalent to two generations of scaling
             Global wire lengths reduction
                                              No impact on software development

35                                                                                        © 2011 IBM Corporation
Scalable Heat Removal by Interlayer Cooling
                                                                                  cross-section through fluid
                                                                                  port and cavities

      3D integration requires (scalable) interlayer liquid cooling
      Challenge: isolate electrical interconnects from liquid

                                                                Microchannel
                                                                Pin fin




                                         Through silicon via electrical bonding
                                           and water insulation scheme


      A large fraction of energy in computers is spent for data transport
      Shrinking computers saves energy
                                                                                  Test vehicle with fluid
                                                                                  manifold and connection

36                                                                                             © 2011 IBM Corporation
On the Cube Road


     Paradigm Changes
     -Energy will cost more than servers
     -Coolers are million fold larger than transistors



                          Moore’s Law goes 3D
                          -Single layer scaling slows down
                          -Stacking of layers allows extension of Moore’s law
                          -Approaching functional density of human brain


                                               Future computers look different
                                               -Liquid cooling and heat re-use, e.g. Aquasar
                                               -Interlayer cooled 3D chip stacks
                                               -Smarter energy by bionic designs


                                                                 Energy aspects are key
                                                                 -Cooling – power delivery – photonics
                                                                 -Shrink a rack to a “sugar cube”: 50x efficiency




37                                                                                                          © 2011 IBM Corporation
Thank you very much for your attention.
38                                       © 2011 IBM Corporation

Más contenido relacionado

La actualidad más candente

High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Slide_N
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013btMasoud Nikravesh
 
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingLarry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingDiamond Exchange
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDLarry Smarr
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...RCCSRENKEI
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data ComputingRCCSRENKEI
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras SystemsRCCSRENKEI
 

La actualidad más candente (20)

High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
 
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale ComputingLarry Smarr - Making Sense of Information Through Planetary Scale Computing
Larry Smarr - Making Sense of Information Through Planetary Scale Computing
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data Computing
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
 

Destacado

An Easy Path to Exascale
An Easy Path to ExascaleAn Easy Path to Exascale
An Easy Path to ExascaleScott Pakin
 
Priyanka pillai-Tools for rare cancer data analytics
Priyanka pillai-Tools for rare cancer data analyticsPriyanka pillai-Tools for rare cancer data analytics
Priyanka pillai-Tools for rare cancer data analyticsPriyanka Pillai
 
Incredibleminds Career Exploration Laboratory
Incredibleminds Career Exploration LaboratoryIncredibleminds Career Exploration Laboratory
Incredibleminds Career Exploration LaboratoryFutureWorldz.org
 
LSS 2017 Frontiers in Metabolism
LSS 2017 Frontiers in MetabolismLSS 2017 Frontiers in Metabolism
LSS 2017 Frontiers in MetabolismSacha Sidjanski
 
Exponential technology - How do we make sure everyone benefits?
Exponential technology - How do we make sure everyone benefits?Exponential technology - How do we make sure everyone benefits?
Exponential technology - How do we make sure everyone benefits?Matthijs Pontier
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial IntelligenceAhmed Hani Ibrahim
 
Xmed 2015 Lucien Engelen
Xmed 2015 Lucien EngelenXmed 2015 Lucien Engelen
Xmed 2015 Lucien EngelenLucien Engelen
 
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Joel Saltz
 
San Diego Venture Group; Venture Summit 2013; Smarr
San Diego Venture Group; Venture Summit 2013; SmarrSan Diego Venture Group; Venture Summit 2013; Smarr
San Diego Venture Group; Venture Summit 2013; SmarrSan Diego Venture Group
 
San Diego Venture Group; Venture Summit 2013; Presnell
San Diego Venture Group; Venture Summit 2013; PresnellSan Diego Venture Group; Venture Summit 2013; Presnell
San Diego Venture Group; Venture Summit 2013; PresnellSan Diego Venture Group
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelIntel IT Center
 
Indiana 4 2011 Final Final
Indiana 4 2011 Final FinalIndiana 4 2011 Final Final
Indiana 4 2011 Final FinalJoel Saltz
 
Exponential Technology in Health Care
Exponential Technology in Health Care Exponential Technology in Health Care
Exponential Technology in Health Care Joseph Haslam
 

Destacado (20)

An Easy Path to Exascale
An Easy Path to ExascaleAn Easy Path to Exascale
An Easy Path to Exascale
 
Priyanka pillai-Tools for rare cancer data analytics
Priyanka pillai-Tools for rare cancer data analyticsPriyanka pillai-Tools for rare cancer data analytics
Priyanka pillai-Tools for rare cancer data analytics
 
Incredibleminds Career Exploration Laboratory
Incredibleminds Career Exploration LaboratoryIncredibleminds Career Exploration Laboratory
Incredibleminds Career Exploration Laboratory
 
LSS 2017 Frontiers in Metabolism
LSS 2017 Frontiers in MetabolismLSS 2017 Frontiers in Metabolism
LSS 2017 Frontiers in Metabolism
 
Exponential technology - How do we make sure everyone benefits?
Exponential technology - How do we make sure everyone benefits?Exponential technology - How do we make sure everyone benefits?
Exponential technology - How do we make sure everyone benefits?
 
Mauricio carrillo tripp
Mauricio carrillo trippMauricio carrillo tripp
Mauricio carrillo tripp
 
Future of medicine
Future of medicineFuture of medicine
Future of medicine
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Vivek wadhwa - Review
Vivek wadhwa - Review Vivek wadhwa - Review
Vivek wadhwa - Review
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Xmed 2015 Lucien Engelen
Xmed 2015 Lucien EngelenXmed 2015 Lucien Engelen
Xmed 2015 Lucien Engelen
 
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
 
Complementary and Alternative Therapies for Lupus
Complementary and Alternative Therapies for LupusComplementary and Alternative Therapies for Lupus
Complementary and Alternative Therapies for Lupus
 
Ex Med Companies Recap
Ex Med Companies RecapEx Med Companies Recap
Ex Med Companies Recap
 
San Diego Venture Group; Venture Summit 2013; Smarr
San Diego Venture Group; Venture Summit 2013; SmarrSan Diego Venture Group; Venture Summit 2013; Smarr
San Diego Venture Group; Venture Summit 2013; Smarr
 
San Diego Venture Group; Venture Summit 2013; Presnell
San Diego Venture Group; Venture Summit 2013; PresnellSan Diego Venture Group; Venture Summit 2013; Presnell
San Diego Venture Group; Venture Summit 2013; Presnell
 
Accelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at IntelAccelerating the Pace of Discovery Technical Computing at Intel
Accelerating the Pace of Discovery Technical Computing at Intel
 
Indiana 4 2011 Final Final
Indiana 4 2011 Final FinalIndiana 4 2011 Final Final
Indiana 4 2011 Final Final
 
Exponential Technology in Health Care
Exponential Technology in Health Care Exponential Technology in Health Care
Exponential Technology in Health Care
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Similar a High Performance Computing - Challenges on the Road to Exascale Computing

Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Schaffhausen Institute of Technology
 
Silicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSSilicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSAsaf Somekh
 
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...Larry Smarr
 
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world:  a ‘disruptive’ presentationReservoir engineering in a HPC (zettaflops) world:  a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentationHans Haringa
 
Comparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data CenterComparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data Centerrobgross144
 
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereData Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereXi'an Jiaotong-Liverpool University
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Sam Samuel - Are we stuck in a Rut? The need for agressive research goals
Sam Samuel - Are we stuck in a Rut? The need for agressive research goalsSam Samuel - Are we stuck in a Rut? The need for agressive research goals
Sam Samuel - Are we stuck in a Rut? The need for agressive research goalsiMinds conference
 
Silicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and OpportunitiesSilicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and Opportunitiesinside-BigData.com
 
Datacenter Revolution Dean Nelson, Sun
Datacenter  Revolution    Dean  Nelson,  SunDatacenter  Revolution    Dean  Nelson,  Sun
Datacenter Revolution Dean Nelson, SunNiklas Johnsson
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 

Similar a High Performance Computing - Challenges on the Road to Exascale Computing (20)

Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
 
Silicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOSSilicon to photonics optical interconnect routing: Compass EOS
Silicon to photonics optical interconnect routing: Compass EOS
 
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
 
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world:  a ‘disruptive’ presentationReservoir engineering in a HPC (zettaflops) world:  a ‘disruptive’ presentation
Reservoir engineering in a HPC (zettaflops) world: a ‘disruptive’ presentation
 
Comparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data CenterComparing Copper and Fiber Options Data Center
Comparing Copper and Fiber Options Data Center
 
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereData Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
 
Greencomputing by nadeemsarshar
Greencomputing by nadeemsarsharGreencomputing by nadeemsarshar
Greencomputing by nadeemsarshar
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Sam Samuel - Are we stuck in a Rut? The need for agressive research goals
Sam Samuel - Are we stuck in a Rut? The need for agressive research goalsSam Samuel - Are we stuck in a Rut? The need for agressive research goals
Sam Samuel - Are we stuck in a Rut? The need for agressive research goals
 
Tomás Palacios-Redefining electronics
Tomás Palacios-Redefining electronicsTomás Palacios-Redefining electronics
Tomás Palacios-Redefining electronics
 
Silicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and OpportunitiesSilicon Photonics for Extreme Computing - Challenges and Opportunities
Silicon Photonics for Extreme Computing - Challenges and Opportunities
 
Datacenter Revolution Dean Nelson, Sun
Datacenter  Revolution    Dean  Nelson,  SunDatacenter  Revolution    Dean  Nelson,  Sun
Datacenter Revolution Dean Nelson, Sun
 
Tao zhang
Tao zhangTao zhang
Tao zhang
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
Interop: The 10GbE Top 10
Interop: The 10GbE Top 10Interop: The 10GbE Top 10
Interop: The 10GbE Top 10
 
Bluegene
BluegeneBluegene
Bluegene
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 

Más de Heiko Joerg Schick

Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Heiko Joerg Schick
 
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHuawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHeiko Joerg Schick
 
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...Heiko Joerg Schick
 
The Smarter Car for Autonomous Driving
 The Smarter Car for Autonomous Driving The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous DrivingHeiko Joerg Schick
 
From edge computing to in-car computing
From edge computing to in-car computingFrom edge computing to in-car computing
From edge computing to in-car computingHeiko Joerg Schick
 
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingNeed and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingHeiko Joerg Schick
 
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSRun-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSHeiko Joerg Schick
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderHeiko Joerg Schick
 
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsIBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsHeiko Joerg Schick
 
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesReal time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesHeiko Joerg Schick
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressHeiko Joerg Schick
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 

Más de Heiko Joerg Schick (16)

Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)
 
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technologyHuawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technology
 
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
 
The Smarter Car for Autonomous Driving
 The Smarter Car for Autonomous Driving The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous Driving
 
From edge computing to in-car computing
From edge computing to in-car computingFrom edge computing to in-car computing
From edge computing to in-car computing
 
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous drivingNeed and value for various levels of autonomous driving
Need and value for various levels of autonomous driving
 
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFSRun-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
 
Blue Gene Active Storage
Blue Gene Active StorageBlue Gene Active Storage
Blue Gene Active Storage
 
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person FinderBrowser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
 
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood MapsIBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
 
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the PhilippinesReal time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the Philippines
 
Slimline Open Firmware
Slimline Open FirmwareSlimline Open Firmware
Slimline Open Firmware
 
Agnostic Device Drivers
Agnostic Device DriversAgnostic Device Drivers
Agnostic Device Drivers
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 

Último

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

High Performance Computing - Challenges on the Road to Exascale Computing

  • 1. April 2011 High Performance Computing Challenges on the Road to Exascale Computing H. J. Schick IBM Germany Research & Development GmbH © 2011 IBM Corporation
  • 2. Agenda  Introduction  The What and Why of High Performance Computing  Exascale Challenges  Balanced Systems  Blue Gene Architecture and Blue Gene Active Storage  Supercomputers in a Sugar Cube 2 © 2011 IBM Corporation
  • 3. © 2011 IBM Corporation
  • 4. Origination of the “Jugene” Supercomputer 4 © 2011 IBM Corporation
  • 5. Supercomputer Satisfies Need for FLOPS  FLOPS = FLoating point OPerations per Second. – Mega=106, Giga=109, Tera=1012, Peta=1015, Exa=1018  Simulation is a major application area.  Many simulations based on the notion of “timestep”. – At each timestep, advance the constituent parts according to their physics or chemistry. – Example Challenge: Molecular dynamics has picosecond=10-12 timescale, but many biology processes have millisecond=10-3 timescale. • Simulation has 109 timesteps! Each timestep requires many operations! 5 © 2011 IBM Corporation
  • 6. Simulation Pseudo-code: // Initialize state of atoms. While time < 1 millisecond { // Calculate forces on 40,000 atoms. // Calculate velocities of all atoms. // Advance position of all atoms. time = time + 1picosecond } // Write biology result. 6 © 2011 IBM Corporation
  • 7. Supercomputing is Capability Computing  A single instance of an application using large tightly-coupled computer resources. – For example, a single 1000-year climate simulation.  Contrast to Capacity Computing: – Many instances of one or more applications using large loosely-coupled computer resources. – For example, 1000 independent 1-year climate simulations. – Often trivial parallelism. Often suited for GRID or SETI@Home-style systems. 7 © 2011 IBM Corporation
  • 8. Supercomputer Versus Your Desktop  Assume 2000-processor supercomputer delivers simulation result in 1 day.  Assuming memory-size is not a problem, then your 1-processor desktop would deliver same result in 2000 days = 5 years.  So supercomputers make results available on a human timescale. 8 © 2011 IBM Corporation
  • 9. But what could you do if all objects were intelligent… …and connected? 9 © 2011 IBM Corporation
  • 10. What could you do with unlimited computing power… for pennies? Could you predict the path of a storm down to the square kilometer? Could you identify another 20% of proven oil reserves without drilling one hole? © 2011 IBM Corporation
  • 11. Grand Challenges “A grand challenge is a fundamental problem in science or engineering, with broad applications, whose solution would be enabled by the application of high performance computing resources that could become available in the near future.” Computational fluid dynamics Electronic structure Calculations to calculations for the understand the • Design of hypersonic aircraft, design of new fundamental nature efficient automobile bodies, and materials: of matter: extremely quiet submarines. • Weather forecasting for short and • Chemical catalysts • Quantum long term effects. • Immunological agents chromodynamics • Efficient recovery of oil, and for • Superconductors • Condensed matter many other applications. theory 11 © 2011 IBM Corporation
  • 12. Enough Atoms to See Grains in Solidification of Metal http://www-phys.llnl.gov/Research/Metals_Alloys/news.html 12 © 2011 IBM Corporation
  • 13. Building Blocks of Matter  QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)  Quarks are the constituents of matter which strongly interact exchanging gluons.  Particular phenomena – Confinement – Asymptotic freedom (Nobel Prize 2004)  Theory of strong interactions = Quantum Chromodynamics (QCD) 13 © 2011 IBM Corporation
  • 14. Projected Performance Development Almost a doubling every year !!! 14 © 2011 IBM Corporation
  • 15. Extrapolating an Exaflop in 2018 Standard technology scaling will not get us there in 2018 BlueGene/L Exaflop Exaflop compromise Assumption for “compromise guess” (2005) Directly using traditional scaled technology Node Peak Perf 5.6GF 20TF 20TF Same node count (64k) hardware 2 8000 1600 Assume 3.5GHz concurrency/node System Power in 1 MW 3.5 GW 25 MW Expected based on technology improvement through 4 technology generations. (Only Compute Chip compute chip power scaling, I/Os also scaled same way) Link Bandwidth 1.4Gbps 5 Tbps 1 Tbps Not possible to maintain bandwidth ratio. (Each unidirectional 3-D link) Wires per 2 400 wires 80 wires Large wire count will eliminate high density and drive links onto cables where they are unidirectional 3-D 100x more expensive. Assume 20 Gbps signaling link Pins in network on 24 pins 5,000 pins 1,000 pins 20 Gbps differential assumed. 20 Gbps over copper will be limited to 12 inches. Will need node optics for in rack interconnects. 10Gbps now possible in both copper and optics. Power in network 100 KW 20 MW 4 MW 10 mW/Gbps assumed. Now: 25 mW/Gbps for long distance (greater than 2 feet on copper) for both ends one direction. 45mW/Gbps optics both ends one direction. + 15mW/Gbps of electrical Electrical power in future: separately optimized links for power. Memory 5.6GB/s 20TB/s 1 TB/s Not possible to maintain external bandwidth/Flop Bandwidth/node L2 cache/node 4 MB 16 GB 500 MB About 6-7 technology generations with expected eDRAM density improvements Data pins associated 128 data 40,000 pins 2000 pins 3.2 Gbps per pin with memory/node pins Power in memory I/O 12.8 KW 80 MW 4 MW 10 mW/Gbps assumed. Most current power in address bus. (not DRAM) Future probably about 15mW/Gbps maybe get to 10mW/Gbps (2.5mW/Gbps is c*v^2*f for random data on data pins) Address power is higher. 15 © 2011 IBM Corporation
  • 16. The Big Leap from Petaflops to Exaflops  We will hit 20 Petaflop in 2011/2012 …. Now beginning research for ~2018 Exascale.  IT/CMOS industry is trying to double performance every 2 years. HPC industry is trying to double performance every year.  Technology disruptions in many areas. – BAD NEWS: Scalability of current technologies? • Silicon Power, Interconnect, Memory, Packaging. – GOOD NEWS: Emerging technologies? • Memory technologies (e.g. storage class memory), 3D-chips, etc.  Exploiting exascale machines. – Want to maximize science output per €. – Need multiple partner applications to evaluate HW trade-offs. 16 © 2011 IBM Corporation
  • 17. Exascale Challenges – Energy  Power consumption will increase in the future!  What is the critical limit? – JSC has 5 MW, potential of 10 MW – 1 MW is 1 M€ / year – 20 MW expected to be the critical limit  Are Exascale systems a Large Scale Facility? – LHC uses 100 MW  Energy efficiency – Cooling uses significant fraction (PUE > 1.2 today → 1.0) – Hot cooling water (40°C and more) might help – Free cooling: use free air to cool water – Heat recycling: use waste heat for heating, cooling, etc. 17 © 2011 IBM Corporation
  • 18. Exascale Challenges – Resiliency  Ever increasing number of components – O(10000) nodes – O(100000) DIMMs of RAM  Each component's MTBF will not increase – Optimistic: Remains constant – Realistic: Smaller structures, lower voltages → decrease  Global MTBF will decrease – Critical limit? 1 day? 1 hour? Time to write checkpoint!  How to handle failures – Try to anticipate failures via monitoring – Software must help to handle failures • checkpoints, process-migration, transactional computing 18 © 2011 IBM Corporation
  • 19. Exascale Challenges – Applications  Ever increasing levels of parallelism – Thousands of nodes, hundreds of cores, dozens of registers – Automatic parallelization vs. explicit exposure – How large are coherency domains? – How many languages do we have to learn?  MPI + X most probably not sufficient – 1 process / core makes orchestration of processes harder – GPUs require explicit handling today (CUDA, OpenCL)  What is the future paradigm – MPI + X + Y? PGAS + X (+Y)? – PGAS: UPC, Co-Array Fortran, X10, Chapel, Fortress, …  Which applications are inherently scalable enough at all? 19 © 2011 IBM Corporation
  • 20. Balanced Systems  Example caxpy: Processor FPU throughput Memory bandwidth [FLOPS / cycle] [words / cycle] [FLOPS / word] apeNEXT 8 2 4 QCDOC (MM) 2 0.63 3.2 QCDOC (LS) 2 2 1 Xeon 2 0.29 7 GPU 128 x 2 17.3 (*) 14.8 Cell/B.E. (MM) 8x4 1 32 Cell/B.E. (LS) 8x4 8x4 1 20 © 2011 IBM Corporation
  • 21. Balanced Systems ??? 21 © 2011 IBM Corporation
  • 22. … but are they Reliable, Available and Serviceable ??? 22 © 2011 IBM Corporation
  • 23. Blue Gene/P 23 © 2011 IBM Corporation
  • 24. Blue Gene/P System 72 Racks, 72x32x32 Cabled 8x8x16 Rack 32 Node Cards 1 PF/s Node Card 144 (288) TB (32 chips 4x4x2) 32 compute, 0-1 IO cards 13.9 TF/s 2 (4) TB 435 GF/s Compute Card 64 (128) GB 1 chip, 20 DRAMs Chip 13.6 GF/s 4 processors 2.0 GB DDR2 (4.0GB 6/30/08) 13.6 GF/s 8 MB EDRAM 24 © 2011 IBM Corporation
  • 25. Blue Gene/P Compute ASIC 32k I1/32k D1 Snoop snoop filter PPC450 128 Multiplexing switch Double FPU L2 4MB 256 Shared L3 512b data eDRAM Directory 72b ECC 32k I1/32k D1 256 Snoop for eDRAM L3 Cache filter or PPC450 w/ECC On-Chip 128 Memory Double FPU L2 32 32k I1/32k D1 Shared Multiplexing switch Snoop SRAM filter PPC450 128 4MB Shared L3 eDRAM L2 512b data Double FPU Directory 72b ECC for eDRAM L3 Cache or 32k I1/32k D1 Snoop w/ECC On-Chip filter Memory PPC450 128 Double FPU L2 Arb DMA Hybrid PMU DDR-2 DDR-2 w/ SRAM JTAG Global Ethernet Controller Controller 256x64b Access Torus Collective Barrier 10 Gbit w/ ECC w/ ECC JTAG 6 3.4Gb/s 3 6.8Gb/s 4 global 10 Gb/s 13.6 Gb/s bidirectional bidirectional barriers or DDR-2 DRAM bus interrupts © 2011 IBM Corporation
  • 26. Blue Gene/P Compute Card 2 x 16GB interface to 2 or 4 BGQ ASIC 29mm x 29mm FC-PBGA GB SDRAM-DDR2 NVRAM, monitors, decoupling, Vtt termination All network and IO, power input © 2011 IBM Corporation
  • 27. Blue Gene/P Node Board 32 Compute nodes Optional IO card (one of 2 possible) Local DC-DC regulators (6 required, 8 with redundancy) 10Gb optical link © 2011 IBM Corporation
  • 28. Blue Gene Interconnection Networks Optimized for Parallel Programming and Scalable Management 3D Torus – Interconnects all compute nodes (65,536) – Virtual cut-through hardware routing – 1.4Gb/s on all 12 node links (2.1 GB/s per node) – Communications backbone for computations – 0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth Global Collective Network – One-to-all broadcast functionality – Reduction operations functionality – 2.8 Gb/s of bandwidth per link; One-way global latency 2.5 µs – ~23TB/s total bandwidth (64k machine) – Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt – Round trip latency 1.3 µs Control Network – Boot, monitoring and diagnostics Ethernet – Incorporated into every node ASIC – Active in the I/O nodes (1:64) – All external comm. (file I/O, control, user interaction, etc.) 28 © 2011 IBM Corporation
  • 29. Source: Kirk Borne, Data Science Challenges from Distributed Petabyte Astronomical Data Collections: 29 Preparing for the Data Avalanche through © 2011 IBM Corporation Persistence, Parallelization, and Provenance
  • 30. Blue Gene Architecture in Review Blue Gene is not just FLOPs … … it’s also torus network, power efficiency, and dense packaging. Focus on scalability rather than on configurability gives the Blue Gene family’s System-on-a- Chip architecture unprecedented scalability and reliability. 30 30 Blue Gene Active Storage HEC FSIO 2010 © 2011 IBM Corporation
  • 31. Thought Experiment: A Blue Gene Active Storage Machine • Integrate significant storage class memory (SCM) at each node • For now, Flash memory, maybe similar function to Fusion-io ioDrive Duo • Future systems may deploy Phase Change Memory (PCM), Memristor, or …? ioDrive Duo One Board 512 Node • Assume node density will drops 50% -- 512 Nodes/Rack for embedded apps • Objective: balance Flash bandwidth to network all-to-all throughput SLC NAND Cap. 320 GB 160 TB Read BW (64K) 1450 MB/s 725 GB/s • Resulting System Attributes: Write BW (64K) 1400 MB/s 700 GB/s • Rack: 0.5 petabyte, 512 Blue Gene processors, and embedded torus network Read IOPS (4K) 270,000 138 Mega • 700 TB/s I/O bandwidth to Flash – competitive with ~70 large disk controllers Write IOPS (4K) 257,000 131 Maga • Order of magnitude less space and power than equivalent perf via disk solution Mixed R/W 207,000 105 Mega • Can configure fewer disk controllers and optimize them for archival use IOPs(75/25@4K) • With network all-to-all throughput at 1GB/s per node, anticipate: • 1TB sort from/to persistent storage in order 10 secs. • 130 Million IOPs per rack, 700 GB/s I/O bandwidth • Inherit Blue Gene attributes: scalability, reliability, power efficiency, • Research Challenges (list not exhaustive): • Packaging – can the integration succeed? • Resilience – storage, network, system management, middleware • Data management – need clear split between on-line and archival data • Data structures and algorithms can take specific advantage of the BGAS architecture – no one cares it’s not x86 since software is embedded in storage • Related Work: • Gordon (UCSD) http://nvsl.ucsd.edu/papers/Asplos2009Gordon.pdf • FAWN (CMU) http://www.cs.cmu.edu/~fawnproj/papers/fawn-sosp2009.pdf • RamCloud (Stanford) http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf 31 Blue Gene Active Storage HEC FSIO 2010 © 2011 IBM Corporation
  • 32. From individual transistors to the globe Energy-consumption issues (and thermal issues) propagate through hardware levels 32 © 2011 IBM Corporation
  • 33. Energy consumption of datacenters today Source: APC, Whitepaper #154 (2008)  Current air-cooled datacenters are extremely inefficient. Cooling needs as much energy as IT and both are thrown-away.  Provocative: Datacenter is a huge “Heater with integrated Logic”.  For a 10 MW datacenter US$ 3 - 5M is wasted per year. 33 © 2011 IBM Corporation
  • 34. Hot-water-cooled datacenters – towards zero emission Micro-channel liquid coolers Heat exchanger CMOS 80ºC Direct „Waste“-Heat usage e.g. heating 34 Water 60ºC © 2011 IBM Corporation
  • 35. Paradigm change: Moore’s law goes 3D Multi-Chip Design Brain: synapse network System on Chip Meindl 05 et al. 3D Integration Benefits:  High core-cache bandwidth  Separation of technologies  Reduction in wire length  Equivalent to two generations of scaling Global wire lengths reduction  No impact on software development 35 © 2011 IBM Corporation
  • 36. Scalable Heat Removal by Interlayer Cooling cross-section through fluid port and cavities  3D integration requires (scalable) interlayer liquid cooling  Challenge: isolate electrical interconnects from liquid  Microchannel  Pin fin Through silicon via electrical bonding and water insulation scheme  A large fraction of energy in computers is spent for data transport  Shrinking computers saves energy Test vehicle with fluid manifold and connection 36 © 2011 IBM Corporation
  • 37. On the Cube Road Paradigm Changes -Energy will cost more than servers -Coolers are million fold larger than transistors Moore’s Law goes 3D -Single layer scaling slows down -Stacking of layers allows extension of Moore’s law -Approaching functional density of human brain Future computers look different -Liquid cooling and heat re-use, e.g. Aquasar -Interlayer cooled 3D chip stacks -Smarter energy by bionic designs Energy aspects are key -Cooling – power delivery – photonics -Shrink a rack to a “sugar cube”: 50x efficiency 37 © 2011 IBM Corporation
  • 38. Thank you very much for your attention. 38 © 2011 IBM Corporation