Que es la supercomputacion una explicacion orientada al mundo empresarial
1. ¿Qué es la supercomputación? Una explicación orientada al mundo empresarial José M. Cela Director departamento CASE BSC-CNS [email_address]
2.
3. Tendencia tecnológica: Capacidad del microprocesador 2X transistores/Chip cada 1.5 años llamada “ Ley de Moore ” Moore’s Law Los microprocesadores son menores, mas densos y mas potentes. Otros dispositivos también mejoran sus prestaciones. Gordon Moore (co-fundador de Intel) predijo in 1965 que la densidad de transistores por unidad de área se doblaría cada 18 meses.
13. Incrementar el rendimiento de la CPU: un balance delicado Hasta hace poco vimos aumentar el número de transistores y la frecuencia del reloj. Disipar la potencia se ha convertido en el mayor problema: Procesador de Intel > 100 Watts La frecuencia de reloj no se puede aumentar más. Sin embargo, el numero de transistores seguirá aumentando. Lower Voltage Increase Clock Rate & Transistor Density Core Cache Core Cache Core C1 C2 C3 C4 Cache C1 C2 C3 C4 Cache C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4
16. Blue Gene/P 13.6 GF/s 8 MB EDRAM 4 processors 1 chip, 20 DRAMs 13.6 GF/s 2.0 (or 4.0) GB DDR Supports 4-way SMP 32 Node Cards 1024 chips, 4096 procs 14 TF/s 2 TB 72 Racks Final System:1 PF/s,144 TB November 2007: 0.596 PF/s Cabled 8x8x16 Rack System Compute Card Chip 435 GF/s 64 GB Front End Node / Service Node JS21 / Power5 Linux SLES10 Blue Gene/P continues Blue Gene’s leadership performance in a space-saving, power-efficient package for the most demanding and scalable high-performance computing applications HPC SW: Compilers GPFS ESSL Loadleveler (32 chips 4x4x2) 32 compute, 0-1 IO cards Node Card
17. Cell Broadband Engine Architecture™ (CBEA) Technology Competitive Roadmap Performance Enhancements/ Scaling Advanced Cell BE (1+8eDP SPE) 65nm SOI Cell BE (1+8) 90nm SOI Cost Reduction All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs. Next Gen ( 2PPE’+32SPE’) 45nm SOI ~1 TFlop (est.) Cell BE (1+8) 65nm SOI 2010 2009 2008 2007 2006
19. Green 500 57 126 Blue Gene/P Solution RZG/Max-Planck-Gesellschaft MPI/IPP 371.67 9 56 126 Blue Gene/P Solution IBM - Rochester 371.67 9 75 94.5 Blue Gene/P Solution ASTRON/University Groningen 371.67 8 1 2483.47 BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Voltaire Infiniband DOE/NNSA/LANL 444.94 7 42 138 BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Infiniband IBM Poughkeepsie Benchmarking Center 458.33 5 41 138 BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz , Infiniband DOE/NNSA/LANL 458.33 5 431 26.38 BladeCenter QS22 Cluster, PowerXCell 8i 3.2 Ghz, Infiniband Repsol YPF 530.33 2 430 26.38 BladeCenter QS22 Cluster, PowerXCell 8i 3.2 Ghz, Infiniband Repsol YPF 530.33 2 429 26.38 BladeCenter QS22 Cluster, PowerXCell 8i 3.2 Ghz, Infiniband Repsol YPF 530.33 2 220 34.63 BladeCenter QS22 Cluster, PowerXCell 8i 4.0 Ghz, Infiniband Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw 536.24 1 TOP500 Rank* Total Power (kW) Computer* Site* MFLOPS/W Green500 Rank
20.
21.
22.
23.
24. Blades, blade center and racks JS21 Processor Blade • 2x2 PPC 970 MP 2,3 GHz • 8 GB memory • 36 Gigabytes HD SAS • 2x1Gb Ethernet on board • Myrinet daughter card Blade Center • 14 blades per chassis (7U) • 56 processors • 112 GB memory • Gigabit ethernet switch 6 chassis in a rack (42U) • 336 processors • 672 GB memory
25. Myrinet Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Clos 256x256 Spine 1280 Spine 1280 256 links (1 to each node) 250MB/s each direction 128 Links 0 255 …
26.
27.
28.
29.
30.
31.
32.
33.
34.
35. ¿Por qué la supercomputación? Could not exist as a business Unable to compete, product testing and quality issues Unable to compete, time to market and cost issues Could still exist and compete Source : Fortune Magazine IDC pregunto a 33 compañías de los sectores aeroespacial, automoción, petrolero, electrónica, farmacéutico, financiero, logística y entretenimiento en USA, donde estaría su empresa sin acceso a HPC? 3% 47% 34% 16% „ The country that out-computes will be the one that out-competes“ Council on Competitiveness http://compete.org
Notas del editor
Tomosulo comment on complexity of O-O-O
Access latency for main memory, even using a modern SDRAM with a CAS latency of 2, will typically be around 9 cycles of the **memory system clock** -- the sum of The latency between the FSB and the chipset (Northbridge) (+/- 1 clockcycle) The latency between the chipset and the DRAM (+/- 1 clockcycle) The RAS to CAS latency (2-3 clocks, charging the right row) The CAS latency (2-3 clocks, getting the right column) 1 cycle to transfer the data. The latency to get this data back from the DRAM output buffer to the CPU (via the chipset) (+/- 2 clockcycles) Assuming a typical 133 MHz SDRAM memory system (eg: either PC133 or DDR266/PC2100), and assuming a 1.3 GHz processor, this makes 9*10 = 90 cycles of the CPU clock to access main memory! Yikes, you say! And it gets worse – a 1.6 GHz processor would take it to 108 cycles, a 2.0 GHz processor to 135 cycles, and even if the memory system was increased to 166 MHz (and still stayed CL2), a 3.0 GHz processor would wait a staggering 162 cycles! Caches make the memory system seem almost as fast as the L1 cache, yet as large as main memory. A modern primary (L1) cache has a latency of just two or three **processor cycles**, which is dozens of times faster than accessing main memory, and modern primary caches achieve hit rates of around 90% for most applications. So 90% of the time, accessing memory only takes a couple of cycles. Good overview http://www.pattosoft.com.au/Articles/ModernMicroprocessors/
72x32x32 may become 48x48x32. 2x as many cards, FRU is half as big Still 1024 chips per rack
Esto es lo que podriamos tener según la lista de Junio, pero se confirmara. Todavia pendiente de correr el linpack completo Felicitar a los equipos que los han permitido: trazas, linpack, sistemas