Enviar búsqueda
Cargar
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
•
1 recomendación
•
651 vistas
Heiko Joerg Schick
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 47
Descargar ahora
Descargar para leer sin conexión
Recomendados
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
Michael Gschwind
Blue Gene Active Storage
Blue Gene Active Storage
Heiko Joerg Schick
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
L05 parallel
L05 parallel
MEPCO Schlenk Engineering College
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
parallellabs
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
jemin lee
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind
Recomendados
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
Michael Gschwind
Blue Gene Active Storage
Blue Gene Active Storage
Heiko Joerg Schick
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
QuEST Global (erstwhile NeST Software)
L05 parallel
L05 parallel
MEPCO Schlenk Engineering College
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
parallellabs
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
jemin lee
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
PARC, a Xerox company
Vector processor : Notes
Vector processor : Notes
Subhajit Sahu
Iris an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
Patricia Oniga
Technology (1)
Technology (1)
firstnameoRZLPPq3F lastnameoRZLPPq3F
Hz2514321439
Hz2514321439
IJERA Editor
Lect.10.arm soc.4 neon
Lect.10.arm soc.4 neon
sean chen
1
1
srimoorthi
Par com
Par com
tttoracle
53
53
srimoorthi
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
Lc3519051910
Lc3519051910
IJERA Editor
101 cd 1415-1445
101 cd 1415-1445
Chiou-Nan Chen
design_flow
design_flow
Naren Sridhar
Morph : a novel accelerator
Morph : a novel accelerator
BaharJV
Graphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
karl.barnes
TU München creates a state-of-the-art research environment
TU München creates a state-of-the-art research environment
IBM India Smarter Computing
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
Heiko Joerg Schick
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
Shan Shan Hung
Hsa4941
Hsa4941
guest70cf206
Investment Game
Investment Game
hiteshanand
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
Heiko Joerg Schick
Más contenido relacionado
La actualidad más candente
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
PARC, a Xerox company
Vector processor : Notes
Vector processor : Notes
Subhajit Sahu
Iris an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
Patricia Oniga
Technology (1)
Technology (1)
firstnameoRZLPPq3F lastnameoRZLPPq3F
Hz2514321439
Hz2514321439
IJERA Editor
Lect.10.arm soc.4 neon
Lect.10.arm soc.4 neon
sean chen
1
1
srimoorthi
Par com
Par com
tttoracle
53
53
srimoorthi
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Totalviewtech
Lc3519051910
Lc3519051910
IJERA Editor
101 cd 1415-1445
101 cd 1415-1445
Chiou-Nan Chen
design_flow
design_flow
Naren Sridhar
Morph : a novel accelerator
Morph : a novel accelerator
BaharJV
Graphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
karl.barnes
TU München creates a state-of-the-art research environment
TU München creates a state-of-the-art research environment
IBM India Smarter Computing
La actualidad más candente
(17)
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
CCNxCon2012: Session 4: Caesar: a Content Router for High Speed Forwarding
Vector processor : Notes
Vector processor : Notes
Iris an architecture for cognitive radio networking testbeds
Iris an architecture for cognitive radio networking testbeds
Technology (1)
Technology (1)
Hz2514321439
Hz2514321439
Lect.10.arm soc.4 neon
Lect.10.arm soc.4 neon
1
1
Par com
Par com
53
53
TotalView Debugger On Blue Gene
TotalView Debugger On Blue Gene
Lc3519051910
Lc3519051910
101 cd 1415-1445
101 cd 1415-1445
design_flow
design_flow
Morph : a novel accelerator
Morph : a novel accelerator
Graphics processing uni computer archiecture
Graphics processing uni computer archiecture
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
TU München creates a state-of-the-art research environment
TU München creates a state-of-the-art research environment
Destacado
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
Heiko Joerg Schick
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
Shan Shan Hung
Hsa4941
Hsa4941
guest70cf206
Investment Game
Investment Game
hiteshanand
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
Heiko Joerg Schick
OPORTO CITY by Faria
OPORTO CITY by Faria
Faria22
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Heiko Joerg Schick
Destacado
(7)
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
IBM Corporate Service Corps - Helping Create Interactive Flood Maps
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
HKBU POLS3620 Contemporary Europe and Asia Presenation: Chinese & Western Opera
Hsa4941
Hsa4941
Investment Game
Investment Game
Browser and Management App for Google's Person Finder
Browser and Management App for Google's Person Finder
OPORTO CITY by Faria
OPORTO CITY by Faria
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Similar a QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
Japan's post K Computer
Japan's post K Computer
inside-BigData.com
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
msramakrishna
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
MohmdUmer
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
Perry Lea
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
Exascale Capabl
Exascale Capabl
Sagar Dolas
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Slide_N
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!
Slide_N
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Michael Gschwind
Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
Overview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
Clustering by AKASHMSHAH
Clustering by AKASHMSHAH
Akash M Shah
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
TELECOM I+D
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
Slide_N
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication Cycle
Nicolas Navet
D031201021027
D031201021027
inventionjournals
Deeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
Similar a QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
(20)
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Japan's post K Computer
Japan's post K Computer
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Exascale Capabl
Exascale Capabl
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Hardware and Software Architectures for the CELL BROADBAND ENGINE processor
Feeding the Multicore Beast:It’s All About the Data!
Feeding the Multicore Beast:It’s All About the Data!
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
M. Gschwind, A novel SIMD architecture for the Cell heterogeneous chip multip...
Real time machine learning proposers day v3
Real time machine learning proposers day v3
Overview of HPC Interconnects
Overview of HPC Interconnects
Clustering by AKASHMSHAH
Clustering by AKASHMSHAH
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication Cycle
D031201021027
D031201021027
Deeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Más de Heiko Joerg Schick
Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)
Heiko Joerg Schick
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technology
Heiko Joerg Schick
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
Heiko Joerg Schick
The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous Driving
Heiko Joerg Schick
From edge computing to in-car computing
From edge computing to in-car computing
Heiko Joerg Schick
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous driving
Heiko Joerg Schick
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Heiko Joerg Schick
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the Philippines
Heiko Joerg Schick
Slimline Open Firmware
Slimline Open Firmware
Heiko Joerg Schick
Agnostic Device Drivers
Agnostic Device Drivers
Heiko Joerg Schick
The Cell Processor
The Cell Processor
Heiko Joerg Schick
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Heiko Joerg Schick
Más de Heiko Joerg Schick
(13)
Da Vinci - A scaleable architecture for neural network computing (updated v4)
Da Vinci - A scaleable architecture for neural network computing (updated v4)
Huawei empowers healthcare industry with AI technology
Huawei empowers healthcare industry with AI technology
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The 2025 Huawei trend forecast gives you the lowdown on data centre facilitie...
The Smarter Car for Autonomous Driving
The Smarter Car for Autonomous Driving
From edge computing to in-car computing
From edge computing to in-car computing
Need and value for various levels of autonomous driving
Need and value for various levels of autonomous driving
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS
Real time Flood Simulation for Metro Manila and the Philippines
Real time Flood Simulation for Metro Manila and the Philippines
Slimline Open Firmware
Slimline Open Firmware
Agnostic Device Drivers
Agnostic Device Drivers
The Cell Processor
The Cell Processor
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Último
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Precisely
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
BkGupta21
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Lars Bell
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Último
(20)
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
1.
Heiko J Schick
– IBM Deutschland R&D GmbH November 2010 QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.) © 2009 IBM Corporation
2.
Agenda Chapter 1:
Overview Chapter 2: Application optimized supercomputers Chapter 3: QPACE Chapter 4: Review and Summary Chapter 5: Unforgettable Impressions ;-) 2 © 2009 IBM Corporation
3.
Chapter 1: Overview Building
Blocks of Matter QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.) Quarks are the constituents of matter which strongly interact exchanging gluons. Particular phenomena – Confinement – Asymptotic freedom (Nobel Prize 2004) Theory of strong interactions = Quantum Chromodynamics (QCD) 3 © 2009 IBM Corporation
4.
Chapter 1: Overview Computing
Resource Requests Lattice QCD community aims for O(1−3) PFlops/s sustained beyond 2010. Europe – “The computational requirements voiced by these European groups sum up to more than 1 sustained Petaflop/s by 2009.” [HPC in Europe Taskforce (HET), 2006] US (USQCD) – Hope for O(1) PFlops/s sustained in 2010-11. “A goal with very substantial scientific rewards.” [USQCD SciDAC-2 proposal, 2006] Similar requests from Japan. 4 © 2009 IBM Corporation
5.
Chapter 2: Application
optimized supercomputers Performance Critical Kernels Overall performance of lattice QCD simulations dominated by a few kernels: – Linear algebra • Single processor operations • Typically memory bandwidth limited – Global reductions • Typically limited by network latency: • d-dimensional torus network: – Sparse matrix-vector multiplication 5 © 2009 IBM Corporation
6.
Chapter 2: Application
optimized supercomputers Relevant Performance Signatures Arithmetic operations – Floating-point arithmetic's with complex operands – Dominant operation a × b + c Memory operations – High data re-use – Access pattern: • Random, small blocks (optimize for cache) • 3 streams, large blocks (vector-like architectures) Flow control – Simple / predictable 6 © 2009 IBM Corporation
7.
Chapter 2: Application
optimized supercomputers Parallelization Parallelization strategy – Spatial domain decomposition to partition the simulation domain into small 3d sub- domains, one of the sub-domain is assigned to each processor. Nearest neighbour communication – 3-4 dimensional torus Homogeneous communication patterns Large bandwidth Access pattern – Medium size messages = O(10) kBytes (large local problem size) – Small messages = O(0.1) kBytes (small local problem size) 7 © 2009 IBM Corporation
8.
Chapter 2: Application
optimized supercomputers Performance Signature: caxpy Multiply a Vector X by a Scalar, Add to a Vector Y, and Store in the Vector Y. Task: where is a complex scalar RF and are complex 3x4 matrices Operation per i: = 96 FLOPS M Information transfer between storage and register file (front-end to processing device): – Load: = 48 8-byte words – Store: = 24 8-byte words Balance: = 1.3 FLOPS / word 8 © 2009 IBM Corporation
9.
Chapter 2: Application
optimized supercomputers Sustained Performance Bandwidth/throughput of a device: Time needed to execute task i: where amount of processed data latency Efficiency is – “Ideal” execution time – “Real” execution time 9 © 2009 IBM Corporation
10.
Chapter 2: Application
optimized supercomputers Relevant Hardware Characteristics Floating point unit throughput: – Caveat: Processor instruction set matching • No support for complex arithmetic's (e.g. Cell/B.E.) • Additional shuffle operations needed. Memory bandwidth: – Multi-level memory hierarchy • External memory • Cache • Register file 10 © 2009 IBM Corporation
11.
Chapter 2: Application
optimized supercomputers Balanced Hardware Example caxpy: Processor FPU throughput Memory bandwidth [FLOPS / cycle] [words / cycle] [FLOPS / word] apeNEXT 8 2 4 QCDOC (MM) 2 0.63 3.2 QCDOC (LS) 2 2 1 Xeon 2 0.29 7 GPU 128 x 2 17.3 (*) 14.8 Cell/B.E. (MM) 8x4 1 32 Cell/B.E. (LS) 8x4 8x4 2 11 © 2009 IBM Corporation
12.
Chapter 2: Application
optimized supercomputers Cell/B.E. Architecture 12 © 2009 IBM Corporation
13.
Chapter 2: Application
optimized supercomputers Balanced Systems ?!? 13 © 2009 IBM Corporation
14.
Chapter 2: Application
optimized supercomputers … but are they Reliable, Available and Serviceable ?!? 14 © 2009 IBM Corporation
15.
Chapter 3: QPACE Collaboration
and Credits QPACE = QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.) Academic Partners – University Regensburg S. Heybrock, D. Hierl, T. Maurer, N. Meyer, A. Nobile, A. Schaefer, S. Solbrig, T. Streuer, T. Wettig – University Wuppertal Z. Fodor, A. Frommer, M. Huesken – University Ferrara M. Pivanti, F. Schifano, R. Tripiccione – University Milano H. Simma – DESY Zeuthen D.Pleiter, K.-H. Sulanke, F. Winter – Research Lab Juelich M. Drochner, N. Eicker, T. Lippert Industrial Partner – IBM (DE, US, FR) H. Baier, H. Boettiger, A. Castellane, J.-F. Fauh, U. Fischer, G. Goldrian, C. Gomez, T. Huth, B. Krill, J. Lauritsen, J. McFadden, I. Ouda, M. Ries, H.J. Schick, J.-S. Vogt Main Funding – DFG (SFB TR55), IBM Support by Others – Eurotech (IT) , Knuerr (DE), Xilinx (US) 15 © 2009 IBM Corporation
16.
Project Timetable 01/08
Official project start 06/08 Node card bring-up 10/08 Fully populated backplane 01/09 Hardware integration tests 02-03/09 Release to manufacturing 05/09 Integration of 1st rack 07/09 Deployment of 2 racks at JSC 08/09 Deployment of 4 racks at JSC and 4 racks at University Wuppertal complete 16 © 2009 IBM Corporation
17.
Production Chain Major steps
– Pre-integration at University Regensburg – Integration at IBM / Boeblingen – Installation at FZ Juelich and University Wuppertal 17 © 2009 IBM Corporation
18.
Chapter 3: QPACE Concept
System – Node card with IBM® PowerXCell™ 8i processor and network processor (NWP) • Important feature: fast double precision arithmetic's – Commodity processor interconnected by a custom network – Custom system design – Liquid cooling system Rack parameters – 256 node cards • 26 TFLOPS peak (double precision) • 1 TB Memory – O(35) kWatt power consumption Applications – Target sustained performance of 20-30% – Optimized for calculations in theoretical particle physics: Simulation of Quantum Chromodynamics 18 © 2009 IBM Corporation
19.
Chapter 3: QPACE Networks
Torus network – Nearest-neighbor communication, 3-dimensional torus topology – Aggregate bandwidth 6 GByte/s per node and direction – Remote DMA communication (local store to local store) Interrupt tree network – Evaluation of global conditions and synchronization – Global Exceptions – 2 signals per direction Ethernet network – 1 Gigabit Ethernet link per node card to rack-level switches (switched network) – I/O to parallel file system (user input / output) – Linux network boot – Aim of O(10) GB bandwidth per rack 19 © 2009 IBM Corporation
20.
Chapter 3: QPACE
Root Card (16 per rack) Backplane (8 per rack) Node Card (256 per rack) Power Supply and Power Adapter Card (24 per rack) Rack 20 © 2009 IBM Corporation
21.
Chapter 3: QPACE Node
Card Components – IBM PowerXCell 8i processor 3.2 GHZ – 4 Gigabyte DDR2 memory 800 MHZ with ECC – Network processor (NWP) Xilinx FPGA LX110T FPGA – Ethernet PHY – 6 x 1GB/s external links using PCI Express physical layer – Service Processor (SP) Freescale 52211 – FLASH (firmware and FPGA configuration) – Power subsystem – Clocking Network Processor – FLEXIO interface to PowerXCell 8i processor, 2 bytes with 3 GHZ bit rate – Gigabit Ethernet – UART FW Linux console – UART SP communication – SPI Master (boot flash) – SPI Slave for training and configuration – GPIO 21 © 2009 IBM Corporation
22.
Chapter 3: QPACE Node
Card Network Processor Network PHYs PowerXCell 8i (FPGA) Memory Processor 22 © 2009 IBM Corporation
23.
Chapter 3: QPACE Node
Card DDR2 DDR2 DDR2 DDR2 800MHz I2C Power SPI RW Subsystem (Debug) PowerXCell 8i FLEXIO FLEXIO Clocking 6GB/s 6GB/s RS232 SPI I2C SP FPGA Virtex-5 UART Freescale MCF52211 GigE PHY SPI 384 IO@250MHZ Flash 4*8*2*6 = 384 IO 680 available (LX110T) 6x 1GB/s PHY Compute Network 23 © 2009 IBM Corporation
24.
Chapter 3: QPACE Network
Processor x+ Link PHY Slices 92 % Interface PINs 86 % x- Link LUT-FF pairs 73 % PHY Interface Flip-Flops 55 % Network Logic LUTs 53 % z- FlexIO Routing Link BRAM / FIFOs 35 % PHY Interface Interface Arbitration FIFOs Ethernet PHY Configuration Interface Global Flip-Flops LUTs Signals Processor Interface 53 % 46 % Serial Interfaces Torus 36 % 39 % SPI Flash Ethernet 4% 2% 24 © 2009 IBM Corporation
25.
Chapter 3: QPACE Network
Processor FlexIO RocketIO IBM: • RocketIO Logic IOC IOIF IOC ((IOIF) ) FELX iO • IOC Logic • GBIF Logic Slave GBIF Master Receive Requests Send Requests Switch / Address Decoder / FIFOs / Bus Controller Academic Partners: • Network Processor Logic 6 x 1GB/S 25 © 2009 IBM Corporation
26.
Chapter 3: QPACE Processor
Bus Interface FlexIO Interface – High bandwidth interface between IBM PowerXCell 8i processor and Xilinx Viretx-5 FPGA – Implementation from Rambus Inc – Optimized for intra-board environments – Uses RocketIO GPT transceiver features – Requires link training after power-on • Phase calibration (aligns the data for optimal sampling point) • Parallel calibration (synchronizes the receive deserializer with the transmit serializer) • Levelization calibration (aligns all data lanes) Challenges – Speed, Latency, Bandwidth and Timing (Clock) – 3 Gbyte/sec communication channel – 2 Byte link wide 26 © 2009 IBM Corporation
27.
Chapter 3: QPACE Torus
Network Physical Layer Physical layer – 10GbE @ 2.5 GHz → 1 GByte/s Eye diagram for bad case link – 3.125 GHz – 40 cm PCB, 50 cm cable, – 1 PCB-PCB, 2 PCB-cable connectors Custom data link layer – Fixed size messages – 128 Byte payload + 4 Byte header + 4 Byte CRC → Minimal protocol overhead 27 © 2009 IBM Corporation
28.
Torus Network Architecture
2-sided communication – Node A initiates send, node B initiates receive – Send and receive commands have to match – Multiple use of same link by virtual channels Send / receive from / to local store or main memory – CPU → NWP • CPU moves data and control info to NWP • Back-pressure controlled – NWP → NWP • Independent of processor • Each datagram has to be acknowledged – NWP → CPU • CPU provides credits to NWP • NWP writes data into processor • Completion indicated by notification 28 © 2009 IBM Corporation
29.
Chapter 3: QPACE Torus
Network Reconfiguration Torus network PHYs provide 2 interfaces – Used for network reconfiguration b selecting primary or secondary interface Example – 1x8 or 2x4 node-cards Partition sizes (1,2,2N) * (1,2,4,8,16) * (1,2,4,8) – N ... number of racks connected via cables 29 © 2009 IBM Corporation
30.
Chapter 3: QPACE Cooling
Concept – Node card mounted in housing = heat conductor – Housing connected to liquid cooled cold plate – Critical thermal interfaces • Processor – thermal box • Thermal box – cold plate – Dry connection between node card and cooling circuit Node card housing – Closed node card housing acts as heat conductor. – Heat conductor is linked with liquid-cooled “cold plate” – Cold Plate is placed between two rows of node cards. Simulation Results for one Cold Plate – Ambient 12°C – Water 10 L / min – Load 4224 Watt 2112 Watt / side 30 © 2009 IBM Corporation
31.
Chapter 3: QPACE Power
Efficiency 31 © 2009 IBM Corporation
32.
Chapter 4: Review
and Summary Project Review Hardware design – Almost all critical problems solved in time – Network Processor implementation still a challenge – No serious problems due to wrong design decisions Hardware status – Manufacturing quality good: Small bone pile, few defects during operation. Time schedule – Essentially stayed within planned schedule – Implementation of system / application software delayed 32 © 2009 IBM Corporation
33.
Chapter 4: Review
and Summary Summary QPACE is a new, scalable LQCD machine based on the PowerXCell 8i processor. Design highlights – FPGA directly attached to processor – LQCD optimized, low latency torus network – Novel, cost-efficient liquid cooling system – High packaging density – Very power efficient architecture O(20-30%) sustained performance for key LQCD kernels is reached / feasible → O(10-16) TFLOPS / rack (SP) 33 © 2009 IBM Corporation
34.
Chapter 5: Unforgettable
Impressions ;-) 34 © 2009 IBM Corporation
35.
Chapter 5: Unforgettable
Impressions ;-) 35 © 2009 IBM Corporation
36.
Chapter 5: Unforgettable
Impressions ;-) 36 © 2009 IBM Corporation
37.
Chapter 5: Unforgettable
Impressions ;-) 37 © 2009 IBM Corporation
38.
Chapter 5: Unforgettable
Impressions ;-) 38 © 2009 IBM Corporation
39.
Chapter 5: Unforgettable
Impressions ;-) 39 © 2009 IBM Corporation
40.
Chapter 5: Unforgettable
Impressions ;-) 40 © 2009 IBM Corporation
41.
Chapter 5: Unforgettable
Impressions ;-) 41 © 2009 IBM Corporation
42.
Chapter 5: Unforgettable
Impressions ;-) 42 © 2009 IBM Corporation
43.
Chapter 5: Unforgettable
Impressions ;-) 43 © 2009 IBM Corporation
44.
Chapter 5: Unforgettable
Impressions ;-) 44 © 2009 IBM Corporation
45.
45
© 2009 IBM Corporation
46.
Thank you very
much for your attention. 46 © 2009 IBM Corporation
47.
Disclaimer IBM®, DB2®,
MVS/ESA, AIX®, S/390®, AS/400®, OS/390®, OS/400®, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®, Informix und Informix® Dynamic ServerTM, IBM, BladeCenter and POWER and others are trademarks of the IBM Corporation in US and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Linux is a trademark of Linus Torvalds in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others. The information and materials are provided on an "as is" basis and are subject to change. 47 © 2009 IBM Corporation
Descargar ahora