SlideShare una empresa de Scribd logo
1 de 30
Systems and Technology Group
© 2005 IBM Corporation
Cell today and tomorrow
H. Peter Hofstee, Ph. D.
Cell Chief Scientist and
Chief Architect, Cell Synergistic Processor
IBM Systems and Technology Group
SCEI/Sony Toshiba IBM (STI) Design Center
Austin, Texas
Systems and Technology Group
© 2005 IBM Corporation
2
Acknowledgements
ƒ Cell Broadband Engine (“Cell”) is the result of a deep
partnership between SCEI/Sony, Toshiba, and IBM
ƒ Cell represents the work of more than 400 people
starting in 2001and a design investment of about
$400M
Systems and Technology Group
© 2005 IBM Corporation
3
Agenda
ƒ Basics
– Performance: Power wall , Memory/Latency wall
– Multicore and specialization
ƒ Cell
– Asynchronous load/store (DMA)
– Microarchitecture decisions
ƒ Cell Performance
– Things that work really well
– Things that will likely work well
– Question marks
ƒ Cell Systems
ƒ Future of Cell and things for Academia to look at
Systems and Technology Group
© 2005 IBM Corporation
4
BASICS
Systems and Technology Group
© 2005 IBM Corporation
5
Computing Paradigm Shift
Today:
– Single thread performance hitting limits
• Architecture and process technology saturated
• Small percentage gains expected to remain
But:
– Signs of paradigm shift to application
specific system customization
• Large multiple gains for specific applications
• Cell
–~50x on TRE, ~100x on FFT
• Datapower
–XML acceleration
• Many examples in embedded markets
Future:
– Greater performance demands
• Immersive Interaction
–3D, real-time, gaming inspired applications
–Rich media, data-intensive content
• Sensory Computing
–New network tier
–Autonomous agents performing intelligent analysis on streaming data
>A&D: battlefield coordination
Single Thread Performance
SPECint
Single thread
performance
growth rate slows
dramatically
Historical Trend
45% CGR
Systems and Technology Group
© 2005 IBM Corporation
6
Solutions
ƒ Memory wall:
– More slower threads
– Asynchronous loads
ƒ Efficiency wall:
– More slower threads
– Specialized function
ƒ Power wall:
– Reduce transistor power
• operating voltage
• limit oxide thickness scaling
• limit channel length
– Reduce switching per function
INCREASE
CONCURRENCY:
Multi-Core
INCREASE
SPECIALIZATION:
Non-Homogeneous
Systems and Technology Group
© 2005 IBM Corporation
7
CELL
Systems and Technology Group
© 2005 IBM Corporation
8
Motivation: Cell Goals
ƒ Outstanding performance, especially on
game/multimedia applications.
– Challenges: Power Wall, Frequency Wall, Memory Wall
ƒ Real time responsiveness to the user and the
network.
– Challenges: Real-time in an SMP environment, Security
ƒ Applicable to a wide range of platforms.
– Challenge: Maintain programmability while increasing performance
ƒ Support an introduction in 2005/6.
– Challenge: Structure innovation such that 5yr. schedule can be met
Systems and Technology Group
© 2005 IBM Corporation
9
Cell Concept
ƒ Compatibility with 64b Power Architecture™
– Builds on and leverages IBM investment and community
ƒ Increased efficiency and performance
– Non Homogenous Coherent Chip Multiprocessor
• Allows an attack on the “Frequency Wall”
– Streaming DMA architecture attacks “Memory Wall”
– High design frequency, low operating voltage attacks “Power Wall”
– Highly optimized implementation
ƒ Interface between user and networked world
– Flexibility and security
– Multi-OS support, including RTOS/non-RTOS
– Architectural extensions for real-time management
Systems and Technology Group
© 2005 IBM Corporation
10
Cell Architecture is …
COHERENT BUS
Power
ISA
MMU/BIU
Power
ISA
MMU/BIU
…
IO
transl.
Memory
Incl. coherence/memory
compatible with 32/64b Power Arch. Applications and OS’s
64b Power Architecture™
Systems and Technology Group
© 2005 IBM Corporation
11
Cell Architecture is … 64b Power Architecture™
COHERENT BUS (+RAG)
Power
ISA
+RMT
MMU/BIU
+RMT
Power
ISA
+RMT
MMU/BIU
+RMT
IO
transl.
Memory
Plus
Memory
Flow Control (MFC)
MMU/DMA
+RMT
Local Store
Memory
MMU/DMA
+RMT
Local Store
Memory
LS Alias
LS Alias
…
…
…
Systems and Technology Group
© 2005 IBM Corporation
12
Cell Architecture is … 64b Power Architecture™+ MFC
COHERENT BUS (+RAG)
Power
ISA
+RMT
MMU/BIU
+RMT
Power
ISA
+RMT
MMU/BIU
+RMT
IO
transl.
Memory
Plus
Synergistic
Processors
MMU/DMA
+RMT
Local Store
Memory
MMU/DMA
+RMT
Local Store
Memory
LS Alias
LS Alias
…
…
…
Syn.
Proc.
ISA
Syn.
Proc.
ISA
Systems and Technology Group
© 2005 IBM Corporation
13
Asynchronous Load/Store (DMA)
ƒ THE major architectural decision in Cell
– Motivated by memory wall
– Enabled by a large market
ƒ Fundamental change to programmers.
– Transition from demand-fetch to software controlled
prefetch
– Bill Dally’s “plumbing project analogy”
– “Bucket brigade” analogy
Systems and Technology Group
© 2005 IBM Corporation
14
Permute Unit
Load-Store Unit
Floating-Point Unit
Fixed-Point Unit
Branch Unit
Channel Unit
Result Forwarding and Staging
Register File
Local Store
(256kB)
Single Port SRAM
128B Read 128B Write
DMA Unit
Instruction Issue Unit / Instruction Line Buffer
8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle
64 Byte/Cycle
On-Chip Coherent Bus
SPE BLOCK DIAGRAM
Systems and Technology Group
© 2005 IBM Corporation
15
Other (Micro)Architectural and Decisions
ƒ Large shared register file
ƒ Local store size tradeoffs
ƒ Dual issue, In order
ƒ Software branch prediction
ƒ Channels
Microarchitecture decisions, more so than architecture decisions
show bias towards compute-intensive codes
Systems and Technology Group
© 2005 IBM Corporation
16
Systems and Technology Group
© 2005 IBM Corporation
17
First pass hardware measurement in the
Lab - Nominal Voltage = 1V
0.9 1 1.1 1.2
Supply Voltage
3
3.5
4
4.5
Frequency
[GHz]
Fmax
Hardware Performance Measurement
(85°C)
ƒ 250M transistors … 235mm2
ƒ Top frequency >4GHz
– Lab conditions
– Most efficient at ~1V
ƒ > 200 GFlops (SP) @3.2GHz
ƒ > 20 GFlops (DP) @3.2GHz
ƒ Up to 25.6 GB/s memory B/W
ƒ Up to 70+ GB/s I/O B/W
– Practical ~ 50GB/s
ƒ 100+ simultaneous bus
transactions
– 16+8 entry DMA queue per SPE
CELL PROCESSOR STATISTICS
Systems and Technology Group
© 2005 IBM Corporation
18
CELL PERFORMANCE
(AND PROGRAMMING)
Systems and Technology Group
© 2005 IBM Corporation
19
Things that work extremely well today ( up to 100x)
ƒ Problem can be re-coded
ƒ Predictable non-trivial memory access pattern
– Can build scatter-gather lists
ƒ Problem can benefit from SIMD
ƒ Focus on 32b float, or <=32b integer
ƒ Examples:
– FFTw ( best result about 100GFlops )
– Terrain Rendering Engine
– Volume rendering
ƒ Typical code is double-buffered gather-compute-scatter
Systems and Technology Group
© 2005 IBM Corporation
20
Things that work well today ( about 10-20x)
ƒ Compute bound codes
ƒ Small enough to be rewritten
ƒ Main datatype is 32b float or <= 32b Int
ƒ Benefits from SIMD
ƒ Examples:
– Crypto codes ( RSA, SHA, DES, etc. etc. etc.)
– Media codes ( MPEG 2, MPEG 4, H.264, JPEG )
– … many many others …
Systems and Technology Group
© 2005 IBM Corporation
21
Things likely to work well
ƒ Library .. Device/API based applications
– Graphics and physics and sound and …
ƒ Scientific codes … library based
– No rewrite
– If granularity is ok
Systems and Technology Group
© 2005 IBM Corporation
22
Question marks
ƒ Can a compiler based approach, without restructuring code
specifically for the SPEs result in a chip-level advantage?
– About 3-4x more SPEs in same area or power
– But, have to compiler manage local store
ƒ Interesting benchmarks: SpecFP, MediaBench, EEMBC, etc.
– New more explicitly parallel benchmarks?
ƒ Would you ever use an SPE for a SpecInt-type workload?
Systems and Technology Group
© 2005 IBM Corporation
23
Cell based systems
Systems and Technology Group
© 2005 IBM Corporation
24
Cell Processor Isn't Just for Games.
Innovative Chip is best high-performance embedded processor of 2005
We chose the Cell BE as the best high-performance embedded processor of 2005 because of its
innovative design and future potential....Even if the Cell BE accumulates no more design wins, the
PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of the
console. That would make the Cell BE one of the most successful microprocessors in history.
“…Cell could power
hundreds of new apps,
create a new video-
processing industry and
fuel a multibillion-dollar
build out of tech hardware
over ten years.”
-- Forbes
“It was originally conceived
as the microprocessor to
power Sony's [PS3], but it is
expected to find a home in
lots of other broadband-
connected consumer items
and in servers too.”
-- IEEE Spectrum
Systems and Technology Group
© 2005 IBM Corporation
25
Cell BE based Systems: SCEI, Mercury, … and IBM!
Systems and Technology Group
© 2005 IBM Corporation
26
ƒ Toshiba Announces Cell Chip Set and Cell Reference Set
20 September, 2005
ƒ Tokyo--Toshiba Corporation today took major steps toward creating a comprehensive development environment for applications based on the Cell
microprocessor with the announcement of a Cell Chip Set consisting of the new microprocessor and key peripheral chips, and a Cell Reference Set
development platform. The chip set and the reference set will support development of digital consumer products and communication equipment that
draw on the powerful broadband capabilities of the Cell microprocessor.
ƒ "Software developers and other customers will be eager to make full use of Cell's unsurpassed multitasking and real-time processing functions," said
Tomotaka Saito, General Manager of Broadband System LSI Division, Toshiba Corporation Semiconductor Company. "The Cell Chip Set and Reference
Set will support them in developing products and applications that reach new levels of performance and excitement."
ƒ The Cell Chip Set is composed of the Cell processor, a Super Companion Chip—the interface between Cell and external audio/visual input/output
equipment—and a power supply system chip optimized to drive the Cell microprocessor.
ƒ The Cell Reference Set development platform consists of a Cell microprocessor, peripheral chips mounted on a printed circuit board with a general-use
interface, peripheral equipment, such as DVD and HDD drives, and cooling equipment required for stable operation, all housed in case. The available
software includes operating systems and middleware and software development tools. This combination of hardware and software reduces
development costs, cuts turnaround time and simplifies testing.
ƒ Toshiba expects to start marketing the chips set and reference set in April 2006 or later, once it has assured supply of the component chips and all
related documentation.
ƒ Toshiba Corporation will showcase the Cell Chip Set and Cell Reference Set, and demonstrate digital media applications on the Cell Reference Set at
the Toshiba booth of CEATEC JAPAN 2005, from October 4 to October 8 at Makuhari Messe.
ƒ Outlines of Cell Chip Set and Cell Reference Set:
ƒ Cell Chip Set:
ƒ Cell microprocessor: Next generation microprocessor jointly developed by IBM, Sony Group and Toshiba. Adopts a multi-core architecture and offers
super high-speed data transfer capability. The processor is expected to find application in equipment handling data-rich media applications.
ƒ Super Companion Chip: Cell's peripheral LSI, which houses audio and image interfaces supporting Cell's super high-speed data transfer capability.
The chip also supports a group of interfaces for various systems (video, audio input/output, digital AV interface, IEEE1394, digital tuner interface) and a
group of interfaces that make it easier to connect standard input/output devices (standard bus interface, high speed network interface and storage
device interface.)
ƒ Highly efficient power supply system: The supply system is optimized to drive the Cell processor. Includes controller LSI, TB6814FLG, which makes it
possible to offer high-speed response and high-accuracy required by Cell. Includes multi-chip module, TB7003FL, which embeds power device in a
small 8mm x 8mm package. Realizes small, high-power and high-efficient power supply system which has 4 phases of 1MHz high-speed switches.
ƒ Cell Reference Set:
ƒ Development platform for Cell-based, next generation digital consumer products,
ƒ High-speed multi-bit wiring technology and wide variety of interfaces that supports broadband system architecture
ƒ Linux and ITRON are both provided on the hypervisor OS that manages hardware resources. This approach facilitates the reuse of application property.
ƒ A comprehensive development environment including the Eclipse framework based editor, compiler, debugger, and performance monitor.
ƒ An audio-visual application model includes simultaneous multiple digital and analog broadcast television reception, recording and playback.
SOURCE: TOSHIBA
Systems and Technology Group
© 2005 IBM Corporation
27
Future of Cell and
Things for Academia to look at
Systems and Technology Group
© 2005 IBM Corporation
28
User Interaction Drives Innovation in Computing
Time
Punch Cards
Green Screen/
Teletype
Spreadsheet
WWW
Gaming
Main Frame
Multitasking
Main Frame
Batch
Client/Server
Internet
Mini-Computer
WYSIWYG
Stand Alone PC
Windows
Word
Processing
Level
of
Interaction
Immersive Interaction
Online Gaming
Source: J.A. Kahle
Systems and Technology Group
© 2005 IBM Corporation
29
Characteristics of the Latest Transition in User Interaction
ƒ Windows
ƒ Click and wait…
ƒ Client-centric
ƒ User data accessible from
client only
ƒ Device-centric
ƒ Connected
ƒ Wired, sporadic
ƒ E-mail/newsgroups
ƒ Immersive, 3D interactivity
ƒ Real-time
ƒ Distributed
ƒ User data accessible
everywhere
ƒ Device-agnostic
ƒ Collaborative
ƒ Wireless, always-on
ƒ Text messaging/blogs
Systems and Technology Group
© 2005 IBM Corporation
30
Some things for Academia to look at
ƒ Specialization in computer architectures
– Beyond OS/Application, what specialization makes sense in
a general-(enough) purpose chip/system multiprocessor?
ƒ Programming paradigms and compilation techniques
to deal with memory wall
ƒ New types of applications (often real-time) made
possible by a dramatic jump in performance
– E.g. gesture and emotion recognition

Más contenido relacionado

Similar a Cell Today and Tomorrow - IBM Systems and Technology Group

Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john maoNAVER D2
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linuxBeGooden-IT Consulting
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)Shivam Gupta
 
BrightTalk session-The right SDS for your OpenStack Cloud
BrightTalk session-The right SDS for your OpenStack CloudBrightTalk session-The right SDS for your OpenStack Cloud
BrightTalk session-The right SDS for your OpenStack CloudEitan Segal
 
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationCell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationSlide_N
 
The Future of Hardware and Software Design Technologies
The Future of Hardware and Software Design TechnologiesThe Future of Hardware and Software Design Technologies
The Future of Hardware and Software Design TechnologiesS3
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?Hannes Tschofenig
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computePerry Lea
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC designAishwaryaRavishankar8
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNvenkatraman227
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)Heiko Joerg Schick
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 

Similar a Cell Today and Tomorrow - IBM Systems and Technology Group (20)

SoC: System On Chip
SoC: System On ChipSoC: System On Chip
SoC: System On Chip
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
 
SOC Design Challenges and Practices
SOC Design Challenges and PracticesSOC Design Challenges and Practices
SOC Design Challenges and Practices
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
BrightTalk session-The right SDS for your OpenStack Cloud
BrightTalk session-The right SDS for your OpenStack CloudBrightTalk session-The right SDS for your OpenStack Cloud
BrightTalk session-The right SDS for your OpenStack Cloud
 
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationCell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
 
The Future of Hardware and Software Design Technologies
The Future of Hardware and Software Design TechnologiesThe Future of Hardware and Software Design Technologies
The Future of Hardware and Software Design Technologies
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 

Más de Slide_N

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...Slide_N
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfSlide_N
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive EntertainmentSlide_N
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3Slide_N
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingSlide_N
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfSlide_N
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2Slide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 

Más de Slide_N (20)

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 

Último

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 

Último (20)

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 

Cell Today and Tomorrow - IBM Systems and Technology Group

  • 1. Systems and Technology Group © 2005 IBM Corporation Cell today and tomorrow H. Peter Hofstee, Ph. D. Cell Chief Scientist and Chief Architect, Cell Synergistic Processor IBM Systems and Technology Group SCEI/Sony Toshiba IBM (STI) Design Center Austin, Texas
  • 2. Systems and Technology Group © 2005 IBM Corporation 2 Acknowledgements ƒ Cell Broadband Engine (“Cell”) is the result of a deep partnership between SCEI/Sony, Toshiba, and IBM ƒ Cell represents the work of more than 400 people starting in 2001and a design investment of about $400M
  • 3. Systems and Technology Group © 2005 IBM Corporation 3 Agenda ƒ Basics – Performance: Power wall , Memory/Latency wall – Multicore and specialization ƒ Cell – Asynchronous load/store (DMA) – Microarchitecture decisions ƒ Cell Performance – Things that work really well – Things that will likely work well – Question marks ƒ Cell Systems ƒ Future of Cell and things for Academia to look at
  • 4. Systems and Technology Group © 2005 IBM Corporation 4 BASICS
  • 5. Systems and Technology Group © 2005 IBM Corporation 5 Computing Paradigm Shift Today: – Single thread performance hitting limits • Architecture and process technology saturated • Small percentage gains expected to remain But: – Signs of paradigm shift to application specific system customization • Large multiple gains for specific applications • Cell –~50x on TRE, ~100x on FFT • Datapower –XML acceleration • Many examples in embedded markets Future: – Greater performance demands • Immersive Interaction –3D, real-time, gaming inspired applications –Rich media, data-intensive content • Sensory Computing –New network tier –Autonomous agents performing intelligent analysis on streaming data >A&D: battlefield coordination Single Thread Performance SPECint Single thread performance growth rate slows dramatically Historical Trend 45% CGR
  • 6. Systems and Technology Group © 2005 IBM Corporation 6 Solutions ƒ Memory wall: – More slower threads – Asynchronous loads ƒ Efficiency wall: – More slower threads – Specialized function ƒ Power wall: – Reduce transistor power • operating voltage • limit oxide thickness scaling • limit channel length – Reduce switching per function INCREASE CONCURRENCY: Multi-Core INCREASE SPECIALIZATION: Non-Homogeneous
  • 7. Systems and Technology Group © 2005 IBM Corporation 7 CELL
  • 8. Systems and Technology Group © 2005 IBM Corporation 8 Motivation: Cell Goals ƒ Outstanding performance, especially on game/multimedia applications. – Challenges: Power Wall, Frequency Wall, Memory Wall ƒ Real time responsiveness to the user and the network. – Challenges: Real-time in an SMP environment, Security ƒ Applicable to a wide range of platforms. – Challenge: Maintain programmability while increasing performance ƒ Support an introduction in 2005/6. – Challenge: Structure innovation such that 5yr. schedule can be met
  • 9. Systems and Technology Group © 2005 IBM Corporation 9 Cell Concept ƒ Compatibility with 64b Power Architecture™ – Builds on and leverages IBM investment and community ƒ Increased efficiency and performance – Non Homogenous Coherent Chip Multiprocessor • Allows an attack on the “Frequency Wall” – Streaming DMA architecture attacks “Memory Wall” – High design frequency, low operating voltage attacks “Power Wall” – Highly optimized implementation ƒ Interface between user and networked world – Flexibility and security – Multi-OS support, including RTOS/non-RTOS – Architectural extensions for real-time management
  • 10. Systems and Technology Group © 2005 IBM Corporation 10 Cell Architecture is … COHERENT BUS Power ISA MMU/BIU Power ISA MMU/BIU … IO transl. Memory Incl. coherence/memory compatible with 32/64b Power Arch. Applications and OS’s 64b Power Architecture™
  • 11. Systems and Technology Group © 2005 IBM Corporation 11 Cell Architecture is … 64b Power Architecture™ COHERENT BUS (+RAG) Power ISA +RMT MMU/BIU +RMT Power ISA +RMT MMU/BIU +RMT IO transl. Memory Plus Memory Flow Control (MFC) MMU/DMA +RMT Local Store Memory MMU/DMA +RMT Local Store Memory LS Alias LS Alias … … …
  • 12. Systems and Technology Group © 2005 IBM Corporation 12 Cell Architecture is … 64b Power Architecture™+ MFC COHERENT BUS (+RAG) Power ISA +RMT MMU/BIU +RMT Power ISA +RMT MMU/BIU +RMT IO transl. Memory Plus Synergistic Processors MMU/DMA +RMT Local Store Memory MMU/DMA +RMT Local Store Memory LS Alias LS Alias … … … Syn. Proc. ISA Syn. Proc. ISA
  • 13. Systems and Technology Group © 2005 IBM Corporation 13 Asynchronous Load/Store (DMA) ƒ THE major architectural decision in Cell – Motivated by memory wall – Enabled by a large market ƒ Fundamental change to programmers. – Transition from demand-fetch to software controlled prefetch – Bill Dally’s “plumbing project analogy” – “Bucket brigade” analogy
  • 14. Systems and Technology Group © 2005 IBM Corporation 14 Permute Unit Load-Store Unit Floating-Point Unit Fixed-Point Unit Branch Unit Channel Unit Result Forwarding and Staging Register File Local Store (256kB) Single Port SRAM 128B Read 128B Write DMA Unit Instruction Issue Unit / Instruction Line Buffer 8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle 64 Byte/Cycle On-Chip Coherent Bus SPE BLOCK DIAGRAM
  • 15. Systems and Technology Group © 2005 IBM Corporation 15 Other (Micro)Architectural and Decisions ƒ Large shared register file ƒ Local store size tradeoffs ƒ Dual issue, In order ƒ Software branch prediction ƒ Channels Microarchitecture decisions, more so than architecture decisions show bias towards compute-intensive codes
  • 16. Systems and Technology Group © 2005 IBM Corporation 16
  • 17. Systems and Technology Group © 2005 IBM Corporation 17 First pass hardware measurement in the Lab - Nominal Voltage = 1V 0.9 1 1.1 1.2 Supply Voltage 3 3.5 4 4.5 Frequency [GHz] Fmax Hardware Performance Measurement (85°C) ƒ 250M transistors … 235mm2 ƒ Top frequency >4GHz – Lab conditions – Most efficient at ~1V ƒ > 200 GFlops (SP) @3.2GHz ƒ > 20 GFlops (DP) @3.2GHz ƒ Up to 25.6 GB/s memory B/W ƒ Up to 70+ GB/s I/O B/W – Practical ~ 50GB/s ƒ 100+ simultaneous bus transactions – 16+8 entry DMA queue per SPE CELL PROCESSOR STATISTICS
  • 18. Systems and Technology Group © 2005 IBM Corporation 18 CELL PERFORMANCE (AND PROGRAMMING)
  • 19. Systems and Technology Group © 2005 IBM Corporation 19 Things that work extremely well today ( up to 100x) ƒ Problem can be re-coded ƒ Predictable non-trivial memory access pattern – Can build scatter-gather lists ƒ Problem can benefit from SIMD ƒ Focus on 32b float, or <=32b integer ƒ Examples: – FFTw ( best result about 100GFlops ) – Terrain Rendering Engine – Volume rendering ƒ Typical code is double-buffered gather-compute-scatter
  • 20. Systems and Technology Group © 2005 IBM Corporation 20 Things that work well today ( about 10-20x) ƒ Compute bound codes ƒ Small enough to be rewritten ƒ Main datatype is 32b float or <= 32b Int ƒ Benefits from SIMD ƒ Examples: – Crypto codes ( RSA, SHA, DES, etc. etc. etc.) – Media codes ( MPEG 2, MPEG 4, H.264, JPEG ) – … many many others …
  • 21. Systems and Technology Group © 2005 IBM Corporation 21 Things likely to work well ƒ Library .. Device/API based applications – Graphics and physics and sound and … ƒ Scientific codes … library based – No rewrite – If granularity is ok
  • 22. Systems and Technology Group © 2005 IBM Corporation 22 Question marks ƒ Can a compiler based approach, without restructuring code specifically for the SPEs result in a chip-level advantage? – About 3-4x more SPEs in same area or power – But, have to compiler manage local store ƒ Interesting benchmarks: SpecFP, MediaBench, EEMBC, etc. – New more explicitly parallel benchmarks? ƒ Would you ever use an SPE for a SpecInt-type workload?
  • 23. Systems and Technology Group © 2005 IBM Corporation 23 Cell based systems
  • 24. Systems and Technology Group © 2005 IBM Corporation 24 Cell Processor Isn't Just for Games. Innovative Chip is best high-performance embedded processor of 2005 We chose the Cell BE as the best high-performance embedded processor of 2005 because of its innovative design and future potential....Even if the Cell BE accumulates no more design wins, the PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of the console. That would make the Cell BE one of the most successful microprocessors in history. “…Cell could power hundreds of new apps, create a new video- processing industry and fuel a multibillion-dollar build out of tech hardware over ten years.” -- Forbes “It was originally conceived as the microprocessor to power Sony's [PS3], but it is expected to find a home in lots of other broadband- connected consumer items and in servers too.” -- IEEE Spectrum
  • 25. Systems and Technology Group © 2005 IBM Corporation 25 Cell BE based Systems: SCEI, Mercury, … and IBM!
  • 26. Systems and Technology Group © 2005 IBM Corporation 26 ƒ Toshiba Announces Cell Chip Set and Cell Reference Set 20 September, 2005 ƒ Tokyo--Toshiba Corporation today took major steps toward creating a comprehensive development environment for applications based on the Cell microprocessor with the announcement of a Cell Chip Set consisting of the new microprocessor and key peripheral chips, and a Cell Reference Set development platform. The chip set and the reference set will support development of digital consumer products and communication equipment that draw on the powerful broadband capabilities of the Cell microprocessor. ƒ "Software developers and other customers will be eager to make full use of Cell's unsurpassed multitasking and real-time processing functions," said Tomotaka Saito, General Manager of Broadband System LSI Division, Toshiba Corporation Semiconductor Company. "The Cell Chip Set and Reference Set will support them in developing products and applications that reach new levels of performance and excitement." ƒ The Cell Chip Set is composed of the Cell processor, a Super Companion Chip—the interface between Cell and external audio/visual input/output equipment—and a power supply system chip optimized to drive the Cell microprocessor. ƒ The Cell Reference Set development platform consists of a Cell microprocessor, peripheral chips mounted on a printed circuit board with a general-use interface, peripheral equipment, such as DVD and HDD drives, and cooling equipment required for stable operation, all housed in case. The available software includes operating systems and middleware and software development tools. This combination of hardware and software reduces development costs, cuts turnaround time and simplifies testing. ƒ Toshiba expects to start marketing the chips set and reference set in April 2006 or later, once it has assured supply of the component chips and all related documentation. ƒ Toshiba Corporation will showcase the Cell Chip Set and Cell Reference Set, and demonstrate digital media applications on the Cell Reference Set at the Toshiba booth of CEATEC JAPAN 2005, from October 4 to October 8 at Makuhari Messe. ƒ Outlines of Cell Chip Set and Cell Reference Set: ƒ Cell Chip Set: ƒ Cell microprocessor: Next generation microprocessor jointly developed by IBM, Sony Group and Toshiba. Adopts a multi-core architecture and offers super high-speed data transfer capability. The processor is expected to find application in equipment handling data-rich media applications. ƒ Super Companion Chip: Cell's peripheral LSI, which houses audio and image interfaces supporting Cell's super high-speed data transfer capability. The chip also supports a group of interfaces for various systems (video, audio input/output, digital AV interface, IEEE1394, digital tuner interface) and a group of interfaces that make it easier to connect standard input/output devices (standard bus interface, high speed network interface and storage device interface.) ƒ Highly efficient power supply system: The supply system is optimized to drive the Cell processor. Includes controller LSI, TB6814FLG, which makes it possible to offer high-speed response and high-accuracy required by Cell. Includes multi-chip module, TB7003FL, which embeds power device in a small 8mm x 8mm package. Realizes small, high-power and high-efficient power supply system which has 4 phases of 1MHz high-speed switches. ƒ Cell Reference Set: ƒ Development platform for Cell-based, next generation digital consumer products, ƒ High-speed multi-bit wiring technology and wide variety of interfaces that supports broadband system architecture ƒ Linux and ITRON are both provided on the hypervisor OS that manages hardware resources. This approach facilitates the reuse of application property. ƒ A comprehensive development environment including the Eclipse framework based editor, compiler, debugger, and performance monitor. ƒ An audio-visual application model includes simultaneous multiple digital and analog broadcast television reception, recording and playback. SOURCE: TOSHIBA
  • 27. Systems and Technology Group © 2005 IBM Corporation 27 Future of Cell and Things for Academia to look at
  • 28. Systems and Technology Group © 2005 IBM Corporation 28 User Interaction Drives Innovation in Computing Time Punch Cards Green Screen/ Teletype Spreadsheet WWW Gaming Main Frame Multitasking Main Frame Batch Client/Server Internet Mini-Computer WYSIWYG Stand Alone PC Windows Word Processing Level of Interaction Immersive Interaction Online Gaming Source: J.A. Kahle
  • 29. Systems and Technology Group © 2005 IBM Corporation 29 Characteristics of the Latest Transition in User Interaction ƒ Windows ƒ Click and wait… ƒ Client-centric ƒ User data accessible from client only ƒ Device-centric ƒ Connected ƒ Wired, sporadic ƒ E-mail/newsgroups ƒ Immersive, 3D interactivity ƒ Real-time ƒ Distributed ƒ User data accessible everywhere ƒ Device-agnostic ƒ Collaborative ƒ Wireless, always-on ƒ Text messaging/blogs
  • 30. Systems and Technology Group © 2005 IBM Corporation 30 Some things for Academia to look at ƒ Specialization in computer architectures – Beyond OS/Application, what specialization makes sense in a general-(enough) purpose chip/system multiprocessor? ƒ Programming paradigms and compilation techniques to deal with memory wall ƒ New types of applications (often real-time) made possible by a dramatic jump in performance – E.g. gesture and emotion recognition