SlideShare una empresa de Scribd logo
1 de 36
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 1 of 36
3D V-CacheTM
: The Implementation of
a Hybrid-Bonded 64MB Stacked
Cache for a 7nm x86-64 CPU
J. Wuu1, R. Agarwal2, M. Ciraula1, C. Dietz1, B. Johnson1, D. Johnson1,
R. Schreiber3, R. Swaminathan3, W. Walker1, S. Naffziger1
1AMD, Fort Collins, CO, 2AMD, Santa Clara, CA, 3AMD, Austin, TX
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 2 of 36
Author Introduction
 B.S. in Electrical Engineering & M. Eng in
Electrical Engineering and Computer Science
from MIT, 1997
 Senior Fellow Design Engineer at AMD in Fort
Collins, CO
 Hewlett-Packard and Intel in Fort Collins, CO
between 1997 and 2006
 Interests include memory technology, memory
& cache designs, and 3D design DTCO
John Wuu
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 3 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 4 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 5 of 36
Large L3 Caches Provide IPC Uplift
1x 2x 4x 8x 16x 32x
IPC
%
Uplift
(Linear
Scale)
L3 Cache Size
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 6 of 36
0
2
4
6
5nm
7nm
14nm
20nm
28nm
32nm
45nm
Normalized
Cost/Yielded
mm
2004 2007 2010 2013 2016 2019
10/7nm
14nm
22nm
32nm
45nm
65nm
90nm
Barriers to Large Cache Size
 SRAM scaling & cost =
barriers to large on-die caches
 Cost
 Product flexibility
 Latency
Chiplets
MOORE’S LAW KEEPS SLOWING WHILE COSTS CONTINUE TO INCREASE
28nm 16nm 10nm 7nm 5nm
Analog
SRAM
Logic
Silicon Area Scaling by Function
[1] [2]
[1] Naffziger, VLSI Short Course, 2020, [2] Cost per yielded mm2 for a 250mm2 die
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 7 of 36
More than Moore
 2.5D chiplets can provide product flexibility and reduce
cost
 However, 3D can be even better!
 Improves effective memory latency
 Reduces long datapath and I/O’s dynamic powers
 Fits more transistors within a given package cavity size
*Hypothetical processor with large cache
*
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 8 of 36
AMD 3D V-CacheTM
 Industry’s first high-performance processor
product with Hybrid Bonded 3D cache die
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 9 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 10 of 36
AMD 3D V-CacheTM
Components: CCD
 “Zen 3” x86-64 CPU Core
Complex Die (CCD)
 TSMC 7nm technology
 8 cores per Core Complex
(CCX)
 32MB shared L3 Cache
 +19%1
IPC (Ave) vs. “Zen 2”
 81mm2
 AMD 3D V-Cache™ support
integrated from Day 1
1SEE ENDNOTES: R5K-003
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 11 of 36
AMD 3D V-CacheTM
Components: L3D
 AMD 3D V-Cache™
extended L3 Die (L3D)
 TSMC 7nm FinFET
Technology
 13 layers Cu + 1 layer Al
metal stack
 64MB L3 Cache
Extension
 41mm2
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 12 of 36
AMD 3D V-CacheTM
Components:
Structural Die
 AMD 3D V-Cache™
Structural Dies
 Structural support for
thinned CCD
 Thermal dissipation for
CPU cores
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 13 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 14 of 36
Physical Organization
 CCD face-down
 C4 interface to substrate
 TSV interface to L3D
 L3D face-down
 Hybrid Bonded (HB) to CCD
 Structural Dies
 Oxide bonded to CCD
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 15 of 36
Micro Bump vs. Hybrid Bond
 Compared to Micro
Bump 3D solutions,
Hybrid Bond offers
 >15x interconnect density
 >3x interconnect energy
efficiency
 Superior thermal
conductance
HB 3D
Micro Bump 3D
Hybrid Bond 3D
C4
Micro Bump 3D
Hybrid Bond 3D
[1] Swaminathan, Hot Chips Tutorial, 2021
[1]
C4 and ​Micro Bump 3D illustrations are hypothetical
SEE ENDNOTES: EPYC-027
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 16 of 36
3D V-CacheTM
Bonding Technology
Top
Die
Bottom
Die
M11
M12 M12
M11 M11
M12
B
P
V
M13
M13 M13
Al
Al
T
S
V
Silicon
BPM
Die
Interface
 TSMC SoIC process
 Cu bonded using Bond Pad
Metal (BPM) pads
 BPM interfaces with TSV
 Bond Pad Via (BPV)
connects BPM to M13
 9um minimum TSV pitch
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 17 of 36
Server Configurations
 AMD 3rd Gen EPYC™ Server CPU
 Up to 8 "Zen 3" CCDs
 1 I/O Die (IOD)
 AMD 3rd Gen EPYC™ Server CPU
with AMD 3D V-Cache™
 Up to 8 thinned CCDs + L3Ds
 Support silicon added to match 2D
CCD Z-height
 Both designs compatible with
the same package
Support Silicon
CCD
With 3D
Stacking
Without 3D
Stacking
CCD CCD
IOD IOD
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 18 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 19 of 36
"Zen 3" Cache Hierarchy
 8 Cores per CCD
 32K I-Cache + 32K D-Cache
 Private 512K L2 per core
 Shared 32MB L3
between 8 cores
 16-way set associative
 32B/cycle interface to each
core
 DECTED ECC for enhanced
data reliability
3 Load
2 Store
Core 1
…
Core 7
512K L2
I+D
Cache
8-way
32B/cycle
Core
0
Up to
32M L3
I+D
Cache
16-way
[4] Evers, Hot Chips 2021
[4]
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 20 of 36
"Zen 3" + AMD 3D V-CacheTM
 96MB shared L3 Cache
between 8 cores
 16-way set associative
 32B/cycle interface to
each core
 >2 TB/s L3 bandwidth
 +4 cycles latency
 Each die’s L3 includes
its own
 Data arrays
 Tag arrays
 LRU arrays
3 Load
2 Store
Core 1
…
Core 7
512K L2
I+D
Cache
8-way
32B/cycle
Core
0
Up to
32M L3
I+D
Cache
16-way
Up to
96M
total L3
I+D
Cache
16-way
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 21 of 36
Cache Interface Illustration
TSV
Columns
32 B/Cycle
Bi-Directional Bus
[5] Burd, ISSCC, 2022
[5]
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 22 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 23 of 36
 CCD primary supplies
 RVDD: Ungated supply
 VDD: Per-core gated supply
 VDDM: Gated L2/L3 SRAM
supply
"Zen 3" Power Delivery
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 24 of 36
L3D Power Delivery
 RVDD and
VDDM delivered
to L3D through
power TSVs
 Power TSVs in
channels
between CCD
array macros
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 25 of 36
L3D SRAM Arrays
 L3D consists of
 512 128KB data macros
 1088 6KB tag/LRU macros
 Dual-rail array design
 VDDM powers bitcells
 RVDD powers peripheral circuits
 L3D arrays optimized for high
density and low power
 HD SRAM bitcell
 Extensive power reduction features
Local
I/O
Global
I/O
Sense
Amp
WL Buffers
WL Buffers
WL Drivers
128KB Data Macro
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 26 of 36
3D Signal Interface
 Simple digital signal interface between dies
 Enabled by HB technology’s low parasitics
TSV
CLK
In
CLK
Out
IsolateX
ESD Isolate
TSV
weak
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 27 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 28 of 36
AMD 3D V-CacheTM
Product Portfolio
 AMD 3D V-Cache™ supports L3 Cache extension
for both server and desktop product families
AMD 3rd Gen
EPYC™ Server CPU
AMD RYZEN™ 7 5800X3D
Gaming CPU
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 29 of 36
Desktop Performance
UP TO
1.36X UP TO
1.24X
UP TO
1.21X
UP TO
1.16X
UP TO
1.09X TIE
AMD RYZENTM 9
5900X
AMD RYZENTM 7
5800X3D WITH 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-106
Watch Dogs®… Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
~15% faster gaming at 1080p high
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 30 of 36
State of the Art Comparison
UP TO
1.17X UP TO
1.08X
UP TO
1.06X
UP TO
1.01X
UP TO
0.98X
UP TO
TIE
CORE i9
12900K
AMD RYZENTM 7
5800X3D WITH 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-107
Watch Dogs®…
Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
World’s fastest gaming processor
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 31 of 36
Server Performance
3RD GEN AMD EPYC™ 16-CORE
WITH AMD 3D V-CACHE™
JOBS/HOUR
40.6
JOBS/HOUR
24.4
3RD GEN AMD EPYC™ 16-CORE
WITHOUT AMD 3D V-
CACHE™
FASTER RTL
VERIFICATION
RESULTS MAY VARY. SEE ENDNOTES: MLNX-001R
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 32 of 36
Outline
 Motivation
 AMD 3D V-CacheTM
Overview
 AMD 3D V-CacheTM
Technology
 Architecture
 Arrays and 3D Circuits
 Performance
 Summary
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 33 of 36
Summary
 AMD 3D V-Cache™: the industry’s first 3D stacked
cache for high performance processors utilizing
Hybrid Bond technology
 Product definition and design co-optimized up
front with technology development
 AMD 3D V-Cache™ is compatible with "Zen 3"
server and desktop products, extending the L3 to
provide performance uplift
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 34 of 36
Acknowledgement
 The authors would like to thank TSMC R&D, DTP,
BD, and CSV organizations for their collaboration
and support and colleagues in AMD Cores,
Advanced Packaging Technology, Product
Development Engineering, Device Analysis Labs,
and Foundry Technology Operations for their
contributions
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 35 of 36
Endnotes
R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2"
Ryzen 7 3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and
2x8GB DDR4-3600. Results may vary.
EPYC-026: Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet
technology and Intel 3D stacked micro-bump technology.
EPYC-027: Based on AMD internal simulations and published Intel data on “Foveros” technology specifications.
R5K-106: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb
Raider and CS:GO. All games test at 1920x1080 resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD
Reference Motherboard, Ryzen 9 5900X and ASUS Crosshair VIII Hero with BIOS 3801. Both systems configured with 2x8GB DDR4-3600, GeForce RTX 3080 with
472.12 driver, Samsung 980 Pro 1TB, NZXT Kraken X62, and Windows 11 28000.282.
R5K-107: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb
Raider and CS:GO. All games test at 1920x1080p resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD
Reference Motherboard with 2x8GB DDR4-3600. Core i9-12900K and ROG Maximus Z690 Hero motherboard with BIOS 0702 and 2x16GB DDR5-5200. Both systems
configured with GeForce RTX 3080 on driver 472.12, Samsung 980 Pro 1TB, NZXT Kraken X62, Windows 11 28000.282.
MLNX-001R: EDA RTL Simulation comparison based on AMD internal testing completed on 9/20/2021 measuring the average time to complete a test case simulation.
comparing: 1x 16C 3rd Gen EPYC CPU with AMD 3D V-Cache Technology versus 1x 16C AMD EPYC™ 73F3 on the same AMD “Daytona” reference platform. Results
may vary based on factors including silicon version, hardware and software configuration and driver versions
26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 36 of 36
Disclaimer and Copyright
©2022 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow logo, EPYC, Ryzen, Infinity fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and
typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons,
including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any
computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to
update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
This information is provided "as is." AMD makes no representations or warranties with respect to the contents hereof and assumes no
responsibility for any inaccuracies, errors, or omissions that may appear in this information. AMD specifically disclaims any implied
warranties of non-infringement, merchantability, or fitness for any particular purpose. In no event will AMD be liable to any person for
any reliance, direct, indirect, special, or other consequential damages arising from the use of any information contained herein, even if
AMD is expressly advised of the possibility of such damages.

Más contenido relacionado

La actualidad más candente

Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
Deepak Shankar
 
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
Yole Developpement
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
Gopi Krishnamurthy
 

La actualidad más candente (20)

AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
PCCC22:日本AMD株式会社 テーマ1「第4世代AMD EPYC™ プロセッサー (Genoa) の概要」
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
3DIC and 2.5D TSV Interconnect for Advanced Packaging: 2016 Business Update -...
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
SMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXLSMART Modular: Memory Solutions with CXL
SMART Modular: Memory Solutions with CXL
 
CXL Fabric Management Standards
CXL Fabric Management StandardsCXL Fabric Management Standards
CXL Fabric Management Standards
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
01 nand flash_reliability_notes
01 nand flash_reliability_notes01 nand flash_reliability_notes
01 nand flash_reliability_notes
 
Microchip: CXL Use Cases and Enabling Ecosystem
Microchip: CXL Use Cases and Enabling EcosystemMicrochip: CXL Use Cases and Enabling Ecosystem
Microchip: CXL Use Cases and Enabling Ecosystem
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 

Similar a 3D V-Cache

Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
Jonah McLeod
 

Similar a 3D V-Cache (20)

Bharat gargi final project report
Bharat gargi final project reportBharat gargi final project report
Bharat gargi final project report
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
 
1311883545o453564096i540794i77956709740794079i4079
1311883545o453564096i540794i77956709740794079i40791311883545o453564096i540794i77956709740794079i4079
1311883545o453564096i540794i77956709740794079i4079
 
SemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresSemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable Cores
 
Thesis_Abstract
Thesis_AbstractThesis_Abstract
Thesis_Abstract
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
Hard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderHard IP Core design | Convolution Encoder
Hard IP Core design | Convolution Encoder
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCScale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
H0534248
H0534248H0534248
H0534248
 
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
 
Communication Design Engineer
Communication Design EngineerCommunication Design Engineer
Communication Design Engineer
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
 
soc ip core based for spacecraft application
soc ip core based for spacecraft applicationsoc ip core based for spacecraft application
soc ip core based for spacecraft application
 
Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...
 
IC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design System
IC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design SystemIC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design System
IC Layout Design of 4-bit Magnitude Comparator using Electric VLSI Design System
 
The end of the line for single-chip processors_.docx
The end of the line for single-chip processors_.docxThe end of the line for single-chip processors_.docx
The end of the line for single-chip processors_.docx
 

Más de AMD

Más de AMD (15)

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

3D V-Cache

  • 1. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 1 of 36 3D V-CacheTM : The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU J. Wuu1, R. Agarwal2, M. Ciraula1, C. Dietz1, B. Johnson1, D. Johnson1, R. Schreiber3, R. Swaminathan3, W. Walker1, S. Naffziger1 1AMD, Fort Collins, CO, 2AMD, Santa Clara, CA, 3AMD, Austin, TX
  • 2. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 2 of 36 Author Introduction  B.S. in Electrical Engineering & M. Eng in Electrical Engineering and Computer Science from MIT, 1997  Senior Fellow Design Engineer at AMD in Fort Collins, CO  Hewlett-Packard and Intel in Fort Collins, CO between 1997 and 2006  Interests include memory technology, memory & cache designs, and 3D design DTCO John Wuu
  • 3. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 3 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 4. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 4 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 5. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 5 of 36 Large L3 Caches Provide IPC Uplift 1x 2x 4x 8x 16x 32x IPC % Uplift (Linear Scale) L3 Cache Size
  • 6. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 6 of 36 0 2 4 6 5nm 7nm 14nm 20nm 28nm 32nm 45nm Normalized Cost/Yielded mm 2004 2007 2010 2013 2016 2019 10/7nm 14nm 22nm 32nm 45nm 65nm 90nm Barriers to Large Cache Size  SRAM scaling & cost = barriers to large on-die caches  Cost  Product flexibility  Latency Chiplets MOORE’S LAW KEEPS SLOWING WHILE COSTS CONTINUE TO INCREASE 28nm 16nm 10nm 7nm 5nm Analog SRAM Logic Silicon Area Scaling by Function [1] [2] [1] Naffziger, VLSI Short Course, 2020, [2] Cost per yielded mm2 for a 250mm2 die
  • 7. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 7 of 36 More than Moore  2.5D chiplets can provide product flexibility and reduce cost  However, 3D can be even better!  Improves effective memory latency  Reduces long datapath and I/O’s dynamic powers  Fits more transistors within a given package cavity size *Hypothetical processor with large cache *
  • 8. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 8 of 36 AMD 3D V-CacheTM  Industry’s first high-performance processor product with Hybrid Bonded 3D cache die
  • 9. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 9 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 10. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 10 of 36 AMD 3D V-CacheTM Components: CCD  “Zen 3” x86-64 CPU Core Complex Die (CCD)  TSMC 7nm technology  8 cores per Core Complex (CCX)  32MB shared L3 Cache  +19%1 IPC (Ave) vs. “Zen 2”  81mm2  AMD 3D V-Cache™ support integrated from Day 1 1SEE ENDNOTES: R5K-003
  • 11. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 11 of 36 AMD 3D V-CacheTM Components: L3D  AMD 3D V-Cache™ extended L3 Die (L3D)  TSMC 7nm FinFET Technology  13 layers Cu + 1 layer Al metal stack  64MB L3 Cache Extension  41mm2
  • 12. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 12 of 36 AMD 3D V-CacheTM Components: Structural Die  AMD 3D V-Cache™ Structural Dies  Structural support for thinned CCD  Thermal dissipation for CPU cores
  • 13. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 13 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 14. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 14 of 36 Physical Organization  CCD face-down  C4 interface to substrate  TSV interface to L3D  L3D face-down  Hybrid Bonded (HB) to CCD  Structural Dies  Oxide bonded to CCD
  • 15. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 15 of 36 Micro Bump vs. Hybrid Bond  Compared to Micro Bump 3D solutions, Hybrid Bond offers  >15x interconnect density  >3x interconnect energy efficiency  Superior thermal conductance HB 3D Micro Bump 3D Hybrid Bond 3D C4 Micro Bump 3D Hybrid Bond 3D [1] Swaminathan, Hot Chips Tutorial, 2021 [1] C4 and ​Micro Bump 3D illustrations are hypothetical SEE ENDNOTES: EPYC-027
  • 16. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 16 of 36 3D V-CacheTM Bonding Technology Top Die Bottom Die M11 M12 M12 M11 M11 M12 B P V M13 M13 M13 Al Al T S V Silicon BPM Die Interface  TSMC SoIC process  Cu bonded using Bond Pad Metal (BPM) pads  BPM interfaces with TSV  Bond Pad Via (BPV) connects BPM to M13  9um minimum TSV pitch
  • 17. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 17 of 36 Server Configurations  AMD 3rd Gen EPYC™ Server CPU  Up to 8 "Zen 3" CCDs  1 I/O Die (IOD)  AMD 3rd Gen EPYC™ Server CPU with AMD 3D V-Cache™  Up to 8 thinned CCDs + L3Ds  Support silicon added to match 2D CCD Z-height  Both designs compatible with the same package Support Silicon CCD With 3D Stacking Without 3D Stacking CCD CCD IOD IOD
  • 18. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 18 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 19. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 19 of 36 "Zen 3" Cache Hierarchy  8 Cores per CCD  32K I-Cache + 32K D-Cache  Private 512K L2 per core  Shared 32MB L3 between 8 cores  16-way set associative  32B/cycle interface to each core  DECTED ECC for enhanced data reliability 3 Load 2 Store Core 1 … Core 7 512K L2 I+D Cache 8-way 32B/cycle Core 0 Up to 32M L3 I+D Cache 16-way [4] Evers, Hot Chips 2021 [4]
  • 20. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 20 of 36 "Zen 3" + AMD 3D V-CacheTM  96MB shared L3 Cache between 8 cores  16-way set associative  32B/cycle interface to each core  >2 TB/s L3 bandwidth  +4 cycles latency  Each die’s L3 includes its own  Data arrays  Tag arrays  LRU arrays 3 Load 2 Store Core 1 … Core 7 512K L2 I+D Cache 8-way 32B/cycle Core 0 Up to 32M L3 I+D Cache 16-way Up to 96M total L3 I+D Cache 16-way
  • 21. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 21 of 36 Cache Interface Illustration TSV Columns 32 B/Cycle Bi-Directional Bus [5] Burd, ISSCC, 2022 [5]
  • 22. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 22 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 23. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 23 of 36  CCD primary supplies  RVDD: Ungated supply  VDD: Per-core gated supply  VDDM: Gated L2/L3 SRAM supply "Zen 3" Power Delivery
  • 24. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 24 of 36 L3D Power Delivery  RVDD and VDDM delivered to L3D through power TSVs  Power TSVs in channels between CCD array macros
  • 25. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 25 of 36 L3D SRAM Arrays  L3D consists of  512 128KB data macros  1088 6KB tag/LRU macros  Dual-rail array design  VDDM powers bitcells  RVDD powers peripheral circuits  L3D arrays optimized for high density and low power  HD SRAM bitcell  Extensive power reduction features Local I/O Global I/O Sense Amp WL Buffers WL Buffers WL Drivers 128KB Data Macro
  • 26. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 26 of 36 3D Signal Interface  Simple digital signal interface between dies  Enabled by HB technology’s low parasitics TSV CLK In CLK Out IsolateX ESD Isolate TSV weak
  • 27. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 27 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 28. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 28 of 36 AMD 3D V-CacheTM Product Portfolio  AMD 3D V-Cache™ supports L3 Cache extension for both server and desktop product families AMD 3rd Gen EPYC™ Server CPU AMD RYZEN™ 7 5800X3D Gaming CPU
  • 29. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 29 of 36 Desktop Performance UP TO 1.36X UP TO 1.24X UP TO 1.21X UP TO 1.16X UP TO 1.09X TIE AMD RYZENTM 9 5900X AMD RYZENTM 7 5800X3D WITH 3D V-CACHE™ AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™ SEE ENDNOTES: R5K-106 Watch Dogs®… Far Cry® 6 Gears 5TM Final FantasyTM XIV Shadow of the… CS:GOTM ~15% faster gaming at 1080p high
  • 30. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 30 of 36 State of the Art Comparison UP TO 1.17X UP TO 1.08X UP TO 1.06X UP TO 1.01X UP TO 0.98X UP TO TIE CORE i9 12900K AMD RYZENTM 7 5800X3D WITH 3D V-CACHE™ AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™ SEE ENDNOTES: R5K-107 Watch Dogs®… Far Cry® 6 Gears 5TM Final FantasyTM XIV Shadow of the… CS:GOTM World’s fastest gaming processor
  • 31. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 31 of 36 Server Performance 3RD GEN AMD EPYC™ 16-CORE WITH AMD 3D V-CACHE™ JOBS/HOUR 40.6 JOBS/HOUR 24.4 3RD GEN AMD EPYC™ 16-CORE WITHOUT AMD 3D V- CACHE™ FASTER RTL VERIFICATION RESULTS MAY VARY. SEE ENDNOTES: MLNX-001R
  • 32. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 32 of 36 Outline  Motivation  AMD 3D V-CacheTM Overview  AMD 3D V-CacheTM Technology  Architecture  Arrays and 3D Circuits  Performance  Summary
  • 33. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 33 of 36 Summary  AMD 3D V-Cache™: the industry’s first 3D stacked cache for high performance processors utilizing Hybrid Bond technology  Product definition and design co-optimized up front with technology development  AMD 3D V-Cache™ is compatible with "Zen 3" server and desktop products, extending the L3 to provide performance uplift
  • 34. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 34 of 36 Acknowledgement  The authors would like to thank TSMC R&D, DTP, BD, and CSV organizations for their collaboration and support and colleagues in AMD Cores, Advanced Packaging Technology, Product Development Engineering, Device Analysis Labs, and Foundry Technology Operations for their contributions
  • 35. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 35 of 36 Endnotes R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2" Ryzen 7 3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600. Results may vary. EPYC-026: Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet technology and Intel 3D stacked micro-bump technology. EPYC-027: Based on AMD internal simulations and published Intel data on “Foveros” technology specifications. R5K-106: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb Raider and CS:GO. All games test at 1920x1080 resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD Reference Motherboard, Ryzen 9 5900X and ASUS Crosshair VIII Hero with BIOS 3801. Both systems configured with 2x8GB DDR4-3600, GeForce RTX 3080 with 472.12 driver, Samsung 980 Pro 1TB, NZXT Kraken X62, and Windows 11 28000.282. R5K-107: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb Raider and CS:GO. All games test at 1920x1080p resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD Reference Motherboard with 2x8GB DDR4-3600. Core i9-12900K and ROG Maximus Z690 Hero motherboard with BIOS 0702 and 2x16GB DDR5-5200. Both systems configured with GeForce RTX 3080 on driver 472.12, Samsung 980 Pro 1TB, NZXT Kraken X62, Windows 11 28000.282. MLNX-001R: EDA RTL Simulation comparison based on AMD internal testing completed on 9/20/2021 measuring the average time to complete a test case simulation. comparing: 1x 16C 3rd Gen EPYC CPU with AMD 3D V-Cache Technology versus 1x 16C AMD EPYC™ 73F3 on the same AMD “Daytona” reference platform. Results may vary based on factors including silicon version, hardware and software configuration and driver versions
  • 36. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU © 2022 IEEE International Solid-State Circuits Conference 36 of 36 Disclaimer and Copyright ©2022 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, Ryzen, Infinity fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. This information is provided "as is." AMD makes no representations or warranties with respect to the contents hereof and assumes no responsibility for any inaccuracies, errors, or omissions that may appear in this information. AMD specifically disclaims any implied warranties of non-infringement, merchantability, or fitness for any particular purpose. In no event will AMD be liable to any person for any reliance, direct, indirect, special, or other consequential damages arising from the use of any information contained herein, even if AMD is expressly advised of the possibility of such damages.