Más contenido relacionado
La actualidad más candente (20)
Similar a 3D V-Cache (20)
3D V-Cache
- 1. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 1 of 36
3D V-CacheTM
: The Implementation of
a Hybrid-Bonded 64MB Stacked
Cache for a 7nm x86-64 CPU
J. Wuu1, R. Agarwal2, M. Ciraula1, C. Dietz1, B. Johnson1, D. Johnson1,
R. Schreiber3, R. Swaminathan3, W. Walker1, S. Naffziger1
1AMD, Fort Collins, CO, 2AMD, Santa Clara, CA, 3AMD, Austin, TX
- 2. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 2 of 36
Author Introduction
B.S. in Electrical Engineering & M. Eng in
Electrical Engineering and Computer Science
from MIT, 1997
Senior Fellow Design Engineer at AMD in Fort
Collins, CO
Hewlett-Packard and Intel in Fort Collins, CO
between 1997 and 2006
Interests include memory technology, memory
& cache designs, and 3D design DTCO
John Wuu
- 3. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 3 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 4. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 4 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 5. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 5 of 36
Large L3 Caches Provide IPC Uplift
1x 2x 4x 8x 16x 32x
IPC
%
Uplift
(Linear
Scale)
L3 Cache Size
- 6. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 6 of 36
0
2
4
6
5nm
7nm
14nm
20nm
28nm
32nm
45nm
Normalized
Cost/Yielded
mm
2004 2007 2010 2013 2016 2019
10/7nm
14nm
22nm
32nm
45nm
65nm
90nm
Barriers to Large Cache Size
SRAM scaling & cost =
barriers to large on-die caches
Cost
Product flexibility
Latency
Chiplets
MOORE’S LAW KEEPS SLOWING WHILE COSTS CONTINUE TO INCREASE
28nm 16nm 10nm 7nm 5nm
Analog
SRAM
Logic
Silicon Area Scaling by Function
[1] [2]
[1] Naffziger, VLSI Short Course, 2020, [2] Cost per yielded mm2 for a 250mm2 die
- 7. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 7 of 36
More than Moore
2.5D chiplets can provide product flexibility and reduce
cost
However, 3D can be even better!
Improves effective memory latency
Reduces long datapath and I/O’s dynamic powers
Fits more transistors within a given package cavity size
*Hypothetical processor with large cache
*
- 8. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 8 of 36
AMD 3D V-CacheTM
Industry’s first high-performance processor
product with Hybrid Bonded 3D cache die
- 9. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 9 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 10. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 10 of 36
AMD 3D V-CacheTM
Components: CCD
“Zen 3” x86-64 CPU Core
Complex Die (CCD)
TSMC 7nm technology
8 cores per Core Complex
(CCX)
32MB shared L3 Cache
+19%1
IPC (Ave) vs. “Zen 2”
81mm2
AMD 3D V-Cache™ support
integrated from Day 1
1SEE ENDNOTES: R5K-003
- 11. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 11 of 36
AMD 3D V-CacheTM
Components: L3D
AMD 3D V-Cache™
extended L3 Die (L3D)
TSMC 7nm FinFET
Technology
13 layers Cu + 1 layer Al
metal stack
64MB L3 Cache
Extension
41mm2
- 12. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 12 of 36
AMD 3D V-CacheTM
Components:
Structural Die
AMD 3D V-Cache™
Structural Dies
Structural support for
thinned CCD
Thermal dissipation for
CPU cores
- 13. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 13 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 14. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 14 of 36
Physical Organization
CCD face-down
C4 interface to substrate
TSV interface to L3D
L3D face-down
Hybrid Bonded (HB) to CCD
Structural Dies
Oxide bonded to CCD
- 15. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 15 of 36
Micro Bump vs. Hybrid Bond
Compared to Micro
Bump 3D solutions,
Hybrid Bond offers
>15x interconnect density
>3x interconnect energy
efficiency
Superior thermal
conductance
HB 3D
Micro Bump 3D
Hybrid Bond 3D
C4
Micro Bump 3D
Hybrid Bond 3D
[1] Swaminathan, Hot Chips Tutorial, 2021
[1]
C4 and Micro Bump 3D illustrations are hypothetical
SEE ENDNOTES: EPYC-027
- 16. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 16 of 36
3D V-CacheTM
Bonding Technology
Top
Die
Bottom
Die
M11
M12 M12
M11 M11
M12
B
P
V
M13
M13 M13
Al
Al
T
S
V
Silicon
BPM
Die
Interface
TSMC SoIC process
Cu bonded using Bond Pad
Metal (BPM) pads
BPM interfaces with TSV
Bond Pad Via (BPV)
connects BPM to M13
9um minimum TSV pitch
- 17. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 17 of 36
Server Configurations
AMD 3rd Gen EPYC™ Server CPU
Up to 8 "Zen 3" CCDs
1 I/O Die (IOD)
AMD 3rd Gen EPYC™ Server CPU
with AMD 3D V-Cache™
Up to 8 thinned CCDs + L3Ds
Support silicon added to match 2D
CCD Z-height
Both designs compatible with
the same package
Support Silicon
CCD
With 3D
Stacking
Without 3D
Stacking
CCD CCD
IOD IOD
- 18. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 18 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 19. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 19 of 36
"Zen 3" Cache Hierarchy
8 Cores per CCD
32K I-Cache + 32K D-Cache
Private 512K L2 per core
Shared 32MB L3
between 8 cores
16-way set associative
32B/cycle interface to each
core
DECTED ECC for enhanced
data reliability
3 Load
2 Store
Core 1
…
Core 7
512K L2
I+D
Cache
8-way
32B/cycle
Core
0
Up to
32M L3
I+D
Cache
16-way
[4] Evers, Hot Chips 2021
[4]
- 20. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 20 of 36
"Zen 3" + AMD 3D V-CacheTM
96MB shared L3 Cache
between 8 cores
16-way set associative
32B/cycle interface to
each core
>2 TB/s L3 bandwidth
+4 cycles latency
Each die’s L3 includes
its own
Data arrays
Tag arrays
LRU arrays
3 Load
2 Store
Core 1
…
Core 7
512K L2
I+D
Cache
8-way
32B/cycle
Core
0
Up to
32M L3
I+D
Cache
16-way
Up to
96M
total L3
I+D
Cache
16-way
- 21. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 21 of 36
Cache Interface Illustration
TSV
Columns
32 B/Cycle
Bi-Directional Bus
[5] Burd, ISSCC, 2022
[5]
- 22. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 22 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 23. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 23 of 36
CCD primary supplies
RVDD: Ungated supply
VDD: Per-core gated supply
VDDM: Gated L2/L3 SRAM
supply
"Zen 3" Power Delivery
- 24. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 24 of 36
L3D Power Delivery
RVDD and
VDDM delivered
to L3D through
power TSVs
Power TSVs in
channels
between CCD
array macros
- 25. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 25 of 36
L3D SRAM Arrays
L3D consists of
512 128KB data macros
1088 6KB tag/LRU macros
Dual-rail array design
VDDM powers bitcells
RVDD powers peripheral circuits
L3D arrays optimized for high
density and low power
HD SRAM bitcell
Extensive power reduction features
Local
I/O
Global
I/O
Sense
Amp
WL Buffers
WL Buffers
WL Drivers
128KB Data Macro
- 26. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 26 of 36
3D Signal Interface
Simple digital signal interface between dies
Enabled by HB technology’s low parasitics
TSV
CLK
In
CLK
Out
IsolateX
ESD Isolate
TSV
weak
- 27. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 27 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 28. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 28 of 36
AMD 3D V-CacheTM
Product Portfolio
AMD 3D V-Cache™ supports L3 Cache extension
for both server and desktop product families
AMD 3rd Gen
EPYC™ Server CPU
AMD RYZEN™ 7 5800X3D
Gaming CPU
- 29. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 29 of 36
Desktop Performance
UP TO
1.36X UP TO
1.24X
UP TO
1.21X
UP TO
1.16X
UP TO
1.09X TIE
AMD RYZENTM 9
5900X
AMD RYZENTM 7
5800X3D WITH 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-106
Watch Dogs®… Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
~15% faster gaming at 1080p high
- 30. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 30 of 36
State of the Art Comparison
UP TO
1.17X UP TO
1.08X
UP TO
1.06X
UP TO
1.01X
UP TO
0.98X
UP TO
TIE
CORE i9
12900K
AMD RYZENTM 7
5800X3D WITH 3D V-CACHE™
AMD RYZEN™ 7 5800X3D WITH AMD 3D V-CACHE™
SEE ENDNOTES: R5K-107
Watch Dogs®…
Far Cry® 6 Gears 5TM
Final FantasyTM XIV Shadow of the… CS:GOTM
World’s fastest gaming processor
- 31. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 31 of 36
Server Performance
3RD GEN AMD EPYC™ 16-CORE
WITH AMD 3D V-CACHE™
JOBS/HOUR
40.6
JOBS/HOUR
24.4
3RD GEN AMD EPYC™ 16-CORE
WITHOUT AMD 3D V-
CACHE™
FASTER RTL
VERIFICATION
RESULTS MAY VARY. SEE ENDNOTES: MLNX-001R
- 32. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 32 of 36
Outline
Motivation
AMD 3D V-CacheTM
Overview
AMD 3D V-CacheTM
Technology
Architecture
Arrays and 3D Circuits
Performance
Summary
- 33. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 33 of 36
Summary
AMD 3D V-Cache™: the industry’s first 3D stacked
cache for high performance processors utilizing
Hybrid Bond technology
Product definition and design co-optimized up
front with technology development
AMD 3D V-Cache™ is compatible with "Zen 3"
server and desktop products, extending the L3 to
provide performance uplift
- 34. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 34 of 36
Acknowledgement
The authors would like to thank TSMC R&D, DTP,
BD, and CSV organizations for their collaboration
and support and colleagues in AMD Cores,
Advanced Packaging Technology, Product
Development Engineering, Device Analysis Labs,
and Foundry Technology Operations for their
contributions
- 35. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 35 of 36
Endnotes
R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2"
Ryzen 7 3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and
2x8GB DDR4-3600. Results may vary.
EPYC-026: Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet
technology and Intel 3D stacked micro-bump technology.
EPYC-027: Based on AMD internal simulations and published Intel data on “Foveros” technology specifications.
R5K-106: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb
Raider and CS:GO. All games test at 1920x1080 resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD
Reference Motherboard, Ryzen 9 5900X and ASUS Crosshair VIII Hero with BIOS 3801. Both systems configured with 2x8GB DDR4-3600, GeForce RTX 3080 with
472.12 driver, Samsung 980 Pro 1TB, NZXT Kraken X62, and Windows 11 28000.282.
R5K-107: Based on testing by AMD as of 12/14/2021. Performance evaluated with Watch Dogs Legion, Far Cry 6, Gears 5, Final Fantasy XIV, Shadow of the Tomb
Raider and CS:GO. All games test at 1920x1080p resolution with the HIGH in-game quality preset (or equivalent). System configuration: Ryzen 7 5800X3D and AMD
Reference Motherboard with 2x8GB DDR4-3600. Core i9-12900K and ROG Maximus Z690 Hero motherboard with BIOS 0702 and 2x16GB DDR5-5200. Both systems
configured with GeForce RTX 3080 on driver 472.12, Samsung 980 Pro 1TB, NZXT Kraken X62, Windows 11 28000.282.
MLNX-001R: EDA RTL Simulation comparison based on AMD internal testing completed on 9/20/2021 measuring the average time to complete a test case simulation.
comparing: 1x 16C 3rd Gen EPYC CPU with AMD 3D V-Cache Technology versus 1x 16C AMD EPYC™ 73F3 on the same AMD “Daytona” reference platform. Results
may vary based on factors including silicon version, hardware and software configuration and driver versions
- 36. 26.4: 3D V-Cache: The Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU
© 2022 IEEE
International Solid-State Circuits Conference 36 of 36
Disclaimer and Copyright
©2022 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow logo, EPYC, Ryzen, Infinity fabric, and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and
typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons,
including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any
computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to
update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
This information is provided "as is." AMD makes no representations or warranties with respect to the contents hereof and assumes no
responsibility for any inaccuracies, errors, or omissions that may appear in this information. AMD specifically disclaims any implied
warranties of non-infringement, merchantability, or fitness for any particular purpose. In no event will AMD be liable to any person for
any reliance, direct, indirect, special, or other consequential damages arising from the use of any information contained herein, even if
AMD is expressly advised of the possibility of such damages.