SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
How many cores will we need?
Chien-ping lu, phd
Sr. director, Mediatek inc
a group of hippos is called …

A Crash
2

| how many cores will we need? | December 4, 2013 | Confidential
a group of crows is called …

A Murder
3

| how many cores will we need? | December 4, 2013 | Confidential
a group of giraffes is called …

From Wikipedia

A Tower
4

| how many cores will we need? | December 4, 2013 | Confidential
So, it is not surprising that we use

“A Parade” of elephants

5

| how many cores will we need? | December 4, 2013 | Confidential

“A Herd” of sheep

“An Army” of ants
From frequency to MULTIcore scaling

Power

Frequency

performance

Power

Serial Computing
Time
6

| how many cores will we need? | December 4, 2013 | Confidential

Parallel Computing
Power wall: 2005
How many cores will we need?

Performance

Moderate
Time
7

| how many cores will we need? | December 4, 2013 | Confidential

Massive
Dark silicon (OR DARK CORES)?

Performance
8x  4x
2x

Time
8

| how many cores will we need? | December 4, 2013 | Confidential

4x  3x

16x  4x
Light up the cores
Redefine the cores to be heterogeneous
Redefine the cores to be heterogeneous
Dark Silicon:
Dark Silicon:
A concern on power
A concern on power

Power ceiling

re w p
o

GPU-style “cores”
Little cores
Body tracking

Big cores

Parallelism wall
Amdahl’s law

Degree of Parallelism (number of cores)
9

An argument against
An argument against
parallel computing
parallel computing

| how many cores will we need? | December 4, 2013 | Confidential

Ray tracing
The elephants: CPU cores

For multiple-instruction-multiple-DATA (MIMD) execution

Retrofitted for moderately parallel
workloads, and not very efficient for
massively parallel workloads

Parallel.For (…)

…
Front End
Front End

Front End
Front End

Front End
Front End

Front End
Front End

Front End
Front End

ALU
ALU

ALU
ALU

ALU
ALU

ALU
ALU

ALU
ALU

…

ALU
ALU

Else

Front End
Front End

…

…
A CPU core runs 1 iteration of the parallel loop
The same color means the same piece of code
10

| how many cores will we need? | December 4, 2013 | Confidential
army of ants: simt cores

For SIMT (single-instruction-multiple-thread ) Execution

A SIMT core runs 1 iteration of
the parallel loop
Parallel.For (…)

SIMT is the execution model of HSA
and implemented in modern GPUs,
with MIMD flexibility and SIMD
efficiency
Front End
Front End
Front End

…
…

Else

…
SFU 1

SFU 0

A cluster of SIMT cores shares one front end in a SIMD
manner
11

| how many cores will we need? | December 4, 2013 | Confidential

Can achieve better power efficiency with more
specialized function units given the right workload

ALU
ALU

ALU
ALU
ALU

ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU
ALU

ALU
ALU

…

A branch is emulated
thru divergence
Properties of massively data-parallel workloads
• Problem size N of the parallel workload can keep growing
• Visible serial workload s can be kept constant
• Communication overhead is proportional to log P (by a factor of r)
• Parallel workload is speeded up linearly by P, the number of cores
• "Embarrassingly" parallel, when there is no communication overhead (r=0)

ss
ss

N
N
rrlog P
log P

N/P
N/P
Time saved by P cores

12

| how many cores will we need? | December 4, 2013 | Confidential
Revisiting Amdahl's law for trend prediction
Speedup =
Speedup =

13

| how many cores will we need? | December 4, 2013 | Confidential

s s +PN
+
ss+ rrlog PP + 1 / P
+ log + N
Mediatek face beautification

When it comes to beauty, there seems to be no limit

Before

14

| how many cores will we need? | December 4, 2013 | Confidential

Skin tone adjustment
Wrinkle removal

Thinner face, bigger eyes
graphics keeps moving

Recognized by 94% of
American Consumers
GL benchmark 2.1 Egypt, 2011 GFX bench 2.7 T-Rex, 2013

Highest grossing video
game of all-time

Pac-man, 1980

GL benchmark 2.5 Egypt, 2012 GFX bench 3.0 Manhattan, 2013

Mobile 3D Graphics
15

| how many cores will we need? | December 4, 2013 | Confidential
High-performance computing (HPC) keeps scaling out
 HPC from 1993 to 2012
‒GFLOPS ~ 130,000x
‒Cores ~ 11,000x
‒GHz ~ 10x
More atoms
Higher grid resolution
More time steps

16

| how many cores will we need? | December 4, 2013 | Confidential
parallel killer apps are just around the corner
completing the positive feedback loop

Moore’s law
Moore’s law

Better user
Better user
experience
experience

Higher Frequency
More cores
Higher Frequency
More cores

What bigger problems to
solve with bigger data?

How solving bigger problems
leads to better user experience?
Bigger data-parallel
Bigger data-parallel
workloads in Graphics
workloads in Graphics
and HPC
and HPC
17

| how many cores will we need? | December 4, 2013 | Confidential

Data
Data

Mining bigger data
Mining bigger data
More complex
More complex
with Machine
Bigger Machine
with problems
Biggerproblems
software
software
Learning
Learning
How to distinguish cat photos from dog ones?
ASIRRA
Animal Species Image Recognition for Restricting Access (from Microsoft Research)

18

| how many cores will we need? | December 4, 2013 | Confidential
Why is it hard?

Source: training set of Kaggle.com Dogs vs. Cats competition
19

| how many cores will we need? | December 4, 2013 | Confidential
is there a solution to relate photos from the same dog?

Prancer, a 5-years-old toy poodle, before and after grooming
20

| how many cores will we need? | December 4, 2013 | Confidential
MINE the solutions from the data
Dog-Cat
Dog-Cat
classifier
classifier

Theory of the differences
Theory of the differences
between dogs and cats?
between dogs and cats?

Learn from many (12,500)
Learn from many (12,500)
photos labeled as dogs or
photos labeled as dogs or
cats
cats
Machine Learning
Machine Learning

21

| how many cores will we need? | December 4, 2013 | Confidential
machine learning: prediction with powerful models

 More powerful have more
knobs, which need to be
determined with a bigger data
set
 The explosive growth of data
has made very powerful models
feasible
6th-order polynomial over-fits the 4 samples
22

| how many cores will we need? | December 4, 2013 | Confidential
From data to user experience
dog/cat photos
Sensor readings
Depth images

Examples:

x
Bigger data lead to more
Bigger data lead to more
powerful models
powerful models

Web-scale Data

( xn , y n )

Client

f x { ai }
Model
ai Knobs
Cloud

{ }

dog or cat
jogging, walking or climbing
body motion





y models with
Powerful models with
Powerful

more knobs lead to
more knobs lead to
better user experience
better user experience

Determine { ai } to minimize the error between

f xn { ai }

and

Model

Machine Learning
23

| how many cores will we need? | December 4, 2013 | Confidential

yn
Smart clients in the era of data

Smarter Client
Client
Smarter Client
Client
Cloud
Bigger Training
Bigger Training
Big Training Set
Big Training Set
Set
Set

In the cloud or
the clients

Better
Better
Connectivity
Connectivity
Connectivity
Connectivity

24

| how many cores will we need? | December 4, 2013 | Confidential

More powerful
More powerful
Powerful Model
Powerful Model
Model
Model

Better User
User
Better User
User
Experience
Experience

Better Sensing
Sensing
Better Sensing
Sensing

Bigger Data
Bigger Data
Data Mining
Data Mining
Mining
Mining

Local Machine
Local Machine
Learning
Learning

Input
Input
data
data
Looking forward
 The future is here

‒ There are already massively parallel
heterogeneous processors

 There is no shame in being dataparallel
‒ One of the smartest things achieved
in computing is data parallel

Source: Le et al., Building High-level Features Using
Large Scale Unsupervised Learning
25

| how many cores will we need? | December 4, 2013 | Confidential

 Go parallel and go
heterogeneous to keep
 Mobile device cool in our palms
 Data centers clean for our
environment

Carbon footprint of US
datacenters is at the
same level as the airline
industry
Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers,
software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to
notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other
names are for informational purposes only and may be trademarks of their respective owners.

26

| how many cores will we need? | December 4, 2013 | Confidential

Más contenido relacionado

La actualidad más candente

HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn Yahoo Developer Network
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of HadoopKnoldus Inc.
 
JMI Techtalk: 한재근 - How to use GPU for developing AI
JMI Techtalk: 한재근 - How to use GPU for developing AIJMI Techtalk: 한재근 - How to use GPU for developing AI
JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderAMD Developer Central
 

La actualidad más candente (20)

HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
JMI Techtalk: 한재근 - How to use GPU for developing AI
JMI Techtalk: 한재근 - How to use GPU for developing AIJMI Techtalk: 한재근 - How to use GPU for developing AI
JMI Techtalk: 한재근 - How to use GPU for developing AI
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
 

Destacado

Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 

Destacado (8)

Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 

Similar a Apu13 cp lu-keynote-final-slideshare

Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docx
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docxA Survey on Domain-Specific Languages for Machine.pdfA Sur.docx
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docxbartholomeocoombs
 
Virtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the CloudVirtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the CloudThe UberCloud
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Simplify Data Management and Go Green with Supermicro & Qumulo
Simplify Data Management and Go Green with Supermicro & QumuloSimplify Data Management and Go Green with Supermicro & Qumulo
Simplify Data Management and Go Green with Supermicro & QumuloRebekah Rodriguez
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesNumenta
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGDSCNiT
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformGeekNightHyderabad
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013dfilppi
 
Refactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesRefactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesJeff Larkin
 
Cloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategyCloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategyFujitsu Middle East
 
CIKB - Software Architecture Analysis Design
CIKB - Software Architecture Analysis DesignCIKB - Software Architecture Analysis Design
CIKB - Software Architecture Analysis DesignAntonio Castellon
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdfjimjones227147
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 

Similar a Apu13 cp lu-keynote-final-slideshare (20)

Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docx
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docxA Survey on Domain-Specific Languages for Machine.pdfA Sur.docx
A Survey on Domain-Specific Languages for Machine.pdfA Sur.docx
 
Virtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the CloudVirtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the Cloud
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Simplify Data Management and Go Green with Supermicro & Qumulo
Simplify Data Management and Go Green with Supermicro & QumuloSimplify Data Management and Go Green with Supermicro & Qumulo
Simplify Data Management and Go Green with Supermicro & Qumulo
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Presentation1
Presentation1Presentation1
Presentation1
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devices
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013
 
Refactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesRefactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid Architectures
 
Cloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategyCloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategy
 
CIKB - Software Architecture Analysis Design
CIKB - Software Architecture Analysis DesignCIKB - Software Architecture Analysis Design
CIKB - Software Architecture Analysis Design
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 

Más de HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 

Más de HSA Foundation (14)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 

Apu13 cp lu-keynote-final-slideshare

  • 1. How many cores will we need? Chien-ping lu, phd Sr. director, Mediatek inc
  • 2. a group of hippos is called … A Crash 2 | how many cores will we need? | December 4, 2013 | Confidential
  • 3. a group of crows is called … A Murder 3 | how many cores will we need? | December 4, 2013 | Confidential
  • 4. a group of giraffes is called … From Wikipedia A Tower 4 | how many cores will we need? | December 4, 2013 | Confidential
  • 5. So, it is not surprising that we use “A Parade” of elephants 5 | how many cores will we need? | December 4, 2013 | Confidential “A Herd” of sheep “An Army” of ants
  • 6. From frequency to MULTIcore scaling Power Frequency performance Power Serial Computing Time 6 | how many cores will we need? | December 4, 2013 | Confidential Parallel Computing Power wall: 2005
  • 7. How many cores will we need? Performance Moderate Time 7 | how many cores will we need? | December 4, 2013 | Confidential Massive
  • 8. Dark silicon (OR DARK CORES)? Performance 8x  4x 2x Time 8 | how many cores will we need? | December 4, 2013 | Confidential 4x  3x 16x  4x
  • 9. Light up the cores Redefine the cores to be heterogeneous Redefine the cores to be heterogeneous Dark Silicon: Dark Silicon: A concern on power A concern on power Power ceiling re w p o GPU-style “cores” Little cores Body tracking Big cores Parallelism wall Amdahl’s law Degree of Parallelism (number of cores) 9 An argument against An argument against parallel computing parallel computing | how many cores will we need? | December 4, 2013 | Confidential Ray tracing
  • 10. The elephants: CPU cores For multiple-instruction-multiple-DATA (MIMD) execution Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (…) … Front End Front End Front End Front End Front End Front End Front End Front End Front End Front End ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … ALU ALU Else Front End Front End … … A CPU core runs 1 iteration of the parallel loop The same color means the same piece of code 10 | how many cores will we need? | December 4, 2013 | Confidential
  • 11. army of ants: simt cores For SIMT (single-instruction-multiple-thread ) Execution A SIMT core runs 1 iteration of the parallel loop Parallel.For (…) SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency Front End Front End Front End … … Else … SFU 1 SFU 0 A cluster of SIMT cores shares one front end in a SIMD manner 11 | how many cores will we need? | December 4, 2013 | Confidential Can achieve better power efficiency with more specialized function units given the right workload ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … A branch is emulated thru divergence
  • 12. Properties of massively data-parallel workloads • Problem size N of the parallel workload can keep growing • Visible serial workload s can be kept constant • Communication overhead is proportional to log P (by a factor of r) • Parallel workload is speeded up linearly by P, the number of cores • "Embarrassingly" parallel, when there is no communication overhead (r=0) ss ss N N rrlog P log P N/P N/P Time saved by P cores 12 | how many cores will we need? | December 4, 2013 | Confidential
  • 13. Revisiting Amdahl's law for trend prediction Speedup = Speedup = 13 | how many cores will we need? | December 4, 2013 | Confidential s s +PN + ss+ rrlog PP + 1 / P + log + N
  • 14. Mediatek face beautification When it comes to beauty, there seems to be no limit Before 14 | how many cores will we need? | December 4, 2013 | Confidential Skin tone adjustment Wrinkle removal Thinner face, bigger eyes
  • 15. graphics keeps moving Recognized by 94% of American Consumers GL benchmark 2.1 Egypt, 2011 GFX bench 2.7 T-Rex, 2013 Highest grossing video game of all-time Pac-man, 1980 GL benchmark 2.5 Egypt, 2012 GFX bench 3.0 Manhattan, 2013 Mobile 3D Graphics 15 | how many cores will we need? | December 4, 2013 | Confidential
  • 16. High-performance computing (HPC) keeps scaling out  HPC from 1993 to 2012 ‒GFLOPS ~ 130,000x ‒Cores ~ 11,000x ‒GHz ~ 10x More atoms Higher grid resolution More time steps 16 | how many cores will we need? | December 4, 2013 | Confidential
  • 17. parallel killer apps are just around the corner completing the positive feedback loop Moore’s law Moore’s law Better user Better user experience experience Higher Frequency More cores Higher Frequency More cores What bigger problems to solve with bigger data? How solving bigger problems leads to better user experience? Bigger data-parallel Bigger data-parallel workloads in Graphics workloads in Graphics and HPC and HPC 17 | how many cores will we need? | December 4, 2013 | Confidential Data Data Mining bigger data Mining bigger data More complex More complex with Machine Bigger Machine with problems Biggerproblems software software Learning Learning
  • 18. How to distinguish cat photos from dog ones? ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research) 18 | how many cores will we need? | December 4, 2013 | Confidential
  • 19. Why is it hard? Source: training set of Kaggle.com Dogs vs. Cats competition 19 | how many cores will we need? | December 4, 2013 | Confidential
  • 20. is there a solution to relate photos from the same dog? Prancer, a 5-years-old toy poodle, before and after grooming 20 | how many cores will we need? | December 4, 2013 | Confidential
  • 21. MINE the solutions from the data Dog-Cat Dog-Cat classifier classifier Theory of the differences Theory of the differences between dogs and cats? between dogs and cats? Learn from many (12,500) Learn from many (12,500) photos labeled as dogs or photos labeled as dogs or cats cats Machine Learning Machine Learning 21 | how many cores will we need? | December 4, 2013 | Confidential
  • 22. machine learning: prediction with powerful models  More powerful have more knobs, which need to be determined with a bigger data set  The explosive growth of data has made very powerful models feasible 6th-order polynomial over-fits the 4 samples 22 | how many cores will we need? | December 4, 2013 | Confidential
  • 23. From data to user experience dog/cat photos Sensor readings Depth images Examples: x Bigger data lead to more Bigger data lead to more powerful models powerful models Web-scale Data ( xn , y n ) Client f x { ai } Model ai Knobs Cloud { } dog or cat jogging, walking or climbing body motion    y models with Powerful models with Powerful more knobs lead to more knobs lead to better user experience better user experience Determine { ai } to minimize the error between f xn { ai } and Model Machine Learning 23 | how many cores will we need? | December 4, 2013 | Confidential yn
  • 24. Smart clients in the era of data Smarter Client Client Smarter Client Client Cloud Bigger Training Bigger Training Big Training Set Big Training Set Set Set In the cloud or the clients Better Better Connectivity Connectivity Connectivity Connectivity 24 | how many cores will we need? | December 4, 2013 | Confidential More powerful More powerful Powerful Model Powerful Model Model Model Better User User Better User User Experience Experience Better Sensing Sensing Better Sensing Sensing Bigger Data Bigger Data Data Mining Data Mining Mining Mining Local Machine Local Machine Learning Learning Input Input data data
  • 25. Looking forward  The future is here ‒ There are already massively parallel heterogeneous processors  There is no shame in being dataparallel ‒ One of the smartest things achieved in computing is data parallel Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning 25 | how many cores will we need? | December 4, 2013 | Confidential  Go parallel and go heterogeneous to keep  Mobile device cool in our palms  Data centers clean for our environment Carbon footprint of US datacenters is at the same level as the airline industry
  • 26. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 26 | how many cores will we need? | December 4, 2013 | Confidential