SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
THE PROGRAMMER’S GUIDE
TO THE APU GALAXY
Phil Rogers, Corporate Fellow
AMD
THE OPPORTUNITY WE ARE SEIZING




   Make the unprecedented
     processing capability of
   the APU as accessible to
        programmers as the
              CPU is today.



2 | The Programmer’s Guide to the APU Galaxy | June 2011
OUTLINE


The APU today and its programming
 environment

The future of the heterogeneous platform

AMD Fusion System Architecture

Roadmap

Software evolution

A visual view of the new command
 and data flow




3 | The Programmer’s Guide to the APU Galaxy | June 2011
APU: ACCELERATED PROCESSING UNIT


The APU has arrived and it is a great advance
 over previous platforms
Combines scalar processing on CPU with
 parallel processing on the GPU and high
 bandwidth access to memory
How do we make it even better going forward?
   – Easier to program
   – Easier to optimize
   – Easier to load balance
   – Higher performance
   – Lower power




4 | The Programmer’s Guide to the APU Galaxy | June 2011
LOW POWER E-SERIES AMD FUSION APU: “ZACATE”


    E-Series APU
 2 x86 Bobcat CPU cores
 Array of Radeon™ Cores
        Discrete-class DirectX® 11 performance
        80 Stream Processors
 3rd Generation Unified Video Decoder
 PCIe® Gen2
 Single-channel DDR3 @ 1066
 18W TDP



    Performance:
 Up to 8.5GB/s System Memory Bandwidth
 Up to 90 Gflop of Single Precision Compute



5 | The Programmer’s Guide to the APU Galaxy | June 2011
TABLET Z-SERIES AMD FUSION APU: “DESNA”


    Z-Series APU
 2 x86 “Bobcat” CPU cores
 Array of Radeon™ Cores
        Discrete-class DirectX® 11 performance
        80 Stream Processors
 3rd Generation Unified Video Decoder
 PCIe® Gen2
 Single-channel DDR3 @ 1066
 6W TDP w/ Local Hardware Thermal Control



    Performance:
 Up to 8.5GB/s System Memory Bandwidth
 Suitable for sealed, passively cooled designs



6 | The Programmer’s Guide to the APU Galaxy | June 2011
MAINSTREAM A-SERIES AMD FUSION APU: “LLANO”


    A-Series APU
 Up to four x86 CPU cores
        AMD Turbo CORE frequency acceleration
 Array of Radeon™ Cores
        Discrete-class DirectX® 11 performance
 3rd Generation Unified Video Decoder
 Blu-ray 3D stereoscopic display
 PCIe® Gen2
 Dual-channel DDR3
 45W TDP


    Performance:
 Up to 29GB/s System Memory Bandwidth
 Up to 500 Gflops of Single Precision Compute



7 | The Programmer’s Guide to the APU Galaxy | June 2011
COMMITTED TO OPEN STANDARDS


AMD drives open and de-facto standards

   – Compete on the best implementation

Open standards are the basis for large
 ecosystems

Open standards always win over time
                                                           DirectX®
   – SW developers want their applications
     to run on multiple platforms from
     multiple hardware vendors




8 | The Programmer’s Guide to the APU Galaxy | June 2011
A NEW ERA OF PROCESSOR PERFORMANCE


                                                                                                                              Heterogeneous
                 Single-Core Era                                        Multi-Core Era
                                                                                                                               Systems Era
Enabled by:              Constrained by:                   Enabled by:                 Constrained by:       Enabled by:                       Temporarily
 Moore’s Law              Power                            Moore’s Law                 Power                Abundant data                   Constrained by:
 Voltage                  Complexity                       SMP                         Parallel SW           parallelism                       Programming
  Scaling                                                    architecture                Scalability          Power efficient                   models
                                                                                                               GPUs                              Comm.overhead

          Assembly  C/C++  Java …                                  pthreads  OpenMP / TBB …                          Shader  CUDA OpenCL !!!




                                                                                                         Modern Application
 Single-thread
 Performance




                                                       Performance
                                             ?
                                                       Throughput




                                                                                                           Performance
                                                                                            we are
                                                                                             here
                                we are
                                 here
                                                                                                                                    we are
                                                                                                                                     here

                         Time                                               Time (# of processors)                            Time (Data-parallel exploitation)




9 | The Programmer’s Guide to the APU Galaxy | June 2011
EVOLUTION OF HETEROGENEOUS COMPUTING
                                                   Excellent                                                                                Architected Era

                                                                                                                                AMD Fusion System Architecture
Architecture Maturity & Programmer Accessibility




                                                                                                    Standards Drivers Era            GPU Peer Processor

                                                                                                  OpenCL™, DirectCompute            Mainstream programmers
                                                                    Proprietary Drivers Era          Driver-based APIs              Full C++
                                                                                                                                    GPU as a co-processor
                                                                   Graphics & Proprietary       Expert programmers                 Unified coherent address space
                                                                     Driver-based APIs          C and C++ subsets                  Task parallel runtimes
                                                                                                Compute centric APIs , data        Nested Data Parallel programs
                                                                “Adventurous” programmers       types                              User mode dispatch
                                                                                                Multiple address spaces with       Pre-emption and context
                                                                Exploit early programmable      explicit data movement              switching
                                                                 “shader cores” in the GPU      Specialized work queue based
                                                                Make your program look like     structures
                                                                 “graphics” to the GPU          Kernel mode dispatch
                                                                                                                                      See Herb Sutter’s Keynote
                                                                                                                                    tomorrow for a cool example of
                                                                CUDA™, Brook+, etc
                                                                                                                                     plans for the architected era!
                                                   Poor




                                                                       2002 - 2008                      2009 - 2011                          2012 - 2020


          10 | The Programmer’s Guide to the APU Galaxy | June 2011
FSA FEATURE ROADMAP


          Physical                                  Optimized          Architectural             System
         Integration                                Platforms           Integration            Integration

                                                                                             GPU compute
  Integrate CPU & GPU                         GPU Compute C++       Unified Address Space
                                                                                             context switch
         in silicon                               support             for CPU and GPU

                                                                                             GPU graphics
                                                                     GPU uses pageable        pre-emption
       Unified Memory
                                           User mode scheduling      system memory via
         Controller
                                                                        CPU pointers
                                                                                            Quality of Service

          Common                             Bi-Directional Power
                                                                    Fully coherent memory
        Manufacturing                        Mgmt between CPU                                  Extend to
                                                                     between CPU & GPU
         Technology                                and GPU                                   Discrete GPU




11 | The Programmer’s Guide to the APU Galaxy | June 2011
FUSION SYSTEM ARCHITECTURE – AN OPEN PLATFORM

Open Architecture, published specifications
  – FSAIL virtual ISA
  – FSA memory model
  – FSA dispatch
ISA agnostic for both CPU and GPU

Inviting partners to join us, in all areas
   – Hardware companies
   – Operating Systems
   – Tools and Middleware
   – Applications

FSA review committee planned



12 | The Programmer’s Guide to the APU Galaxy | June 2011
FSA INTERMEDIATE LAYER - FSAIL

FSAIL is a virtual ISA for parallel programs
    – Finalized to ISA by a JIT compiler or
      “Finalizer”

Explicitly parallel
    – Designed for data parallel programming

Support for exceptions, virtual functions,
 and other high level language features

Syscall methods
    – GPU code can call directly to system
      services, IO, printf, etc

Debugging support


13 | The Programmer’s Guide to the APU Galaxy | June 2011
FSA MEMORY MODEL


Designed to be compatible with C++0x,
 Java and .NET Memory Models

Relaxed consistency memory model for
 parallel compute performance

Loads and stores can be re-ordered by
 the finalizer

Visibility controlled by:
    – Load.Acquire, Store.Release
    – Fences
    – Barriers




14 | The Programmer’s Guide to the APU Galaxy | June 2011
Driver Stack                                                FSA Software Stack

            Apps                                                  Apps
                Apps                                                     Apps
                    Apps                                                        Apps
                        Apps                                                           Apps
                            Apps                                                              Apps
                                Apps                                                                 Apps



                Domain Libraries                                         FSA Domain Libraries



            OpenCL™ 1.x, DX Runtimes,
                                                                                                                    FSA Runtime
                User Mode Drivers
                                                                                              Task Queuing
                                                                            FSA JIT
                                                                                                Libraries
                                                                                                                    FSA Kernel
            Graphics Kernel Mode Driver
                                                                                                                    Mode Driver


                                                       Hardware - APUs, CPUs, GPUs

           AMD user mode component                      AMD kernel mode component         All others contributed by third parties or AMD


15 | The Programmer’s Guide to the APU Galaxy | June 2011
OPENCL™ AND FSA

FSA is an optimized platform architecture
 for OpenCL™
    – Not an alternative to OpenCL™
OpenCL™ on FSA will benefit from
    – Avoidance of wasteful copies
    – Low latency dispatch
    – Improved memory model
    – Pointers shared between CPU and GPU
FSA also exposes a lower level programming
 interface, for those that want the ultimate in
 control and performance
    – Optimized libraries may choose the lower
      level interface


16 | The Programmer’s Guide to the APU Galaxy | June 2011
TASK QUEUING RUNTIMES

Popular pattern for task and data parallel
 programming on SMP systems today
Characterized by:
    – A work queue per core
    – Runtime library that divides large
      loops into tasks and distributes to
      queues
    – A work stealing runtime that keeps the
      system balanced
FSA is designed to extend this pattern to
 run on heterogeneous systems




17 | The Programmer’s Guide to the APU Galaxy | June 2011
TASK QUEUING RUNTIME ON CPUS

                                       Work Stealing Runtime



                  Q                            Q                 Q          Q



          CPU                          CPU                   CPU        CPU
         Worker                       Worker                Worker     Worker

         X86 CPU                      X86 CPU               X86 CPU    X86 CPU




           CPU Threads                    GPU Threads         Memory



18 | The Programmer’s Guide to the APU Galaxy | June 2011
TASK QUEUING RUNTIME ON THE FSA PLATFORM

                                                      Work Stealing Runtime



                  Q                            Q                  Q                Q           Q



          CPU                          CPU                    CPU              CPU        GPU
         Worker                       Worker                 Worker           Worker     Manager

         X86 CPU                      X86 CPU                X86 CPU          X86 CPU   Radeon™ GPU




           CPU Threads                    GPU Threads          Memory



19 | The Programmer’s Guide to the APU Galaxy | June 2011
TASK QUEUING RUNTIME ON THE FSA PLATFORM

                                                      Work Stealing Runtime



                  Q                            Q                  Q                Q                    Q



          CPU                          CPU                    CPU              CPU               GPU
                                                                                                                 Memory
         Worker                       Worker                 Worker           Worker            Manager

         X86 CPU                      X86 CPU                X86 CPU          X86 CPU

                                                                                            Fetch and Dispatch


                                                                                        S   S   S   S        S
                                                                                        I   I   I   I        I
                                                                                        M   M   M   M        M
           CPU Threads                    GPU Threads          Memory                   D   D   D   D        D




20 | The Programmer’s Guide to the APU Galaxy | June 2011
FSA SOFTWARE EXAMPLE - REDUCTION

float foo(float);
float myArray[…];


Task<float, ReductionBin> task([myArray]( IndexRange<1> index) [[device]] {
        float sum = 0.;
        for (size_t I = index.begin(); I !=                        index.end();   i++) {
                sum += foo(myArray[i]);
        }
        return sum;
});


float result = task.enqueueWithReduce( Partition<1, Auto>(1920),
                                                      [] (int x, int y) [[device]] { return x+y; }, 0.);



21 | The Programmer’s Guide to the APU Galaxy | June 2011
HETEROGENEOUS COMPUTE DISPATCH




         How compute dispatch operates
              today in the driver model


                  How compute dispatch
           improves tomorrow under FSA




22 | The Programmer’s Guide to the APU Galaxy | June 2011
TODAY’S COMMAND AND DISPATCH FLOW

          Command Flow                 Data Flow




                                  User                          Kernel
 Application                                             Soft
                  Direct3D        Mode                          Mode
      A                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer




                                                                                      A                 GPU
                                                                                                     HARDWARE




                                                                                          Hardware
                                                                                           Queue




23 | The Programmer’s Guide to the APU Galaxy | June 2011
TODAY’S COMMAND AND DISPATCH FLOW

          Command Flow                 Data Flow




                                  User                          Kernel
 Application                                             Soft
                  Direct3D        Mode                          Mode
      A                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer

          Command Flow                 Data Flow




                                  User                          Kernel                                  GPU
 Application                                             Soft                         A
                  Direct3D        Mode                          Mode                                 HARDWARE
      B                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer


          Command Flow                 Data Flow



                                                                                          Hardware
                                  User                          Kernel                     Queue
 Application                                             Soft
                  Direct3D        Mode                          Mode
      C                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer




24 | The Programmer’s Guide to the APU Galaxy | June 2011
TODAY’S COMMAND AND DISPATCH FLOW

          Command Flow                 Data Flow




                                  User                          Kernel
 Application                                             Soft
                  Direct3D        Mode                          Mode
      A                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer

          Command Flow                 Data Flow
                                                                                                  A   B
                                                                                              B
                                                                                          C
                                  User                          Kernel                                          GPU
 Application                                             Soft                         A
                  Direct3D        Mode                          Mode                                         HARDWARE
      B                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer


          Command Flow                 Data Flow



                                                                                                  Hardware
                                  User                          Kernel                             Queue
 Application                                             Soft
                  Direct3D        Mode                          Mode
      C                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer




25 | The Programmer’s Guide to the APU Galaxy | June 2011
TODAY’S COMMAND AND DISPATCH FLOW

          Command Flow                 Data Flow




                                  User                          Kernel
 Application                                             Soft
                  Direct3D        Mode                          Mode
      A                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer

          Command Flow                 Data Flow
                                                                                                  A   B
                                                                                              B
                                                                                          C
                                  User                          Kernel                                          GPU
 Application                                             Soft                         A
                  Direct3D        Mode                          Mode                                         HARDWARE
      B                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer


          Command Flow                 Data Flow



                                                                                                  Hardware
                                  User                          Kernel                             Queue
 Application                                             Soft
                  Direct3D        Mode                          Mode
      C                                                 Queue
                                  Driver                        Driver
                                           Command Buffer                DMA Buffer




26 | The Programmer’s Guide to the APU Galaxy | June 2011
FUTURE COMMAND AND DISPATCH FLOW

                                                            C   C
                                                       C            C
 Application                                                                         Application codes to the
      C                                                             C                 hardware
                                                                                     User mode queuing
                                                        Hardware Queue
                         Optional Dispatch
                              Buffer
                                                                                     Hardware scheduling
                                                                B
                                                            B                        Low dispatch times
 Application                                           B                    GPU
      B                                                                  HARDWARE


                                                                                     No APIs
                                                        Hardware Queue
                                                                                     No Soft Queues
                                                                A
                                                            A                        No User Mode Drivers
                                                       A
 Application                                                                         No Kernel Mode Transitions
      A
                                                                                     No Overhead!

                                                        Hardware Queue



27 | The Programmer’s Guide to the APU Galaxy | June 2011
FUTURE COMMAND AND DISPATCH CPU <-> GPU

                                                            Application / Runtime




                  CPU1                                         CPU2                 GPU




28 | The Programmer’s Guide to the APU Galaxy | June 2011
FUTURE COMMAND AND DISPATCH CPU <-> GPU

                                                            Application / Runtime




                  CPU1                                         CPU2                 GPU




29 | The Programmer’s Guide to the APU Galaxy | June 2011
FUTURE COMMAND AND DISPATCH CPU <-> GPU

                                                            Application / Runtime




                  CPU1                                         CPU2                 GPU




30 | The Programmer’s Guide to the APU Galaxy | June 2011
FUTURE COMMAND AND DISPATCH CPU <-> GPU

                                                            Application / Runtime




                  CPU1                                         CPU2                 GPU




31 | The Programmer’s Guide to the APU Galaxy | June 2011
WHERE ARE WE TAKING YOU?

      Switch the compute, don’t move
                                                              Platform Design Goals
      the data!

       Every processor now has serial and                   Easy support of massive data sets
        parallel cores
                                                             Support for task based programming
       All cores capable, with performance                   models
        differences
                                                             Solutions for
       Simple and                                            all platforms
        efficient program
        model                                                Open to all




32 | The Programmer’s Guide to the APU Galaxy | June 2011
THE FUTURE OF HETEROGENEOUS COMPUTING

The architectural path for the future is clear
    – Programming patterns established on
      Symmetric Multi-Processor (SMP)
      systems migrate to the heterogeneous
      world
    – An open architecture, with published
      specifications and an open source
      execution software stack
    – Heterogeneous cores working together
      seamlessly in coherent memory
    – Low latency dispatch
    – No software fault lines




33 | The Programmer’s Guide to the APU Galaxy | June 2011
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”

Más contenido relacionado

La actualidad más candente

HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation OverviewHSA Foundation
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
What's New in IBM Java 8 SE?
What's New in IBM Java 8 SE?What's New in IBM Java 8 SE?
What's New in IBM Java 8 SE?Tim Ellison
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Computejworth
 

La actualidad más candente (20)

HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation Overview
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
What's New in IBM Java 8 SE?
What's New in IBM Java 8 SE?What's New in IBM Java 8 SE?
What's New in IBM Java 8 SE?
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 

Similar a AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”

Cots moves to multicore: AMD
Cots moves to multicore: AMDCots moves to multicore: AMD
Cots moves to multicore: AMDKonrad Witte
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Ximea - the pc camera, 90 gflps smart camera
Ximea  - the pc camera, 90 gflps smart cameraXimea  - the pc camera, 90 gflps smart camera
Ximea - the pc camera, 90 gflps smart cameraXIMEA
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureBob Rhubart
 
Tech Ed09 India Ver M New
Tech Ed09 India Ver M NewTech Ed09 India Ver M New
Tech Ed09 India Ver M Newrsnarayanan
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersEric Van Hensbergen
 
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...qedanne
 
Cross platform computer vision optimization
Cross platform computer vision optimizationCross platform computer vision optimization
Cross platform computer vision optimizationYoss Cohen
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)IBM Danmark
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
Erjang - A journey into Erlang-land
Erjang - A journey into Erlang-landErjang - A journey into Erlang-land
Erjang - A journey into Erlang-landKresten Krab Thorup
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...chiportal
 
IBM System z - zEnterprise a future platform for enterprise systems
IBM System z - zEnterprise a future platform for enterprise systemsIBM System z - zEnterprise a future platform for enterprise systems
IBM System z - zEnterprise a future platform for enterprise systemsIBM Sverige
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...ChangWoo Min
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 

Similar a AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” (20)

Cots moves to multicore: AMD
Cots moves to multicore: AMDCots moves to multicore: AMD
Cots moves to multicore: AMD
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Ximea - the pc camera, 90 gflps smart camera
Ximea  - the pc camera, 90 gflps smart cameraXimea  - the pc camera, 90 gflps smart camera
Ximea - the pc camera, 90 gflps smart camera
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the Future
 
Tech Ed09 India Ver M New
Tech Ed09 India Ver M NewTech Ed09 India Ver M New
Tech Ed09 India Ver M New
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
 
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...Http   Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
Http Jaoo.Com.Au Sydney 2008 File Path= Jaoo Aus2008 Slides Dave Thomas Lif...
 
Cross platform computer vision optimization
Cross platform computer vision optimizationCross platform computer vision optimization
Cross platform computer vision optimization
 
Pgopencl
PgopenclPgopencl
Pgopencl
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Erjang - A journey into Erlang-land
Erjang - A journey into Erlang-landErjang - A journey into Erlang-land
Erjang - A journey into Erlang-land
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
 
IBM System z - zEnterprise a future platform for enterprise systems
IBM System z - zEnterprise a future platform for enterprise systemsIBM System z - zEnterprise a future platform for enterprise systems
IBM System z - zEnterprise a future platform for enterprise systems
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
Ph.D. Thesis presentation
Ph.D. Thesis presentationPh.D. Thesis presentation
Ph.D. Thesis presentation
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 

Más de HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

Más de HSA Foundation (11)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”

  • 1. THE PROGRAMMER’S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD
  • 2. THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU is today. 2 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 3. OUTLINE The APU today and its programming environment The future of the heterogeneous platform AMD Fusion System Architecture Roadmap Software evolution A visual view of the new command and data flow 3 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 4. APU: ACCELERATED PROCESSING UNIT The APU has arrived and it is a great advance over previous platforms Combines scalar processing on CPU with parallel processing on the GPU and high bandwidth access to memory How do we make it even better going forward? – Easier to program – Easier to optimize – Easier to load balance – Higher performance – Lower power 4 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 5. LOW POWER E-SERIES AMD FUSION APU: “ZACATE” E-Series APU 2 x86 Bobcat CPU cores Array of Radeon™ Cores  Discrete-class DirectX® 11 performance  80 Stream Processors 3rd Generation Unified Video Decoder PCIe® Gen2 Single-channel DDR3 @ 1066 18W TDP Performance: Up to 8.5GB/s System Memory Bandwidth Up to 90 Gflop of Single Precision Compute 5 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 6. TABLET Z-SERIES AMD FUSION APU: “DESNA” Z-Series APU 2 x86 “Bobcat” CPU cores Array of Radeon™ Cores  Discrete-class DirectX® 11 performance  80 Stream Processors 3rd Generation Unified Video Decoder PCIe® Gen2 Single-channel DDR3 @ 1066 6W TDP w/ Local Hardware Thermal Control Performance: Up to 8.5GB/s System Memory Bandwidth Suitable for sealed, passively cooled designs 6 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 7. MAINSTREAM A-SERIES AMD FUSION APU: “LLANO” A-Series APU Up to four x86 CPU cores  AMD Turbo CORE frequency acceleration Array of Radeon™ Cores  Discrete-class DirectX® 11 performance 3rd Generation Unified Video Decoder Blu-ray 3D stereoscopic display PCIe® Gen2 Dual-channel DDR3 45W TDP Performance: Up to 29GB/s System Memory Bandwidth Up to 500 Gflops of Single Precision Compute 7 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 8. COMMITTED TO OPEN STANDARDS AMD drives open and de-facto standards – Compete on the best implementation Open standards are the basis for large ecosystems Open standards always win over time DirectX® – SW developers want their applications to run on multiple platforms from multiple hardware vendors 8 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 9. A NEW ERA OF PROCESSOR PERFORMANCE Heterogeneous Single-Core Era Multi-Core Era Systems Era Enabled by: Constrained by: Enabled by: Constrained by: Enabled by: Temporarily  Moore’s Law Power  Moore’s Law Power  Abundant data Constrained by:  Voltage Complexity  SMP Parallel SW parallelism Programming Scaling architecture Scalability  Power efficient models GPUs Comm.overhead Assembly  C/C++  Java … pthreads  OpenMP / TBB … Shader  CUDA OpenCL !!! Modern Application Single-thread Performance Performance ? Throughput Performance we are here we are here we are here Time Time (# of processors) Time (Data-parallel exploitation) 9 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 10. EVOLUTION OF HETEROGENEOUS COMPUTING Excellent Architected Era AMD Fusion System Architecture Architecture Maturity & Programmer Accessibility Standards Drivers Era GPU Peer Processor OpenCL™, DirectCompute  Mainstream programmers Proprietary Drivers Era Driver-based APIs  Full C++  GPU as a co-processor Graphics & Proprietary  Expert programmers  Unified coherent address space Driver-based APIs  C and C++ subsets  Task parallel runtimes  Compute centric APIs , data  Nested Data Parallel programs  “Adventurous” programmers types  User mode dispatch  Multiple address spaces with  Pre-emption and context  Exploit early programmable explicit data movement switching “shader cores” in the GPU  Specialized work queue based  Make your program look like structures “graphics” to the GPU  Kernel mode dispatch See Herb Sutter’s Keynote tomorrow for a cool example of  CUDA™, Brook+, etc plans for the architected era! Poor 2002 - 2008 2009 - 2011 2012 - 2020 10 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 11. FSA FEATURE ROADMAP Physical Optimized Architectural System Integration Platforms Integration Integration GPU compute Integrate CPU & GPU GPU Compute C++ Unified Address Space context switch in silicon support for CPU and GPU GPU graphics GPU uses pageable pre-emption Unified Memory User mode scheduling system memory via Controller CPU pointers Quality of Service Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Extend to between CPU & GPU Technology and GPU Discrete GPU 11 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 12. FUSION SYSTEM ARCHITECTURE – AN OPEN PLATFORM Open Architecture, published specifications – FSAIL virtual ISA – FSA memory model – FSA dispatch ISA agnostic for both CPU and GPU Inviting partners to join us, in all areas – Hardware companies – Operating Systems – Tools and Middleware – Applications FSA review committee planned 12 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 13. FSA INTERMEDIATE LAYER - FSAIL FSAIL is a virtual ISA for parallel programs – Finalized to ISA by a JIT compiler or “Finalizer” Explicitly parallel – Designed for data parallel programming Support for exceptions, virtual functions, and other high level language features Syscall methods – GPU code can call directly to system services, IO, printf, etc Debugging support 13 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 14. FSA MEMORY MODEL Designed to be compatible with C++0x, Java and .NET Memory Models Relaxed consistency memory model for parallel compute performance Loads and stores can be re-ordered by the finalizer Visibility controlled by: – Load.Acquire, Store.Release – Fences – Barriers 14 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 15. Driver Stack FSA Software Stack Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Domain Libraries FSA Domain Libraries OpenCL™ 1.x, DX Runtimes, FSA Runtime User Mode Drivers Task Queuing FSA JIT Libraries FSA Kernel Graphics Kernel Mode Driver Mode Driver Hardware - APUs, CPUs, GPUs AMD user mode component AMD kernel mode component All others contributed by third parties or AMD 15 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 16. OPENCL™ AND FSA FSA is an optimized platform architecture for OpenCL™ – Not an alternative to OpenCL™ OpenCL™ on FSA will benefit from – Avoidance of wasteful copies – Low latency dispatch – Improved memory model – Pointers shared between CPU and GPU FSA also exposes a lower level programming interface, for those that want the ultimate in control and performance – Optimized libraries may choose the lower level interface 16 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 17. TASK QUEUING RUNTIMES Popular pattern for task and data parallel programming on SMP systems today Characterized by: – A work queue per core – Runtime library that divides large loops into tasks and distributes to queues – A work stealing runtime that keeps the system balanced FSA is designed to extend this pattern to run on heterogeneous systems 17 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 18. TASK QUEUING RUNTIME ON CPUS Work Stealing Runtime Q Q Q Q CPU CPU CPU CPU Worker Worker Worker Worker X86 CPU X86 CPU X86 CPU X86 CPU CPU Threads GPU Threads Memory 18 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 19. TASK QUEUING RUNTIME ON THE FSA PLATFORM Work Stealing Runtime Q Q Q Q Q CPU CPU CPU CPU GPU Worker Worker Worker Worker Manager X86 CPU X86 CPU X86 CPU X86 CPU Radeon™ GPU CPU Threads GPU Threads Memory 19 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 20. TASK QUEUING RUNTIME ON THE FSA PLATFORM Work Stealing Runtime Q Q Q Q Q CPU CPU CPU CPU GPU Memory Worker Worker Worker Worker Manager X86 CPU X86 CPU X86 CPU X86 CPU Fetch and Dispatch S S S S S I I I I I M M M M M CPU Threads GPU Threads Memory D D D D D 20 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 21. FSA SOFTWARE EXAMPLE - REDUCTION float foo(float); float myArray[…]; Task<float, ReductionBin> task([myArray]( IndexRange<1> index) [[device]] { float sum = 0.; for (size_t I = index.begin(); I != index.end(); i++) { sum += foo(myArray[i]); } return sum; }); float result = task.enqueueWithReduce( Partition<1, Auto>(1920), [] (int x, int y) [[device]] { return x+y; }, 0.); 21 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 22. HETEROGENEOUS COMPUTE DISPATCH How compute dispatch operates today in the driver model How compute dispatch improves tomorrow under FSA 22 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 23. TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer A GPU HARDWARE Hardware Queue 23 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 24. TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware User Kernel Queue Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 24 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 25. TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow A B B C User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware User Kernel Queue Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 25 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 26. TODAY’S COMMAND AND DISPATCH FLOW Command Flow Data Flow User Kernel Application Soft Direct3D Mode Mode A Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow A B B C User Kernel GPU Application Soft A Direct3D Mode Mode HARDWARE B Queue Driver Driver Command Buffer DMA Buffer Command Flow Data Flow Hardware User Kernel Queue Application Soft Direct3D Mode Mode C Queue Driver Driver Command Buffer DMA Buffer 26 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 27. FUTURE COMMAND AND DISPATCH FLOW C C C C Application  Application codes to the C C hardware  User mode queuing Hardware Queue Optional Dispatch Buffer  Hardware scheduling B B  Low dispatch times Application B GPU B HARDWARE  No APIs Hardware Queue  No Soft Queues A A  No User Mode Drivers A Application  No Kernel Mode Transitions A  No Overhead! Hardware Queue 27 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 28. FUTURE COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 28 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 29. FUTURE COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 29 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 30. FUTURE COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 30 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 31. FUTURE COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 31 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 32. WHERE ARE WE TAKING YOU? Switch the compute, don’t move Platform Design Goals the data!  Every processor now has serial and  Easy support of massive data sets parallel cores  Support for task based programming  All cores capable, with performance models differences  Solutions for  Simple and all platforms efficient program model  Open to all 32 | The Programmer’s Guide to the APU Galaxy | June 2011
  • 33. THE FUTURE OF HETEROGENEOUS COMPUTING The architectural path for the future is clear – Programming patterns established on Symmetric Multi-Processor (SMP) systems migrate to the heterogeneous world – An open architecture, with published specifications and an open source execution software stack – Heterogeneous cores working together seamlessly in coherent memory – Low latency dispatch – No software fault lines 33 | The Programmer’s Guide to the APU Galaxy | June 2011