SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
ITRI
Industrial Technology
Research Institute
Heterogeneous System Architecture
(HSA) Design
王振傑 (Jay Wang)
嵌入式系統與晶片技術組 -系統架構設計部 (D200)
資訊與通訊研究所 (ICL)
ccwang.jay@itri.org.tw
2015-04-30
2
嵌入式系統硬體技術部 (D100)
系統架構設計部 (D200)
嵌入式系統軟體技術部 (D300)
智慧電子產業推動部 (D400)
系統整合與應用部 (D500)
嵌入式系統與晶片技術組
Division for Embedded System
and SoC Technology
工業技術研究院
資訊與通訊研究所
HSA Design (2015-04-30) @ NCKU, Tainan
What is HSA?
3
An intelligent computing architecture that enables CPU, GPU and other
processors to work in harmony on a single piece of silicon by seamlessly
moving the right tasks to the best suited processing element.
HSA Design (2015-04-30) @ NCKU, Tainan
Three Eras of Processor Performance
4
?
Single-thread
Performance
Time
we are
here
Enabled by:
 Moore’s Observation
 Voltage Scaling
 Micro-Architecture
Constrained by:
 Power
 Complexity
Single-Core Era
ModernApplication
Performance
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by:
 Moore’s Observation
 Abundant data parallelism
 Power efficient data parallel
processing (GPUs)
Constrained by:
 Programming models
 Communication overheads
Throughput
Performance
Time (# of processors)
we are
here
Enabled by:
 Moore’s Observation
 Desire for Throughput
 20 years of SMP arch
Constrained by:
 Power
 Parallel SW availability
 Scalability
Multi-Core Era
Assembly  C/C++  Java … pthreads  OpenMP / TBB …
Shader  CUDA OpenCL
 C++ and Java
SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Foundation
5
 Founded in June 2012
 www.hsafoundation.com
 Developing a new platform for heterogeneous
systems
 Launched the official v1.0 specification set in
March 2015
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Foundation Members (April 2015)
6
Founders
Promoters
Contributors
Academics
Supporters
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Platform Model
7
In HSA system, a regular device is called an HSA agent, and if the HSA
agent can run kernels then it is also an HSA kernel agent.
Compute Unit (CU)
Compute Unit (CU)
Compute Unit (CU)
Compute Unit (CU)
Compute Unit
(CU)
Lane
(Processing Element)
Host CPU
(OS, HSA runtime)
HSA Kernel Agent
Compute Unit (CU)
Compute Unit (CU)
Wavefront Size
(A power of 2 in the range from 1 to 256 inclusive)
HSA Agent
SIMD
Data Parallel
Workloads
Serial and Task
Parallel Workloads
Jay Wang, Taiwan, 2015.03
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Intermediate Language (HSAIL)
8
The HSA Foundation members are building a heterogeneous compute software ecosystem
built on open, royalty-free industry standards and open-source software: the HSA
runtimes and compilation tools are based on open-source technologies such as LLVM and
GCC. ( https://github.com/HSAFoundation )
Company D
GPU
...
Other
Hardware
Accelerator
Company B
CPUs
Finalizer
(Company A - CPU)
Finalizer
(Company B - CPU)
Finalizer
(Company C - GPU)
Finalizer
(Company D - GPU)
Finalizer
(Company E - DSP)
Finalizer
(...)
OpenMP DSL
Virtual Parallel
ISA
CLOC –
Compile OpenCL
kernels to HSAIL
HSA Intermediate Language (HSAIL)
OpenCL C++AMP Java
Company A
CPUs
Company C
GPU
Company E
DSP
Parallel
Programming
Languages
HSA Runtime
Libraries
Jay Wang, Taiwan,
2014.10
HSA Design (2015-04-30) @ NCKU, Tainan
HSAIL Programming Model
9
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Runtime Stack
10
HSA Kernel Agent
CPU
HSA Runtime
HSA
Application
(HSA Agent)
Language Runtime
(ex: OpenCL runtime)
User Application
( CPU Code + HSAIL Kernel Code )
HSA Kernel Agent
GPU
HSA
Kernel Mode
Driver
Host CPU
HSA Kernel Agent
DSP
HSA User Mode Queuing (Architected Queuing Language)
+
HSA Signaling
Jay Wang, Taiwan, 2015.04
Target ISA
HSA
Finalizers
HSA Design (2015-04-30) @ NCKU, Tainan
Kernel Execution
11
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Memory Consistency Model
(Relaxed Model)
Second Operation
ld_rlx
st_rlx
atomic_rlx
atomicNoRet_rlx
atomic_acq
atomicNoRet_acq
fence_acq
atomic_rel
atomicNoRet_rel
fence_rel
atomic_ar
atomicNoRet_ar
fence_ar
First
Operation
ld_rlx or st_rlx yes yes yes yes no no
atomic_rlx
atomicNoRet_rlx
yes yes yes no no no
atomic_acq
atomicNoRet_acq
fence_acq
no no no no no no
atomic_rel
atomicNoRet_rel
yes yes no no no no
fence_rel yes no no no no no
atomic_ar
atomicNoRet_ar
fence_ar
no no no no no no
12
relaxed ;
…..
acquire ;
…..
release ;
…..
acq_rel ;
…..
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
13
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
Legacy GPU Compute
 Multiple memory pools and address spaces
 Data copies before/after GPU compute
14
System Memory GPU Memory
1
23
Host CPUs GPU
Virtual Memory #1 Virtual Memory #2
(HSA Agent)
(HSA Kernel Agent) Jay Wang, Taiwan, 2015.04
HSA Design (2015-04-30) @ NCKU, Tainan
Host CPUs GPU(HSA Agent)
(HSA Kernel Agent)
Shared Virtual Memory
System Memory GPU Memory
Jay Wang, Taiwan, 2015.04
Shared Virtual Memory (HSA)
15
32-bit HSA System
(32 bits VA)
64-bit HSA System
(≥ 48 bits VA)
IOMMU
OS Page Table
MMU
HSA Design (2015-04-30) @ NCKU, Tainan
Group Segments within
Flat Address Space
Global Segment within
Flat Address Space
Private Segments within
Flat Address Space
Kernel Dispatch Grid
Work-Group Work-Group
WI WI WI
Private Segment
WI WI WI
Private Segment
Group Segment
Group Segment
Global Segment
Flat Address SpaceHSA Agent
$s0
$s1
$s2
$s3
$s4
$s5
$s6
$s7
$s124
$s125
$s126
$s127
32-bit
Registers
( s registers)
$c0
$c1
$c2
$c3
$c4
$c5
$c6
$c7
$d0
$d1
$d2
$d3
$d62
$d63
64-bit
Registers
( d registers)
$q0
$q31
$q1
128-bit
Registers
( q registers)
1-bit
Control Registers
( c registers)
Local Registers per Work-Item
Jay Wang, Taiwan,
2014.10
HSA Memory Hierarchy
16
1) Global
2) Group
3) Private
4) Kernarg
5) Readonly
6) Spill
7) Arg Virtual Address Range Reservation
(System Memory or Device Local Memory)
HSA Design (2015-04-30) @ NCKU, Tainan
Group Segments within
Flat Address Space
Global Segment within
Flat Address Space
Private Segments within
Flat Address Space
Kernel Dispatch Grid
Work-Group Work-Group
WI WI WI
Private Segment
WI WI WI
Private Segment
Group Segment
Group Segment
Global Segment
Flat Address Space
HSA
Kernel Agent
Host CPUs
Jay Wang, Taiwan,
2015.04
Cache Coherency Domains
17
System Memory
Cache
Cache
Cache
Coherency
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
18
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
Signaling and Synchronization
 The required mechanisms for HSAIL and the HSA runtime are:
 Allocate/Destroy an HSA signal
 Read the current HSA signal value
 Wait on an HSA signal to meet a specified condition (with a maximum wait duration
requested)
 Send an HSA signal value
 Atomic read-modify-write an HSA signal value
19
sem_init()
sem_wait()
sem_post()
sem_destroy()
pthread_mutex_init()
pthread_mutex_lock()
pthread_mutex_unlock()
pthread_mutex_destroy()
Signal Handle
(hsa_signal_t)
Signal Value
(hsa_signal_value_t)
HSA
Kernel Agent
Host CPU
HSA Runtime
APIs
HSAIL
Instructions
Implementation-
defined data
Sig32 or Sig64
Jay Wang, Taiwan, 2015.04
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Runtime APIs for Signaling
20
HSA Runtime APIs ( for HSA application )
• hsa_signal_create ( )
• hsa_signal_destroy ( )
• hsa_signal_load_{acquire, relaxed} ( )
• hsa_signal_store_{relaxed, release} ( )
• hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_add_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_and_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_or_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( )
• hsa_signal_wait_{acquire, relaxed} ( )
HSA Runtime Programmer’s Reference Manual (v1.0)
2.4 Signals
HSA Design (2015-04-30) @ NCKU, Tainan
HSAIL Instructions for Signaling
21
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model,
Compiler Writer’s Guide, and Object Format (BRIG) (v1.0)
6.8 Notification (signal) Instructions
HSA Design (2015-04-30) @ NCKU, Tainan
Atomic Memory Operations
 HSA requires the following standard atomic memory operations to be
supported by HSA Kernel Agents (other HSA Agents only need to
support the subset of these operations required by their role in the
system):
 Load from memory
 Store to memory
 Fetch from memory, apply logic operation (bitwise AND/OR/XOR)
with one addition operand, and store back.
 Fetch from memory, apply integer arithmetic operation (add,
subtract, increment, decrement, minimum, maximum) with one
addition operand, and store back.
 Exchange memory location with operand.
 Compare-and-swap (CAS); load memory location, compare with first
operand, if equal than store second operand back to memory
location.
22
HSA Design (2015-04-30) @ NCKU, Tainan
Timestamp
(64-bit)
Host CPU
HSA
Runtime
APIs
HSAIL
Clock
Instruction
Timestamp
Frequency
(1~400MHz)
HSA Runtime
HSA
Kernel Agent
Jay Wang, Taiwan, 2015.04
HSA System Timestamp
 The HSA system provide for a low overhead mechanism of determining the
passing of time.
 A system timestamp is required that can be read from HSAIL or through the
HSA runtime.
 It is also possible to determine the system timestamp frequency through the
HSA runtime.
23
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
24
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
User Model Queuing
 Multiple user-level
command queues
 Runtime-allocated
 Architected Queuing
Language (AQL)
25
HSA Kernel Agent
K
A
CPU
A
HSA Runtime
HSA
Application
(HSA Agent)
CPU
Language
Runtime
(ex: OpenCL runtime)
User Application
HSA
Finalizers
HSA Kernel Agent
GPU
HSA
Kernel Mode
Driver
CPU
K
A
A
Jay Wang, Taiwan, 2015.04
K
AQL
Kernel Dispatch Queue
A
AQL
Agent Dispatch Queue
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Packet Processor
26
type
features
base_address
doorbell_signal
0x00
0x04
0x08
0x10
0x0C
0x14
size0x18
reserved (must be 0)0x1C
write_index (64-bit)read_index (64-bit)
base_address +
( (read_index%size) * AQL packet size )
base_address +
( (write_index%size) * AQL packet size )
Support single or multiple producers
Support KERNEL_DISPATCH and/or
AGENT_DISPATCH packet
AQL Packet (64 Bytes)
User Mode Queue Structure (hsa_queue_t)
Ring Buffer
id
0x20
0x24
Jay Wang, Taiwan, 2015.03
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Kernel Agent
K
A
A
HSA Runtime
HSA Application
(HSA Agent)
CPU
Language Runtime
(ex: OpenCL runtime)
User Application
GPU
Jay Wang, Taiwan, 2015.04
User Mode Queue Operations
HSA Runtime APIs ( for HSA application )
• hsa_queue_create ( )
• hsa_soft_queue_create ( )
• hsa_queue_destroy ( )
• hsa_queue_inactivate ( )
• hsa_queue_load_write_index_{acquire, relaxed} ( )
• hsa_queue_store_write_index_{relaxed, release} ( )
• hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( )
• hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( )
• hsa_queue_load_read_index_{acquire, relaxed} ( )
• hsa_queue_store_read_index_{relaxed, release} ( )
27
HSAIL Instructions ( for HSA Kernel Agent)
• queueid_u32 dest
• queueptr_uLength dest
• ldqueuewriteindex_segment_order_u64 dest, address
• stqueuewriteindex_segment_order_u64 address, src
• casqueuewriteindex_segment_order_u64 dest, address, src0, src1
• addqueuewriteindex_segment_order_u64 dest, address, src
• ldqueuereadindex_segment_order_u64 dest, address
• stqueuereadindex_segment_order_u64 address, src
HSA Design (2015-04-30) @ NCKU, Tainan
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
header
workgroup_size_x
kernel_object
kernarg_address
dimensions (2-bit)
workgroup_size_y
workgroup_size_z
grid_size_x
reserved
grid_size_y
grid_size_z
private_segment_size_bytes
group_segment_size_bytes
reserved
completion_signal
Kernel Dispatch Packet
031 1516
Jay Wang, Taiwan, 2015.03
header
return_address
arg0
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
type
reserved
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
arg1
arg2
arg3
reserved
completion_signal
Agent Dispatch Packet
031 1516
Jay Wang, Taiwan, 2015.03
header
dep_signal0
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
reserved
reserved
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
reserved
completion_signal
dep_signal1
dep_signal2
dep_signal3
dep_signal4
Barrier-AND / Barrier-OR Packet
031 1516
Jay Wang, Taiwan, 2015.03
AQL Packet Types
28
 HSA signaling object handle used to indicate completion of the job.
format (8-bit)
barrier (1-bit)
acquire_fence_scope (2-bit)
release_fence_scope (2-bit)
reserved (3-bit)
0101112 9 8 71315
AQL_FORMAT
0 VENDOR_SPECIFIC
1 INVALID
2 KERNEL_DISPATCH
3 BARRIER_AND
4 AGENT_DISPATCH
5 BARRIER_OR
Jay Wang, Taiwan, 2015.03
HSA Design (2015-04-30) @ NCKU, Tainan
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
header
workgroup_size_x
kernel_object
kernarg_address
dimensions (2-bit)
workgroup_size_y
workgroup_size_z
grid_size_x
reserved
grid_size_y
grid_size_z
private_segment_size_bytes
group_segment_size_bytes
reserved
completion_signal
031 1516
Jay Wang, Taiwan, 2015.03
Kernel Dispatch Packet
29
Work-group Size
Grid Size
Segment Size
Pointer to the Kernel
Pointer to the
arguments
HSA Design (2015-04-30) @ NCKU, Tainan
header
return_address
arg0
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
type
reserved
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
arg1
arg2
arg3
reserved
completion_signal
031 1516
Jay Wang, Taiwan, 2015.03
Agent Dispatch Packet
30
64-bit direct or indirect
arguments
Pointer to location to
store the function
return value(s) in
The function to be performed by the destination agent.
The function codes are application defined.
HSA Design (2015-04-30) @ NCKU, Tainan
header
dep_signal0
0x00
0x04
0x08
0x10
0x0C
0x14
0x18
0x1C
reserved
reserved
0x20
0x24
0x28
0x30
0x2C
0x34
0x38
0x3C
reserved
completion_signal
dep_signal1
dep_signal2
dep_signal3
dep_signal4
031 1516
Jay Wang, Taiwan, 2015.03
Barrier-AND / Barrier-OR Packet
 The Barrier packet defines dependencies for the HSA Packet Processor
to monitor.
 The HSA Packet Processor will not launch any further packets until the Barrier-
AND / Barrier-OR packet is complete.
31
Handles for dependent
signaling objects to be
evaluated by the packet
processor.
HSA Design (2015-04-30) @ NCKU, Tainan
Packet Process Flow
 All preceding packets in the queue must have completed their launch phase.
 If the barrier bit in the packet header is set than all preceding packets in the
queue must have completed.
 An acquire memory fence is applied for Kernel/Agent Dispatch packets
before the packet enters the active phase.
 Kernel Dispatch packets and Agent Dispatch packets execute on the Kernel
Agent/Agent, and the active phase ends when the task completes.
 Barrier-AND and Barrier-OR packets remain in the active phase until their
condition is met.
 If the packet is a Barrier-AND or Barrier-OR packet then an acquire memory
fence is applied as the first step.
 After execution of the acquire fence, the memory release fence is applied.
 After the memory release fence completes, the signal specified by the
completion_signal field in the AQL packet is signaled with a decrementing
atomic operation.
32
Launch Phase
Active Phase
Completion Phase
HSA Design (2015-04-30) @ NCKU, Tainan
Barrier-bit Example
33
completionSignal
AQL Packet
Barrier bit = 1
DequeueEnqueue
LaunchPhase
ActivePhase
CompletionPhase
Jay Wang, Taiwan, 2015.04
If barrier bit is set, then
processing of the packet will
only begin when all preceding
packets are complete.
HSA Design (2015-04-30) @ NCKU, Tainan
Barrier-AND Packet Example
34
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
35
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
Agent Scheduling
36
AQL packet
(Agent/Kernel Dispatch packet or Barrier-AND/OR packet)
Agent
Scheduling
AQL Queue
AQL Queue
AQL Queue
AQL Queue
Non-HSA Task Pool
AQL Queue
Application #1
Application #2
Application #3
HSA
(Kernel)
Agent
Poke!
(1) Task execution completed
(3) Barrier packet completed
Agt
Agt
Agt
Agt
Agt
Agt
Agt
Jay Wang, Taiwan, 2015.04
(2) New AQL packet submission
HSA Design (2015-04-30) @ NCKU, Tainan
Kernel Agent Context Switching
37
AQL Queue
AQL Queue
AQL Queue
AQL Queue
Non-HSA Task Pool
AQL Queue
#1
#2
#3
HSA
Agent
Scheduling
Compute Unit
(CU)
Compute Unit
(CU)
Compute Unit
(CU)
HSA Kernel Agent
Context
Switching
Kernel
Program
Kernel
Program
Kernel
Program
WG
WG
WG
1. Switch ( Required )
2. Preempt ( Required as soon as possible )
3. Terminate and context reset (Terminated as fast as possible)
Jay Wang, Taiwan, 2015.04
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
38
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
FP Exception Reporting
 A Kernel Agent shall report certain defined exceptions related to the
execution of the HSAIL code to the HSA Runtime.
39
Lane
0
Lane
1
Lane
2
Lane
(N-1)
Lane
3
Work
Item
Work
Item
Work
Item
Work
Item
Work
Item
Lane
4
Work
Item
Work-Group 0 Work-Group 2Work-Group 1 Work-Group X
avefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y
Work-Group 1
Compute Unit (CU)
PC
HSA Kernel Agent
Wavefront 2
SIMD (Single Instruction, Multiple Data) style
HSA Runtime
Host CPU
Exception Module
Control Directive
enablebreakexceptions #EC
Signaling
Exception
Code
Description
Invalid operatoin
Divide-by-zero
Overflow
Underflow
Inexact
0
1
2
3
4
IEEE754-2008
Jay Wang, Taiwan, 2015.04
enabledetectexceptions #EC
DETECT
Policy
BREAK
Policy
BreakEn bits
DetectEn bits
Status bits
Exception
Handler
HSAIL Instruction
cleardetectexcept_u32
getdetectexcept_u32
setdetectexcept_u32
HSA Design (2015-04-30) @ NCKU, Tainan
Debug Infrastructure
 The Kernel Agent shall provide mechanisms to allow system software
and some select application software (for example, debuggers and
profilers) to set breakpoints and collect throughput information for
profiling.
40
Lane
0
Lane
1
Lane
2
Lane
(N-1)
Lane
3
Work
Item
Work
Item
Work
Item
Work
Item
Work
Item
Lane
4
Work
Item
Work-Group 0 Work-Group 2Work-Group 1
Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3
Grid
Work-Group 1
Compute
Unit
PC
HSA Kernel Agent
Wavefront 2
SIMD (Single Instruction, Multiple Data) style
Host CPU
(HSA Agent)
Debuggers
HSA
Kernel Agent
Debug Inteface
Profilers
Debug Module
Conditional
Breakpoint
Memory
Breakpoint
Jay Wang, Taiwan, 2015.04
Instruction
Breakpoint
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
41
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
Execution Environment
42
You have 2 OpenCL platform(s)
----------------------------------------------
Platform[0].Name = NVIDIA CUDA
Platform[0].Vendor = NVIDIA Corporation
Platform[0].Version = OpenCL 1.1 CUDA 4.2.1
Platform[0].Profile = FULL_PROFILE
----------------------------------------------
Platform[1].Name = Intel(R) OpenCL
Platform[1].Vendor = Intel(R) Corporation
Platform[1].Version = OpenCL 1.2
Platform[1].Profile = FULL_PROFILE
----------------------------------------------
Platform[0] has 1 device(s)
----------------------------------------------
Device[0].Type = CL_DEVICE_TYPE_GPU
Device[0].Name = GeForce GT 625
Device[0].Vendor = NVIDIA Corporation
Device[0].Version = OpenCL 1.1 CUDA
Device[0].DriverVersion = 320.49
Device[0].Profile = FULL_PROFILE
Device[0].OpenCL_C = OpenCL C 1.1
Device[0].MaxComputeUnits = 1
Device[0].MaxWiDimensions = 3
Device[0].MaxWiSize = (1024,1024,64)
Device[0].MaxWgSize = 1024
Device[0].MaxClkFrequency = 1747 MHz
Device[0].AddrSpaceSize = 32 bits
Platform[1] has 1 device(s)
----------------------------------------------
Device[0].Type = CL_DEVICE_TYPE_CPU
Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz
Device[0].Vendor = Intel(R) Corporation
Device[0].Version = OpenCL 1.2 (Build 80752)
Device[0].DriverVersion = 3.0.1.15216
Device[0].Profile = FULL_PROFILE
Device[0].OpenCL_C = OpenCL C 1.2
Device[0].MaxComputeUnits = 4
Device[0].MaxWiDimensions = 3
Device[0].MaxWiSize = (1024,1024,1024)
Device[0].MaxWgSize = 1024
Device[0].MaxClkFrequency = 3100 MHz
Device[0].AddrSpaceSize = 32 bits
OpenCL APIs
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Platform Topology Discovery
 HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O
43
HSA Platform Node 2
Node 0
Add-In Board (optional)
HSA discrete GPU
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Device Local
Memory
coherent
non-coherent
Mem
Mem
HSA MMU
SBIOS
UEFI
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
Mem
Node 1
PCIe
BridgePCIE
System Memory
(cacheable)
coherent
(non-cacheable)
non-coherent
HSA APU
GPU
H-CU
H-CU
H-CU
CPU
Core
Core
Core
Mem HSA MMU
Add-In Board (optional)
HSA discrete GPU
GPU
H-CU
H-CU
H-CU
Device Local
Memory
coherent
non-coherent
PCIE
Mem
VBIOS
UEFI GOP
SocketInterconnect
Node 3
PCIE
Node 4
PCIE
VBIOS
UEFI GOP
HSA Design (2015-04-30) @ NCKU, Tainan
System Arch. Requirements
1. Shared Virtual Memory
2. Cache Coherency Domains
3. Flat Addressing
4. Endianess
5. Signaling and Synchronization
6. Atomic Memory Operations
7. HSA System Timestamp
8. User Mode Queuing
9. Architected Queuing Language (AQL)
10. Agent Scheduling
11. Kernel Agent Context Switching
12. IEEE754-2008 Floating Point Exceptions
13. Kernel Agent Hardware Debug Infrastructure
14. HSA Platform Topology Discovery
15. Images
44
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
HSA Design (2015-04-30) @ NCKU, Tainan
Images
 A graphics feature that can
sometimes be useful in data-
parallel computing
 Used to store one-, two-, or
three-dimensional images
 predefined image formats
 Image memory is a special kind
of memory access
 Dedicated hardware to speed
up image operations.
45
 The OpenCL™ Specification
Version 2.1:
5.3 Image Objects
https://www.khronos.org/registry/cl/specs/opencl-2.1.pdf
Image Channel Type
Image Channel Order
Image Geometry
Image Data Size
Image Handle
(hsa_ext_image_handle_t)
Image Data
(1D, 2D, or 3D images)
Global Segment
Image
Data
Image Descriptor
HSA Kernel Agent
HSA Runtime
Image Object
rdimage
ldimage
stimage
Jay Wang, Taiwan, 2015.04
HSA Design (2015-04-30) @ NCKU, Tainan
Summary
 Programming model issues
 HSA Intermediate Language (HSAIL) + HSA Runtime
 Architected Queuing Language (AQL) + Signaling
 Debug infrastructure
 Communication overhead issues
 Cache coherent shared virtual memory (CC-SVM)
 Architected Queuing Language (AQL) for user mode queuing
 Hardware-assisted signaling and atomic operations for synchronization
46
CPUs GPU DSP
...
HSAIL
Unified Coherent Memory
HSA Runtime
AQL
Jay Wang, Taiwan, 2015.04
HSA Design (2015-04-30) @ NCKU, Tainan
HSA Kernel Agent
CPU
HSA Runtime
HSA
Application
(HSA Agent)
User Application
( CPU Code + HSAIL Kernel Code )
HSA Kernel Agent
GPU
HSA
Kernel Mode
Driver
Host CPU
HSA Kernel Agent
DSP
HSA User Mode Queuing (Architected Queuing Language)
+
HSA Signaling
Jay Wang, Taiwan, 2015.04
HSA
Finalizers
HSA Kernel Agent Designer
Parallel Application
Designer
HSA
System Software
Designer
HSA
System Architecture
Designer
Language Runtime
(ex: OpenCL runtime)
47
媽~
我在這!
 OpenCL Standards ( https://www.khronos.org/opencl/ )
 HSA Standards ( http://www.hsafoundation.com/html/HSA_Library.htm )
 HSA Platform System Architecture Specification v1.0
 HSA Programmer Reference Manual Specification v1.0
 HSA Runtime Specification v1.0
 HSA Foundation Github ( https://github.com/HSAFoundation )
HSA Design (2015-04-30) @ NCKU, Tainan
Taiwan HSA Group @ Facebook
48

Más contenido relacionado

La actualidad más candente

OpenStack概要 ~仮想ネットワーク~
OpenStack概要 ~仮想ネットワーク~OpenStack概要 ~仮想ネットワーク~
OpenStack概要 ~仮想ネットワーク~Masaya Aoyama
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overviewRISC-V International
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialDalton Valadares
 
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?Kuniyasu Suzaki
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxSamsung Open Source Group
 
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64 BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64 Linaro
 
Rust で RTOS を考える
Rust で RTOS を考えるRust で RTOS を考える
Rust で RTOS を考えるryuz88
 
Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)RuggedBoardGroup
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewLinaro
 
Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Linaro
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013Wave Digitech
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDanny Abukalam
 

La actualidad más candente (20)

It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
OpenStack概要 ~仮想ネットワーク~
OpenStack概要 ~仮想ネットワーク~OpenStack概要 ~仮想ネットワーク~
OpenStack概要 ~仮想ネットワーク~
 
Chips alliance omni xtend overview
Chips alliance omni xtend overviewChips alliance omni xtend overview
Chips alliance omni xtend overview
 
Linux Audio Drivers. ALSA
Linux Audio Drivers. ALSALinux Audio Drivers. ALSA
Linux Audio Drivers. ALSA
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
U-Boot - An universal bootloader
U-Boot - An universal bootloader U-Boot - An universal bootloader
U-Boot - An universal bootloader
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build Tutorial
 
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
 
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64 BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
 
Rust で RTOS を考える
Rust で RTOS を考えるRust で RTOS を考える
Rust で RTOS を考える
 
Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting Review
 
Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8
 
Cuda
CudaCuda
Cuda
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 
Bootloaders
BootloadersBootloaders
Bootloaders
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013
 
Dave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMUDave Gilbert - KVM and QEMU
Dave Gilbert - KVM and QEMU
 

Destacado

20150501南園
20150501南園20150501南園
20150501南園健正 林
 
Task 4 final: Consultants-E E-Moderating Course Oct 2015
Task 4 final: Consultants-E E-Moderating Course Oct 2015Task 4 final: Consultants-E E-Moderating Course Oct 2015
Task 4 final: Consultants-E E-Moderating Course Oct 2015brendawm
 
No Place Left Session Seven
No Place Left Session SevenNo Place Left Session Seven
No Place Left Session SevenGrace Canberra
 
No Place Left Session Six - Acts 15
No Place Left Session Six - Acts 15No Place Left Session Six - Acts 15
No Place Left Session Six - Acts 15Grace Canberra
 
1 John Series Sunday 22nd February
1 John Series Sunday 22nd February1 John Series Sunday 22nd February
1 John Series Sunday 22nd FebruaryGrace Canberra
 
ABP Electronics
ABP ElectronicsABP Electronics
ABP ElectronicsJustin Yi
 
If It's The Lords Will
If It's The Lords WillIf It's The Lords Will
If It's The Lords WillGrace Canberra
 
Risky Living Session Five - Sin & Judgment
Risky Living Session Five - Sin & JudgmentRisky Living Session Five - Sin & Judgment
Risky Living Session Five - Sin & JudgmentGrace Canberra
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1健正 林
 
台南校區多功能會館 151002
台南校區多功能會館 151002台南校區多功能會館 151002
台南校區多功能會館 151002健正 林
 

Destacado (20)

Web design
Web designWeb design
Web design
 
20150501南園
20150501南園20150501南園
20150501南園
 
Task 4 final: Consultants-E E-Moderating Course Oct 2015
Task 4 final: Consultants-E E-Moderating Course Oct 2015Task 4 final: Consultants-E E-Moderating Course Oct 2015
Task 4 final: Consultants-E E-Moderating Course Oct 2015
 
No Place Left Session Seven
No Place Left Session SevenNo Place Left Session Seven
No Place Left Session Seven
 
No Place Left Session Six - Acts 15
No Place Left Session Six - Acts 15No Place Left Session Six - Acts 15
No Place Left Session Six - Acts 15
 
SMTULSA Social Business Conference Sponsorship Kit
SMTULSA Social Business Conference Sponsorship KitSMTULSA Social Business Conference Sponsorship Kit
SMTULSA Social Business Conference Sponsorship Kit
 
Boats and Business
Boats and BusinessBoats and Business
Boats and Business
 
1 John Series Sunday 22nd February
1 John Series Sunday 22nd February1 John Series Sunday 22nd February
1 John Series Sunday 22nd February
 
The Tongue
The TongueThe Tongue
The Tongue
 
ABP Electronics
ABP ElectronicsABP Electronics
ABP Electronics
 
WAA PCB
WAA PCBWAA PCB
WAA PCB
 
If It's The Lords Will
If It's The Lords WillIf It's The Lords Will
If It's The Lords Will
 
Something I Can Use
Something I Can UseSomething I Can Use
Something I Can Use
 
COUFEST Rocks Social Media! How Bands can Rock Social Media
COUFEST Rocks Social Media! How Bands can Rock Social MediaCOUFEST Rocks Social Media! How Bands can Rock Social Media
COUFEST Rocks Social Media! How Bands can Rock Social Media
 
Risky Living Session Five - Sin & Judgment
Risky Living Session Five - Sin & JudgmentRisky Living Session Five - Sin & Judgment
Risky Living Session Five - Sin & Judgment
 
2014 cheer constitution
2014 cheer constitution2014 cheer constitution
2014 cheer constitution
 
Tale of Two Men
Tale of Two MenTale of Two Men
Tale of Two Men
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1
 
Dealing With Anxiety at Work
Dealing With Anxiety at WorkDealing With Anxiety at Work
Dealing With Anxiety at Work
 
台南校區多功能會館 151002
台南校區多功能會館 151002台南校區多功能會館 151002
台南校區多功能會館 151002
 

Similar a HSA Design (2015-04-30)

助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」Shinya Takamaeda-Y
 
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Michelle Holley
 
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...NECST Lab @ Politecnico di Milano
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Smart Data Slides: Emerging Hardware Choices for Modern AI Data Management
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementSmart Data Slides: Emerging Hardware Choices for Modern AI Data Management
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementDATAVERSITY
 
Software used in Electronics and Communication
Software used in Electronics and CommunicationSoftware used in Electronics and Communication
Software used in Electronics and Communicationashishsoni1505
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Jason Dai
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIndrajit Poddar
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the CoreC4Media
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
Intel(r) Quick Assist Technology Overview
Intel(r) Quick Assist Technology OverviewIntel(r) Quick Assist Technology Overview
Intel(r) Quick Assist Technology OverviewMichelle Holley
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 

Similar a HSA Design (2015-04-30) (20)

助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
 
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
 
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Smart Data Slides: Emerging Hardware Choices for Modern AI Data Management
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementSmart Data Slides: Emerging Hardware Choices for Modern AI Data Management
Smart Data Slides: Emerging Hardware Choices for Modern AI Data Management
 
Software used in Electronics and Communication
Software used in Electronics and CommunicationSoftware used in Electronics and Communication
Software used in Electronics and Communication
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Introduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI PlatformIntroduction to PowerAI - The Enterprise AI Platform
Introduction to PowerAI - The Enterprise AI Platform
 
How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Intel(r) Quick Assist Technology Overview
Intel(r) Quick Assist Technology OverviewIntel(r) Quick Assist Technology Overview
Intel(r) Quick Assist Technology Overview
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 

Último

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 

Último (20)

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 

HSA Design (2015-04-30)

  • 1. ITRI Industrial Technology Research Institute Heterogeneous System Architecture (HSA) Design 王振傑 (Jay Wang) 嵌入式系統與晶片技術組 -系統架構設計部 (D200) 資訊與通訊研究所 (ICL) ccwang.jay@itri.org.tw 2015-04-30
  • 2. 2 嵌入式系統硬體技術部 (D100) 系統架構設計部 (D200) 嵌入式系統軟體技術部 (D300) 智慧電子產業推動部 (D400) 系統整合與應用部 (D500) 嵌入式系統與晶片技術組 Division for Embedded System and SoC Technology 工業技術研究院 資訊與通訊研究所
  • 3. HSA Design (2015-04-30) @ NCKU, Tainan What is HSA? 3 An intelligent computing architecture that enables CPU, GPU and other processors to work in harmony on a single piece of silicon by seamlessly moving the right tasks to the best suited processing element.
  • 4. HSA Design (2015-04-30) @ NCKU, Tainan Three Eras of Processor Performance 4 ? Single-thread Performance Time we are here Enabled by:  Moore’s Observation  Voltage Scaling  Micro-Architecture Constrained by:  Power  Complexity Single-Core Era ModernApplication Performance Time (Data-parallel exploitation) we are here Heterogeneous Systems Era Enabled by:  Moore’s Observation  Abundant data parallelism  Power efficient data parallel processing (GPUs) Constrained by:  Programming models  Communication overheads Throughput Performance Time (# of processors) we are here Enabled by:  Moore’s Observation  Desire for Throughput  20 years of SMP arch Constrained by:  Power  Parallel SW availability  Scalability Multi-Core Era Assembly  C/C++  Java … pthreads  OpenMP / TBB … Shader  CUDA OpenCL  C++ and Java SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
  • 5. HSA Design (2015-04-30) @ NCKU, Tainan HSA Foundation 5  Founded in June 2012  www.hsafoundation.com  Developing a new platform for heterogeneous systems  Launched the official v1.0 specification set in March 2015
  • 6. HSA Design (2015-04-30) @ NCKU, Tainan HSA Foundation Members (April 2015) 6 Founders Promoters Contributors Academics Supporters
  • 7. HSA Design (2015-04-30) @ NCKU, Tainan HSA Platform Model 7 In HSA system, a regular device is called an HSA agent, and if the HSA agent can run kernels then it is also an HSA kernel agent. Compute Unit (CU) Compute Unit (CU) Compute Unit (CU) Compute Unit (CU) Compute Unit (CU) Lane (Processing Element) Host CPU (OS, HSA runtime) HSA Kernel Agent Compute Unit (CU) Compute Unit (CU) Wavefront Size (A power of 2 in the range from 1 to 256 inclusive) HSA Agent SIMD Data Parallel Workloads Serial and Task Parallel Workloads Jay Wang, Taiwan, 2015.03
  • 8. HSA Design (2015-04-30) @ NCKU, Tainan HSA Intermediate Language (HSAIL) 8 The HSA Foundation members are building a heterogeneous compute software ecosystem built on open, royalty-free industry standards and open-source software: the HSA runtimes and compilation tools are based on open-source technologies such as LLVM and GCC. ( https://github.com/HSAFoundation ) Company D GPU ... Other Hardware Accelerator Company B CPUs Finalizer (Company A - CPU) Finalizer (Company B - CPU) Finalizer (Company C - GPU) Finalizer (Company D - GPU) Finalizer (Company E - DSP) Finalizer (...) OpenMP DSL Virtual Parallel ISA CLOC – Compile OpenCL kernels to HSAIL HSA Intermediate Language (HSAIL) OpenCL C++AMP Java Company A CPUs Company C GPU Company E DSP Parallel Programming Languages HSA Runtime Libraries Jay Wang, Taiwan, 2014.10
  • 9. HSA Design (2015-04-30) @ NCKU, Tainan HSAIL Programming Model 9
  • 10. HSA Design (2015-04-30) @ NCKU, Tainan HSA Runtime Stack 10 HSA Kernel Agent CPU HSA Runtime HSA Application (HSA Agent) Language Runtime (ex: OpenCL runtime) User Application ( CPU Code + HSAIL Kernel Code ) HSA Kernel Agent GPU HSA Kernel Mode Driver Host CPU HSA Kernel Agent DSP HSA User Mode Queuing (Architected Queuing Language) + HSA Signaling Jay Wang, Taiwan, 2015.04 Target ISA HSA Finalizers
  • 11. HSA Design (2015-04-30) @ NCKU, Tainan Kernel Execution 11
  • 12. HSA Design (2015-04-30) @ NCKU, Tainan HSA Memory Consistency Model (Relaxed Model) Second Operation ld_rlx st_rlx atomic_rlx atomicNoRet_rlx atomic_acq atomicNoRet_acq fence_acq atomic_rel atomicNoRet_rel fence_rel atomic_ar atomicNoRet_ar fence_ar First Operation ld_rlx or st_rlx yes yes yes yes no no atomic_rlx atomicNoRet_rlx yes yes yes no no no atomic_acq atomicNoRet_acq fence_acq no no no no no no atomic_rel atomicNoRet_rel yes yes no no no no fence_rel yes no no no no no atomic_ar atomicNoRet_ar fence_ar no no no no no no 12 relaxed ; ….. acquire ; ….. release ; ….. acq_rel ; …..
  • 13. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 13 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 14. HSA Design (2015-04-30) @ NCKU, Tainan Legacy GPU Compute  Multiple memory pools and address spaces  Data copies before/after GPU compute 14 System Memory GPU Memory 1 23 Host CPUs GPU Virtual Memory #1 Virtual Memory #2 (HSA Agent) (HSA Kernel Agent) Jay Wang, Taiwan, 2015.04
  • 15. HSA Design (2015-04-30) @ NCKU, Tainan Host CPUs GPU(HSA Agent) (HSA Kernel Agent) Shared Virtual Memory System Memory GPU Memory Jay Wang, Taiwan, 2015.04 Shared Virtual Memory (HSA) 15 32-bit HSA System (32 bits VA) 64-bit HSA System (≥ 48 bits VA) IOMMU OS Page Table MMU
  • 16. HSA Design (2015-04-30) @ NCKU, Tainan Group Segments within Flat Address Space Global Segment within Flat Address Space Private Segments within Flat Address Space Kernel Dispatch Grid Work-Group Work-Group WI WI WI Private Segment WI WI WI Private Segment Group Segment Group Segment Global Segment Flat Address SpaceHSA Agent $s0 $s1 $s2 $s3 $s4 $s5 $s6 $s7 $s124 $s125 $s126 $s127 32-bit Registers ( s registers) $c0 $c1 $c2 $c3 $c4 $c5 $c6 $c7 $d0 $d1 $d2 $d3 $d62 $d63 64-bit Registers ( d registers) $q0 $q31 $q1 128-bit Registers ( q registers) 1-bit Control Registers ( c registers) Local Registers per Work-Item Jay Wang, Taiwan, 2014.10 HSA Memory Hierarchy 16 1) Global 2) Group 3) Private 4) Kernarg 5) Readonly 6) Spill 7) Arg Virtual Address Range Reservation (System Memory or Device Local Memory)
  • 17. HSA Design (2015-04-30) @ NCKU, Tainan Group Segments within Flat Address Space Global Segment within Flat Address Space Private Segments within Flat Address Space Kernel Dispatch Grid Work-Group Work-Group WI WI WI Private Segment WI WI WI Private Segment Group Segment Group Segment Global Segment Flat Address Space HSA Kernel Agent Host CPUs Jay Wang, Taiwan, 2015.04 Cache Coherency Domains 17 System Memory Cache Cache Cache Coherency
  • 18. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 18 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 19. HSA Design (2015-04-30) @ NCKU, Tainan Signaling and Synchronization  The required mechanisms for HSAIL and the HSA runtime are:  Allocate/Destroy an HSA signal  Read the current HSA signal value  Wait on an HSA signal to meet a specified condition (with a maximum wait duration requested)  Send an HSA signal value  Atomic read-modify-write an HSA signal value 19 sem_init() sem_wait() sem_post() sem_destroy() pthread_mutex_init() pthread_mutex_lock() pthread_mutex_unlock() pthread_mutex_destroy() Signal Handle (hsa_signal_t) Signal Value (hsa_signal_value_t) HSA Kernel Agent Host CPU HSA Runtime APIs HSAIL Instructions Implementation- defined data Sig32 or Sig64 Jay Wang, Taiwan, 2015.04
  • 20. HSA Design (2015-04-30) @ NCKU, Tainan HSA Runtime APIs for Signaling 20 HSA Runtime APIs ( for HSA application ) • hsa_signal_create ( ) • hsa_signal_destroy ( ) • hsa_signal_load_{acquire, relaxed} ( ) • hsa_signal_store_{relaxed, release} ( ) • hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_add_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_and_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_or_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( ) • hsa_signal_wait_{acquire, relaxed} ( ) HSA Runtime Programmer’s Reference Manual (v1.0) 2.4 Signals
  • 21. HSA Design (2015-04-30) @ NCKU, Tainan HSAIL Instructions for Signaling 21 HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG) (v1.0) 6.8 Notification (signal) Instructions
  • 22. HSA Design (2015-04-30) @ NCKU, Tainan Atomic Memory Operations  HSA requires the following standard atomic memory operations to be supported by HSA Kernel Agents (other HSA Agents only need to support the subset of these operations required by their role in the system):  Load from memory  Store to memory  Fetch from memory, apply logic operation (bitwise AND/OR/XOR) with one addition operand, and store back.  Fetch from memory, apply integer arithmetic operation (add, subtract, increment, decrement, minimum, maximum) with one addition operand, and store back.  Exchange memory location with operand.  Compare-and-swap (CAS); load memory location, compare with first operand, if equal than store second operand back to memory location. 22
  • 23. HSA Design (2015-04-30) @ NCKU, Tainan Timestamp (64-bit) Host CPU HSA Runtime APIs HSAIL Clock Instruction Timestamp Frequency (1~400MHz) HSA Runtime HSA Kernel Agent Jay Wang, Taiwan, 2015.04 HSA System Timestamp  The HSA system provide for a low overhead mechanism of determining the passing of time.  A system timestamp is required that can be read from HSAIL or through the HSA runtime.  It is also possible to determine the system timestamp frequency through the HSA runtime. 23
  • 24. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 24 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 25. HSA Design (2015-04-30) @ NCKU, Tainan User Model Queuing  Multiple user-level command queues  Runtime-allocated  Architected Queuing Language (AQL) 25 HSA Kernel Agent K A CPU A HSA Runtime HSA Application (HSA Agent) CPU Language Runtime (ex: OpenCL runtime) User Application HSA Finalizers HSA Kernel Agent GPU HSA Kernel Mode Driver CPU K A A Jay Wang, Taiwan, 2015.04 K AQL Kernel Dispatch Queue A AQL Agent Dispatch Queue
  • 26. HSA Design (2015-04-30) @ NCKU, Tainan HSA Packet Processor 26 type features base_address doorbell_signal 0x00 0x04 0x08 0x10 0x0C 0x14 size0x18 reserved (must be 0)0x1C write_index (64-bit)read_index (64-bit) base_address + ( (read_index%size) * AQL packet size ) base_address + ( (write_index%size) * AQL packet size ) Support single or multiple producers Support KERNEL_DISPATCH and/or AGENT_DISPATCH packet AQL Packet (64 Bytes) User Mode Queue Structure (hsa_queue_t) Ring Buffer id 0x20 0x24 Jay Wang, Taiwan, 2015.03
  • 27. HSA Design (2015-04-30) @ NCKU, Tainan HSA Kernel Agent K A A HSA Runtime HSA Application (HSA Agent) CPU Language Runtime (ex: OpenCL runtime) User Application GPU Jay Wang, Taiwan, 2015.04 User Mode Queue Operations HSA Runtime APIs ( for HSA application ) • hsa_queue_create ( ) • hsa_soft_queue_create ( ) • hsa_queue_destroy ( ) • hsa_queue_inactivate ( ) • hsa_queue_load_write_index_{acquire, relaxed} ( ) • hsa_queue_store_write_index_{relaxed, release} ( ) • hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( ) • hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( ) • hsa_queue_load_read_index_{acquire, relaxed} ( ) • hsa_queue_store_read_index_{relaxed, release} ( ) 27 HSAIL Instructions ( for HSA Kernel Agent) • queueid_u32 dest • queueptr_uLength dest • ldqueuewriteindex_segment_order_u64 dest, address • stqueuewriteindex_segment_order_u64 address, src • casqueuewriteindex_segment_order_u64 dest, address, src0, src1 • addqueuewriteindex_segment_order_u64 dest, address, src • ldqueuereadindex_segment_order_u64 dest, address • stqueuereadindex_segment_order_u64 address, src
  • 28. HSA Design (2015-04-30) @ NCKU, Tainan 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C header workgroup_size_x kernel_object kernarg_address dimensions (2-bit) workgroup_size_y workgroup_size_z grid_size_x reserved grid_size_y grid_size_z private_segment_size_bytes group_segment_size_bytes reserved completion_signal Kernel Dispatch Packet 031 1516 Jay Wang, Taiwan, 2015.03 header return_address arg0 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C type reserved 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C arg1 arg2 arg3 reserved completion_signal Agent Dispatch Packet 031 1516 Jay Wang, Taiwan, 2015.03 header dep_signal0 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C reserved reserved 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C reserved completion_signal dep_signal1 dep_signal2 dep_signal3 dep_signal4 Barrier-AND / Barrier-OR Packet 031 1516 Jay Wang, Taiwan, 2015.03 AQL Packet Types 28  HSA signaling object handle used to indicate completion of the job. format (8-bit) barrier (1-bit) acquire_fence_scope (2-bit) release_fence_scope (2-bit) reserved (3-bit) 0101112 9 8 71315 AQL_FORMAT 0 VENDOR_SPECIFIC 1 INVALID 2 KERNEL_DISPATCH 3 BARRIER_AND 4 AGENT_DISPATCH 5 BARRIER_OR Jay Wang, Taiwan, 2015.03
  • 29. HSA Design (2015-04-30) @ NCKU, Tainan 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C header workgroup_size_x kernel_object kernarg_address dimensions (2-bit) workgroup_size_y workgroup_size_z grid_size_x reserved grid_size_y grid_size_z private_segment_size_bytes group_segment_size_bytes reserved completion_signal 031 1516 Jay Wang, Taiwan, 2015.03 Kernel Dispatch Packet 29 Work-group Size Grid Size Segment Size Pointer to the Kernel Pointer to the arguments
  • 30. HSA Design (2015-04-30) @ NCKU, Tainan header return_address arg0 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C type reserved 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C arg1 arg2 arg3 reserved completion_signal 031 1516 Jay Wang, Taiwan, 2015.03 Agent Dispatch Packet 30 64-bit direct or indirect arguments Pointer to location to store the function return value(s) in The function to be performed by the destination agent. The function codes are application defined.
  • 31. HSA Design (2015-04-30) @ NCKU, Tainan header dep_signal0 0x00 0x04 0x08 0x10 0x0C 0x14 0x18 0x1C reserved reserved 0x20 0x24 0x28 0x30 0x2C 0x34 0x38 0x3C reserved completion_signal dep_signal1 dep_signal2 dep_signal3 dep_signal4 031 1516 Jay Wang, Taiwan, 2015.03 Barrier-AND / Barrier-OR Packet  The Barrier packet defines dependencies for the HSA Packet Processor to monitor.  The HSA Packet Processor will not launch any further packets until the Barrier- AND / Barrier-OR packet is complete. 31 Handles for dependent signaling objects to be evaluated by the packet processor.
  • 32. HSA Design (2015-04-30) @ NCKU, Tainan Packet Process Flow  All preceding packets in the queue must have completed their launch phase.  If the barrier bit in the packet header is set than all preceding packets in the queue must have completed.  An acquire memory fence is applied for Kernel/Agent Dispatch packets before the packet enters the active phase.  Kernel Dispatch packets and Agent Dispatch packets execute on the Kernel Agent/Agent, and the active phase ends when the task completes.  Barrier-AND and Barrier-OR packets remain in the active phase until their condition is met.  If the packet is a Barrier-AND or Barrier-OR packet then an acquire memory fence is applied as the first step.  After execution of the acquire fence, the memory release fence is applied.  After the memory release fence completes, the signal specified by the completion_signal field in the AQL packet is signaled with a decrementing atomic operation. 32 Launch Phase Active Phase Completion Phase
  • 33. HSA Design (2015-04-30) @ NCKU, Tainan Barrier-bit Example 33 completionSignal AQL Packet Barrier bit = 1 DequeueEnqueue LaunchPhase ActivePhase CompletionPhase Jay Wang, Taiwan, 2015.04 If barrier bit is set, then processing of the packet will only begin when all preceding packets are complete.
  • 34. HSA Design (2015-04-30) @ NCKU, Tainan Barrier-AND Packet Example 34
  • 35. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 35 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 36. HSA Design (2015-04-30) @ NCKU, Tainan Agent Scheduling 36 AQL packet (Agent/Kernel Dispatch packet or Barrier-AND/OR packet) Agent Scheduling AQL Queue AQL Queue AQL Queue AQL Queue Non-HSA Task Pool AQL Queue Application #1 Application #2 Application #3 HSA (Kernel) Agent Poke! (1) Task execution completed (3) Barrier packet completed Agt Agt Agt Agt Agt Agt Agt Jay Wang, Taiwan, 2015.04 (2) New AQL packet submission
  • 37. HSA Design (2015-04-30) @ NCKU, Tainan Kernel Agent Context Switching 37 AQL Queue AQL Queue AQL Queue AQL Queue Non-HSA Task Pool AQL Queue #1 #2 #3 HSA Agent Scheduling Compute Unit (CU) Compute Unit (CU) Compute Unit (CU) HSA Kernel Agent Context Switching Kernel Program Kernel Program Kernel Program WG WG WG 1. Switch ( Required ) 2. Preempt ( Required as soon as possible ) 3. Terminate and context reset (Terminated as fast as possible) Jay Wang, Taiwan, 2015.04
  • 38. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 38 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 39. HSA Design (2015-04-30) @ NCKU, Tainan FP Exception Reporting  A Kernel Agent shall report certain defined exceptions related to the execution of the HSAIL code to the HSA Runtime. 39 Lane 0 Lane 1 Lane 2 Lane (N-1) Lane 3 Work Item Work Item Work Item Work Item Work Item Lane 4 Work Item Work-Group 0 Work-Group 2Work-Group 1 Work-Group X avefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront Y Work-Group 1 Compute Unit (CU) PC HSA Kernel Agent Wavefront 2 SIMD (Single Instruction, Multiple Data) style HSA Runtime Host CPU Exception Module Control Directive enablebreakexceptions #EC Signaling Exception Code Description Invalid operatoin Divide-by-zero Overflow Underflow Inexact 0 1 2 3 4 IEEE754-2008 Jay Wang, Taiwan, 2015.04 enabledetectexceptions #EC DETECT Policy BREAK Policy BreakEn bits DetectEn bits Status bits Exception Handler HSAIL Instruction cleardetectexcept_u32 getdetectexcept_u32 setdetectexcept_u32
  • 40. HSA Design (2015-04-30) @ NCKU, Tainan Debug Infrastructure  The Kernel Agent shall provide mechanisms to allow system software and some select application software (for example, debuggers and profilers) to set breakpoints and collect throughput information for profiling. 40 Lane 0 Lane 1 Lane 2 Lane (N-1) Lane 3 Work Item Work Item Work Item Work Item Work Item Lane 4 Work Item Work-Group 0 Work-Group 2Work-Group 1 Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 Grid Work-Group 1 Compute Unit PC HSA Kernel Agent Wavefront 2 SIMD (Single Instruction, Multiple Data) style Host CPU (HSA Agent) Debuggers HSA Kernel Agent Debug Inteface Profilers Debug Module Conditional Breakpoint Memory Breakpoint Jay Wang, Taiwan, 2015.04 Instruction Breakpoint
  • 41. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 41 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 42. HSA Design (2015-04-30) @ NCKU, Tainan Execution Environment 42 You have 2 OpenCL platform(s) ---------------------------------------------- Platform[0].Name = NVIDIA CUDA Platform[0].Vendor = NVIDIA Corporation Platform[0].Version = OpenCL 1.1 CUDA 4.2.1 Platform[0].Profile = FULL_PROFILE ---------------------------------------------- Platform[1].Name = Intel(R) OpenCL Platform[1].Vendor = Intel(R) Corporation Platform[1].Version = OpenCL 1.2 Platform[1].Profile = FULL_PROFILE ---------------------------------------------- Platform[0] has 1 device(s) ---------------------------------------------- Device[0].Type = CL_DEVICE_TYPE_GPU Device[0].Name = GeForce GT 625 Device[0].Vendor = NVIDIA Corporation Device[0].Version = OpenCL 1.1 CUDA Device[0].DriverVersion = 320.49 Device[0].Profile = FULL_PROFILE Device[0].OpenCL_C = OpenCL C 1.1 Device[0].MaxComputeUnits = 1 Device[0].MaxWiDimensions = 3 Device[0].MaxWiSize = (1024,1024,64) Device[0].MaxWgSize = 1024 Device[0].MaxClkFrequency = 1747 MHz Device[0].AddrSpaceSize = 32 bits Platform[1] has 1 device(s) ---------------------------------------------- Device[0].Type = CL_DEVICE_TYPE_CPU Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz Device[0].Vendor = Intel(R) Corporation Device[0].Version = OpenCL 1.2 (Build 80752) Device[0].DriverVersion = 3.0.1.15216 Device[0].Profile = FULL_PROFILE Device[0].OpenCL_C = OpenCL C 1.2 Device[0].MaxComputeUnits = 4 Device[0].MaxWiDimensions = 3 Device[0].MaxWiSize = (1024,1024,1024) Device[0].MaxWgSize = 1024 Device[0].MaxClkFrequency = 3100 MHz Device[0].AddrSpaceSize = 32 bits OpenCL APIs
  • 43. HSA Design (2015-04-30) @ NCKU, Tainan HSA Platform Topology Discovery  HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O 43 HSA Platform Node 2 Node 0 Add-In Board (optional) HSA discrete GPU System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU GPU H-CU H-CU H-CU CPU Core Core Core Device Local Memory coherent non-coherent Mem Mem HSA MMU SBIOS UEFI HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent Mem Node 1 PCIe BridgePCIE System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU CPU Core Core Core Mem HSA MMU Add-In Board (optional) HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent PCIE Mem VBIOS UEFI GOP SocketInterconnect Node 3 PCIE Node 4 PCIE VBIOS UEFI GOP
  • 44. HSA Design (2015-04-30) @ NCKU, Tainan System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Endianess 5. Signaling and Synchronization 6. Atomic Memory Operations 7. HSA System Timestamp 8. User Mode Queuing 9. Architected Queuing Language (AQL) 10. Agent Scheduling 11. Kernel Agent Context Switching 12. IEEE754-2008 Floating Point Exceptions 13. Kernel Agent Hardware Debug Infrastructure 14. HSA Platform Topology Discovery 15. Images 44 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION, VERSION 1.0 FINAL (2015-03-16)
  • 45. HSA Design (2015-04-30) @ NCKU, Tainan Images  A graphics feature that can sometimes be useful in data- parallel computing  Used to store one-, two-, or three-dimensional images  predefined image formats  Image memory is a special kind of memory access  Dedicated hardware to speed up image operations. 45  The OpenCL™ Specification Version 2.1: 5.3 Image Objects https://www.khronos.org/registry/cl/specs/opencl-2.1.pdf Image Channel Type Image Channel Order Image Geometry Image Data Size Image Handle (hsa_ext_image_handle_t) Image Data (1D, 2D, or 3D images) Global Segment Image Data Image Descriptor HSA Kernel Agent HSA Runtime Image Object rdimage ldimage stimage Jay Wang, Taiwan, 2015.04
  • 46. HSA Design (2015-04-30) @ NCKU, Tainan Summary  Programming model issues  HSA Intermediate Language (HSAIL) + HSA Runtime  Architected Queuing Language (AQL) + Signaling  Debug infrastructure  Communication overhead issues  Cache coherent shared virtual memory (CC-SVM)  Architected Queuing Language (AQL) for user mode queuing  Hardware-assisted signaling and atomic operations for synchronization 46 CPUs GPU DSP ... HSAIL Unified Coherent Memory HSA Runtime AQL Jay Wang, Taiwan, 2015.04
  • 47. HSA Design (2015-04-30) @ NCKU, Tainan HSA Kernel Agent CPU HSA Runtime HSA Application (HSA Agent) User Application ( CPU Code + HSAIL Kernel Code ) HSA Kernel Agent GPU HSA Kernel Mode Driver Host CPU HSA Kernel Agent DSP HSA User Mode Queuing (Architected Queuing Language) + HSA Signaling Jay Wang, Taiwan, 2015.04 HSA Finalizers HSA Kernel Agent Designer Parallel Application Designer HSA System Software Designer HSA System Architecture Designer Language Runtime (ex: OpenCL runtime) 47 媽~ 我在這!  OpenCL Standards ( https://www.khronos.org/opencl/ )  HSA Standards ( http://www.hsafoundation.com/html/HSA_Library.htm )  HSA Platform System Architecture Specification v1.0  HSA Programmer Reference Manual Specification v1.0  HSA Runtime Specification v1.0  HSA Foundation Github ( https://github.com/HSAFoundation )
  • 48. HSA Design (2015-04-30) @ NCKU, Tainan Taiwan HSA Group @ Facebook 48