SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
THE	
  HSA	
  SYSTEM	
  ARCHITECTURE	
  
REQUIREMENTS	
  –	
  AN	
  OVERVIEW	
  
PAUL	
  BLINZER,	
  FELLOW,	
  HSA	
  SYSTEM	
  SOFTWARE,	
  AMD	
  
SYSTEM	
  ARCHITECTURE	
  WORKGROUP	
  CHAIR,	
  HSA	
  
FOUNDATION	
  

1	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
AGENDA	
  
! 

What	
  is	
  the	
  HSA	
  FoundaKon?	
  

! 

The	
  System	
  Architecture	
  Workgroup	
  and	
  its	
  goals	
  

! 

What	
  defines	
  HSA	
  plaVorms	
  and	
  components?	
  

! 

The	
  Shared	
  Virtual	
  Memory	
  requirements	
  

! 

The	
  HSA	
  Memory	
  Model	
  Requirements	
  

! 

The	
  HSA	
  Queuing	
  Architecture	
  

! 

Some	
  other	
  requirements	
  set	
  by	
  the	
  System	
  Architecture	
  specificaKon	
  

! 

Where	
  to	
  find	
  further	
  informaKon	
  

! 

Q	
  &	
  A	
  

2	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
WHAT	
  IS	
  THE	
  HSA	
  FOUNDATION?	
  
"  This	
  is	
  the	
  short	
  version…	
  
! 

The	
  HSA	
  FoundaKon	
  is	
  a	
  not-­‐for-­‐profit	
  consorKum	
  of	
  SOC	
  and	
  SOC	
  IP	
  
vendors,	
  OEMs,	
  academia,	
  OSVs	
  and	
  ISVs	
  defining	
  a	
  consistent	
  
heterogeneous	
  plaVorm	
  architecture	
  to	
  make	
  it	
  dramaKcally	
  easier	
  
to	
  program	
  heterogeneous	
  parallel	
  devices	
  
! 

! 

It	
  spans	
  mulKple	
  host	
  plaVorm	
  architectures	
  and	
  programmable	
  data	
  parallel	
  components	
  (e.g.	
  
CPU:	
  x86,	
  ARM,	
  MIPS,	
  …	
  device	
  types:	
  GPUs,	
  DSPs,	
  …)	
  to	
  work	
  collaboraKvely	
  within	
  the	
  same	
  
HSA	
  system	
  architecture	
  
It	
  defines	
  a	
  set	
  of	
  specificaKons	
  that	
  define	
  HW	
  &	
  SW	
  plaVorm	
  requirements	
  to	
  enable	
  
applicaKons	
  to	
  target	
  the	
  feature	
  set	
  from	
  high	
  level	
  languages	
  and	
  APIs	
  
! 

! 

! 

It’s	
  not	
  a	
  replacement	
  to	
  e.g.	
  OpenCL	
  but	
  complementary	
  to	
  it,	
  defining	
  the	
  system	
  level	
  
properKes	
  “below	
  the	
  API”,	
  leveraged	
  by	
  applicaKon-­‐	
  and	
  system	
  soiware	
  

Conformance

The	
  System	
  Architecture	
  specificaKon	
  defines	
  the	
  required	
  component	
  and	
  plaVorm	
  features	
  for	
  
HSA	
  compliant	
  components	
  

This	
  presentaKon	
  is	
  an	
  overview	
  of	
  the	
  current	
  System	
  Architecture	
  
definiKons	
  and	
  does	
  not	
  represent	
  a	
  complete	
  or	
  “final”	
  state	
  
! 

Tools

that	
  one	
  is	
  the	
  specificaKon	
  itself	
  when	
  available	
  ☺	
  

3	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  

System
Runtime
Specification

Programmer’s
Reference
Manual
Platform
(Software)
System
Architecture
Specification
THE	
  SYSTEM	
  ARCHITECTURE	
  WORKGROUP	
  OF	
  THE	
  HSA	
  FOUNDATION	
  
" 

Who	
  ParKcipates	
  and	
  what	
  are	
  the	
  goals?	
  	
  

"  The	
  workgroup	
  membership	
  spans	
  a	
  wide	
  variety	
  of	
  IP	
  and	
  plaVorm	
  architecture	
  owners	
  
‒  Several	
  host	
  plaVorm	
  architectures	
  are	
  targeted	
  	
  

"  The	
  specificaKons	
  define	
  a	
  common	
  set	
  of	
  plaVorm	
  properKes	
  that	
  provide	
  a	
  dependable	
  
hardware	
  and	
  system	
  foundaKon	
  for	
  applicaKon	
  soiware,	
  libraries	
  and	
  runKmes	
  
"  The	
  goal	
  is	
  to	
  eliminate	
  “weak	
  points”	
  in	
  the	
  system	
  soiware-­‐	
  and	
  hardware	
  architecture	
  
of	
  tradiKonal	
  plaVorms	
  that	
  lead	
  to	
  unnecessary	
  overhead	
  in	
  the	
  operaKons	
  of	
  data	
  
parallel	
  workloads	
  
"  The	
  main	
  deliverables	
  are:	
  
‒  Well-­‐defined,	
  	
  consistent	
  and	
  dependable	
  memory	
  model	
  all	
  HSA	
  agents	
  operate	
  in	
  
‒  Share	
  access	
  to	
  process	
  virtual	
  memory	
  between	
  HSA	
  agents	
  (“ptr-­‐is-­‐ptr”)	
  
‒  Low-­‐latency	
  workload	
  dispatch	
  contained	
  in	
  user-­‐mode	
  queues	
  
‒  Scalability	
  across	
  a	
  wide	
  range	
  of	
  plaVorms	
  
‒  These	
  properKes	
  are	
  leveraged	
  in	
  the	
  “HSA	
  Programmer’s	
  Reference”,	
  HSAIL	
  and	
  HSA	
  RunKme	
  
specificaKons	
  
4	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
WHAT	
  DEFINES	
  HSA	
  PLATFORMS	
  AND	
  COMPONENTS?	
  
" 

" 

In	
  short,	
  an	
  HSA	
  compaKble	
  plaVorm	
  consists	
  of	
  “HSA	
  agents”	
  (hardware	
  
components	
  that	
  parKcipate	
  in	
  the	
  HSA	
  memory	
  model)	
  adhering	
  to	
  the	
  various	
  
system	
  architecture	
  requirements	
  
Each	
  HSA	
  agent	
  adheres	
  to	
  the	
  same	
  queuing	
  &	
  dispatch	
  mechanics,	
  low-­‐latency	
  
synchronizaKon	
  primiKves,	
  memory	
  coherence	
  and	
  data	
  visibility	
  (memory	
  model)	
  
requirements	
  
‒ 

Defined	
  mainly	
  in	
  the	
  “(Soiware)	
  System	
  Architecture”	
  specificaKon	
  

‒ 

The	
  HSAIL	
  and	
  “Programmer’s	
  Reference	
  Manual”	
  specificaKons	
  define	
  the	
  soiware	
  execuKon	
  model	
  

‒ 

Architected	
  mechanisms	
  to	
  enqueue	
  and	
  dispatch	
  workloads	
  from	
  one	
  HSA	
  agent	
  queue	
  to	
  another	
  eliminate	
  the	
  need	
  to	
  
use	
  the	
  host	
  CPU	
  for	
  these	
  purposes	
  for	
  a	
  lot	
  of	
  scenarios	
  

‒ 

Architected	
  infrastructure	
  allows	
  exchanging	
  data	
  with	
  non-­‐HSA	
  compliant	
  components	
  in	
  a	
  plaVorm	
  

‒ 

Fundamental	
  data	
  types	
  are	
  naturally	
  aligned	
  

5	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
WHAT	
  DEFINES	
  HSA	
  PLATFORMS	
  AND	
  COMPONENTS?	
  
‒  There	
  are	
  two	
  different	
  machine	
  models	
  (“small”	
  and	
  “large”)	
  that	
  target	
  different	
  funcKonality	
  levels	
  
‒  It	
  takes	
  into	
  account	
  different	
  feature	
  requirements	
  for	
  different	
  plaVorm	
  environments	
  	
  
‒  In	
  all	
  cases,	
  the	
  same	
  HSA	
  applicaKon	
  programming	
  model	
  is	
  used	
  to	
  target	
  HSA	
  agents	
  and	
  provides	
  the	
  same	
  power–
efficient	
  and	
  low-­‐latency	
  	
  dispatch	
  mechanisms,	
  synchronizaKon	
  primiKves	
  and	
  SW	
  programming	
  model	
  

‒  ApplicaKons	
  wriren	
  to	
  target	
  HSA	
  small	
  model	
  machines	
  will	
  generally	
  work	
  on	
  large	
  model	
  machines,	
  too	
  
‒  If	
  the	
  large	
  model	
  plaVorm	
  and	
  host	
  OperaKng	
  System	
  provides	
  a	
  32bit	
  process	
  environment	
  

Proper&es	
  

Small	
  Machine	
  Model	
  

Large	
  Machine	
  Model	
  

PlaVorm	
  targets	
  

embedded	
  or	
  personal	
  device	
  space	
  (controllers,	
  
smartphones,	
  etc.)	
  

PC,	
  workstaKon,	
  cloud	
  Server,	
  etc	
  running	
  more	
  demanding	
  workloads	
  

NaKve	
  pointer	
  size	
  

32bit	
  

64bit	
  (+	
  32bit	
  ptr	
  if	
  32bit	
  processes	
  are	
  supported)	
  

FloaKng	
  point	
  size	
  

Half	
  (FP16*),	
  Single	
  (FP32)	
  precision	
  	
  

Half	
  (FP16*),	
  Single	
  (FP32),	
  Double	
  (FP64)	
  precision	
  

Atomic	
  ops	
  size	
  

32bit	
  

32bit,	
  64bit	
  

*min.	
  Load	
  and	
  store	
  on	
  memory	
  

6	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
THE	
  SHARED	
  PROCESS	
  VIRTUAL	
  ADDRESS	
  SPACE	
  REQUIREMENTS(1)	
  
‒  The	
  Basis	
  of	
  “ptr-­‐is-­‐ptr”	
  
" 

Each	
  HSA	
  agent	
  adheres	
  to	
  the	
  same	
  user	
  process	
  address	
  space	
  view	
  as	
  the	
  host	
  CPU	
  
‒ 

" 

The	
  process	
  address	
  view	
  is	
  established	
  by	
  the	
  hardware’s	
  page	
  table	
  mappings	
  
‒ 
‒ 
‒ 

" 

HSA	
  operates	
  in	
  a	
  “flat”	
  virtual	
  address	
  space,	
  using	
  64bit	
  &	
  32bit	
  ptrs	
  depending	
  on	
  applicaKon/machine	
  model	
  
‒  A	
  pointer	
  value	
  references	
  the	
  same	
  memory	
  for	
  every	
  HSA	
  agent	
  
‒  An	
  HSA	
  agent	
  can	
  “walk”	
  or	
  update	
  linked	
  data	
  structures	
  directly	
  without	
  any	
  assistance	
  from	
  a	
  host	
  CPU	
  

HSA	
  agent	
  virtual	
  address	
  range	
  matches	
  the	
  host	
  plaVorm	
  (e.g.	
  48bit,	
  32bit,	
  …)	
  
HSA	
  agents	
  always	
  operate	
  at	
  “user	
  privilege”	
  of	
  the	
  host	
  CPU,	
  policy	
  enforced	
  by	
  system	
  
HSA	
  agents	
  observe	
  the	
  same	
  memory	
  page	
  table	
  arributes	
  (cache,	
  read,	
  write,	
  …)	
  and	
  page	
  sizes	
  of	
  the	
  host	
  CPU,	
  policy	
  enforced	
  
by	
  system	
  

HSA	
  agents	
  support	
  page	
  faults,	
  allowing	
  to	
  directly	
  operate	
  on	
  pageable	
  memory	
  as	
  
provided	
  by	
  the	
  OperaKng	
  System	
  environment	
  
‒ 

‒ 

For	
  allocated	
  pageable	
  memory,	
  System	
  Soiware	
  takes	
  page	
  faults,	
  commits	
  memory,	
  loads	
  contents	
  from	
  backup	
  store	
  and	
  
restarts	
  execuKon	
  like	
  it	
  does	
  for	
  any	
  access	
  from	
  host	
  CPU	
  threads	
  
There	
  is	
  no	
  tedious	
  device	
  buffer	
  copy,	
  explicit	
  page	
  lock	
  or	
  similar	
  needed	
  to	
  access	
  data	
  in	
  allocated	
  memory	
  by	
  an	
  HSA	
  agent	
  
directly!	
  

7	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
THE	
  SHARED	
  PROCESS	
  VIRTUAL	
  ADDRESS	
  SPACE	
  REQUIREMENTS(2)	
  
"  The	
  basis	
  of	
  “ptr-­‐is-­‐ptr”	
  
" 

On	
  AMD	
  processor-­‐based	
  pla9orms,	
  the	
  IOMMUv2	
  device	
  
provides	
  the	
  HSAMMU	
  translaKon	
  services	
  via	
  standard	
  PCI	
  
Express™	
  ATS/PRI	
  protocols	
  to	
  HSA	
  compliant	
  hardware	
  
when	
  accessing	
  memory	
  from	
  the	
  HSA	
  agent	
  
‒ 

‒ 

" 

Device
Table
base
register

Event
Counter
registers

HSA MMU
(IOMMUv2 device)

Command
Page Req
Buffer
Log
base register
base register
Event Log
base register

System memory

IOMMUv2	
  integraKon	
  into	
  OS	
  memory	
  manager	
  provides	
  the	
  low-­‐level	
  
infrastructure	
  (e.g.	
  in	
  Linux®	
  kernel)	
  
Different	
  host	
  plaVorm	
  architectures	
  may	
  use	
  different	
  detail	
  mechanisms	
  here	
  

HSA MMU
Translation Tables
(per Process, PASID)

Page Service
Request Log

Event
Log

8	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  

I/O page tables

Command
Buffer

The	
  implementaKon	
  detail	
  is	
  not	
  relevant	
  to	
  the	
  applicaKon	
  and	
  dealt	
  within	
  the	
  
system	
  soiware	
  (e.g.	
  OS)	
  

Host
translation

Device
Table

‒ 

As	
  long	
  as	
  it	
  follows	
  the	
  HSA	
  Sysarch	
  requirements,	
  it	
  is	
  ok	
  	
  

Interrupt
Remapping
Table

‒ 

Guest &
host
translation

separate	
  translaKon	
  levels	
  are	
  used	
  (see	
  block	
  diagram)	
  

ImplementaKon	
  of	
  shared	
  virtual	
  address	
  space	
  by	
  other	
  
vendors	
  on	
  other	
  host	
  plaVorms	
  may	
  be	
  different	
  

Perf Counters &
RAS Info (opt.)

Peripheral Page
Requests
(PPR) Service

The	
  HSAMMU	
  funcKonality	
  is	
  provided	
  in	
  addiKon	
  to	
  
IOMMU	
  funcKonality	
  used	
  in	
  device	
  virtualizaKon	
  
‒ 

" 

HSA MMU Data structures
THE	
  HSA	
  MEMORY	
  MODEL	
  REQUIREMENTS	
  
"  What	
  are	
  	
  Its	
  key	
  properKes?	
  
" 

A	
  memory	
  model	
  defines	
  how	
  writes	
  by	
  one	
  work	
  item	
  or	
  agent	
  becomes	
  visible	
  to	
  other	
  
work	
  items	
  and	
  agents,	
  rules	
  that	
  need	
  to	
  be	
  adhered	
  to	
  by	
  compilers	
  and	
  applicaKon	
  
threads	
  
‒ 

‒ 

" 

‒ 

Naturally	
  aligned	
  on	
  size,	
  small	
  machine	
  model	
  supports	
  32bit,	
  large	
  machine	
  model	
  supports	
  32bit	
  and	
  64bit	
  

Cache	
  Coherency	
  between	
  HSA	
  agents	
  (&	
  host	
  CPU)	
  is	
  maintained	
  by	
  default	
  
‒ 

	
  

Inherently	
  maps	
  to	
  many	
  CPU	
  and	
  device	
  architectures	
  very	
  easily	
  
Efficient	
  sequenKal	
  consistency	
  mechanisms	
  supported	
  to	
  fit	
  high-­‐level	
  language	
  programming	
  models	
  

A	
  consistent,	
  full	
  set	
  of	
  atomic	
  operaKons	
  is	
  available	
  
‒ 

" 

Important	
  to	
  define	
  scope	
  for	
  performance	
  opKmizaKons	
  in	
  the	
  compiler,	
  to	
  allow	
  reordering	
  of	
  code	
  in	
  the	
  Finalizer	
  

At	
  its	
  base,	
  the	
  HSA	
  memory	
  model	
  is	
  based	
  on	
  a	
  “relaxed”	
  load	
  acquire/store	
  release	
  
model	
  
‒ 

" 

It	
  defines	
  visibility	
  and	
  ordering	
  rules	
  of	
  write	
  and	
  read	
  events	
  across	
  work	
  items,	
  HSA	
  agents	
  and	
  interacKons	
  with	
  non-­‐HSA	
  
components	
  in	
  the	
  system	
  

key	
  feature	
  of	
  the	
  HSA	
  system	
  &	
  plaVorm	
  environment	
  

9	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
THE	
  HSA	
  QUEUEING	
  ARCHITECTURE	
  REQUIREMENTS(1)	
  
"  The	
  basis	
  of	
  the	
  workload	
  dispatch	
  on	
  HSA
" 

	
  	
  

The	
  queue	
  dispatch	
  occurs	
  through	
  architected	
  queue	
  packets	
  (“Architected	
  
Queuing	
  Language”,	
  AQL	
  )	
  that	
  references	
  the	
  work	
  items	
  &	
  parameters	
  
‒ 

Dispatch	
  to	
  HW	
  occurs	
  directly	
  in	
  user	
  mode,	
  eliminaKng	
  a	
  notable	
  source	
  of	
  latency	
  overhead	
  in	
  tradiKonal	
  architectures!	
  

‒ 

Two	
  architected	
  packet	
  types	
  exist	
  at	
  the	
  moment,	
  dispatch	
  and	
  barrier	
  packets	
  

‒ 

‒ 

" 

Each	
  queue	
  is	
  defined	
  by	
  several	
  architected	
  parameters	
  (type,	
  base	
  address,	
  size,	
  read	
  index,	
  write	
  index,	
  …)	
  that	
  allow	
  
targeKng	
  the	
  queue	
  from	
  other	
  HSA	
  agents	
  and	
  the	
  host	
  CPU	
  
The	
  design	
  allows	
  an	
  HSA	
  agent	
  on	
  the	
  plaVorm	
  to	
  build	
  &	
  dispatch	
  jobs	
  to	
  a	
  queue	
  using	
  HSA	
  architected	
  interfaces	
  

ApplicaKons	
  and	
  runKme	
  can	
  build	
  different	
  queuing	
  models	
  on	
  top	
  of	
  the	
  
infrastructure	
  
‒ 

Single-­‐producer,	
  MulK-­‐producer	
  queuing	
  models,	
  lock-­‐free	
  dispatch,	
  …	
  are	
  all	
  opKons	
  SW	
  can	
  implement	
  on	
  top	
  of	
  the	
  
system	
  architecture’s	
  queue	
  definiKon	
  to	
  fit	
  the	
  use	
  model	
  

10	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
THE	
  HSA	
  QUEUEING	
  ARCHITECTURE	
  REQUIREMENTS(2)	
  
"  The	
  basis	
  of	
  the	
  workload	
  dispatch	
  on	
  HSA
" 

	
  	
  

The	
  HSA	
  System	
  Architecture	
  defines	
  a	
  user	
  mode	
  queue	
  based	
  dispatch	
  
mechanism	
  	
  
‒ 
‒ 

" 

Each	
  queue	
  is	
  only	
  valid	
  within	
  that	
  process	
  context	
  and	
  represents	
  a	
  virtual	
  enKty	
  that	
  is	
  scheduled	
  to	
  hardware	
  
The	
  job	
  execuKon	
  occurs	
  at	
  “user	
  privilege”	
  like	
  the	
  rest	
  of	
  the	
  applicaKon	
  code,	
  enforced	
  by	
  system	
  architecture	
  

Each	
  HSA	
  agent	
  allows	
  for	
  mulKple	
  queues	
  per	
  applicaKon	
  process	
  
‒ 

HSA	
  defines	
  in-­‐order	
  dispatch	
  semanKcs	
  of	
  work	
  items	
  within	
  queues	
  for	
  efficient	
  HW	
  implementaKon	
  
‒ 

‒ 

" 

HW	
  may	
  execute	
  dispatch	
  packets	
  “out-­‐of-­‐order”,	
  if	
  no	
  dependencies	
  exist	
  and	
  in-­‐order	
  semanKcs	
  are	
  followed	
  
externally	
  

“Out	
  of	
  order”	
  execuKon	
  applies	
  between	
  queues,	
  with	
  explicit,	
  memory	
  based	
  synchronizaKon	
  mechanisms	
  between	
  them	
  
as	
  needed	
  

It	
  is	
  “cheap”	
  to	
  create	
  queues	
  in	
  HSA,	
  so	
  applicaKons	
  can	
  have	
  one	
  queue	
  per	
  HSA	
  
agent	
  for	
  each	
  applicaKon	
  thread,	
  or	
  leveraging	
  mulKple	
  HSA	
  user	
  queues	
  per	
  
thread	
  if	
  needed	
  
‒ 

This	
  gives	
  applicaKons	
  a	
  lot	
  of	
  flexibility	
  to	
  structure	
  the	
  queue	
  layout	
  to	
  match	
  the	
  problem	
  instead	
  of	
  trying	
  to	
  fit	
  the	
  
problem	
  to	
  work	
  with	
  one	
  or	
  a	
  few	
  queues	
  only	
  

11	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
OTHER	
  REQUIREMENTS	
  SET	
  BY	
  THE	
  HSA	
  SYSTEM	
  ARCHITECTURE	
  	
  
"  Miscellaneous	
  menKon,	
  but	
  nevertheless	
  important	
  to	
  make	
  it	
  work	
  well…	
  
" 

HSA	
  Memory	
  based	
  signaling	
  and	
  synchronizaKon	
  primiKves	
  
‒ 

Defines	
  memory	
  based	
  semanKcs	
  to	
  synchronize	
  with	
  work	
  items	
  processed	
  by	
  HSA	
  agents	
  	
  
‒ 

e.g.	
  32bit	
  or	
  64bit	
  value,	
  content	
  update,	
  wait	
  on	
  value	
  by	
  HSA	
  agents	
  and	
  AQL	
  packets	
  

‒ 
‒ 

Allows	
  one-­‐to-­‐one	
  and	
  one-­‐to-­‐many	
  signaling	
  

‒ 

The	
  signaling	
  semanKcs	
  follow	
  atomicity	
  requirements	
  defined	
  in	
  the	
  memory	
  model	
  	
  

‒ 

" 

Hardware-­‐assisted,	
  power-­‐efficient	
  &	
  low-­‐latency	
  way	
  to	
  synchronize	
  execuKon	
  of	
  work	
  items	
  between	
  threads	
  

RunKme	
  &	
  applicaKon	
  SW	
  can	
  use	
  infrastructure	
  to	
  build	
  mutexes,	
  semaphores,	
  other	
  synchronizaKon	
  	
  primiKves	
  

HSA	
  Cache	
  Coherency	
  Domains	
  
‒ 

Defines	
  the	
  scope	
  of	
  HSA	
  cache	
  coherency	
  and	
  relate	
  to	
  other	
  non-­‐HSA	
  system	
  resource	
  operaKons	
  

‒ 

Associated	
  with	
  the	
  memory	
  model	
  requirements	
  

‒ 

Architected	
  way	
  to	
  interact	
  with	
  non-­‐HSA	
  plaVorm	
  infrastructure	
  (e.g.	
  graphics)	
  

12	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
OTHER	
  REQUIREMENTS	
  SET	
  BY	
  THE	
  HSA	
  SYSTEM	
  ARCHITECTURE	
  	
  
"  Miscellaneous	
  menKon,	
  but	
  nevertheless	
  	
  important	
  
HSA Platform - Simple

" 

HSA	
  system	
  Kmestamp	
  requirements	
  	
  
‒ 

‒ 

Defines	
  a	
  low-­‐overhead	
  mechanism	
  to	
  “determine	
  the	
  passing	
  of	
  Kme”	
  on	
  an	
  HSA	
  
plaVorm	
  

core

GPU

core
core
core

H-CU
H-CU

Mem

HSA MMU

H-CU

The	
  value	
  can	
  be	
  queried	
  by	
  HSAIL	
  or	
  HSA	
  runKme	
  

‒ 

CPU

System Memory

Represented	
  by	
  a	
  64bit	
  Kmestamp	
  value	
  that	
  does	
  not	
  roll	
  over	
  and	
  is	
  incremented	
  at	
  a	
  
constant	
  rate	
  in	
  HW	
  

‒ 

" 

HSA APU

ApplicaKons	
  and	
  tools	
  are	
  able	
  to	
  build	
  a	
  consistent	
  Kmeline	
  across	
  all	
  HSA	
  agents	
  	
  

HSA	
  Topology	
  requirements	
  

HSA Platform
Add-In GPU (optional)

GPU

HSA APU

‒ 

Defines	
  system	
  topology	
  and	
  properKes	
  of	
  HSA	
  agents	
  discoverable	
  on	
  an	
  HSA	
  plaVorm	
  
by	
  an	
  applicaKon	
  to	
  take	
  advantage	
  of	
  plaVorm	
  properKes	
  
‒ 

‒ 

Examples	
  are	
  #of	
  compute	
  units,	
  max.	
  work	
  item	
  dimensions,	
  work	
  group	
  size,	
  
work	
  item	
  size,	
  queue	
  properKes,	
  …	
  
API’s	
  like	
  OpenCL™	
  and	
  others	
  can	
  leverage	
  HSA	
  system	
  topology	
  data	
  to	
  discover	
  
memory	
  layout,	
  compute	
  unit	
  properKes	
  and	
  other	
  properKes	
  and	
  consistently	
  
report	
  the	
  system	
  topology	
  for	
  applicaKons	
  to	
  leverage	
  

13	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  

Device Local
Memory

HSA GPU

H-CU
CPU
core
core
core
core

System Memory

H-CU
GPU

HSA MMU

System
Firmware

H-CU

H-CU
H-CU

Mem

IOBUS

H-CU

Firmware

Mem
WHERE	
  TO	
  FIND	
  FURTHER	
  INFORMATION	
  ON	
  SYSTEM	
  ARCHITECTURE?	
  
" 

HSA	
  FoundaKon	
  Website:	
  hrp://www.hsafoundaKon.com	
  
‒ 

The	
  main	
  locaKon	
  for	
  specs,	
  developer	
  info,	
  tools,	
  publicaKons	
  and	
  many	
  things	
  more	
  

‒ 

HSA	
  Programmer’s	
  Reference	
  Manual	
  v	
  0.95	
  has	
  been	
  published	
  

‒ 

HSA	
  PlaVorm	
  Soiware	
  Systems	
  Architecture	
  SpecificaKon	
  is	
  quickly	
  nearing	
  the	
  0.95	
  state	
  	
  
‒ 

Will	
  be	
  published	
  aier	
  raKficaKon	
  by	
  the	
  HSA	
  FoundaKon	
  Board	
  of	
  Directors	
  

‒ 

Stay	
  Tuned	
  

14	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  
ANY	
  QUESTIONS?	
  
"  Of	
  course	
  there	
  are,	
  so	
  go	
  ahead	
  ☺	
  

15	
   |	
  	
  	
  THE	
  HSA	
  PLATFORM	
  SYSTEM	
  ARCHITECTURE	
  SPECIFICATION	
  –	
  AN	
  OVERVIEW	
  	
  |	
  	
  	
  NOVEMBER	
  12,	
  2013	
  |	
  APU13	
  	
  

Más contenido relacionado

La actualidad más candente

Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyCC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyAMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovAMD Developer Central
 

La actualidad más candente (20)

Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyCC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 

Similar a HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
Sap esp integration options
Sap esp integration optionsSap esp integration options
Sap esp integration optionsLuc Vanrobays
 
HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)Jay Wang
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
AX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy VliegenAX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy Vliegendynamicscom
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
Mashup2010
Mashup2010Mashup2010
Mashup2010steccami
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
Setup_Steps_ASCP_1.pdf
Setup_Steps_ASCP_1.pdfSetup_Steps_ASCP_1.pdf
Setup_Steps_ASCP_1.pdfUmairIlyas21
 
Azure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverAzure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverGary Jackson MBCS
 
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...FIWARE
 
Hovitaga OpenSQL Editor - Security and authorization concept
Hovitaga OpenSQL Editor - Security and authorization conceptHovitaga OpenSQL Editor - Security and authorization concept
Hovitaga OpenSQL Editor - Security and authorization conceptHovitaga Kft.
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
Apache Spark at Viadeo
Apache Spark at ViadeoApache Spark at Viadeo
Apache Spark at ViadeoCepoi Eugen
 

Similar a HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer (20)

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
Sap esp integration options
Sap esp integration optionsSap esp integration options
Sap esp integration options
 
HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
AX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy VliegenAX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy Vliegen
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Mashup2010
Mashup2010Mashup2010
Mashup2010
 
My Saminar On Php
My Saminar On PhpMy Saminar On Php
My Saminar On Php
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
SAP Hana Overview
SAP Hana OverviewSAP Hana Overview
SAP Hana Overview
 
Setup_Steps_ASCP_1.pdf
Setup_Steps_ASCP_1.pdfSetup_Steps_ASCP_1.pdf
Setup_Steps_ASCP_1.pdf
 
Azure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverAzure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaver
 
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...
FIWARE Global Summit - Provisioning of the FIWARE Orion Context Broker using ...
 
Hovitaga OpenSQL Editor - Security and authorization concept
Hovitaga OpenSQL Editor - Security and authorization conceptHovitaga OpenSQL Editor - Security and authorization concept
Hovitaga OpenSQL Editor - Security and authorization concept
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
Apache Spark at Viadeo
Apache Spark at ViadeoApache Spark at Viadeo
Apache Spark at Viadeo
 

Más de AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 

Más de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer

  • 1. THE  HSA  SYSTEM  ARCHITECTURE   REQUIREMENTS  –  AN  OVERVIEW   PAUL  BLINZER,  FELLOW,  HSA  SYSTEM  SOFTWARE,  AMD   SYSTEM  ARCHITECTURE  WORKGROUP  CHAIR,  HSA   FOUNDATION   1   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 2. AGENDA   !  What  is  the  HSA  FoundaKon?   !  The  System  Architecture  Workgroup  and  its  goals   !  What  defines  HSA  plaVorms  and  components?   !  The  Shared  Virtual  Memory  requirements   !  The  HSA  Memory  Model  Requirements   !  The  HSA  Queuing  Architecture   !  Some  other  requirements  set  by  the  System  Architecture  specificaKon   !  Where  to  find  further  informaKon   !  Q  &  A   2   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 3. WHAT  IS  THE  HSA  FOUNDATION?   "  This  is  the  short  version…   !  The  HSA  FoundaKon  is  a  not-­‐for-­‐profit  consorKum  of  SOC  and  SOC  IP   vendors,  OEMs,  academia,  OSVs  and  ISVs  defining  a  consistent   heterogeneous  plaVorm  architecture  to  make  it  dramaKcally  easier   to  program  heterogeneous  parallel  devices   !  !  It  spans  mulKple  host  plaVorm  architectures  and  programmable  data  parallel  components  (e.g.   CPU:  x86,  ARM,  MIPS,  …  device  types:  GPUs,  DSPs,  …)  to  work  collaboraKvely  within  the  same   HSA  system  architecture   It  defines  a  set  of  specificaKons  that  define  HW  &  SW  plaVorm  requirements  to  enable   applicaKons  to  target  the  feature  set  from  high  level  languages  and  APIs   !  !  !  It’s  not  a  replacement  to  e.g.  OpenCL  but  complementary  to  it,  defining  the  system  level   properKes  “below  the  API”,  leveraged  by  applicaKon-­‐  and  system  soiware   Conformance The  System  Architecture  specificaKon  defines  the  required  component  and  plaVorm  features  for   HSA  compliant  components   This  presentaKon  is  an  overview  of  the  current  System  Architecture   definiKons  and  does  not  represent  a  complete  or  “final”  state   !  Tools that  one  is  the  specificaKon  itself  when  available  ☺   3   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     System Runtime Specification Programmer’s Reference Manual Platform (Software) System Architecture Specification
  • 4. THE  SYSTEM  ARCHITECTURE  WORKGROUP  OF  THE  HSA  FOUNDATION   "  Who  ParKcipates  and  what  are  the  goals?     "  The  workgroup  membership  spans  a  wide  variety  of  IP  and  plaVorm  architecture  owners   ‒  Several  host  plaVorm  architectures  are  targeted     "  The  specificaKons  define  a  common  set  of  plaVorm  properKes  that  provide  a  dependable   hardware  and  system  foundaKon  for  applicaKon  soiware,  libraries  and  runKmes   "  The  goal  is  to  eliminate  “weak  points”  in  the  system  soiware-­‐  and  hardware  architecture   of  tradiKonal  plaVorms  that  lead  to  unnecessary  overhead  in  the  operaKons  of  data   parallel  workloads   "  The  main  deliverables  are:   ‒  Well-­‐defined,    consistent  and  dependable  memory  model  all  HSA  agents  operate  in   ‒  Share  access  to  process  virtual  memory  between  HSA  agents  (“ptr-­‐is-­‐ptr”)   ‒  Low-­‐latency  workload  dispatch  contained  in  user-­‐mode  queues   ‒  Scalability  across  a  wide  range  of  plaVorms   ‒  These  properKes  are  leveraged  in  the  “HSA  Programmer’s  Reference”,  HSAIL  and  HSA  RunKme   specificaKons   4   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 5. WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?   "  "  In  short,  an  HSA  compaKble  plaVorm  consists  of  “HSA  agents”  (hardware   components  that  parKcipate  in  the  HSA  memory  model)  adhering  to  the  various   system  architecture  requirements   Each  HSA  agent  adheres  to  the  same  queuing  &  dispatch  mechanics,  low-­‐latency   synchronizaKon  primiKves,  memory  coherence  and  data  visibility  (memory  model)   requirements   ‒  Defined  mainly  in  the  “(Soiware)  System  Architecture”  specificaKon   ‒  The  HSAIL  and  “Programmer’s  Reference  Manual”  specificaKons  define  the  soiware  execuKon  model   ‒  Architected  mechanisms  to  enqueue  and  dispatch  workloads  from  one  HSA  agent  queue  to  another  eliminate  the  need  to   use  the  host  CPU  for  these  purposes  for  a  lot  of  scenarios   ‒  Architected  infrastructure  allows  exchanging  data  with  non-­‐HSA  compliant  components  in  a  plaVorm   ‒  Fundamental  data  types  are  naturally  aligned   5   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 6. WHAT  DEFINES  HSA  PLATFORMS  AND  COMPONENTS?   ‒  There  are  two  different  machine  models  (“small”  and  “large”)  that  target  different  funcKonality  levels   ‒  It  takes  into  account  different  feature  requirements  for  different  plaVorm  environments     ‒  In  all  cases,  the  same  HSA  applicaKon  programming  model  is  used  to  target  HSA  agents  and  provides  the  same  power– efficient  and  low-­‐latency    dispatch  mechanisms,  synchronizaKon  primiKves  and  SW  programming  model   ‒  ApplicaKons  wriren  to  target  HSA  small  model  machines  will  generally  work  on  large  model  machines,  too   ‒  If  the  large  model  plaVorm  and  host  OperaKng  System  provides  a  32bit  process  environment   Proper&es   Small  Machine  Model   Large  Machine  Model   PlaVorm  targets   embedded  or  personal  device  space  (controllers,   smartphones,  etc.)   PC,  workstaKon,  cloud  Server,  etc  running  more  demanding  workloads   NaKve  pointer  size   32bit   64bit  (+  32bit  ptr  if  32bit  processes  are  supported)   FloaKng  point  size   Half  (FP16*),  Single  (FP32)  precision     Half  (FP16*),  Single  (FP32),  Double  (FP64)  precision   Atomic  ops  size   32bit   32bit,  64bit   *min.  Load  and  store  on  memory   6   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 7. THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(1)   ‒  The  Basis  of  “ptr-­‐is-­‐ptr”   "  Each  HSA  agent  adheres  to  the  same  user  process  address  space  view  as  the  host  CPU   ‒  "  The  process  address  view  is  established  by  the  hardware’s  page  table  mappings   ‒  ‒  ‒  "  HSA  operates  in  a  “flat”  virtual  address  space,  using  64bit  &  32bit  ptrs  depending  on  applicaKon/machine  model   ‒  A  pointer  value  references  the  same  memory  for  every  HSA  agent   ‒  An  HSA  agent  can  “walk”  or  update  linked  data  structures  directly  without  any  assistance  from  a  host  CPU   HSA  agent  virtual  address  range  matches  the  host  plaVorm  (e.g.  48bit,  32bit,  …)   HSA  agents  always  operate  at  “user  privilege”  of  the  host  CPU,  policy  enforced  by  system   HSA  agents  observe  the  same  memory  page  table  arributes  (cache,  read,  write,  …)  and  page  sizes  of  the  host  CPU,  policy  enforced   by  system   HSA  agents  support  page  faults,  allowing  to  directly  operate  on  pageable  memory  as   provided  by  the  OperaKng  System  environment   ‒  ‒  For  allocated  pageable  memory,  System  Soiware  takes  page  faults,  commits  memory,  loads  contents  from  backup  store  and   restarts  execuKon  like  it  does  for  any  access  from  host  CPU  threads   There  is  no  tedious  device  buffer  copy,  explicit  page  lock  or  similar  needed  to  access  data  in  allocated  memory  by  an  HSA  agent   directly!   7   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 8. THE  SHARED  PROCESS  VIRTUAL  ADDRESS  SPACE  REQUIREMENTS(2)   "  The  basis  of  “ptr-­‐is-­‐ptr”   "  On  AMD  processor-­‐based  pla9orms,  the  IOMMUv2  device   provides  the  HSAMMU  translaKon  services  via  standard  PCI   Express™  ATS/PRI  protocols  to  HSA  compliant  hardware   when  accessing  memory  from  the  HSA  agent   ‒  ‒  "  Device Table base register Event Counter registers HSA MMU (IOMMUv2 device) Command Page Req Buffer Log base register base register Event Log base register System memory IOMMUv2  integraKon  into  OS  memory  manager  provides  the  low-­‐level   infrastructure  (e.g.  in  Linux®  kernel)   Different  host  plaVorm  architectures  may  use  different  detail  mechanisms  here   HSA MMU Translation Tables (per Process, PASID) Page Service Request Log Event Log 8   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     I/O page tables Command Buffer The  implementaKon  detail  is  not  relevant  to  the  applicaKon  and  dealt  within  the   system  soiware  (e.g.  OS)   Host translation Device Table ‒  As  long  as  it  follows  the  HSA  Sysarch  requirements,  it  is  ok     Interrupt Remapping Table ‒  Guest & host translation separate  translaKon  levels  are  used  (see  block  diagram)   ImplementaKon  of  shared  virtual  address  space  by  other   vendors  on  other  host  plaVorms  may  be  different   Perf Counters & RAS Info (opt.) Peripheral Page Requests (PPR) Service The  HSAMMU  funcKonality  is  provided  in  addiKon  to   IOMMU  funcKonality  used  in  device  virtualizaKon   ‒  "  HSA MMU Data structures
  • 9. THE  HSA  MEMORY  MODEL  REQUIREMENTS   "  What  are    Its  key  properKes?   "  A  memory  model  defines  how  writes  by  one  work  item  or  agent  becomes  visible  to  other   work  items  and  agents,  rules  that  need  to  be  adhered  to  by  compilers  and  applicaKon   threads   ‒  ‒  "  ‒  Naturally  aligned  on  size,  small  machine  model  supports  32bit,  large  machine  model  supports  32bit  and  64bit   Cache  Coherency  between  HSA  agents  (&  host  CPU)  is  maintained  by  default   ‒    Inherently  maps  to  many  CPU  and  device  architectures  very  easily   Efficient  sequenKal  consistency  mechanisms  supported  to  fit  high-­‐level  language  programming  models   A  consistent,  full  set  of  atomic  operaKons  is  available   ‒  "  Important  to  define  scope  for  performance  opKmizaKons  in  the  compiler,  to  allow  reordering  of  code  in  the  Finalizer   At  its  base,  the  HSA  memory  model  is  based  on  a  “relaxed”  load  acquire/store  release   model   ‒  "  It  defines  visibility  and  ordering  rules  of  write  and  read  events  across  work  items,  HSA  agents  and  interacKons  with  non-­‐HSA   components  in  the  system   key  feature  of  the  HSA  system  &  plaVorm  environment   9   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 10. THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(1)   "  The  basis  of  the  workload  dispatch  on  HSA "      The  queue  dispatch  occurs  through  architected  queue  packets  (“Architected   Queuing  Language”,  AQL  )  that  references  the  work  items  &  parameters   ‒  Dispatch  to  HW  occurs  directly  in  user  mode,  eliminaKng  a  notable  source  of  latency  overhead  in  tradiKonal  architectures!   ‒  Two  architected  packet  types  exist  at  the  moment,  dispatch  and  barrier  packets   ‒  ‒  "  Each  queue  is  defined  by  several  architected  parameters  (type,  base  address,  size,  read  index,  write  index,  …)  that  allow   targeKng  the  queue  from  other  HSA  agents  and  the  host  CPU   The  design  allows  an  HSA  agent  on  the  plaVorm  to  build  &  dispatch  jobs  to  a  queue  using  HSA  architected  interfaces   ApplicaKons  and  runKme  can  build  different  queuing  models  on  top  of  the   infrastructure   ‒  Single-­‐producer,  MulK-­‐producer  queuing  models,  lock-­‐free  dispatch,  …  are  all  opKons  SW  can  implement  on  top  of  the   system  architecture’s  queue  definiKon  to  fit  the  use  model   10   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 11. THE  HSA  QUEUEING  ARCHITECTURE  REQUIREMENTS(2)   "  The  basis  of  the  workload  dispatch  on  HSA "      The  HSA  System  Architecture  defines  a  user  mode  queue  based  dispatch   mechanism     ‒  ‒  "  Each  queue  is  only  valid  within  that  process  context  and  represents  a  virtual  enKty  that  is  scheduled  to  hardware   The  job  execuKon  occurs  at  “user  privilege”  like  the  rest  of  the  applicaKon  code,  enforced  by  system  architecture   Each  HSA  agent  allows  for  mulKple  queues  per  applicaKon  process   ‒  HSA  defines  in-­‐order  dispatch  semanKcs  of  work  items  within  queues  for  efficient  HW  implementaKon   ‒  ‒  "  HW  may  execute  dispatch  packets  “out-­‐of-­‐order”,  if  no  dependencies  exist  and  in-­‐order  semanKcs  are  followed   externally   “Out  of  order”  execuKon  applies  between  queues,  with  explicit,  memory  based  synchronizaKon  mechanisms  between  them   as  needed   It  is  “cheap”  to  create  queues  in  HSA,  so  applicaKons  can  have  one  queue  per  HSA   agent  for  each  applicaKon  thread,  or  leveraging  mulKple  HSA  user  queues  per   thread  if  needed   ‒  This  gives  applicaKons  a  lot  of  flexibility  to  structure  the  queue  layout  to  match  the  problem  instead  of  trying  to  fit  the   problem  to  work  with  one  or  a  few  queues  only   11   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 12. OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE     "  Miscellaneous  menKon,  but  nevertheless  important  to  make  it  work  well…   "  HSA  Memory  based  signaling  and  synchronizaKon  primiKves   ‒  Defines  memory  based  semanKcs  to  synchronize  with  work  items  processed  by  HSA  agents     ‒  e.g.  32bit  or  64bit  value,  content  update,  wait  on  value  by  HSA  agents  and  AQL  packets   ‒  ‒  Allows  one-­‐to-­‐one  and  one-­‐to-­‐many  signaling   ‒  The  signaling  semanKcs  follow  atomicity  requirements  defined  in  the  memory  model     ‒  "  Hardware-­‐assisted,  power-­‐efficient  &  low-­‐latency  way  to  synchronize  execuKon  of  work  items  between  threads   RunKme  &  applicaKon  SW  can  use  infrastructure  to  build  mutexes,  semaphores,  other  synchronizaKon    primiKves   HSA  Cache  Coherency  Domains   ‒  Defines  the  scope  of  HSA  cache  coherency  and  relate  to  other  non-­‐HSA  system  resource  operaKons   ‒  Associated  with  the  memory  model  requirements   ‒  Architected  way  to  interact  with  non-­‐HSA  plaVorm  infrastructure  (e.g.  graphics)   12   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 13. OTHER  REQUIREMENTS  SET  BY  THE  HSA  SYSTEM  ARCHITECTURE     "  Miscellaneous  menKon,  but  nevertheless    important   HSA Platform - Simple "  HSA  system  Kmestamp  requirements     ‒  ‒  Defines  a  low-­‐overhead  mechanism  to  “determine  the  passing  of  Kme”  on  an  HSA   plaVorm   core GPU core core core H-CU H-CU Mem HSA MMU H-CU The  value  can  be  queried  by  HSAIL  or  HSA  runKme   ‒  CPU System Memory Represented  by  a  64bit  Kmestamp  value  that  does  not  roll  over  and  is  incremented  at  a   constant  rate  in  HW   ‒  "  HSA APU ApplicaKons  and  tools  are  able  to  build  a  consistent  Kmeline  across  all  HSA  agents     HSA  Topology  requirements   HSA Platform Add-In GPU (optional) GPU HSA APU ‒  Defines  system  topology  and  properKes  of  HSA  agents  discoverable  on  an  HSA  plaVorm   by  an  applicaKon  to  take  advantage  of  plaVorm  properKes   ‒  ‒  Examples  are  #of  compute  units,  max.  work  item  dimensions,  work  group  size,   work  item  size,  queue  properKes,  …   API’s  like  OpenCL™  and  others  can  leverage  HSA  system  topology  data  to  discover   memory  layout,  compute  unit  properKes  and  other  properKes  and  consistently   report  the  system  topology  for  applicaKons  to  leverage   13   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13     Device Local Memory HSA GPU H-CU CPU core core core core System Memory H-CU GPU HSA MMU System Firmware H-CU H-CU H-CU Mem IOBUS H-CU Firmware Mem
  • 14. WHERE  TO  FIND  FURTHER  INFORMATION  ON  SYSTEM  ARCHITECTURE?   "  HSA  FoundaKon  Website:  hrp://www.hsafoundaKon.com   ‒  The  main  locaKon  for  specs,  developer  info,  tools,  publicaKons  and  many  things  more   ‒  HSA  Programmer’s  Reference  Manual  v  0.95  has  been  published   ‒  HSA  PlaVorm  Soiware  Systems  Architecture  SpecificaKon  is  quickly  nearing  the  0.95  state     ‒  Will  be  published  aier  raKficaKon  by  the  HSA  FoundaKon  Board  of  Directors   ‒  Stay  Tuned   14   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13    
  • 15. ANY  QUESTIONS?   "  Of  course  there  are,  so  go  ahead  ☺   15   |      THE  HSA  PLATFORM  SYSTEM  ARCHITECTURE  SPECIFICATION  –  AN  OVERVIEW    |      NOVEMBER  12,  2013  |  APU13