HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Blinzer
1. THE
HSA
SYSTEM
ARCHITECTURE
REQUIREMENTS
–
AN
OVERVIEW
PAUL
BLINZER,
FELLOW,
HSA
SYSTEM
SOFTWARE,
AMD
SYSTEM
ARCHITECTURE
WORKGROUP
CHAIR,
HSA
FOUNDATION
1
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
2. AGENDA
!
What
is
the
HSA
FoundaKon?
!
The
System
Architecture
Workgroup
and
its
goals
!
What
defines
HSA
plaVorms
and
components?
!
The
Shared
Virtual
Memory
requirements
!
The
HSA
Memory
Model
Requirements
!
The
HSA
Queuing
Architecture
!
Some
other
requirements
set
by
the
System
Architecture
specificaKon
!
Where
to
find
further
informaKon
!
Q
&
A
2
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
3. WHAT
IS
THE
HSA
FOUNDATION?
" This
is
the
short
version…
!
The
HSA
FoundaKon
is
a
not-‐for-‐profit
consorKum
of
SOC
and
SOC
IP
vendors,
OEMs,
academia,
OSVs
and
ISVs
defining
a
consistent
heterogeneous
plaVorm
architecture
to
make
it
dramaKcally
easier
to
program
heterogeneous
parallel
devices
!
!
It
spans
mulKple
host
plaVorm
architectures
and
programmable
data
parallel
components
(e.g.
CPU:
x86,
ARM,
MIPS,
…
device
types:
GPUs,
DSPs,
…)
to
work
collaboraKvely
within
the
same
HSA
system
architecture
It
defines
a
set
of
specificaKons
that
define
HW
&
SW
plaVorm
requirements
to
enable
applicaKons
to
target
the
feature
set
from
high
level
languages
and
APIs
!
!
!
It’s
not
a
replacement
to
e.g.
OpenCL
but
complementary
to
it,
defining
the
system
level
properKes
“below
the
API”,
leveraged
by
applicaKon-‐
and
system
soiware
Conformance
The
System
Architecture
specificaKon
defines
the
required
component
and
plaVorm
features
for
HSA
compliant
components
This
presentaKon
is
an
overview
of
the
current
System
Architecture
definiKons
and
does
not
represent
a
complete
or
“final”
state
!
Tools
that
one
is
the
specificaKon
itself
when
available
☺
3
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
System
Runtime
Specification
Programmer’s
Reference
Manual
Platform
(Software)
System
Architecture
Specification
4. THE
SYSTEM
ARCHITECTURE
WORKGROUP
OF
THE
HSA
FOUNDATION
"
Who
ParKcipates
and
what
are
the
goals?
" The
workgroup
membership
spans
a
wide
variety
of
IP
and
plaVorm
architecture
owners
‒ Several
host
plaVorm
architectures
are
targeted
" The
specificaKons
define
a
common
set
of
plaVorm
properKes
that
provide
a
dependable
hardware
and
system
foundaKon
for
applicaKon
soiware,
libraries
and
runKmes
" The
goal
is
to
eliminate
“weak
points”
in
the
system
soiware-‐
and
hardware
architecture
of
tradiKonal
plaVorms
that
lead
to
unnecessary
overhead
in
the
operaKons
of
data
parallel
workloads
" The
main
deliverables
are:
‒ Well-‐defined,
consistent
and
dependable
memory
model
all
HSA
agents
operate
in
‒ Share
access
to
process
virtual
memory
between
HSA
agents
(“ptr-‐is-‐ptr”)
‒ Low-‐latency
workload
dispatch
contained
in
user-‐mode
queues
‒ Scalability
across
a
wide
range
of
plaVorms
‒ These
properKes
are
leveraged
in
the
“HSA
Programmer’s
Reference”,
HSAIL
and
HSA
RunKme
specificaKons
4
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
5. WHAT
DEFINES
HSA
PLATFORMS
AND
COMPONENTS?
"
"
In
short,
an
HSA
compaKble
plaVorm
consists
of
“HSA
agents”
(hardware
components
that
parKcipate
in
the
HSA
memory
model)
adhering
to
the
various
system
architecture
requirements
Each
HSA
agent
adheres
to
the
same
queuing
&
dispatch
mechanics,
low-‐latency
synchronizaKon
primiKves,
memory
coherence
and
data
visibility
(memory
model)
requirements
‒
Defined
mainly
in
the
“(Soiware)
System
Architecture”
specificaKon
‒
The
HSAIL
and
“Programmer’s
Reference
Manual”
specificaKons
define
the
soiware
execuKon
model
‒
Architected
mechanisms
to
enqueue
and
dispatch
workloads
from
one
HSA
agent
queue
to
another
eliminate
the
need
to
use
the
host
CPU
for
these
purposes
for
a
lot
of
scenarios
‒
Architected
infrastructure
allows
exchanging
data
with
non-‐HSA
compliant
components
in
a
plaVorm
‒
Fundamental
data
types
are
naturally
aligned
5
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
6. WHAT
DEFINES
HSA
PLATFORMS
AND
COMPONENTS?
‒ There
are
two
different
machine
models
(“small”
and
“large”)
that
target
different
funcKonality
levels
‒ It
takes
into
account
different
feature
requirements
for
different
plaVorm
environments
‒ In
all
cases,
the
same
HSA
applicaKon
programming
model
is
used
to
target
HSA
agents
and
provides
the
same
power–
efficient
and
low-‐latency
dispatch
mechanisms,
synchronizaKon
primiKves
and
SW
programming
model
‒ ApplicaKons
wriren
to
target
HSA
small
model
machines
will
generally
work
on
large
model
machines,
too
‒ If
the
large
model
plaVorm
and
host
OperaKng
System
provides
a
32bit
process
environment
Proper&es
Small
Machine
Model
Large
Machine
Model
PlaVorm
targets
embedded
or
personal
device
space
(controllers,
smartphones,
etc.)
PC,
workstaKon,
cloud
Server,
etc
running
more
demanding
workloads
NaKve
pointer
size
32bit
64bit
(+
32bit
ptr
if
32bit
processes
are
supported)
FloaKng
point
size
Half
(FP16*),
Single
(FP32)
precision
Half
(FP16*),
Single
(FP32),
Double
(FP64)
precision
Atomic
ops
size
32bit
32bit,
64bit
*min.
Load
and
store
on
memory
6
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
7. THE
SHARED
PROCESS
VIRTUAL
ADDRESS
SPACE
REQUIREMENTS(1)
‒ The
Basis
of
“ptr-‐is-‐ptr”
"
Each
HSA
agent
adheres
to
the
same
user
process
address
space
view
as
the
host
CPU
‒
"
The
process
address
view
is
established
by
the
hardware’s
page
table
mappings
‒
‒
‒
"
HSA
operates
in
a
“flat”
virtual
address
space,
using
64bit
&
32bit
ptrs
depending
on
applicaKon/machine
model
‒ A
pointer
value
references
the
same
memory
for
every
HSA
agent
‒ An
HSA
agent
can
“walk”
or
update
linked
data
structures
directly
without
any
assistance
from
a
host
CPU
HSA
agent
virtual
address
range
matches
the
host
plaVorm
(e.g.
48bit,
32bit,
…)
HSA
agents
always
operate
at
“user
privilege”
of
the
host
CPU,
policy
enforced
by
system
HSA
agents
observe
the
same
memory
page
table
arributes
(cache,
read,
write,
…)
and
page
sizes
of
the
host
CPU,
policy
enforced
by
system
HSA
agents
support
page
faults,
allowing
to
directly
operate
on
pageable
memory
as
provided
by
the
OperaKng
System
environment
‒
‒
For
allocated
pageable
memory,
System
Soiware
takes
page
faults,
commits
memory,
loads
contents
from
backup
store
and
restarts
execuKon
like
it
does
for
any
access
from
host
CPU
threads
There
is
no
tedious
device
buffer
copy,
explicit
page
lock
or
similar
needed
to
access
data
in
allocated
memory
by
an
HSA
agent
directly!
7
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
8. THE
SHARED
PROCESS
VIRTUAL
ADDRESS
SPACE
REQUIREMENTS(2)
" The
basis
of
“ptr-‐is-‐ptr”
"
On
AMD
processor-‐based
pla9orms,
the
IOMMUv2
device
provides
the
HSAMMU
translaKon
services
via
standard
PCI
Express™
ATS/PRI
protocols
to
HSA
compliant
hardware
when
accessing
memory
from
the
HSA
agent
‒
‒
"
Device
Table
base
register
Event
Counter
registers
HSA MMU
(IOMMUv2 device)
Command
Page Req
Buffer
Log
base register
base register
Event Log
base register
System memory
IOMMUv2
integraKon
into
OS
memory
manager
provides
the
low-‐level
infrastructure
(e.g.
in
Linux®
kernel)
Different
host
plaVorm
architectures
may
use
different
detail
mechanisms
here
HSA MMU
Translation Tables
(per Process, PASID)
Page Service
Request Log
Event
Log
8
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
I/O page tables
Command
Buffer
The
implementaKon
detail
is
not
relevant
to
the
applicaKon
and
dealt
within
the
system
soiware
(e.g.
OS)
Host
translation
Device
Table
‒
As
long
as
it
follows
the
HSA
Sysarch
requirements,
it
is
ok
Interrupt
Remapping
Table
‒
Guest &
host
translation
separate
translaKon
levels
are
used
(see
block
diagram)
ImplementaKon
of
shared
virtual
address
space
by
other
vendors
on
other
host
plaVorms
may
be
different
Perf Counters &
RAS Info (opt.)
Peripheral Page
Requests
(PPR) Service
The
HSAMMU
funcKonality
is
provided
in
addiKon
to
IOMMU
funcKonality
used
in
device
virtualizaKon
‒
"
HSA MMU Data structures
9. THE
HSA
MEMORY
MODEL
REQUIREMENTS
" What
are
Its
key
properKes?
"
A
memory
model
defines
how
writes
by
one
work
item
or
agent
becomes
visible
to
other
work
items
and
agents,
rules
that
need
to
be
adhered
to
by
compilers
and
applicaKon
threads
‒
‒
"
‒
Naturally
aligned
on
size,
small
machine
model
supports
32bit,
large
machine
model
supports
32bit
and
64bit
Cache
Coherency
between
HSA
agents
(&
host
CPU)
is
maintained
by
default
‒
Inherently
maps
to
many
CPU
and
device
architectures
very
easily
Efficient
sequenKal
consistency
mechanisms
supported
to
fit
high-‐level
language
programming
models
A
consistent,
full
set
of
atomic
operaKons
is
available
‒
"
Important
to
define
scope
for
performance
opKmizaKons
in
the
compiler,
to
allow
reordering
of
code
in
the
Finalizer
At
its
base,
the
HSA
memory
model
is
based
on
a
“relaxed”
load
acquire/store
release
model
‒
"
It
defines
visibility
and
ordering
rules
of
write
and
read
events
across
work
items,
HSA
agents
and
interacKons
with
non-‐HSA
components
in
the
system
key
feature
of
the
HSA
system
&
plaVorm
environment
9
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
10. THE
HSA
QUEUEING
ARCHITECTURE
REQUIREMENTS(1)
" The
basis
of
the
workload
dispatch
on
HSA
"
The
queue
dispatch
occurs
through
architected
queue
packets
(“Architected
Queuing
Language”,
AQL
)
that
references
the
work
items
&
parameters
‒
Dispatch
to
HW
occurs
directly
in
user
mode,
eliminaKng
a
notable
source
of
latency
overhead
in
tradiKonal
architectures!
‒
Two
architected
packet
types
exist
at
the
moment,
dispatch
and
barrier
packets
‒
‒
"
Each
queue
is
defined
by
several
architected
parameters
(type,
base
address,
size,
read
index,
write
index,
…)
that
allow
targeKng
the
queue
from
other
HSA
agents
and
the
host
CPU
The
design
allows
an
HSA
agent
on
the
plaVorm
to
build
&
dispatch
jobs
to
a
queue
using
HSA
architected
interfaces
ApplicaKons
and
runKme
can
build
different
queuing
models
on
top
of
the
infrastructure
‒
Single-‐producer,
MulK-‐producer
queuing
models,
lock-‐free
dispatch,
…
are
all
opKons
SW
can
implement
on
top
of
the
system
architecture’s
queue
definiKon
to
fit
the
use
model
10
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
11. THE
HSA
QUEUEING
ARCHITECTURE
REQUIREMENTS(2)
" The
basis
of
the
workload
dispatch
on
HSA
"
The
HSA
System
Architecture
defines
a
user
mode
queue
based
dispatch
mechanism
‒
‒
"
Each
queue
is
only
valid
within
that
process
context
and
represents
a
virtual
enKty
that
is
scheduled
to
hardware
The
job
execuKon
occurs
at
“user
privilege”
like
the
rest
of
the
applicaKon
code,
enforced
by
system
architecture
Each
HSA
agent
allows
for
mulKple
queues
per
applicaKon
process
‒
HSA
defines
in-‐order
dispatch
semanKcs
of
work
items
within
queues
for
efficient
HW
implementaKon
‒
‒
"
HW
may
execute
dispatch
packets
“out-‐of-‐order”,
if
no
dependencies
exist
and
in-‐order
semanKcs
are
followed
externally
“Out
of
order”
execuKon
applies
between
queues,
with
explicit,
memory
based
synchronizaKon
mechanisms
between
them
as
needed
It
is
“cheap”
to
create
queues
in
HSA,
so
applicaKons
can
have
one
queue
per
HSA
agent
for
each
applicaKon
thread,
or
leveraging
mulKple
HSA
user
queues
per
thread
if
needed
‒
This
gives
applicaKons
a
lot
of
flexibility
to
structure
the
queue
layout
to
match
the
problem
instead
of
trying
to
fit
the
problem
to
work
with
one
or
a
few
queues
only
11
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
12. OTHER
REQUIREMENTS
SET
BY
THE
HSA
SYSTEM
ARCHITECTURE
" Miscellaneous
menKon,
but
nevertheless
important
to
make
it
work
well…
"
HSA
Memory
based
signaling
and
synchronizaKon
primiKves
‒
Defines
memory
based
semanKcs
to
synchronize
with
work
items
processed
by
HSA
agents
‒
e.g.
32bit
or
64bit
value,
content
update,
wait
on
value
by
HSA
agents
and
AQL
packets
‒
‒
Allows
one-‐to-‐one
and
one-‐to-‐many
signaling
‒
The
signaling
semanKcs
follow
atomicity
requirements
defined
in
the
memory
model
‒
"
Hardware-‐assisted,
power-‐efficient
&
low-‐latency
way
to
synchronize
execuKon
of
work
items
between
threads
RunKme
&
applicaKon
SW
can
use
infrastructure
to
build
mutexes,
semaphores,
other
synchronizaKon
primiKves
HSA
Cache
Coherency
Domains
‒
Defines
the
scope
of
HSA
cache
coherency
and
relate
to
other
non-‐HSA
system
resource
operaKons
‒
Associated
with
the
memory
model
requirements
‒
Architected
way
to
interact
with
non-‐HSA
plaVorm
infrastructure
(e.g.
graphics)
12
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
13. OTHER
REQUIREMENTS
SET
BY
THE
HSA
SYSTEM
ARCHITECTURE
" Miscellaneous
menKon,
but
nevertheless
important
HSA Platform - Simple
"
HSA
system
Kmestamp
requirements
‒
‒
Defines
a
low-‐overhead
mechanism
to
“determine
the
passing
of
Kme”
on
an
HSA
plaVorm
core
GPU
core
core
core
H-CU
H-CU
Mem
HSA MMU
H-CU
The
value
can
be
queried
by
HSAIL
or
HSA
runKme
‒
CPU
System Memory
Represented
by
a
64bit
Kmestamp
value
that
does
not
roll
over
and
is
incremented
at
a
constant
rate
in
HW
‒
"
HSA APU
ApplicaKons
and
tools
are
able
to
build
a
consistent
Kmeline
across
all
HSA
agents
HSA
Topology
requirements
HSA Platform
Add-In GPU (optional)
GPU
HSA APU
‒
Defines
system
topology
and
properKes
of
HSA
agents
discoverable
on
an
HSA
plaVorm
by
an
applicaKon
to
take
advantage
of
plaVorm
properKes
‒
‒
Examples
are
#of
compute
units,
max.
work
item
dimensions,
work
group
size,
work
item
size,
queue
properKes,
…
API’s
like
OpenCL™
and
others
can
leverage
HSA
system
topology
data
to
discover
memory
layout,
compute
unit
properKes
and
other
properKes
and
consistently
report
the
system
topology
for
applicaKons
to
leverage
13
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
Device Local
Memory
HSA GPU
H-CU
CPU
core
core
core
core
System Memory
H-CU
GPU
HSA MMU
System
Firmware
H-CU
H-CU
H-CU
Mem
IOBUS
H-CU
Firmware
Mem
14. WHERE
TO
FIND
FURTHER
INFORMATION
ON
SYSTEM
ARCHITECTURE?
"
HSA
FoundaKon
Website:
hrp://www.hsafoundaKon.com
‒
The
main
locaKon
for
specs,
developer
info,
tools,
publicaKons
and
many
things
more
‒
HSA
Programmer’s
Reference
Manual
v
0.95
has
been
published
‒
HSA
PlaVorm
Soiware
Systems
Architecture
SpecificaKon
is
quickly
nearing
the
0.95
state
‒
Will
be
published
aier
raKficaKon
by
the
HSA
FoundaKon
Board
of
Directors
‒
Stay
Tuned
14
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13
15. ANY
QUESTIONS?
" Of
course
there
are,
so
go
ahead
☺
15
|
THE
HSA
PLATFORM
SYSTEM
ARCHITECTURE
SPECIFICATION
–
AN
OVERVIEW
|
NOVEMBER
12,
2013
|
APU13