This document provides an overview of heterogeneous architectures and the challenges they present for developers. It discusses how hardware is becoming more specialized and complex as Moore's Law slows. This leads to difficulties delivering high performance and efficiency in applications. The document then summarizes several available compute devices from easiest to hardest to program, including GPUs, MICs, FPGAs, and automata. It proposes that software and tools are needed to abstract this complexity and automatically realize performance gains across heterogeneous systems. Bifusion technology aims to do this through remote virtualization that scales applications horizontally, vertically, and across different device types in a transparent manner.
2. 2
abstract
and
slow
à
ß
complex
and
fast
Time
à
Delivering
performance
and
efficiency
to
today’s
applica<ons
is
becoming
more
difficult
The problem in compuHng
5. Moore’s law slowing -‐> complexity
Era
of
frequency
Era
of
mul<-‐core
Era
of
many-‐core
6. 6
abstract
and
slow
à
ß
complex
and
fast
Time
à
Help!
The problem in compuHng
7. The soluHon(s)
•
Hardware
• Specialized
hardware
required
to
keep
up
with
accelerated
performance
curve
• Encourage
accessibility:
low
hourly
pricing
• SoIware
• Abstrac<ons:
Libraries,
APIs,
tool
chain
up
to
compiler
IR,
use
transla<ons
where
possible
• Ecosystem:
Learning
materials,
user
groups,
university
engagement
•
What
makes
this
happen:
Developers
7
Remainder
of
this
talk
is
about
the
hardware
out
there
and
how
to
develop
for
them
8. Current State of Developer Experience
for Accelerators
8
-‐ Update
to
the
right
Opera<ng
System
-‐ Install
Vendor
Tool-‐flows
which
only
work
on
specific
Opera<ng
Systems
-‐ SeXng
up
the
Environment
and
Licenses
-‐ Installing
the
Board
-‐ SeXng
up
the
board
-‐ Numerous
pages
of
documenta<on
Unhappy
Developer
Experience
L
In
many
cases
developers
give
up
before
even
star<ng
real
work
due
to
this
poor
developer
experience
16. Vision
To
bring
supercompu<ng
for
the
masses
by:
◦ building
soIware
to
automa<cally
realize
the
benefits
of
heterogeneous
hardware
16
17. Enabling scaling automaHcally
Horizontal
Scaling
with
BF
Boost
remo<ng
technology
Ver5cal
Scaling
with
BF
Boost
spliXng
technology
Heterogeneous
Scaling
with
BF
Boost
intercep<on
technology
cpu
system
gpu
system
3X
Machine
learning
with
Caffe,
Torch:
2
local
vs.
8
remote
GPUs
3.5X
Rendering
with
Blender:
1
local
vs.
4
remote
GPUs
20X
Rendering
with
Blender:
4
remote
GPUs
8X
Image
Processing
with
ImageMagick:
1
vs.
12
local
GPUs
10X
Computer
Vision
(face
detect)
with
OpenCV:
12
CPU
cores
vs.
4
GPUs
7X
Computa5onal
Science
with
NAMD:
2
remote
GPUs
18. BiYusion Tech: Remote VirtualizaHon
18
Features
• Scale-‐out:
connect
one
server
to
many
accelerators
to
boost
performance
• Scale-‐in:
connect
many
servers
to
few
accelerators
to
pool
resources
and
lower
cost
• Service
discovery:
local
and
remote
machines
can
discover
themselves
on
demand
without
complex
or
<me
consuming
configura<on.
• Virtual
pools:
Segment
resources
by
class
of
users
or
hardware
Remote
virtualiza<on
enables
varied
virtual
configura<ons
by
combining
or
sharing
the
resources
of
local
and
remote
servers
• Binary-‐level
API
intercep<on
• Distribute
work
across
local
and
remote
machines
• Advanced
performance
features
including
synchroniza<on
elision
and
data
pipelining
applica5on
remote
servers
local
server
• SoIware
sees
all
new
hardware
as
if
it
were
directly
connected
• No
change
to
soIware
required
applica5on
virtual
server
with
combined
resources
System
view
Applica5on
view
data
and
compute
pipelining
Advanced
caching
and
data
directories
Auto
service
discovery,
metering
Func<on
redirec<on
for
advanced
coprocessors
19. Helping to solve accessibility
19
scale-‐out
pooling
Inexpensive
micro-‐client
Shared
Heterogeneous
server
20. offer most affordable
20
Heterogeneous
cloud
Developer
machine
high performance developer instances
and
• Binary-‐level
API
intercep<on
• Distribute
work
across
local
and
remote
machines
• Advanced
performance
features
including
synchroniza<on
elision
and
data
pipelining
applica5on
remote
servers
local
server
data
and
compute
pipelining
Advanced
caching
and
data
directories
Auto
service
discovery,
metering
Func<on
redirec<on
for
advanced
coprocessors