SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
HSA	
  ENABLEMENT	
  OF	
  APARAPI	
  
	
  EASING	
  THE	
  DEVELOPER	
  PATH	
  TO	
  APU/GPU	
  ACCELERATED	
  JAVA	
  APPLICATIONS	
  
VIGNESH	
  RAVI	
  –	
  SOFTWARE	
  DEVELOPER	
  HSA	
  TEAM	
  AMD	
  	
  
GARY	
  FROST	
  –	
  SOFTWARE	
  FELLOW	
  AMD	
  
HSA	
  ENABLEMENT	
  OF	
  APARAPI	
  :	
  AGENDA	
  

!  Java GPU enablement via Aparapi
‒  Why Java?
‒  Aparapi
‒  What is it and how is it used?

!  Introduction to HSA
!  How HSA simplifies Java GPU programming with Aparapi
‒  Simpler programming model using lambda expressions
‒  Removal of previous constraints thanks to SVM (Shared Virtual Memory)

!  The nuts and bolts of our current HSA enablement
‒  HSAIL generation
‒  Dispatch via HSA Runtime APIs

!  Summary
!  Q&A
2	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
WHY	
  JAVA?	
  
!  Java	
  by	
  the	
  numbers	
  	
  
‒ 9	
  Million	
  Developers	
  
‒ 1	
  Billion	
  Java	
  downloads	
  per	
  year	
  
‒ 97%	
  	
  Enterprise	
  desktops	
  run	
  Java	
  
‒ 100%	
  	
  of	
  blue	
  ray	
  players	
  ship	
  with	
  Java	
  
hVp://oracle.com.edgesuite.net/[meline/java/	
  
!  Java	
  7	
  language	
  &	
  libraries	
  already	
  include	
  concurrency	
  features	
  	
  
‒ primi[ves	
  (threads,	
  locks,	
  monitors,	
  atomic	
  ops)	
  
‒ libraries	
  (fork/join,	
  thread	
  pools,	
  executors,	
  futures)	
  
!  Upcoming	
  Java	
  8	
  include	
  stream	
  processing	
  enhancements	
  
‒ support	
  for	
  ‘lambda’	
  	
  expressions	
  	
  
‒ Lambda	
  centric	
  concurrent	
  stream	
  processing	
  libs/apis	
  	
  
(java.u[l.stream.*)	
  	
  	
  
3	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
INITIAL	
  APARAPI	
  PROJECT	
  OVERVIEW	
  (2011)	
  
!  Open Source framework

	
  
Java	
  Applica[on	
  

!  Allows Java developers access to GPU compute

Overload	
  Aparapi	
  KKernel	
  Base	
  
Overload	
  Aparapi	
   ernel	
  Class’s	
  
	
  run()	
  method	
  
Class’s	
  run()	
  method	
  

!  Aparapi Java API for expressing data parallel workloads

Aparapi	
  converts	
  
bytecode	
  to	
  
OpenCL™	
  	
  

Kernel kernel = new Kernel(){
@Override public void run(){
int i=getGlobalID();
square[i]=in[i]*in[i];
}
};
kernel.execute(size);

!  Aparapi runtime capable of converting bytecode to OpenCL™
‒  Execution on OpenCL™ 1.1+ capable devices (GPUs and APUs)
Or…
‒  Execute via a thread pool if OpenCL™ is unavailable.	
  
4	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

OpenCL™
OpenCL™ compiler &
Runtime
JVM
CPU ISA

CPU

GPU ISA

GPU
MEET	
  HSA	
  AND	
  HSAIL	
  
!  Heterogeneous	
  System	
  Architecture	
  standardizes	
  CPU/GPU	
  func[onality	
  
‒ Be	
  ISA-­‐agnos[c	
  for	
  both	
  CPUs	
  and	
  accelerators	
  
‒ Support	
  high-­‐level	
  programming	
  languages	
  
‒ Provide	
  the	
  ability	
  to	
  access	
  pageable	
  system	
  memory	
  from	
  the	
  GPU	
  
‒ Maintain	
  cache	
  coherency	
  for	
  system	
  memory	
  between	
  CPU	
  and	
  GPU	
  

!  Specifica[ons	
  and	
  simulator	
  from	
  HSA	
  Founda[on	
  
‒ HSAIL	
  portable	
  ISA	
  is	
  	
  “finalized”	
  to	
  par[cular	
  hardware	
  ISA	
  at	
  run[me	
  
‒ Run[me	
  specifica[on	
  for	
  job	
  launch	
  and	
  control	
  
‒ HSAIL™	
  simulator	
  for	
  development	
  and	
  tes[ng	
  before	
  hardware	
  availability	
  

5	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
APARAPI	
  HSA	
  ENABLEMENT	
  (2013-­‐2014)	
  
	
  
Java	
  Applica[on	
  

!  Open	
  Source	
  project	
  sponsored	
  	
  
!  Enhanced	
  to	
  support	
  HSA	
  and	
  Java	
  8	
  lambda	
  expression	
  

Aparapi	
  Lambda	
  based	
  	
  API	
  
Aparapi	
  converts	
  bytecode	
  to	
  
HSAIL	
  

Device.hsa().forEach(size,
i -> square[i]=in[i]*in[i]
);

HSAIL
HSA Finalizer & Runtime

	
  

!  Allow	
  developers	
  to	
  efficiently	
  represent	
  data	
  parallel	
  algorithms	
  
using	
  new	
  Java	
  8	
  Lambda	
  expressions	
  
!  API’s	
  have	
  same	
  look	
  &	
  feel	
  as	
  proposed	
  Java	
  8	
  stream	
  API	
  features	
  
!  No	
  modifica[ons	
  to	
  the	
  JVM.	
  	
  	
  
‒  We	
  provide	
  external	
  JNI/Java	
  libraries.	
  	
  

6	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

JVM
CPU ISA

CPU

GPU ISA

GPU
HSA	
  AND	
  LAMBDA	
  ENABLED	
  APARAPI	
  EXECUTION	
  EXAMPLE	
  

	
  

Does	
  PlaLorm	
  
Supports	
  HSA?	
  

Y

N
Y

Can	
  bytecode	
  be	
  
converted	
  to	
  
HSAIL?	
  

N

	
  
Device.hsa().forEach(size,
i -> square[i]=int[i]*int[i]
);

Is	
  this	
  the	
  first	
  
execuAon	
  of	
  this	
  
lambda	
  
	
  instance?	
  

Y

Execute	
  Kernel	
  
using	
  Java	
  
thread	
  Pool	
  

Convert	
  
bytecode	
  to	
  
HSAIL	
  

N
N

	
  
Do	
  we	
  have	
  HSAIL	
  
for	
  this	
  lambda	
  ?	
  

	
  

7	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

Y

Execute	
  	
  HSAIL	
  
Kernel	
  on	
  
GPU/APU	
  
SUMATRA	
  PROJECT	
  :	
  NATIVE	
  SUPPORT	
  FOR	
  GPU	
  OFFLOAD	
  ADDED	
  TO	
  JAVA	
  
!  AMD/Oracle	
  sponsored	
  Open	
  Source	
  (OpenJDK)	
  project	
  
!  Targeted	
  at	
  OpenJDK	
  Java	
  9	
  (2015)	
  

	
  
Java	
  Applica[on	
  

!  Allow	
  developers	
  to	
  efficiently	
  represent	
  data	
  parallel	
  algorithms	
  in	
  Java	
  using	
  
Stream	
  API	
  +	
  Lambda	
  expressions	
  

Java	
  JDK	
  Stream	
  +	
  Lambda	
  
API	
  

!  Sumatra	
  is	
  not	
  pushing	
  new	
  ‘programming	
  model’	
  	
  

Java	
  GRAAL	
  JIT	
  
backend	
  

!  Instead	
  we	
  ‘repurpose’	
  Stream	
  API	
  +	
  Lambda	
  to	
  enable	
  both	
  CPU	
  or	
  GPU	
  
compu[ng	
  

HSAIL

!  A	
  Sumatra	
  enabled	
  Java	
  Virtual	
  Machine™	
  will	
  dispatch	
  ‘selected’	
  constructs	
  to	
  HSA	
  
enabled	
  devices	
  at	
  run[me.	
  
!  Developers	
  already	
  refactoring	
  JDK	
  to	
  use	
  stream	
  &	
  lambda	
  API’s	
  
‒  So	
  anyone	
  using	
  exis[ng	
  JDK	
  should	
  see	
  GPU	
  accelera[on	
  without	
  any	
  code	
  changes.	
  

!  Links:	
  
‒  hVp://openjdk.java.net/projects/sumatra	
  
‒  hVps://wikis.oracle.com/display/HotSpotInternals/Sumatra	
  
‒  hVp://mail.openjdk.java.net/pipermail/sumatra-­‐dev	
  

8	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

HSA Finalizer & Runtime
JVM
CPU ISA

CPU

GPU ISA

GPU
HSA	
  ENABLEMENT	
  OF	
  JAVA	
  
Java	
  7	
  –	
  OpenCL	
  enabled	
  Aparapi	
  	
  
	
  

Java	
  8	
  –	
  	
  HSA	
  enabled	
  Aparapi	
  	
  

Java	
  9	
  –	
  HSA	
  enabled	
  Java	
  (Sumatra)	
  	
  

•  Java	
  8	
  brings	
  Stream	
  +	
  Lambda	
  API.	
  
More	
  natural	
  way	
  of	
  expressing	
  data	
  parallel	
  
algorithms	
  
	
  
Ini[ally	
  targeted	
  at	
  mul[-­‐core.	
  
	
  
•  APARAPI	
  will	
  :-­‐	
  
Support	
  Java	
  8	
  Lambdas	
  
	
  
Dispatch	
  code	
  to	
  HSA	
  enabled	
  devices	
  at	
  run[me	
  via	
  
HSAIL	
  

•  Adds	
  na[ve	
  GPU	
  compute	
  support	
  to	
  Java	
  Virtual	
  Machine	
  
(JVM)	
  	
  
	
  
•  Developer	
  uses	
  JDK	
  provided	
  	
  Lambda	
  +	
  Stream	
  API	
  

	
  

•  AMD	
  ini[ated	
  Open	
  Source	
  project	
  
	
  
•  APIs	
  for	
  data	
  parallel	
  algorithms	
  	
  
GPU	
  accelerate	
  Java	
  applica[ons	
  
No	
  need	
  to	
  learn	
  OpenCL	
  
	
  
•  Ac[ve	
  community	
  captured	
  mindshare	
  
~20	
  contributors	
  
	
  >7000	
  downloads	
  
~150	
  visits	
  per	
  day	
  

We	
  plan	
  to	
  provide	
  	
  
HSA	
  Enabled	
  Aparapi	
  (Java	
  8)	
  
as	
  a	
  bridge	
  technology	
  between	
  	
  
OpenCL	
  based	
  Aparapi	
  (Java	
  7)	
  
	
  and	
  	
  
HSA	
  Enabled	
  Sumatra	
  (Java	
  9)	
  

	
  
Java	
  Applica[on	
  

	
  
Java	
  Applica[on	
  

	
  
APARAPI	
  +	
  	
  Lambda	
  API	
  

OpenCL™

Java	
  JDK	
  Stream	
  +	
  Lambda	
  API	
  
	
  
Java	
  GRAAL	
  JIT	
  backend	
  
	
  

HSAIL™

HSAIL™

OpenCL™	
  Compiler	
  and	
  
Run[me	
  

	
  HSA	
  Finalizer	
  &	
  Run[me	
  

JVM	
  

	
  HSA™	
  Finalizer	
  &	
  Run[me	
  
JVM	
  

JVM	
  
GPU ISA

CPU	
  

•  JVM	
  uses	
  GRAAL	
  compiler	
  to	
  generate	
  HSAIL	
  	
  
	
  
•  JVM	
  decides	
  at	
  run[me	
  to	
  execute	
  on	
  either	
  CPU	
  or	
  GPU	
  
depending	
  on	
  workload	
  characteris[cs.	
  	
  

	
  
Java	
  Applica[on	
  

	
  
APARAPI	
  
	
  
API	
  

CPU ISA

	
  

GPU	
  

9	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

GPU ISA

CPU ISA
CPU	
  

GPU	
  

GPU ISA

CPU ISA
CPU	
  

GPU	
  
A	
  CASE	
  STUDY	
  CENTERED	
  ON	
  NBODY	
  
!  A	
  Java	
  developer	
  implemen[ng	
  a	
  sequen[al	
  version	
  of	
  NBody	
  would	
  probably…	
  
‒  Create	
  a	
  class	
  	
  to	
  represent	
  each	
  body	
  
class Body{
float x,y,z,m,vx,vy,vz;
// Include method to update position and display
void updateAndShow(Screen screen, Body[] bodies){
for (Body other:bodies){
// accumulate forces between other and this
}
// update vx,vy,vz,x,y and z from accumulated data
screen.paint(x,y,z);
}
}	
  

!  Loop	
  through	
  each	
  Body	
  (in	
  array	
  of	
  bodies[])	
  to	
  update	
  and	
  display	
  
for (Body b: bodies)
b.updateAndShow(screen, bodies);

10	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
WITHOUT	
  HSA	
  WE	
  CAN’T	
  (EFFICIENTLY)	
  USE	
  OBJECTS	
  	
  
!  In	
  Java;	
  allocated	
  Objects	
  are	
  scaVered	
  on	
  the	
  heap.	
  
‒  There	
  is	
  no	
  way	
  to	
  allocate	
  an	
  array	
  of	
  objects	
  in	
  con[guous	
  memory	
  (as	
  	
  with	
  C++)	
  
‒  We	
  force	
  the	
  developer	
  to	
  resort	
  to	
  using	
  parallel	
  arrays	
  of	
  primi[ves	
  (which	
  are	
  con[guous)	
  
	
   float x[], y[], z[], m[], vx,[], vy[], vz[];
‒  And	
  to	
  infer	
  that	
  	
  	
  x[n],	
  y[n]	
  and	
  z[n]	
  holds	
  the	
  state	
  for	
  bodies[n].	
  
Kernel kernel = new Kernel(){
public void run(){
int i = getGlobalId(0);
for (int j=0; j<bodies.length; j++){
// accum forces between (x,y,z)[j] and (x,y,z)[i]
}
// update vx[j],vy[j],vz[j],x[j],y[j] and z[j]
}
};

‒  Then	
  the	
  kernel	
  	
  can	
  be	
  used	
  to	
  execute	
  the	
  	
  code	
  on	
  the	
  GPU	
  
Kernel.execute(bodies.length);

11	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
HSA	
  ENABLED	
  APARAPI	
  (AND	
  SUMATRA)	
  ALLOWS	
  USE	
  OF	
  OBJECTS	
  
!  So	
  we	
  code	
  our	
  Body	
  class	
  exactly	
  as	
  we	
  would	
  if	
  execu[ng	
  in	
  Java.	
  	
  
class Body{
float x,y,z,m,vx,vy,vz;
// Include method to update position and display
void updateAndShow(Screen screen, Body[] bodies){
for (Body other:bodies){
// accumulate forces between other and this
}
// update vx,vy,vz,x,y and z from accumulated data
screen.paint(x,y,z);
}
}	
  

!  Then	
  use	
  new	
  Aparapi	
  lambda	
  enabled	
  API	
  to	
  coordinate	
  dispatch	
  to	
  theGPU	
  
Device.hsa().forEach(bodies, b -> {
b.updateAndShow(screen, bodies);
});

12	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
‒ Step	
  0:	
  Generate	
  HSAIL	
  from	
  Bytecode	
  
‒ Step	
  1:	
  Generate	
  host	
  HSA	
  Run[me	
  calls	
  
‒ Step	
  1.1:	
  Ini[alize	
  HSA	
  run[me,	
  device,	
  queue	
  
…	
  
‒ Step	
  1.2:	
  Finalize	
  HSAIL	
  to	
  generate	
  GPU	
  ISA	
  
‒ Step	
  1.3:	
  Bind	
  Java	
  args	
  to	
  HSA	
  args	
  
‒ Step	
  1.4:	
  Dispatch	
  the	
  kernel	
  
‒ Step	
  1.5:	
  Wait	
  for	
  comple[on	
  
‒ Repeat	
  steps	
  1.3	
  -­‐	
  1.5	
  for	
  next	
  itera[on	
  of	
  same	
  
kernel	
  

‒ Repeat	
  step	
  0	
  –	
  1	
  for	
  each	
  new	
  kernel	
  
	
  

MyLambda.java

javac (compiler)

MyLambda.class

Runtime

!  HSA	
  enabled	
  Aparapi,	
  at	
  run[me:	
  

Development time

OVERVIEW	
  OF	
  HSA	
  ENABLED	
  APARAPI	
  

Application

Aparapi

Generate
HSA RT
calls
Initialize

JVM

Contains

CPU ISA

Finalize

Bind Args
CPU

GPU

Dispatch
GPU ISA

13	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

Generate
HSAIL

Input
HIGH	
  LEVEL	
  HSA	
  FEATURES	
  

! Features	
  currently	
  being	
  defined	
  in	
  the	
  HSA	
  Working	
  Groups**	
  
‒ Unified	
  addressing	
  across	
  all	
  processors	
  
‒ Opera[on	
  into	
  pageable	
  system	
  memory	
  
‒ Full	
  memory	
  coherency	
  
‒ Pla|orm	
  	
  atomics	
  
‒ User	
  mode	
  dispatch	
  
‒ Enables	
  fast	
  dispatch	
  with	
  no	
  driver	
  involvement	
  

‒ Architected	
  queuing	
  language	
  
‒ Flexible	
  compute	
  dispatch,	
  easier	
  GPU	
  self-­‐enqueue	
  

‒ High	
  level	
  language	
  support	
  for	
  GPU	
  compute	
  processors	
  
‒ Preemp[on	
  and	
  context	
  switching	
  
	
  
**	
  All	
  features	
  subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
14	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

@	
  Copyright	
  2012	
  HSA	
  Founda[on.	
  All	
  Rights	
  Reserved.	
  
HSA	
  INTERMEDIATE	
  LANGUAGE	
  (HSAIL)**	
  
!  HSAIL	
  is	
  a	
  virtual	
  ISA	
  for	
  parallel	
  programs	
  
‒ Finalized	
  to	
  vendor-­‐specific	
  ISA	
  by	
  a	
  JIT	
  compiler	
  or	
  “Finalizer”	
  
‒ ISA	
  independent	
  by	
  design	
  for	
  CPU	
  &	
  GPU	
  

!  Explicitly	
  parallel	
  
‒ Designed	
  for	
  data	
  parallel	
  programming	
  

!  Support	
  for	
  excep[ons,	
  virtual	
  func[ons,	
  and	
  other	
  high	
  level	
  language	
  features	
  
!  Lower	
  level	
  than	
  OpenCL™	
  SPIR	
  
‒ Fits	
  naturally	
  in	
  the	
  OpenCL™	
  compila[on	
  stack	
  

!  Suitable	
  to	
  support	
  addi[onal	
  high	
  level	
  languages	
  and	
  programming	
  models:	
  
‒ Java,	
  C++,	
  OpenMP,	
  etc	
  

**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
15	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

@	
  Copyright	
  2012	
  HSA	
  Founda[on.	
  All	
  Rights	
  Reserved.	
  
HSAIL	
  OVERVIEW**	
  
INSTRUCTION	
  SET	
  

!  Similar	
  to	
  assembly	
  language	
  for	
  a	
  RISC	
  CPU	
  
‒  Load-­‐store	
  architecture	
  
ld_global_u64 $d0, [$d6 + 120];

$d0= load($d6+120)

add_u64

$d1= $d2+24

$d1, $d2, 24;

!  136	
  opcodes	
  (Java™	
  bytecode	
  has	
  200)	
  
‒  Floa[ng	
  point	
  (single,	
  double,	
  half	
  (f16))	
  
‒  Integer	
  (32-­‐bit,	
  64-­‐bit)	
  
‒  Some	
  packed	
  opera[ons	
  	
  
‒  Branches	
  
‒  Func[on	
  calls	
  
‒  Pla$orm	
  Atomic	
  Opera[ons:	
  	
  and,	
  or,	
  xor,	
  exch,	
  add,	
  sub,	
  
inc,	
  dec,	
  max,	
  min,	
  cas	
  
‒  Synchronize	
  host	
  CPU	
  and	
  HSA	
  Component!	
  

!  Text	
  and	
  Binary	
  formats	
  (“BRIG”)	
  

REGISTERS	
  

!  Four	
  classes	
  of	
  registers	
  
‒  C:	
  1-­‐bit,	
  Control	
  Registers	
  
‒  S:	
  32-­‐bit,	
  Single-­‐precision	
  FP	
  or	
  Int	
  
‒  D:	
  64-­‐bit,	
  Double-­‐precision	
  FP	
  or	
  Long	
  Int	
  
‒  Q:	
  128-­‐bit,	
  Packed	
  data.	
  

!  Fixed	
  number	
  of	
  registers:	
  
‒  8	
  C	
  	
  
‒  S,	
  D,	
  Q	
  share	
  a	
  single	
  pool	
  of	
  resources	
  
S + 2*D + 4*Q <= 128
Up to 128 S or 64 D or 32 Q (or a blend)

!  Register	
  alloca[on	
  done	
  in	
  high-­‐level	
  
compiler	
  	
  
‒  Finalizer	
  doesn’t	
  have	
  to	
  perform	
  expensive	
  
register	
  alloca[on	
  

	
  
**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
16	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

@	
  Copyright	
  2012	
  HSA	
  Founda[on.	
  All	
  Rights	
  Reserved.	
  
SEGMENTS	
  AND	
  MEMORY	
  **	
  
!  7	
  segments	
  of	
  memory	
  
‒  global,	
  readonly,	
  group,	
  spill,	
  private,	
  arg,	
  kernarg,	
  	
  
‒  Memory	
  instruc[ons	
  can	
  (op[onally)	
  specify	
  a	
  segment	
  

!  Global	
  Segment	
  

!  Kernarg	
  Segment	
  
‒  Programmer	
  writes	
  kernarg	
  segment	
  to	
  pass	
  
arguments	
  to	
  a	
  kernel	
  

!  Read-­‐Only	
  Segment	
  

‒  Visible	
  to	
  all	
  HSA	
  agents	
  (including	
  host	
  CPU)	
  

‒  Remains	
  constant	
  during	
  execu[on	
  of	
  kernel	
  

‒  HSAIL	
  provides	
  sync	
  opera[ons	
  to	
  control	
  visibility	
  of	
  
group	
  memory	
  

addressing	
  
‒  Very	
  useful	
  for	
  high-­‐level	
  language	
  support	
  (ie	
  
classes,	
  libraries)	
  
‒  Aligns	
  well	
  with	
  OpenCL	
  2.0	
  “generic”	
  addressing	
  
feature	
  

ld_global_u64 $d0, [$d6]
!  Flat	
  Addressing	
  
!  Group	
  Segment	
  
ld_group_u64 $d0,[$d6+24]
‒  Each	
  segment	
  mapped	
  into	
  virtual	
  address	
  space	
  
‒  Provides	
  high-­‐performance	
  memory	
  shared	
  in	
  the	
  work-­‐
st_spill_f32 $s1,[$d6+4] can	
  map	
  to	
  segments	
  based	
  on	
  
‒  Flat	
  addresses	
  
group.	
  
ld_kernarg_u64 $d6,virtual	
  address	
  
[%_arg0]
‒  Group	
  memory	
  can	
  be	
  read	
  and	
  wriVen	
  by	
  any	
  work-­‐
‒  Instruc[ons	
  with	
  n
item	
  in	
  the	
  work-­‐group	
  
ld_u64 $d0,[$d6+24] ; flat o	
  explicit	
  segment	
  use	
  flat	
  
!  Spill,	
  Private,	
  Arg	
  Segments	
  
‒  Represent	
  different	
  regions	
  of	
  a	
  per-­‐work-­‐item	
  stack	
  
‒  Typically	
  generated	
  by	
  compiler,	
  not	
  specified	
  by	
  
programmer	
  

	
  

**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
17	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

@	
  Copyright	
  2012	
  HSA	
  Founda[on.	
  All	
  Rights	
  Reserved.	
  
EXAMPLE	
  –	
  BYTECODE	
  TO	
  HSAIL	
  GENERATION	
  
Generated HSAIL

javac –g squares.java

int in[], out[];
Device.hsa().forEach(len, i->
out[i] = in[i] * in[i]
);

18	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

0: aload_0 //out[]
1: iload_2 //i
2: aload_1 //in[]
3: iload_2
4: iaload
5: aload_1
6: iload_2
7: iaload
8: imul
9: iastore
10: return

version 0:95: $full : $large;
kernel &run(
kernarg_u64 %_arg0,
//out[]
kernarg_u64 %_arg1,
//in[]
kernarg_s32 %_arg2
){
ld_kernarg_u64 $d0, [%_arg0];
ld_kernarg_u64 $d1, [%_arg1];
ld_kernarg_s32 $s2, [%_arg2];
workitemabsid_u32 $s2, 0; //i
mov_b64 $d3, $d0;
mov_b32 $s4, $s2;
mov_b64 $d5, $d1;
mov_b32 $s6, $s2;
cvt_u64_s32 $d6, $s6;
mad_u64 $d6, $d6, 4, $d5;
ld_global_s32 $s5, [$d6+24];
mov_b64 $d6, $d1;
mov_b32 $s7, $s2;
cvt_u64_s32 $d7, $s7;
mad_u64 $d7, $d7, 4, $d6;
ld_global_s32 $s6, [$d7+24];
mul_s32 $s5, $s5, $s6;
cvt_u64_s32 $d4, $s4;
mad_u64 $d4, $d4, 4, $d3;
st_global_s32 $s5, [$d4+24];
ret;
};
APARAPI	
  JNI	
  CALL	
  -­‐>	
  HSA	
  RUNTIME	
  API	
  

Device	
  Discovery	
  &	
  Queue	
  Crea[on	
  APIs**	
  
!  Discover	
  HSA	
  Device	
  
‒  Both	
  count	
  and	
  device_list	
  are	
  out	
  params	
  
‒  User	
  can	
  iterate	
  over	
  HSA	
  devices	
  in	
  the	
  list	
  

!  User-­‐Mode	
  Queue	
  Crea[on	
  
‒  User	
  can	
  provide	
  pre-­‐allocated	
  buffer	
  
‒  If	
  not,	
  API	
  will	
  allocate	
  a	
  buffer	
  
‒  queue	
  is	
  the	
  user-­‐mode	
  queue	
  

HsaStatus	
  HsaGetDevices(unsigned	
  int	
  *count,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  const	
  HsaDevice	
  **device_list);	
  

HsaStatus	
  HsaCreateUserModeQueue(const	
  HsaDevice	
  *device,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  void	
  *buffer,	
  size_t	
  buffer_size,	
  
	
  
	
  	
  HsaQueuePriority	
  queue_priority,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaQueueFrac[on	
  queue_frac[on,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaQueue	
  **queue);	
  

	
  

**	
  All	
  APIs	
  subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
19	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
APARAPI	
  JNI	
  -­‐>	
  HSA	
  RUNTIME	
  API	
  
Finalize	
  HSAIL	
  to	
  GPU	
  ISA**	
  
!  Transla[ng	
  HSAIL	
  text	
  to	
  Binary	
  (BRIG)	
  
‒  BRIG	
  is	
  a	
  binary	
  container	
  for	
  several	
  sec[ons	
  
‒  Code	
  
‒  String	
  
‒  Direc[ve	
  
‒  …	
  

‒  libHsail	
  is	
  an	
  assembler/disassembler	
  
‒  This	
  is	
  a	
  standalone	
  compiler	
  library	
  
‒  Not	
  part	
  of	
  Run[me	
  

!  Finalize	
  Brig	
  to	
  IHV	
  specific	
  GPU	
  ISA	
  
‒  Input:	
  Brig	
  
‒  Output:	
  HsaKernelCode	
  which	
  contains	
  ISA	
  

Status	
  Assemble	
  (const	
  char*	
  hsail_text,	
  HsaBrig	
  *brig);	
  
HsaStatus	
  HsaFinalizeBrig(const	
  HsaDevice	
  *device,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaBrig	
  *brig,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  const	
  char	
  *kernel_name,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  const	
  char	
  *op[ons,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaKernelCode	
  **kernel);	
  

**	
  All	
  APIs	
  subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
20	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
APARAPI	
  JNI	
  -­‐>	
  POPULATION	
  OF	
  AQL	
  DISPATCH	
  PACKET	
  
!  AQL	
  Dispatch	
  Packet**	
  
‒  Header	
  enables:	
  
‒  Different	
  packet	
  types	
  
‒  Specify	
  if	
  this	
  packet	
  should	
  wait	
  for	
  all	
  previous	
  to	
  
complete	
  
‒  Control	
  visibility	
  of	
  data	
  and	
  memory	
  fences	
  before	
  
and	
  aƒer	
  dispatch	
  

‒  Body	
  enables:	
  
‒  Specify	
  the	
  problem	
  fan	
  out	
  using	
  launch	
  config	
  
related	
  fields	
  
‒  How	
  much	
  workgroup	
  memory?	
  
‒  Loca[on	
  of	
  IHV	
  specific	
  GPU	
  ISA	
  
‒  Loca[on	
  of	
  where	
  kernelargs	
  can	
  be	
  found	
  
‒  A	
  signal	
  mechanism	
  to	
  wait	
  on	
  kernel	
  comple[on	
  

!  Only	
  popula[ng	
  Kernel	
  info	
  and	
  signal	
  are	
  
opaque,	
  so	
  require	
  run[me	
  APIs	
  

typedef	
  struct	
  HsaAqlDispatchPacket	
  {	
  
uint32_t	
  format	
  :	
  8;	
  
uint32_t	
  barrier	
  :	
  1;	
  
uint32_t	
  acquire_fence_scope	
  :	
  2;	
  
Header	
  Fields	
  
uint32_t	
  release_fence_scope	
  :	
  2;	
  
uint32_t	
  invalidate_instruction_cache	
  :	
  1;	
  
uint32_t	
  invalidate_roi_image_cache	
  :	
  1;	
  
uint32_t	
  dimensions	
  :	
  2;	
  
uint32_t	
  reserved	
  :	
  15;	
  
uint16_t	
  workgroup_size[3];	
   Launch	
  Config	
  
uint16_t	
  reserved2;	
  
uint32_t	
  grid_size[3];	
  
uint32_t	
  private_segment_size_bytes;	
  
uint32_t	
  group_segment_size_bytes;	
  
Kernel	
  Info	
  
uint64_t	
  kernel_object_address;	
  
uint64_t	
  kernel_arg_address;	
  
uint64_t	
  reserved3;	
  
uint64_t	
  completion_signal;	
  
Kernel	
  SynchronizaAon	
  
}	
  HsaAqlDispatchPacket;	
  

‒  Other	
  fields	
  	
  are	
  open,	
  so	
  simple	
  assignments	
  
**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
21	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
POPULATING	
  KERNEL	
  INFO	
  AND	
  SIGNAL	
  USING	
  HSA	
  RT	
  API**	
  
HsaStatus	
  HsaFinalizeBrig(const	
  HsaDevice	
  *device,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaBrig	
  *brig,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  const	
  char	
  *kernel_name,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  const	
  char	
  *op[ons,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  HsaKernelCode	
  **kernel);	
  

typedef	
  struct	
  HsaKernelCode	
  {	
  
	
  	
  	
  …	
  
	
  	
  	
  uint32_t	
  workitem_private_segment_byte_size;	
  
	
  	
  	
  uint32_t	
  workgroup_group_segment_byte_size;	
  
	
  	
  	
  uint64_t	
  kernarg_segment_byte_size;	
  	
  	
  	
  
	
  	
  	
  …	
  
}	
  HsaKernelCode;	
  

	
  

typedef	
  struct	
  HsaAqlDispatchPacket	
  {	
  
	
  	
  	
  …	
  
	
  	
  	
  uint32_t	
  private_segment_size_bytes;	
  
	
  	
  	
  uint32_t	
  group_segment_size_bytes;	
  
	
  	
  	
  uint64_t	
  kernel_object_address;	
  
	
  	
  	
  uint64_t	
  kernel_arg_address;	
  
	
  	
  	
  …	
  
	
  	
  	
  uint64_t	
  completion_signal;	
  
}	
  

HsaStatus	
  HsaCreateSignal(HsaSignal	
  *signal);	
  

**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
22	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

Pack	
  Java	
  Args	
  into	
  a	
  vector	
  in	
  JNI	
  
Register	
  vector	
  data	
  address	
  

HsaStatus	
  HsaRegisterSystemMemory(void	
  *address,	
  size_t	
  size);	
  
DISPATCH	
  AND	
  WAIT	
  ON	
  KERNEL	
  COMPLETION	
  
!  Dispatch	
  
‒  Submit	
  AQL	
  Packet	
  into	
  the	
  HsaQueue	
  
‒  Thread	
  safe	
  API	
  
HsaStatus	
  HsaSubmitAql(HsaQueue	
  *queue,HsaAqlDispatchPacket	
  *aql_packet);	
  

!  Wait	
  on	
  Kernel	
  Comple[on	
  
bool	
  is_done	
  =	
  false;	
  
while	
  (!is_done)	
  {	
  
	
  	
  	
  	
  status	
  =	
  HsaQuerySignal(signal,	
  &is_done);	
  
	
  	
  	
  	
  assert(status	
  ==	
  kHsaStatusSuccess);	
  
}	
  

**	
  Subject	
  to	
  change,	
  pending	
  comple[on	
  and	
  ra[fica[on	
  of	
  specifica[ons	
  in	
  the	
  HSA	
  Working	
  Groups	
  
23	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

!  Aƒer	
  comple[on,	
  disposing	
  HSA	
  resources	
  
‒  Release	
  queue	
  
‒  Release	
  signal	
  
‒  Release	
  Kernel	
  object	
  
‒  Deregister	
  kernel	
  args	
  related	
  memory	
  
HsaStatus	
  HsaDestroyUserModeQueue(HsaQueue	
  *queue);	
  
HsaStatus	
  HsaDestroySignal(HsaSignal	
  signal);	
  
HsaStatus	
  HsaFreeKernelCode(HsaKernelCode	
  *kernel);	
  
HsaStatus	
  HsaDeregisterSystemMemory(void	
  *address);	
  
	
  
DEMO	
  

24	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
SUMMARY	
  
!  Aparapi	
  is	
  already	
  an	
  establish	
  framework	
  for	
  simplifying	
  execu[on	
  of	
  Java	
  on	
  GPU	
  devices	
  
!  HSA	
  enabled	
  Aparapi	
  further	
  simplifies	
  GPU	
  accelera[on	
  of	
  Java	
  applica[ons	
  
‒  Aligns	
  with	
  Java	
  8	
  features	
  to	
  support	
  ‘lambda’	
  expression	
  for	
  compactness	
  
‒  Enables	
  ‘large	
  unified’	
  system	
  memory	
  for	
  GPU	
  accelera[on	
  
‒  Eases	
  programming	
  by	
  enabling	
  direct	
  access	
  to	
  Java	
  objects	
  on	
  heap	
  
‒  Enables	
  fast	
  offload	
  of	
  Java	
  kernels	
  through	
  User-­‐mode	
  queue	
  and	
  AQL	
  

!  HSA	
  enabled	
  Aparapi	
  lends	
  to	
  more	
  interes[ng	
  future	
  possibili[es	
  
‒  Simplified	
  communica[on	
  and	
  workload	
  balancing	
  across	
  both	
  CPU	
  and	
  GPU	
  
‒  Exploit	
  new	
  computa[on	
  paVerns	
  and	
  recursions	
  through	
  kernel	
  self-­‐enqueue	
  	
  

25	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
QUESTIONS	
  &	
  ANSWERS?	
  

26	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  

The	
  informa[on	
  presented	
  in	
  this	
  document	
  is	
  for	
  informa[onal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informa[on	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  soƒware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obliga[on	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informa[on.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informa[on	
  and	
  to	
  make	
  changes	
  from	
  [me	
  to	
  [me	
  to	
  the	
  content	
  hereof	
  without	
  obliga[on	
  of	
  AMD	
  to	
  no[fy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo	
  and	
  combina[ons	
  thereof	
  are	
  trademarks	
  of	
  Advanced	
  Micro	
  Devices,	
  
Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdic[ons.	
  OpenCL	
  is	
  a	
  trademark	
  of	
  Apple	
  Inc.	
  	
  HSA	
  is	
  a	
  trademark	
  of	
  the	
  Heterogeneous	
  System	
  Architecture	
  
Founda[on.	
  Other	
  names	
  are	
  for	
  informa[onal	
  purposes	
  only	
  and	
  may	
  be	
  trademarks	
  of	
  their	
  respec[ve	
  owners.	
  
27	
   |	
  	
  	
  HSA	
  ENABLEMENT	
  	
  OF	
  APARAPI	
  	
  	
  |	
  NOVEMBER	
  2013|	
  

Más contenido relacionado

La actualidad más candente

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovAMD Developer Central
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 

La actualidad más candente (20)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 

Destacado

Software developer career growth
Software developer career growthSoftware developer career growth
Software developer career growthSamnang Chhun
 
Open Source: A New Developer Career
Open Source: A New Developer CareerOpen Source: A New Developer Career
Open Source: A New Developer CareerDirk Riehle
 
Foundations of a Successful Developer Platform - DeveloperWeek 2015
Foundations of a Successful Developer Platform - DeveloperWeek 2015Foundations of a Successful Developer Platform - DeveloperWeek 2015
Foundations of a Successful Developer Platform - DeveloperWeek 2015Kamyar Mohager
 
IT Career: Software Developer
IT Career: Software DeveloperIT Career: Software Developer
IT Career: Software DeveloperDaniel Jay
 
The Developer Experience
The Developer ExperienceThe Developer Experience
The Developer ExperiencePamela Fox
 
10 steps to becoming a professional software engineer
10 steps to becoming a professional software engineer10 steps to becoming a professional software engineer
10 steps to becoming a professional software engineerJimmy Sorensen
 
Life as an asp.net programmer
Life as an asp.net programmerLife as an asp.net programmer
Life as an asp.net programmerArun Prasad
 
Building a Successful Career as a Software Developer
Building a Successful Career as a Software DeveloperBuilding a Successful Career as a Software Developer
Building a Successful Career as a Software Developernolanzak
 
Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)Pete Goodliffe
 
Electro-Hydraulic Linear Actuators for Industrial Valves
Electro-Hydraulic Linear Actuators for Industrial ValvesElectro-Hydraulic Linear Actuators for Industrial Valves
Electro-Hydraulic Linear Actuators for Industrial ValvesClassic Controls, Inc.
 
Careers in software development
Careers in software developmentCareers in software development
Careers in software developmentMichael Vax
 
Career Planning and Development
Career Planning and DevelopmentCareer Planning and Development
Career Planning and DevelopmentYodhia Antariksa
 
Career planning presentation
Career planning presentationCareer planning presentation
Career planning presentationkesiamargot
 

Destacado (16)

26968 software developer career
26968 software developer career26968 software developer career
26968 software developer career
 
Small Business
Small BusinessSmall Business
Small Business
 
Software developer career growth
Software developer career growthSoftware developer career growth
Software developer career growth
 
Open Source: A New Developer Career
Open Source: A New Developer CareerOpen Source: A New Developer Career
Open Source: A New Developer Career
 
Foundations of a Successful Developer Platform - DeveloperWeek 2015
Foundations of a Successful Developer Platform - DeveloperWeek 2015Foundations of a Successful Developer Platform - DeveloperWeek 2015
Foundations of a Successful Developer Platform - DeveloperWeek 2015
 
IT Career: Software Developer
IT Career: Software DeveloperIT Career: Software Developer
IT Career: Software Developer
 
The Developer Experience
The Developer ExperienceThe Developer Experience
The Developer Experience
 
10 steps to becoming a professional software engineer
10 steps to becoming a professional software engineer10 steps to becoming a professional software engineer
10 steps to becoming a professional software engineer
 
Life as an asp.net programmer
Life as an asp.net programmerLife as an asp.net programmer
Life as an asp.net programmer
 
Building a Successful Career as a Software Developer
Building a Successful Career as a Software DeveloperBuilding a Successful Career as a Software Developer
Building a Successful Career as a Software Developer
 
Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)Becoming a Better Programmer (2013)
Becoming a Better Programmer (2013)
 
Electro-Hydraulic Linear Actuators for Industrial Valves
Electro-Hydraulic Linear Actuators for Industrial ValvesElectro-Hydraulic Linear Actuators for Industrial Valves
Electro-Hydraulic Linear Actuators for Industrial Valves
 
Careers in software development
Careers in software developmentCareers in software development
Careers in software development
 
Career Planning and Development
Career Planning and DevelopmentCareer Planning and Development
Career Planning and Development
 
Career development ppt
Career development pptCareer development ppt
Career development ppt
 
Career planning presentation
Career planning presentationCareer planning presentation
Career planning presentation
 

Similar a CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi

PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleAMD Developer Central
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
Installing Hadoop / Spark from scratch
Installing Hadoop / Spark from scratchInstalling Hadoop / Spark from scratch
Installing Hadoop / Spark from scratchAndrey Vykhodtsev
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayAdam Gibson
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesikmfrancis
 
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...scalaconfjp
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
Preparing your code for Java 9
Preparing your code for Java 9Preparing your code for Java 9
Preparing your code for Java 9Deepu Xavier
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Zhipeng Huang
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
OASGraph LoopBack 4 Integration
OASGraph LoopBack 4 IntegrationOASGraph LoopBack 4 Integration
OASGraph LoopBack 4 IntegrationMario Estrada
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.J On The Beach
 

Similar a CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi (20)

PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
Installing Hadoop / Spark from scratch
Installing Hadoop / Spark from scratchInstalling Hadoop / Spark from scratch
Installing Hadoop / Spark from scratch
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A GrzesikApache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
Apache Karaf - Building OSGi applications on Apache Karaf - T Frank & A Grzesik
 
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
Preparing your code for Java 9
Preparing your code for Java 9Preparing your code for Java 9
Preparing your code for Java 9
 
Java 8 Overview
Java 8 OverviewJava 8 Overview
Java 8 Overview
 
JVM++: The Graal VM
JVM++: The Graal VMJVM++: The Graal VM
JVM++: The Graal VM
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
OASGraph LoopBack 4 Integration
OASGraph LoopBack 4 IntegrationOASGraph LoopBack 4 Integration
OASGraph LoopBack 4 Integration
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 

Más de AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 

Más de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 

Último

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi

  • 1. HSA  ENABLEMENT  OF  APARAPI    EASING  THE  DEVELOPER  PATH  TO  APU/GPU  ACCELERATED  JAVA  APPLICATIONS   VIGNESH  RAVI  –  SOFTWARE  DEVELOPER  HSA  TEAM  AMD     GARY  FROST  –  SOFTWARE  FELLOW  AMD  
  • 2. HSA  ENABLEMENT  OF  APARAPI  :  AGENDA   !  Java GPU enablement via Aparapi ‒  Why Java? ‒  Aparapi ‒  What is it and how is it used? !  Introduction to HSA !  How HSA simplifies Java GPU programming with Aparapi ‒  Simpler programming model using lambda expressions ‒  Removal of previous constraints thanks to SVM (Shared Virtual Memory) !  The nuts and bolts of our current HSA enablement ‒  HSAIL generation ‒  Dispatch via HSA Runtime APIs !  Summary !  Q&A 2   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 3. WHY  JAVA?   !  Java  by  the  numbers     ‒ 9  Million  Developers   ‒ 1  Billion  Java  downloads  per  year   ‒ 97%    Enterprise  desktops  run  Java   ‒ 100%    of  blue  ray  players  ship  with  Java   hVp://oracle.com.edgesuite.net/[meline/java/   !  Java  7  language  &  libraries  already  include  concurrency  features     ‒ primi[ves  (threads,  locks,  monitors,  atomic  ops)   ‒ libraries  (fork/join,  thread  pools,  executors,  futures)   !  Upcoming  Java  8  include  stream  processing  enhancements   ‒ support  for  ‘lambda’    expressions     ‒ Lambda  centric  concurrent  stream  processing  libs/apis     (java.u[l.stream.*)       3   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 4. INITIAL  APARAPI  PROJECT  OVERVIEW  (2011)   !  Open Source framework   Java  Applica[on   !  Allows Java developers access to GPU compute Overload  Aparapi  KKernel  Base   Overload  Aparapi   ernel  Class’s    run()  method   Class’s  run()  method   !  Aparapi Java API for expressing data parallel workloads Aparapi  converts   bytecode  to   OpenCL™     Kernel kernel = new Kernel(){ @Override public void run(){ int i=getGlobalID(); square[i]=in[i]*in[i]; } }; kernel.execute(size); !  Aparapi runtime capable of converting bytecode to OpenCL™ ‒  Execution on OpenCL™ 1.1+ capable devices (GPUs and APUs) Or… ‒  Execute via a thread pool if OpenCL™ is unavailable.   4   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   OpenCL™ OpenCL™ compiler & Runtime JVM CPU ISA CPU GPU ISA GPU
  • 5. MEET  HSA  AND  HSAIL   !  Heterogeneous  System  Architecture  standardizes  CPU/GPU  func[onality   ‒ Be  ISA-­‐agnos[c  for  both  CPUs  and  accelerators   ‒ Support  high-­‐level  programming  languages   ‒ Provide  the  ability  to  access  pageable  system  memory  from  the  GPU   ‒ Maintain  cache  coherency  for  system  memory  between  CPU  and  GPU   !  Specifica[ons  and  simulator  from  HSA  Founda[on   ‒ HSAIL  portable  ISA  is    “finalized”  to  par[cular  hardware  ISA  at  run[me   ‒ Run[me  specifica[on  for  job  launch  and  control   ‒ HSAIL™  simulator  for  development  and  tes[ng  before  hardware  availability   5   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 6. APARAPI  HSA  ENABLEMENT  (2013-­‐2014)     Java  Applica[on   !  Open  Source  project  sponsored     !  Enhanced  to  support  HSA  and  Java  8  lambda  expression   Aparapi  Lambda  based    API   Aparapi  converts  bytecode  to   HSAIL   Device.hsa().forEach(size, i -> square[i]=in[i]*in[i] ); HSAIL HSA Finalizer & Runtime   !  Allow  developers  to  efficiently  represent  data  parallel  algorithms   using  new  Java  8  Lambda  expressions   !  API’s  have  same  look  &  feel  as  proposed  Java  8  stream  API  features   !  No  modifica[ons  to  the  JVM.       ‒  We  provide  external  JNI/Java  libraries.     6   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   JVM CPU ISA CPU GPU ISA GPU
  • 7. HSA  AND  LAMBDA  ENABLED  APARAPI  EXECUTION  EXAMPLE     Does  PlaLorm   Supports  HSA?   Y N Y Can  bytecode  be   converted  to   HSAIL?   N   Device.hsa().forEach(size, i -> square[i]=int[i]*int[i] ); Is  this  the  first   execuAon  of  this   lambda    instance?   Y Execute  Kernel   using  Java   thread  Pool   Convert   bytecode  to   HSAIL   N N   Do  we  have  HSAIL   for  this  lambda  ?     7   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   Y Execute    HSAIL   Kernel  on   GPU/APU  
  • 8. SUMATRA  PROJECT  :  NATIVE  SUPPORT  FOR  GPU  OFFLOAD  ADDED  TO  JAVA   !  AMD/Oracle  sponsored  Open  Source  (OpenJDK)  project   !  Targeted  at  OpenJDK  Java  9  (2015)     Java  Applica[on   !  Allow  developers  to  efficiently  represent  data  parallel  algorithms  in  Java  using   Stream  API  +  Lambda  expressions   Java  JDK  Stream  +  Lambda   API   !  Sumatra  is  not  pushing  new  ‘programming  model’     Java  GRAAL  JIT   backend   !  Instead  we  ‘repurpose’  Stream  API  +  Lambda  to  enable  both  CPU  or  GPU   compu[ng   HSAIL !  A  Sumatra  enabled  Java  Virtual  Machine™  will  dispatch  ‘selected’  constructs  to  HSA   enabled  devices  at  run[me.   !  Developers  already  refactoring  JDK  to  use  stream  &  lambda  API’s   ‒  So  anyone  using  exis[ng  JDK  should  see  GPU  accelera[on  without  any  code  changes.   !  Links:   ‒  hVp://openjdk.java.net/projects/sumatra   ‒  hVps://wikis.oracle.com/display/HotSpotInternals/Sumatra   ‒  hVp://mail.openjdk.java.net/pipermail/sumatra-­‐dev   8   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   HSA Finalizer & Runtime JVM CPU ISA CPU GPU ISA GPU
  • 9. HSA  ENABLEMENT  OF  JAVA   Java  7  –  OpenCL  enabled  Aparapi       Java  8  –    HSA  enabled  Aparapi     Java  9  –  HSA  enabled  Java  (Sumatra)     •  Java  8  brings  Stream  +  Lambda  API.   More  natural  way  of  expressing  data  parallel   algorithms     Ini[ally  targeted  at  mul[-­‐core.     •  APARAPI  will  :-­‐   Support  Java  8  Lambdas     Dispatch  code  to  HSA  enabled  devices  at  run[me  via   HSAIL   •  Adds  na[ve  GPU  compute  support  to  Java  Virtual  Machine   (JVM)       •  Developer  uses  JDK  provided    Lambda  +  Stream  API     •  AMD  ini[ated  Open  Source  project     •  APIs  for  data  parallel  algorithms     GPU  accelerate  Java  applica[ons   No  need  to  learn  OpenCL     •  Ac[ve  community  captured  mindshare   ~20  contributors    >7000  downloads   ~150  visits  per  day   We  plan  to  provide     HSA  Enabled  Aparapi  (Java  8)   as  a  bridge  technology  between     OpenCL  based  Aparapi  (Java  7)    and     HSA  Enabled  Sumatra  (Java  9)     Java  Applica[on     Java  Applica[on     APARAPI  +    Lambda  API   OpenCL™ Java  JDK  Stream  +  Lambda  API     Java  GRAAL  JIT  backend     HSAIL™ HSAIL™ OpenCL™  Compiler  and   Run[me    HSA  Finalizer  &  Run[me   JVM    HSA™  Finalizer  &  Run[me   JVM   JVM   GPU ISA CPU   •  JVM  uses  GRAAL  compiler  to  generate  HSAIL       •  JVM  decides  at  run[me  to  execute  on  either  CPU  or  GPU   depending  on  workload  characteris[cs.       Java  Applica[on     APARAPI     API   CPU ISA   GPU   9   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   GPU ISA CPU ISA CPU   GPU   GPU ISA CPU ISA CPU   GPU  
  • 10. A  CASE  STUDY  CENTERED  ON  NBODY   !  A  Java  developer  implemen[ng  a  sequen[al  version  of  NBody  would  probably…   ‒  Create  a  class    to  represent  each  body   class Body{ float x,y,z,m,vx,vy,vz; // Include method to update position and display void updateAndShow(Screen screen, Body[] bodies){ for (Body other:bodies){ // accumulate forces between other and this } // update vx,vy,vz,x,y and z from accumulated data screen.paint(x,y,z); } }   !  Loop  through  each  Body  (in  array  of  bodies[])  to  update  and  display   for (Body b: bodies) b.updateAndShow(screen, bodies); 10   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 11. WITHOUT  HSA  WE  CAN’T  (EFFICIENTLY)  USE  OBJECTS     !  In  Java;  allocated  Objects  are  scaVered  on  the  heap.   ‒  There  is  no  way  to  allocate  an  array  of  objects  in  con[guous  memory  (as    with  C++)   ‒  We  force  the  developer  to  resort  to  using  parallel  arrays  of  primi[ves  (which  are  con[guous)     float x[], y[], z[], m[], vx,[], vy[], vz[]; ‒  And  to  infer  that      x[n],  y[n]  and  z[n]  holds  the  state  for  bodies[n].   Kernel kernel = new Kernel(){ public void run(){ int i = getGlobalId(0); for (int j=0; j<bodies.length; j++){ // accum forces between (x,y,z)[j] and (x,y,z)[i] } // update vx[j],vy[j],vz[j],x[j],y[j] and z[j] } }; ‒  Then  the  kernel    can  be  used  to  execute  the    code  on  the  GPU   Kernel.execute(bodies.length); 11   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 12. HSA  ENABLED  APARAPI  (AND  SUMATRA)  ALLOWS  USE  OF  OBJECTS   !  So  we  code  our  Body  class  exactly  as  we  would  if  execu[ng  in  Java.     class Body{ float x,y,z,m,vx,vy,vz; // Include method to update position and display void updateAndShow(Screen screen, Body[] bodies){ for (Body other:bodies){ // accumulate forces between other and this } // update vx,vy,vz,x,y and z from accumulated data screen.paint(x,y,z); } }   !  Then  use  new  Aparapi  lambda  enabled  API  to  coordinate  dispatch  to  theGPU   Device.hsa().forEach(bodies, b -> { b.updateAndShow(screen, bodies); }); 12   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 13. ‒ Step  0:  Generate  HSAIL  from  Bytecode   ‒ Step  1:  Generate  host  HSA  Run[me  calls   ‒ Step  1.1:  Ini[alize  HSA  run[me,  device,  queue   …   ‒ Step  1.2:  Finalize  HSAIL  to  generate  GPU  ISA   ‒ Step  1.3:  Bind  Java  args  to  HSA  args   ‒ Step  1.4:  Dispatch  the  kernel   ‒ Step  1.5:  Wait  for  comple[on   ‒ Repeat  steps  1.3  -­‐  1.5  for  next  itera[on  of  same   kernel   ‒ Repeat  step  0  –  1  for  each  new  kernel     MyLambda.java javac (compiler) MyLambda.class Runtime !  HSA  enabled  Aparapi,  at  run[me:   Development time OVERVIEW  OF  HSA  ENABLED  APARAPI   Application Aparapi Generate HSA RT calls Initialize JVM Contains CPU ISA Finalize Bind Args CPU GPU Dispatch GPU ISA 13   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   Generate HSAIL Input
  • 14. HIGH  LEVEL  HSA  FEATURES   ! Features  currently  being  defined  in  the  HSA  Working  Groups**   ‒ Unified  addressing  across  all  processors   ‒ Opera[on  into  pageable  system  memory   ‒ Full  memory  coherency   ‒ Pla|orm    atomics   ‒ User  mode  dispatch   ‒ Enables  fast  dispatch  with  no  driver  involvement   ‒ Architected  queuing  language   ‒ Flexible  compute  dispatch,  easier  GPU  self-­‐enqueue   ‒ High  level  language  support  for  GPU  compute  processors   ‒ Preemp[on  and  context  switching     **  All  features  subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   14   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   @  Copyright  2012  HSA  Founda[on.  All  Rights  Reserved.  
  • 15. HSA  INTERMEDIATE  LANGUAGE  (HSAIL)**   !  HSAIL  is  a  virtual  ISA  for  parallel  programs   ‒ Finalized  to  vendor-­‐specific  ISA  by  a  JIT  compiler  or  “Finalizer”   ‒ ISA  independent  by  design  for  CPU  &  GPU   !  Explicitly  parallel   ‒ Designed  for  data  parallel  programming   !  Support  for  excep[ons,  virtual  func[ons,  and  other  high  level  language  features   !  Lower  level  than  OpenCL™  SPIR   ‒ Fits  naturally  in  the  OpenCL™  compila[on  stack   !  Suitable  to  support  addi[onal  high  level  languages  and  programming  models:   ‒ Java,  C++,  OpenMP,  etc   **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   15   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   @  Copyright  2012  HSA  Founda[on.  All  Rights  Reserved.  
  • 16. HSAIL  OVERVIEW**   INSTRUCTION  SET   !  Similar  to  assembly  language  for  a  RISC  CPU   ‒  Load-­‐store  architecture   ld_global_u64 $d0, [$d6 + 120]; $d0= load($d6+120) add_u64 $d1= $d2+24 $d1, $d2, 24; !  136  opcodes  (Java™  bytecode  has  200)   ‒  Floa[ng  point  (single,  double,  half  (f16))   ‒  Integer  (32-­‐bit,  64-­‐bit)   ‒  Some  packed  opera[ons     ‒  Branches   ‒  Func[on  calls   ‒  Pla$orm  Atomic  Opera[ons:    and,  or,  xor,  exch,  add,  sub,   inc,  dec,  max,  min,  cas   ‒  Synchronize  host  CPU  and  HSA  Component!   !  Text  and  Binary  formats  (“BRIG”)   REGISTERS   !  Four  classes  of  registers   ‒  C:  1-­‐bit,  Control  Registers   ‒  S:  32-­‐bit,  Single-­‐precision  FP  or  Int   ‒  D:  64-­‐bit,  Double-­‐precision  FP  or  Long  Int   ‒  Q:  128-­‐bit,  Packed  data.   !  Fixed  number  of  registers:   ‒  8  C     ‒  S,  D,  Q  share  a  single  pool  of  resources   S + 2*D + 4*Q <= 128 Up to 128 S or 64 D or 32 Q (or a blend) !  Register  alloca[on  done  in  high-­‐level   compiler     ‒  Finalizer  doesn’t  have  to  perform  expensive   register  alloca[on     **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   16   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   @  Copyright  2012  HSA  Founda[on.  All  Rights  Reserved.  
  • 17. SEGMENTS  AND  MEMORY  **   !  7  segments  of  memory   ‒  global,  readonly,  group,  spill,  private,  arg,  kernarg,     ‒  Memory  instruc[ons  can  (op[onally)  specify  a  segment   !  Global  Segment   !  Kernarg  Segment   ‒  Programmer  writes  kernarg  segment  to  pass   arguments  to  a  kernel   !  Read-­‐Only  Segment   ‒  Visible  to  all  HSA  agents  (including  host  CPU)   ‒  Remains  constant  during  execu[on  of  kernel   ‒  HSAIL  provides  sync  opera[ons  to  control  visibility  of   group  memory   addressing   ‒  Very  useful  for  high-­‐level  language  support  (ie   classes,  libraries)   ‒  Aligns  well  with  OpenCL  2.0  “generic”  addressing   feature   ld_global_u64 $d0, [$d6] !  Flat  Addressing   !  Group  Segment   ld_group_u64 $d0,[$d6+24] ‒  Each  segment  mapped  into  virtual  address  space   ‒  Provides  high-­‐performance  memory  shared  in  the  work-­‐ st_spill_f32 $s1,[$d6+4] can  map  to  segments  based  on   ‒  Flat  addresses   group.   ld_kernarg_u64 $d6,virtual  address   [%_arg0] ‒  Group  memory  can  be  read  and  wriVen  by  any  work-­‐ ‒  Instruc[ons  with  n item  in  the  work-­‐group   ld_u64 $d0,[$d6+24] ; flat o  explicit  segment  use  flat   !  Spill,  Private,  Arg  Segments   ‒  Represent  different  regions  of  a  per-­‐work-­‐item  stack   ‒  Typically  generated  by  compiler,  not  specified  by   programmer     **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   17   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   @  Copyright  2012  HSA  Founda[on.  All  Rights  Reserved.  
  • 18. EXAMPLE  –  BYTECODE  TO  HSAIL  GENERATION   Generated HSAIL javac –g squares.java int in[], out[]; Device.hsa().forEach(len, i-> out[i] = in[i] * in[i] ); 18   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   0: aload_0 //out[] 1: iload_2 //i 2: aload_1 //in[] 3: iload_2 4: iaload 5: aload_1 6: iload_2 7: iaload 8: imul 9: iastore 10: return version 0:95: $full : $large; kernel &run( kernarg_u64 %_arg0, //out[] kernarg_u64 %_arg1, //in[] kernarg_s32 %_arg2 ){ ld_kernarg_u64 $d0, [%_arg0]; ld_kernarg_u64 $d1, [%_arg1]; ld_kernarg_s32 $s2, [%_arg2]; workitemabsid_u32 $s2, 0; //i mov_b64 $d3, $d0; mov_b32 $s4, $s2; mov_b64 $d5, $d1; mov_b32 $s6, $s2; cvt_u64_s32 $d6, $s6; mad_u64 $d6, $d6, 4, $d5; ld_global_s32 $s5, [$d6+24]; mov_b64 $d6, $d1; mov_b32 $s7, $s2; cvt_u64_s32 $d7, $s7; mad_u64 $d7, $d7, 4, $d6; ld_global_s32 $s6, [$d7+24]; mul_s32 $s5, $s5, $s6; cvt_u64_s32 $d4, $s4; mad_u64 $d4, $d4, 4, $d3; st_global_s32 $s5, [$d4+24]; ret; };
  • 19. APARAPI  JNI  CALL  -­‐>  HSA  RUNTIME  API   Device  Discovery  &  Queue  Crea[on  APIs**   !  Discover  HSA  Device   ‒  Both  count  and  device_list  are  out  params   ‒  User  can  iterate  over  HSA  devices  in  the  list   !  User-­‐Mode  Queue  Crea[on   ‒  User  can  provide  pre-­‐allocated  buffer   ‒  If  not,  API  will  allocate  a  buffer   ‒  queue  is  the  user-­‐mode  queue   HsaStatus  HsaGetDevices(unsigned  int  *count,                                                                  const  HsaDevice  **device_list);   HsaStatus  HsaCreateUserModeQueue(const  HsaDevice  *device,                                                                                    void  *buffer,  size_t  buffer_size,        HsaQueuePriority  queue_priority,                                                                                    HsaQueueFrac[on  queue_frac[on,                                                                                    HsaQueue  **queue);     **  All  APIs  subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   19   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 20. APARAPI  JNI  -­‐>  HSA  RUNTIME  API   Finalize  HSAIL  to  GPU  ISA**   !  Transla[ng  HSAIL  text  to  Binary  (BRIG)   ‒  BRIG  is  a  binary  container  for  several  sec[ons   ‒  Code   ‒  String   ‒  Direc[ve   ‒  …   ‒  libHsail  is  an  assembler/disassembler   ‒  This  is  a  standalone  compiler  library   ‒  Not  part  of  Run[me   !  Finalize  Brig  to  IHV  specific  GPU  ISA   ‒  Input:  Brig   ‒  Output:  HsaKernelCode  which  contains  ISA   Status  Assemble  (const  char*  hsail_text,  HsaBrig  *brig);   HsaStatus  HsaFinalizeBrig(const  HsaDevice  *device,                                                                        HsaBrig  *brig,                                                                        const  char  *kernel_name,                                                                        const  char  *op[ons,                                                                        HsaKernelCode  **kernel);   **  All  APIs  subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   20   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 21. APARAPI  JNI  -­‐>  POPULATION  OF  AQL  DISPATCH  PACKET   !  AQL  Dispatch  Packet**   ‒  Header  enables:   ‒  Different  packet  types   ‒  Specify  if  this  packet  should  wait  for  all  previous  to   complete   ‒  Control  visibility  of  data  and  memory  fences  before   and  aƒer  dispatch   ‒  Body  enables:   ‒  Specify  the  problem  fan  out  using  launch  config   related  fields   ‒  How  much  workgroup  memory?   ‒  Loca[on  of  IHV  specific  GPU  ISA   ‒  Loca[on  of  where  kernelargs  can  be  found   ‒  A  signal  mechanism  to  wait  on  kernel  comple[on   !  Only  popula[ng  Kernel  info  and  signal  are   opaque,  so  require  run[me  APIs   typedef  struct  HsaAqlDispatchPacket  {   uint32_t  format  :  8;   uint32_t  barrier  :  1;   uint32_t  acquire_fence_scope  :  2;   Header  Fields   uint32_t  release_fence_scope  :  2;   uint32_t  invalidate_instruction_cache  :  1;   uint32_t  invalidate_roi_image_cache  :  1;   uint32_t  dimensions  :  2;   uint32_t  reserved  :  15;   uint16_t  workgroup_size[3];   Launch  Config   uint16_t  reserved2;   uint32_t  grid_size[3];   uint32_t  private_segment_size_bytes;   uint32_t  group_segment_size_bytes;   Kernel  Info   uint64_t  kernel_object_address;   uint64_t  kernel_arg_address;   uint64_t  reserved3;   uint64_t  completion_signal;   Kernel  SynchronizaAon   }  HsaAqlDispatchPacket;   ‒  Other  fields    are  open,  so  simple  assignments   **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   21   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 22. POPULATING  KERNEL  INFO  AND  SIGNAL  USING  HSA  RT  API**   HsaStatus  HsaFinalizeBrig(const  HsaDevice  *device,                                                                        HsaBrig  *brig,                                                                        const  char  *kernel_name,                                                                        const  char  *op[ons,                                                                        HsaKernelCode  **kernel);   typedef  struct  HsaKernelCode  {        …        uint32_t  workitem_private_segment_byte_size;        uint32_t  workgroup_group_segment_byte_size;        uint64_t  kernarg_segment_byte_size;              …   }  HsaKernelCode;     typedef  struct  HsaAqlDispatchPacket  {        …        uint32_t  private_segment_size_bytes;        uint32_t  group_segment_size_bytes;        uint64_t  kernel_object_address;        uint64_t  kernel_arg_address;        …        uint64_t  completion_signal;   }   HsaStatus  HsaCreateSignal(HsaSignal  *signal);   **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   22   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   Pack  Java  Args  into  a  vector  in  JNI   Register  vector  data  address   HsaStatus  HsaRegisterSystemMemory(void  *address,  size_t  size);  
  • 23. DISPATCH  AND  WAIT  ON  KERNEL  COMPLETION   !  Dispatch   ‒  Submit  AQL  Packet  into  the  HsaQueue   ‒  Thread  safe  API   HsaStatus  HsaSubmitAql(HsaQueue  *queue,HsaAqlDispatchPacket  *aql_packet);   !  Wait  on  Kernel  Comple[on   bool  is_done  =  false;   while  (!is_done)  {          status  =  HsaQuerySignal(signal,  &is_done);          assert(status  ==  kHsaStatusSuccess);   }   **  Subject  to  change,  pending  comple[on  and  ra[fica[on  of  specifica[ons  in  the  HSA  Working  Groups   23   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|   !  Aƒer  comple[on,  disposing  HSA  resources   ‒  Release  queue   ‒  Release  signal   ‒  Release  Kernel  object   ‒  Deregister  kernel  args  related  memory   HsaStatus  HsaDestroyUserModeQueue(HsaQueue  *queue);   HsaStatus  HsaDestroySignal(HsaSignal  signal);   HsaStatus  HsaFreeKernelCode(HsaKernelCode  *kernel);   HsaStatus  HsaDeregisterSystemMemory(void  *address);    
  • 24. DEMO   24   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 25. SUMMARY   !  Aparapi  is  already  an  establish  framework  for  simplifying  execu[on  of  Java  on  GPU  devices   !  HSA  enabled  Aparapi  further  simplifies  GPU  accelera[on  of  Java  applica[ons   ‒  Aligns  with  Java  8  features  to  support  ‘lambda’  expression  for  compactness   ‒  Enables  ‘large  unified’  system  memory  for  GPU  accelera[on   ‒  Eases  programming  by  enabling  direct  access  to  Java  objects  on  heap   ‒  Enables  fast  offload  of  Java  kernels  through  User-­‐mode  queue  and  AQL   !  HSA  enabled  Aparapi  lends  to  more  interes[ng  future  possibili[es   ‒  Simplified  communica[on  and  workload  balancing  across  both  CPU  and  GPU   ‒  Exploit  new  computa[on  paVerns  and  recursions  through  kernel  self-­‐enqueue     25   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 26. QUESTIONS  &  ANSWERS?   26   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|  
  • 27. DISCLAIMER  &  ATTRIBUTION   The  informa[on  presented  in  this  document  is  for  informa[onal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informa[on  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  soƒware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obliga[on  to  update  or  otherwise  correct  or  revise  this  informa[on.  However,  AMD   reserves  the  right  to  revise  this  informa[on  and  to  make  changes  from  [me  to  [me  to  the  content  hereof  without  obliga[on  of  AMD  to  no[fy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combina[ons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdic[ons.  OpenCL  is  a  trademark  of  Apple  Inc.    HSA  is  a  trademark  of  the  Heterogeneous  System  Architecture   Founda[on.  Other  names  are  for  informa[onal  purposes  only  and  may  be  trademarks  of  their  respec[ve  owners.   27   |      HSA  ENABLEMENT    OF  APARAPI      |  NOVEMBER  2013|