SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
SYSTAP LLC | © 2006-2015 All Rights Reserved
MLConf @ NYC: March 27, 2015
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph Databases Grew at Over 500% in the Last Two Years
Popularity changes per category – March 2015
PopularityChanges
Graph Databases
SYSTAP LLC | © 2006-2015 All Rights Reserved
The Amount of Graph Data is Exploding
Billion+ Edges
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs are different. You need the
right paradigm and hardware to scale.
https://
datatake.files.wordpress.com/
Graph Cache Thrash
The CPU just waits for
graph data from main
memory...
TypeofCacheorMemory
Access Latency Per Clock Cycle
SYSTAP LLC | © 2006-2015 All Rights Reserved
SYSTAP Solves the Graph Scaling Problem Using GPUs
●  100s of Times Faster than CPU
main memory-based systems
●  Up to 40X Cheaper
●  10,000X Faster than disk-based
technologies
●  High Level API
SYSTAP LLC | © 2006-2015 All Rights Reserved
Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of
Biomedical Informatics , Volume 52 , 394 - 405
Finding the Next Cure for Cancer is a
Billion+ Edge Graph Challenge
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graph is BIG and changing

(Trillion+ Edges)
SYSTAP LLC | © 2006-2015 All Rights Reserved
Graphs Enable People to Find Knowledge
A Bunch of Pages An Answer
http://BlazeGraph.com
http://MapGraph.io
Parallel&Breadth&First&Search&on&
GPU&Clusters&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
1
3/15/15
This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002. This material is
based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029.
The authors would like to thank Dr. White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this
work.
Z. Fu, H.K. Dasari, B. Bebee, M. Berzins, B. Thompson. Parallel Breadth First Search on GPU Clusters. IEEE Big Data.
Bethesda, MD. 2014.
h6p://mapgraph.io&
http://BlazeGraph.com
http://MapGraph.io
Many?Core&is&Your&Future&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
2
3/10/15
http://BlazeGraph.com
http://MapGraph.io
MapGraph:&Extreme&performance&
•  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per&Second.&
–  This&is&the&basic&measure&of&performance&for&graph&traversal.&
Configura)on* Cost* GTEPS* $/GTEPS*
4?Core&CPU& $4,000& 0.2& $5,333&
45Core*CPU*+**K20*GPU* $7,000* 3.0* $2,333*
XMT?2&(rumored&price)& $1,800,000& 10.0& $188,000&
64*GPUs*(32*nodes*with*2x*K20*GPUs*per*
node*and*InfiniBand*DDRx4*–*today)*
$500,000* 30.0* $16,666*
16*GPUs*(2*nodes*with*8x*Pascal*GPUs*per*
node*and*InfiniBand*DDRx4*–*Q1,*2016)*
$125,000* >30.0* <$4,166*
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
3
3/10/15
http://BlazeGraph.com
http://MapGraph.io
SYSTAP’s&MapGraph&APIs&Roadmap&
•  Vertex&Centric&API&
–  Same&performance&as&CUDA.&
•  Schema?flexible&data?model&
–  Property&Graph&/&RDF&
–  Reduce&import&hassles&
–  Opera_ons&over&merged&graphs&
•  Graph&pa6ern&matching&language&
•  DSL/Scala&=>&CUDA&code&genera_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
Ease&of&Use&
4
3/10/15
http://BlazeGraph.com
http://MapGraph.io
Breadth&First&Search&
•  Fundamental&building&block&for&graph&algorithms&
–  Including&SPARQL&(JOINs)&
•  In&level&synchronous&steps&
–  Label&visited&verDces&
•  For&this&graph&
–  IteraDon&0&
–  IteraDon&1&
–  IteraDon&2&
•  Hard&problem!&
–  Basis&for&Graph&500&benchmark&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
10
3/10/15
0&
1&
1&
2&
1&
1&
2&
2&
2&
2&
1&
3&
2&
3&
2&
2& 3&
2&
http://BlazeGraph.com
http://MapGraph.io
GPUs&–&A&Game&Changer&for&Graph&AnalyDcs&
0
500
1000
1500
2000
2500
3000
3500
1 10 100 1000 10000 100000
MillionTraversedEdgesperSecond
Average Traversal Depth
NVIDIA Tesla C2050
Multicore per socket
Sequential
Breadth-First Search on Graphs
10x Speedup on GPUs
3 GTEPS
•  Graphs&are&a&hard&problem&
•  Non?locality&
•  Data&dependent&parallelism&
•  Memory&bus&and&
communicaDon&bo6lenecks&
•  GPUs&deliver&effecDve&
parallelism&
•  10x+&memory&bandwidth&
•  Dynamic&parallelism&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
11
3/10/15
Graphic: Merrill, Garland, and Grimshaw, “GPU Sparse Graph
Traversal”, GPU Technology Conference, 2012
http://BlazeGraph.com
http://MapGraph.io
2D&Par__oning&(aka&Vertex&Cuts)&
•  p&x&p&compute&grid&
–  Edges&in&rows/cols&
–  Minimize&messages&
•  log(p)&(versus&p2)&
–  One&par__on&per&GPU&
•  Batch&parallel&opera_on&
–  Grid&row:&out?edges&
–  Grid&column:&in?edges&
•  Representa_ve&fron_ers&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
5
3/10/15
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
out-edges
in-edges
BFS& PR&Query&
•  Parallelism&–&work&must&be&distributed&
and&balanced.&
•  Memory&bandwidth&–&memory,&not&disk,&
is&the&bo6leneck&
http://BlazeGraph.com
http://MapGraph.io
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
0 1 2 3 4 5 6
FrontierSize
Iteration
Scale&25&Traversal&
•  Work&spans&mul_ple&orders&of&magnitude.&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
6
3/10/15
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
7
3/10/15
1: procedure BFS(Root, Predecessor)
2: In0
ij LocalVertex(Root)
3: for t 0 do
4: Expand(Int
i, Outt
ij)
5: LocalFrontiert Count(Outt
ij)
6: GlobalFrontiert Reduce(LocalFrontiert)
7: if GlobalFrontiert > 0 then
8: Contract(Outt
ij, Outt
j, Int+1
i , Assignij)
9: UpdateLevels(Outt
j, t, level)
10: else
11: UpdatePreds(Assignij,Predsij, level)
12: break
13: end if
14: t++
15: end for
16: end procedure
Figure 3. Distributed Breadth First Search algorithm
1: procedure UPDATEPRED
2: for all v 2 Assignedij
3: Pred[v] 1
4: for i ColOff[v]
5: if level[v] ==
6: Pred[v]
7: end if
8: end for
9: end for
10: end procedure
Figure 7. Pre
for finding the Prefixt
ij, whic
performing Bitwise-OR Exclu
across each row j as shown
are several tree-based algori
Since the bitmap union opera
! Data parallel 1-hop expand on all GPUs
! Termination check
! Local frontier size
! Global frontier size
! Compute predecessors from local
In / Out levels (no communication)! Done
! Starting vertex
! Global Frontier contraction (“wave”)
! Record level of vertices in the In /
Out frontier
! Next iteration
http://BlazeGraph.com
http://MapGraph.io
Distributed&BFS&Algorithm&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
8
3/10/15
1: procedure BFS(Root, Predecessor)
2: In0
ij LocalVertex(Root)
3: for t 0 do
4: Expand(Int
i, Outt
ij)
5: LocalFrontiert Count(Outt
ij)
6: GlobalFrontiert Reduce(LocalFrontiert)
7: if GlobalFrontiert > 0 then
8: Contract(Outt
ij, Outt
j, Int+1
i , Assignij)
9: UpdateLevels(Outt
j, t, level)
10: else
11: UpdatePreds(Assignij,Predsij, level)
12: break
13: end if
14: t++
15: end for
16: end procedure
Figure 3. Distributed Breadth First Search algorithm
1: procedure UPDATEPRED
2: for all v 2 Assignedij
3: Pred[v] 1
4: for i ColOff[v]
5: if level[v] ==
6: Pred[v]
7: end if
8: end for
9: end for
10: end procedure
Figure 7. Pre
for finding the Prefixt
ij, whic
performing Bitwise-OR Exclu
across each row j as shown
are several tree-based algori
Since the bitmap union opera
•  Key&differences&
•  log(p)&parallel&scan&(vs&sequen_al&wave)&
•  GPU?local&computa_on&of&predecessors&
•  1&par__on&per&GPU&
•  GPUDirect&(vs&RDMA)&
! Data parallel 1-hop expand on all GPUs
! Termination check
! Local frontier size
! Global frontier size
! Compute predecessors from local
In / Out levels (no communication)! Done
! Starting vertex
! Global Frontier contraction (“wave”)
! Record level of vertices in the In /
Out frontier
! Next iteration
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
9
3/10/15
10: Outij convert(Lout)
11: end procedure
Figure 4. Expand algorithm
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
Figure 5. Global Frontier contraction and communication algorithm
the row. This is done to update t
vertices so that they do not prod
those vertices in the future iteratio
updating the levels of the vertice
Int+1
j . Since both Outt
j and Int+1
i
in the diagonal (i = j), in line 9,
the diagonal GPUs and then broa
Ci using MPI_Broadcast in line
setup along every row and colum
facilitate these parallel scan and b
During each BFS iteration, we
of the algorithm by checking if
zero as in line 6 of Figure 3. We
frontier using a MPI_Allreduce op
edge expansion and before the g
phase. However, an unexplored o
the termination check with the g
and communication phase in orde
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
10
3/10/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
!&Distributed&prefix&sum.&Prefix&is&ver_ces&
discovered&by&your&lel&neighbors&(log&p)&
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
11
3/10/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
!*Distributed*prefix*sum.*Prefix*is*ver)ces*
discovered*by*your*leZ*neighbors*(log*p)*
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
12
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
!*Distributed*prefix*sum.*Prefix*is*ver)ces*
discovered*by*your*leZ*neighbors*(log*p)*
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
13
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
•  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).&
•  This&improves&the&overall&scaling&efficiency&by&30%.&
! Vertices first discovered by this GPU.
! Right most column has global frontier
for the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
!&Distributed&prefix&sum.&Prefix&is&ver_ces&
discovered&by&your&lel&neighbors&(log&p)&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
14
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
http://BlazeGraph.com
http://MapGraph.io
Global&Fron_er&Contrac_on&and&Communica_on&
SYSTAP™, LLC
© 2006-2014 All Rights Reserved
15
3/15/15
1: procedure CONTRACT(Outt
ij, Outt
j, Int+1
i , Assignij)
2: Prefixt
ij ExclusiveScanj(Outt
ij)
3: Assignedij Assignedij [ (Outt
ij Prefixt
ij)
4: if i = p then
5: Outt
j Outt
ij [ Prefixt
ij
6: end if
7: Broadcast(Outt
j, p , ROW)
8: if i = j then
9: Int+1
i Outt
j
10: end if
11: Broadcast(Int+1
i , i, COL)
12: end procedure
! Distributed prefix sum. Prefix is vertices
discovered by your left neighbors (log p)
! Vertices first discovered by this GPU.
! Right most column has global frontier for
the row
! Broadcast frontier over row.
! Broadcast frontier over column.
G11
G22
G33
G44
C1
C2
C3
C4
R1
In2
t
Out1
t
Out2
t
Out3
t
Out4
t
In1
t
In3
t
In4
t
R2
R3
R4
Source&Ver_ces&
Target&Ver_ces&
•  All&GPUs&now&have&the&new&fron_er.&
http://BlazeGraph.com
http://MapGraph.io
Weak&Scaling&
•  Scaling&the&problem&size&with&
more&GPUs&
–  Fixed&problem&size&per&GPU.&
•  Maximum&scale&27&(4.3B&edges)&
–  64&K20&GPUs&=>&.147s&=>&29&GTEPS&
–  64&K40&GPUs&=>&.135s&=>&32&GTEPS&
•  K40&has&faster&memory&bus.&
0
5
10
15
20
25
30
35
1 4 16 64
GTEPS
#GPUs
GPUs& Scale& Ver)ces& Edges& Time*(s)& GTEPS&
1& 21& 2,097,152& 67,108,864& 0.0254* 2.5&
4& 23& 8,388,608& 268,435,456& 0.0429* 6.3&
16& 25& 33,554,432& 1,073,741,824& 0.0715* 15.0&
64& 27& 134,217,728& 4,294,967,296& 0.1478* 29.1&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
16
3/10/15
Scaling the problem size with
more GPUs.
http://BlazeGraph.com
http://MapGraph.io
Central&Itera_on&Costs&(Weak&Scaling)&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
17
3/10/15
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
0.0120
4 16 64
Time(Seconds)
#GPUs
Computation
Communication
Termination
•  Communica_on&
costs&are&not&
constant.&
•  2D&design&implies&
cost&grows&as&
2*log(2p)/log(p)*
•  How&to&scale?&
–  Overlapping&
–  Compression&
•  Graph&
•  Message&
–  Hybrid&par__oning&
–  Heterogeneous&
compu_ng&
http://BlazeGraph.com
http://MapGraph.io
Costs&During&BFS&Traversal&
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
18
3/10/15
•  Chart&shows&the&different&costs&for&each&GPU&in&each&itera_on&(64&GPUs).&
•  Wave&_me&is&essen_ally&constant,&as&expected.&
•  Compute&_me&peaks&during&the&central&itera_ons.&
•  Costs&are&reasonably&well&balanced&across&all&GPUs&aler&the&2nd&itera_on.&
http://BlazeGraph.com
http://MapGraph.io
Strong&Scaling&
•  Speedup&on&a&constant&problem&size&with&more&GPUs&
•  Problem&scale&25&
–  2^25&ver_ces&(33,554,432)&
–  2^26&directed&edges&(1,073,741,824)&
•  Strong&scaling&efficiency&of&48%&
–  Versus&44%&for&BG/Q&
GPUs* GTEPS* Time*(s)*
16* 15.2* 0.071*
25* 18.2* 0.059*
36* 20.5* 0.053*
49& 21.8& 0.049*
64* 22.7* 0.047*
SYSTAP™, LLC
© 2006-2015 All Rights Reserved
19
3/10/15
Speedup on a constant
problem size with more GPUs.
http://BlazeGraph.com
http://MapGraph.io
MapGraph&Beta&Customer?&
&
Contact&
Bradley&Bebee&
beebs@systap.com&
20
3/10/15
SYSTAP™, LLC
© 2006-2015 All Rights Reserved

Más contenido relacionado

La actualidad más candente

Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06Pankaj Debbarma
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceBen Ball
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...Preferred Networks
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Ganesan Narayanasamy
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Shinya Takamaeda-Y
 
Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowMarina Kolpakova
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5dm_work
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 

La actualidad más candente (19)

Parallel computation
Parallel computationParallel computation
Parallel computation
 
Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06Computer Graphics & Visualization - 06
Computer Graphics & Visualization - 06
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
 
Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Graph500
Graph500Graph500
Graph500
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 

Similar a Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationBrenno Menezes
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...AMD Developer Central
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013Daniele Loiacono
 
GPU Accelerated Machine Learning
GPU Accelerated Machine LearningGPU Accelerated Machine Learning
GPU Accelerated Machine LearningSri Ambati
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023Yutaka Kawai
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)Yu Liu
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGAIRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGAIRJET Journal
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioFred Moyer
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
UC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonUC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonH Eddie Newton
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 

Similar a Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC (20)

6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
 
GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013GPUs for GEC Competition @ GECCO-2013
GPUs for GEC Competition @ GECCO-2013
 
GPU Accelerated Machine Learning
GPU Accelerated Machine LearningGPU Accelerated Machine Learning
GPU Accelerated Machine Learning
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGAIRJET-  	  To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
IRJET- To Design 16 bit Synchronous Microprocessor using VHDL on FPGA
 
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and IstioComprehensive Container Based Service Monitoring with Kubernetes and Istio
Comprehensive Container Based Service Monitoring with Kubernetes and Istio
 
I017425763
I017425763I017425763
I017425763
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
UC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_JohnstonUC2010_BRS1280_Eastman_Chemical_Johnston
UC2010_BRS1280_Eastman_Chemical_Johnston
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Último (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC

  • 1. SYSTAP LLC | © 2006-2015 All Rights Reserved MLConf @ NYC: March 27, 2015
  • 2. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph Databases Grew at Over 500% in the Last Two Years Popularity changes per category – March 2015 PopularityChanges Graph Databases
  • 3. SYSTAP LLC | © 2006-2015 All Rights Reserved The Amount of Graph Data is Exploding Billion+ Edges
  • 4. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs are different. You need the right paradigm and hardware to scale. https:// datatake.files.wordpress.com/ Graph Cache Thrash The CPU just waits for graph data from main memory... TypeofCacheorMemory Access Latency Per Clock Cycle
  • 5. SYSTAP LLC | © 2006-2015 All Rights Reserved SYSTAP Solves the Graph Scaling Problem Using GPUs ●  100s of Times Faster than CPU main memory-based systems ●  Up to 40X Cheaper ●  10,000X Faster than disk-based technologies ●  High Level API
  • 6. SYSTAP LLC | © 2006-2015 All Rights Reserved Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405 Finding the Next Cure for Cancer is a Billion+ Edge Graph Challenge
  • 7. SYSTAP LLC | © 2006-2015 All Rights Reserved Graph is BIG and changing
 (Trillion+ Edges)
  • 8. SYSTAP LLC | © 2006-2015 All Rights Reserved Graphs Enable People to Find Knowledge A Bunch of Pages An Answer
  • 9. http://BlazeGraph.com http://MapGraph.io Parallel&Breadth&First&Search&on& GPU&Clusters& SYSTAP™, LLC © 2006-2015 All Rights Reserved 1 3/15/15 This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C-0002. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029. The authors would like to thank Dr. White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this work. Z. Fu, H.K. Dasari, B. Bebee, M. Berzins, B. Thompson. Parallel Breadth First Search on GPU Clusters. IEEE Big Data. Bethesda, MD. 2014. h6p://mapgraph.io&
  • 11. http://BlazeGraph.com http://MapGraph.io MapGraph:&Extreme&performance& •  GTEPS&is&Billions&(10^9)&of&Traversed&Edges&per&Second.& –  This&is&the&basic&measure&of&performance&for&graph&traversal.& Configura)on* Cost* GTEPS* $/GTEPS* 4?Core&CPU& $4,000& 0.2& $5,333& 45Core*CPU*+**K20*GPU* $7,000* 3.0* $2,333* XMT?2&(rumored&price)& $1,800,000& 10.0& $188,000& 64*GPUs*(32*nodes*with*2x*K20*GPUs*per* node*and*InfiniBand*DDRx4*–*today)* $500,000* 30.0* $16,666* 16*GPUs*(2*nodes*with*8x*Pascal*GPUs*per* node*and*InfiniBand*DDRx4*–*Q1,*2016)* $125,000* >30.0* <$4,166* SYSTAP™, LLC © 2006-2015 All Rights Reserved 3 3/10/15
  • 12. http://BlazeGraph.com http://MapGraph.io SYSTAP’s&MapGraph&APIs&Roadmap& •  Vertex&Centric&API& –  Same&performance&as&CUDA.& •  Schema?flexible&data?model& –  Property&Graph&/&RDF& –  Reduce&import&hassles& –  Opera_ons&over&merged&graphs& •  Graph&pa6ern&matching&language& •  DSL/Scala&=>&CUDA&code&genera_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved Ease&of&Use& 4 3/10/15
  • 13. http://BlazeGraph.com http://MapGraph.io Breadth&First&Search& •  Fundamental&building&block&for&graph&algorithms& –  Including&SPARQL&(JOINs)& •  In&level&synchronous&steps& –  Label&visited&verDces& •  For&this&graph& –  IteraDon&0& –  IteraDon&1& –  IteraDon&2& •  Hard&problem!& –  Basis&for&Graph&500&benchmark& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 0& 1& 1& 2& 1& 1& 2& 2& 2& 2& 1& 3& 2& 3& 2& 2& 3& 2&
  • 14. http://BlazeGraph.com http://MapGraph.io GPUs&–&A&Game&Changer&for&Graph&AnalyDcs& 0 500 1000 1500 2000 2500 3000 3500 1 10 100 1000 10000 100000 MillionTraversedEdgesperSecond Average Traversal Depth NVIDIA Tesla C2050 Multicore per socket Sequential Breadth-First Search on Graphs 10x Speedup on GPUs 3 GTEPS •  Graphs&are&a&hard&problem& •  Non?locality& •  Data&dependent&parallelism& •  Memory&bus&and& communicaDon&bo6lenecks& •  GPUs&deliver&effecDve& parallelism& •  10x+&memory&bandwidth& •  Dynamic&parallelism& SYSTAP™, LLC © 2006-2015 All Rights Reserved 11 3/10/15 Graphic: Merrill, Garland, and Grimshaw, “GPU Sparse Graph Traversal”, GPU Technology Conference, 2012
  • 15. http://BlazeGraph.com http://MapGraph.io 2D&Par__oning&(aka&Vertex&Cuts)& •  p&x&p&compute&grid& –  Edges&in&rows/cols& –  Minimize&messages& •  log(p)&(versus&p2)& –  One&par__on&per&GPU& •  Batch&parallel&opera_on& –  Grid&row:&out?edges& –  Grid&column:&in?edges& •  Representa_ve&fron_ers& SYSTAP™, LLC © 2006-2015 All Rights Reserved 5 3/10/15 G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& out-edges in-edges BFS& PR&Query& •  Parallelism&–&work&must&be&distributed& and&balanced.& •  Memory&bandwidth&–&memory,&not&disk,& is&the&bo6leneck&
  • 16. http://BlazeGraph.com http://MapGraph.io 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 0 1 2 3 4 5 6 FrontierSize Iteration Scale&25&Traversal& •  Work&spans&mul_ple&orders&of&magnitude.& SYSTAP™, LLC © 2006-2015 All Rights Reserved 6 3/10/15
  • 17. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 7 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  • 18. http://BlazeGraph.com http://MapGraph.io Distributed&BFS&Algorithm& SYSTAP™, LLC © 2006-2015 All Rights Reserved 8 3/10/15 1: procedure BFS(Root, Predecessor) 2: In0 ij LocalVertex(Root) 3: for t 0 do 4: Expand(Int i, Outt ij) 5: LocalFrontiert Count(Outt ij) 6: GlobalFrontiert Reduce(LocalFrontiert) 7: if GlobalFrontiert > 0 then 8: Contract(Outt ij, Outt j, Int+1 i , Assignij) 9: UpdateLevels(Outt j, t, level) 10: else 11: UpdatePreds(Assignij,Predsij, level) 12: break 13: end if 14: t++ 15: end for 16: end procedure Figure 3. Distributed Breadth First Search algorithm 1: procedure UPDATEPRED 2: for all v 2 Assignedij 3: Pred[v] 1 4: for i ColOff[v] 5: if level[v] == 6: Pred[v] 7: end if 8: end for 9: end for 10: end procedure Figure 7. Pre for finding the Prefixt ij, whic performing Bitwise-OR Exclu across each row j as shown are several tree-based algori Since the bitmap union opera •  Key&differences& •  log(p)&parallel&scan&(vs&sequen_al&wave)& •  GPU?local&computa_on&of&predecessors& •  1&par__on&per&GPU& •  GPUDirect&(vs&RDMA)& ! Data parallel 1-hop expand on all GPUs ! Termination check ! Local frontier size ! Global frontier size ! Compute predecessors from local In / Out levels (no communication)! Done ! Starting vertex ! Global Frontier contraction (“wave”) ! Record level of vertices in the In / Out frontier ! Next iteration
  • 19. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 9 3/10/15 10: Outij convert(Lout) 11: end procedure Figure 4. Expand algorithm 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure Figure 5. Global Frontier contraction and communication algorithm the row. This is done to update t vertices so that they do not prod those vertices in the future iteratio updating the levels of the vertice Int+1 j . Since both Outt j and Int+1 i in the diagonal (i = j), in line 9, the diagonal GPUs and then broa Ci using MPI_Broadcast in line setup along every row and colum facilitate these parallel scan and b During each BFS iteration, we of the algorithm by checking if zero as in line 6 of Figure 3. We frontier using a MPI_Allreduce op edge expansion and before the g phase. However, an unexplored o the termination check with the g and communication phase in orde ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column.
  • 20. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2015 All Rights Reserved 10 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 21. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 11 3/10/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)* ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 22. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 12 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !*Distributed*prefix*sum.*Prefix*is*ver)ces* discovered*by*your*leZ*neighbors*(log*p)*
  • 23. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 13 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure •  We&use&a&parallel&scan&that&minimizes&communica_on&steps&(vs&work).& •  This&improves&the&overall&scaling&efficiency&by&30%.& ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& !&Distributed&prefix&sum.&Prefix&is&ver_ces& discovered&by&your&lel&neighbors&(log&p)&
  • 24. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 14 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces&
  • 25. http://BlazeGraph.com http://MapGraph.io Global&Fron_er&Contrac_on&and&Communica_on& SYSTAP™, LLC © 2006-2014 All Rights Reserved 15 3/15/15 1: procedure CONTRACT(Outt ij, Outt j, Int+1 i , Assignij) 2: Prefixt ij ExclusiveScanj(Outt ij) 3: Assignedij Assignedij [ (Outt ij Prefixt ij) 4: if i = p then 5: Outt j Outt ij [ Prefixt ij 6: end if 7: Broadcast(Outt j, p , ROW) 8: if i = j then 9: Int+1 i Outt j 10: end if 11: Broadcast(Int+1 i , i, COL) 12: end procedure ! Distributed prefix sum. Prefix is vertices discovered by your left neighbors (log p) ! Vertices first discovered by this GPU. ! Right most column has global frontier for the row ! Broadcast frontier over row. ! Broadcast frontier over column. G11 G22 G33 G44 C1 C2 C3 C4 R1 In2 t Out1 t Out2 t Out3 t Out4 t In1 t In3 t In4 t R2 R3 R4 Source&Ver_ces& Target&Ver_ces& •  All&GPUs&now&have&the&new&fron_er.&
  • 26. http://BlazeGraph.com http://MapGraph.io Weak&Scaling& •  Scaling&the&problem&size&with& more&GPUs& –  Fixed&problem&size&per&GPU.& •  Maximum&scale&27&(4.3B&edges)& –  64&K20&GPUs&=>&.147s&=>&29&GTEPS& –  64&K40&GPUs&=>&.135s&=>&32&GTEPS& •  K40&has&faster&memory&bus.& 0 5 10 15 20 25 30 35 1 4 16 64 GTEPS #GPUs GPUs& Scale& Ver)ces& Edges& Time*(s)& GTEPS& 1& 21& 2,097,152& 67,108,864& 0.0254* 2.5& 4& 23& 8,388,608& 268,435,456& 0.0429* 6.3& 16& 25& 33,554,432& 1,073,741,824& 0.0715* 15.0& 64& 27& 134,217,728& 4,294,967,296& 0.1478* 29.1& SYSTAP™, LLC © 2006-2015 All Rights Reserved 16 3/10/15 Scaling the problem size with more GPUs.
  • 27. http://BlazeGraph.com http://MapGraph.io Central&Itera_on&Costs&(Weak&Scaling)& SYSTAP™, LLC © 2006-2015 All Rights Reserved 17 3/10/15 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 4 16 64 Time(Seconds) #GPUs Computation Communication Termination •  Communica_on& costs&are&not& constant.& •  2D&design&implies& cost&grows&as& 2*log(2p)/log(p)* •  How&to&scale?& –  Overlapping& –  Compression& •  Graph& •  Message& –  Hybrid&par__oning& –  Heterogeneous& compu_ng&
  • 28. http://BlazeGraph.com http://MapGraph.io Costs&During&BFS&Traversal& SYSTAP™, LLC © 2006-2015 All Rights Reserved 18 3/10/15 •  Chart&shows&the&different&costs&for&each&GPU&in&each&itera_on&(64&GPUs).& •  Wave&_me&is&essen_ally&constant,&as&expected.& •  Compute&_me&peaks&during&the&central&itera_ons.& •  Costs&are&reasonably&well&balanced&across&all&GPUs&aler&the&2nd&itera_on.&
  • 29. http://BlazeGraph.com http://MapGraph.io Strong&Scaling& •  Speedup&on&a&constant&problem&size&with&more&GPUs& •  Problem&scale&25& –  2^25&ver_ces&(33,554,432)& –  2^26&directed&edges&(1,073,741,824)& •  Strong&scaling&efficiency&of&48%& –  Versus&44%&for&BG/Q& GPUs* GTEPS* Time*(s)* 16* 15.2* 0.071* 25* 18.2* 0.059* 36* 20.5* 0.053* 49& 21.8& 0.049* 64* 22.7* 0.047* SYSTAP™, LLC © 2006-2015 All Rights Reserved 19 3/10/15 Speedup on a constant problem size with more GPUs.