SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Learning at Scale:
Deep, Distributed and Multi-dimensional
Anima Anandkumar
..
Amazon AI & Caltech
Significantly improve many applications on multiple domains
“deep learning” trend in the past 10 years
image understanding speech recognition natural language
processing
…
Deep Learning
autonomy
Image Classification
Layer 1 Layer 2 Output
multilevel feature extractions from raw pixels
to semantic meanings
explore spatial information with convolution layers
Image Classification
§ Hard to define the network
§ the definition of the inception network has >1k lines of codes in Caffe
§ A single image requires billions floating-point operations
§ Intel i7 ~500 GFLOPS
§ Nvidia Titan X: ~5 TFLOPS
§ Memory consumption is linear with number of layers
State-of-the-art networks have tens to hundreds layers
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
3. MXNet
image credit - wikipedia
• Imperative and Declarative Programming
• Language Support
• Backend and Automatic Parallelization
Writing Parallel Programs is Painful
Each forward-backward-update
involves O(num_layer), which is
often 100—1,000, tensor
computations and communications
data = next_batch()data[gpu0].copyfrom(data[0:50])
_, fc1_wgrad[gpu0] =
FullcBackward(fc1_ograd[gpu0] ,
fc1_weight[gpu0])
fc1_ograd[gpu0], fc2_wgrad[gpu0] =
FullcBackward(fc2_ograd[gpu0] ,
fc2_weight[gpu0])
fc2_ograd[gpu0] = LossGrad(fc2[gpu0],
label[0:50])
fc2[gpu0] = FullcForward(fc1[gpu0],
fc2_weight[gpu0])
fc1[gpu0] = FullcForward(data[gpu0],
fc1_weight[gpu0])
fc2_wgrad[cpu] =
fc2_wgrad[gpu0] + fc2_wgrad[gpu1]
fc2_weight[cpu].copyto(
fc2_weight[gpu0] ,
fc2_weight[gpu1])
fc2_weight[cpu] -=
lr*fc12_wgrad[gpu0]
fc1_weight[cpu] -= lr *
fc1_wgrad[gpu0]
fc1_wgrad[cpu] =
fc1_wgrad[gpu0] + fc1_wgrad[gpu1]
fc1_weight[cpu].copyto(
fc1_weight[gpu0] ,
fc1_weight[gpu1])
data[gpu0].copyfrom(data[51:100])
_, fc1_wgrad[gpu1] =
FullcBackward(fc1_ograd[gpu1] ,
fc1_weight[gpu1])
fc1_ograd[gpu1], fc2_wgrad[gpu1] =
FullcBackward(fc2_ograd[gpu1] ,
fc2_weight[gpu1])
fc2_ograd[gpu1] =
LossGrad(fc2[gpu1], label[51:100])
fc2[gpu1] = FullcForward(fc1[gpu1],
fc2_weight[gpu1])
fc1[gpu1] = FullcForward(data[gpu1],
fc1_weight[gpu1])
Dependency graph for 2-layer neural
networks with 2 GPUs
Auto Parallelization
18
Write serial programs Run in parallel
>>> import mxnet as mx
>>> A = mx.nd.ones((2,2)) *2
>>> C = A + 2
>>> B = A + 1
>>> D = B * C
>>> D.wait_to_read()
A = 2
C = A + 2 B = A + 1
D = B ⨉ C
Data Parallelism
19
key-value store
examples
1. Read a data partition
2. Pull the parameters
3. Compute the gradient
4. Push the gradient
5. Update the parameters
Scale to Multiple GPU Machines
21
PCIe Switch
GPU
GPU
GPU
GPU
CPU
Network Switch
63 GB/s
4 PCIe 3.0 16x
15.75 GB/s
PCIe 3.0 16x
1.25 GB/s
10 Gbit Ethernet
Hierarchical parameter server
Level-1 Servers
Workers
Level-2 Servers
GPUs
CPUs
Experiment Setup
✧
✓ 1.2 million images with 1000 classes
✧ Resnet 152-layer model
✧ EC2 P2.16xlarge
22
GPU 0-15
PCIe switches
CPU
✧ Minibatch SGD
✧ Synchronized Updating
Scalability over Multiple Machines
23
time(sec)/bath
0
0.25
0.5
0.75
1
# of GPUs
0 32 64 96 128
Comm Cost
batch size/GPU=2
batch size/GPU=4
batch size/GPU=8
batch size/GPU=16
115x
8
2012before 2013 2014 2015 2016 2017
mxnet
imperative
symbolic
gluon
Back-end System
✧ Optimization
✓ Memory optimization
✓ Operator fusion
✧ Scheduling
✓ Auto-parallelization
11
a b
1
+
⨉
c
fullc
softmax
weight
bias
Back-end
import mxnet as mx
a = mx.nd.zeros((100, 50))
b = mx.nd.ones((100, 50))
c = a * b
c += 1
import mxnet as mx
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(
data=net, num_hidden=128)
net = mx.symbol.SoftmaxOutput(data=net)
texec = mx.module.Module(net)
texec.forward(data=c)
texec.backward()
Front-end
In summary
✦ Symbolic
❖ efficient & portable
❖ but hard to use
10
✦ tesla
✦ Imperative
❖ flexible
❖ may be slow
✦ Gluon
❖ imperative for developing
❖ symbolic for deploying
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
Tensors: Beyond 2D world
Modern data is inherently multi-dimensional
Tensors: Beyond 2D world
Modern data is inherently multi-dimensional
Input Hidden 1 Hidden 2 Output
Tensor Contraction
Extends the notion of matrix product
Matrix product
Mv =
j
vjMj
= +
Tensor Contraction
T(u, v, ·) =
i,j
uivjTi,j,:
=
++
+
Employing Tensor Contractions in Alexnet
Replace fully connected layer with tensor contraction layer
Enabling Tensor Contraction Layer in Mxnet
Performance	of	the	TCL
• Trained	end-to-end
• On	ImageNet	with	VGG:	
• 65.9%	space	savings
• performance	drop	of	0.6%	only
• On	ImageNet	with	AlexNet:		
• 56.6%	space	savings
• Performance	improvement	of	0.5%
Low-rank	tensor	regression
Tensor	Regression	Networks,		J.	Kossaifi,	Z.C.Lipton,	A.Khanna,	
T.Furlanello and	A.Anandkumar,		ArXiv pre-publication
Performance	and	rank
Speeding up Tensor Contractions
1 Tensor contractions are a core primitive of multilinear algebra.
2 BLAS 3: Unbounded compute intensity (no. of ops per I/O)
Consider single-index contractions: CC = AA BB
=
=
A(:,1,:) A(:,2,:)A422
B21
C421
e.g. Cmnp = Amnk Bkp
Speeding up Tensor Contraction
Explicit permutation dominates,
especially for small tensors.
Consider Cmnp = Akm Bpkn.
1 Akm → Amk
2 Bpkn → Bkpn
3 Cmnp → Cmpn
4 Cm(pn) = Amk Bk(pn)
5 Cmpn → Cmnp
100 200 300 400 500
0
0.2
0.4
0.6
0.8
1
n
(Top) CPU. (Bottom) GPU. The fraction of time
spent in copies/transpositions. Lines are shown with
1, 2, 3, and 6 transpositions.
Existing Primitives
GEMM
Suboptimal for many small matrices.
Pointer-to-Pointer BatchedGEMM
Available in MKL 11.3β and cuBLAS 4.1
C[p] = α op(A[p]) op(B[p]) + β C[p]
cublas<T>gemmBatched(cublasHandle_t handle,
cublasOperation_t transA, cublasOperation_t transB,
int M, int N, int K,
const T* alpha,
const T** A, int ldA,
const T** B, int ldB,
const T* beta,
T** C, int ldC,
int batchCount)
Tensor Contraction with Extended BLAS Primitives
Cmn[p] = AmkBkn[p]
cublasDgemmStridedBatched(handle,
CUBLAS_OP_N, CUBLAS_OP_N,
M, N, K,
&alpha,
A, ldA1, 0,
B, ldB1, ldB2,
&beta,
C, ldC1, ldC2,
P)
Tensor Contraction with Extended BLAS Primitives
Cmnp = A∗∗ × B∗∗∗
Cmnp ≡ C[m + n · ldC1 + p · ldC2]
Case Contraction Kernel1 Kernel2 Case Contraction Kernel1 Kernel2
1.1 AmkBknp Cm(np) = AmkBk(np) Cmn[p] = AmkBkn[p] 4.1 AknBkmp Cmn[p] = Bkm[p]Akn
1.2 AmkBkpn Cmn[p] = AmkBk[p]n Cm[n]p = AmkBkp[n] 4.2 AknBkpm Cmn[p] = Bk[p]mAkn
1.3 AmkBnkp Cmn[p] = AmkBnk[p] 4.3 AknBmkp Cmn[p] = Bmk[p]Akn
1.4 AmkBpkn Cm[n]p = AmkBpk[n] 4.4 AknBpkm
1.5 AmkBnpk Cm(np) = AmkB(np)k Cmn[p] = AmkBn[p]k 4.5 AknBmpk Cmn[p] = Bm[p]kAkn
1.6 AmkBpnk Cm[n]p = AmkBp[n]k 4.6 AknBpmk
2.1 AkmBknp Cm(np) = AkmBk(np) Cmn[p] = AkmBkn[p] 5.1 ApkBkmn C(mn)p = Bk(mn)Apk Cm[n]p = Bkm[n]Apk
2.2 AkmBkpn Cmn[p] = AkmBk[p]n Cm[n]p = AkmBkp[n] 5.2 ApkBknm Cm[n]p = Bk[n]mApk
2.3 AkmBnkp Cmn[p] = AkmBnk[p] 5.3 ApkBmkn Cm[n]p = Bmk[n]Apk
2.4 AkmBpkn Cm[n]p = AkmBpk[n] 5.4 ApkBnkm
2.5 AkmBnpk Cm(np) = AkmB(np)k Cmn[p] = AkmBn[p]k 5.5 ApkBmnk C(mn)p = B(mn)kApk Cm[n]p = Bm[n]kApk
2.6 AkmBpnk Cm[n]p = AkmBp[n]k 5.6 ApkBnmk
3.1 AnkBkmp Cmn[p] = Bkm[p]Ank 6.1 AkpBkmn C(mn)p = Bk(mn)Akp Cm[n]p = Bkm[n]Akp
3.2 AnkBkpm Cmn[p] = Bk[p]mAnk 6.2 AkpBknm Cm[n]p = Bk[n]mAkp
3.3 AnkBmkp Cmn[p] = Bmk[p]Ank 6.3 AkpBmkn Cm[n]p = Bmk[n]Akp
3.4 AnkBpkm 6.4 AkpBnkm
3.5 AnkBmpk Cmn[p] = Bm[p]kAnk 6.5 AkpBmnk C(mn)p = B(mn)kAkp Cm[n]p = Bm[n]kAkp
3.6 AnkBpmk 6.6 AkpBnmk
A new primitive: StridedBatchedGEMM
Performance on par with pure GEMM (P100 and beyond).
Applications: Tucker Decomposition
Tmnp = GijkAmiBnjCpk
mnp ijk
mi
njT G
A
B
pkC Main steps in the algorithm
Ymjk = TmnpBt
njCt
pk
Yink = TmnpAt+1
mi Ct
pk
Yijp = TmnpBt+1
nj At+1
mi
Performance on Tucker decomposition:
20 40 60 80 100 120
10−2
100
102
104
106
n
Time(sec)
TensorToolbox
BTAS
Cyclops
CPU Batched
GPU Batched
Tensor Sketches
Randomized dimensionality reduction
through sketching.
◮ Complexity independent of tensor order:
exponential gain!
+1
+1
-1
Tensor T
Sketch s
Applications
Tensor Decomposition via Sketching
Visual Question and Answering
CNN
RNN
What is the
mustach made of?
C
W
H
MCT
L
Avgpooling
FC
Relu
BatchNorm
FC
"Banana"
Softmax
MCT in Visual Question & Answering
CNN
RNN
‡ ¡t is the
musta™  ¢¡£¤ ¥¦§
C
¨
r
w©
v
ev

FC
Relu
f
!
m
FC
4#$%$na4
Softma
x
Multimodal Tensor Pooling
C W
H
L
Text feature
Image feature d1
d2
d3Spatial sketch
Count sketch
3D FFT
1D FFT
3D IFFT
(optional)
d4
d1
d2
d3
Tensor Decompositions
Extracting Topics from Documents
Topics
Topic Proportion
police
witness
campus
police
witness
campus
police
witness
campus
police
witness
crime
Sports
Educaon
campus
A., D. P. Foster, D. Hsu, S.M. Kakade, Y.K. Liu.“Two SVDs Suffice: Spectral decompositions
for probabilistic topic modeling and latent Dirichlet allocation,” NIPS 2012.
Tensor Methods for Topic Modeling
campus
police
witness
Topic-word matrix P[word = i|topic = j]
Linearly independent columns
Moment Tensor: Co-occurrence of Word Triplets
= + +
campus
police
witness
crim
e
Sports
Educa
on
campus
police
witness
cam
pus
police
witness
Tensors vs. Variational Inference
Criterion: Perplexity = exp[−likelihood].
Learning Topics from PubMed on Spark, 8mil articles
0
2
4
6
8
10
×104
RunningTime
103
104
105
Perplexity
Tensor
Variational
Learning network communities from social network data
Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6.
102
10
3
10
4
105
10
6
RunningTime
FB YP DBLPsub DBLP 10-2
10-1
10
0
101
Error
FB YP DBLPsub DBLP
F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
Tensors vs. Variational Inference
Criterion: Perplexity = exp[−likelihood].
Learning Topics from PubMed on Spark, 8mil articles
0
2
4
6
8
10
×104
RunningTime
103
104
105
Perplexity
Tensor
Variational
Learning network communities from social network data
Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6.
102
10
3
10
4
105
10
6
RunningTime
FB YP DBLPsub DBLP 10-2
10-1
10
0
101
Error
FB YP DBLPsub DBLP
Orders of Magnitude Faster 
More Accurate
F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
Conclusion
Distributed Deep Learning at Scale
Mxnet has many attractive features
◮ Flexible programming
◮ Portable
◮ Highly efficient
Easy to deploy large-scale DL on AWS cloud
◮ Deep Learning AMI
◮ Cloud formation templates
Tensors are the future of ML
Tensor contractions: space savings in deep architectures.
New primitives speed up tensor contractions: extended BLAS
=
++
+
T
u
v
= + ....

Más contenido relacionado

La actualidad más candente

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachSpark Summit
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetAmazon Web Services
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...Dong Heon Cho
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 

La actualidad más candente (20)

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNet
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
[ppt]
[ppt][ppt]
[ppt]
 

Destacado

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017MLconf
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017MLconf
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf
 
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15MLconf
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCMLconf
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music IndustryData Science London
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013MLconf
 
Lukas Biewald, MLconf
Lukas Biewald, MLconf Lukas Biewald, MLconf
Lukas Biewald, MLconf MLconf
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf
 

Destacado (20)

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAI
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
 
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music Industry
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013
 
Lukas Biewald, MLconf
Lukas Biewald, MLconf Lukas Biewald, MLconf
Lukas Biewald, MLconf
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
 

Similar a Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Codemotion
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGenevsachde
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural NetworksShinya Takamaeda-Y
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Provectus
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1NVIDIA
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureIJMER
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 

Similar a Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017 (20)

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
An35225228
An35225228An35225228
An35225228
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT Architecture
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017

  • 1. Learning at Scale: Deep, Distributed and Multi-dimensional Anima Anandkumar .. Amazon AI & Caltech
  • 2. Significantly improve many applications on multiple domains “deep learning” trend in the past 10 years image understanding speech recognition natural language processing … Deep Learning autonomy
  • 3. Image Classification Layer 1 Layer 2 Output multilevel feature extractions from raw pixels to semantic meanings explore spatial information with convolution layers
  • 4. Image Classification § Hard to define the network § the definition of the inception network has >1k lines of codes in Caffe § A single image requires billions floating-point operations § Intel i7 ~500 GFLOPS § Nvidia Titan X: ~5 TFLOPS § Memory consumption is linear with number of layers State-of-the-art networks have tens to hundreds layers
  • 5. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 6. 3. MXNet image credit - wikipedia • Imperative and Declarative Programming • Language Support • Backend and Automatic Parallelization
  • 7. Writing Parallel Programs is Painful Each forward-backward-update involves O(num_layer), which is often 100—1,000, tensor computations and communications data = next_batch()data[gpu0].copyfrom(data[0:50]) _, fc1_wgrad[gpu0] = FullcBackward(fc1_ograd[gpu0] , fc1_weight[gpu0]) fc1_ograd[gpu0], fc2_wgrad[gpu0] = FullcBackward(fc2_ograd[gpu0] , fc2_weight[gpu0]) fc2_ograd[gpu0] = LossGrad(fc2[gpu0], label[0:50]) fc2[gpu0] = FullcForward(fc1[gpu0], fc2_weight[gpu0]) fc1[gpu0] = FullcForward(data[gpu0], fc1_weight[gpu0]) fc2_wgrad[cpu] = fc2_wgrad[gpu0] + fc2_wgrad[gpu1] fc2_weight[cpu].copyto( fc2_weight[gpu0] , fc2_weight[gpu1]) fc2_weight[cpu] -= lr*fc12_wgrad[gpu0] fc1_weight[cpu] -= lr * fc1_wgrad[gpu0] fc1_wgrad[cpu] = fc1_wgrad[gpu0] + fc1_wgrad[gpu1] fc1_weight[cpu].copyto( fc1_weight[gpu0] , fc1_weight[gpu1]) data[gpu0].copyfrom(data[51:100]) _, fc1_wgrad[gpu1] = FullcBackward(fc1_ograd[gpu1] , fc1_weight[gpu1]) fc1_ograd[gpu1], fc2_wgrad[gpu1] = FullcBackward(fc2_ograd[gpu1] , fc2_weight[gpu1]) fc2_ograd[gpu1] = LossGrad(fc2[gpu1], label[51:100]) fc2[gpu1] = FullcForward(fc1[gpu1], fc2_weight[gpu1]) fc1[gpu1] = FullcForward(data[gpu1], fc1_weight[gpu1]) Dependency graph for 2-layer neural networks with 2 GPUs
  • 8. Auto Parallelization 18 Write serial programs Run in parallel >>> import mxnet as mx >>> A = mx.nd.ones((2,2)) *2 >>> C = A + 2 >>> B = A + 1 >>> D = B * C >>> D.wait_to_read() A = 2 C = A + 2 B = A + 1 D = B ⨉ C
  • 9. Data Parallelism 19 key-value store examples 1. Read a data partition 2. Pull the parameters 3. Compute the gradient 4. Push the gradient 5. Update the parameters
  • 10. Scale to Multiple GPU Machines 21 PCIe Switch GPU GPU GPU GPU CPU Network Switch 63 GB/s 4 PCIe 3.0 16x 15.75 GB/s PCIe 3.0 16x 1.25 GB/s 10 Gbit Ethernet Hierarchical parameter server Level-1 Servers Workers Level-2 Servers GPUs CPUs
  • 11. Experiment Setup ✧ ✓ 1.2 million images with 1000 classes ✧ Resnet 152-layer model ✧ EC2 P2.16xlarge 22 GPU 0-15 PCIe switches CPU ✧ Minibatch SGD ✧ Synchronized Updating
  • 12. Scalability over Multiple Machines 23 time(sec)/bath 0 0.25 0.5 0.75 1 # of GPUs 0 32 64 96 128 Comm Cost batch size/GPU=2 batch size/GPU=4 batch size/GPU=8 batch size/GPU=16 115x
  • 13. 8 2012before 2013 2014 2015 2016 2017 mxnet imperative symbolic gluon
  • 14. Back-end System ✧ Optimization ✓ Memory optimization ✓ Operator fusion ✧ Scheduling ✓ Auto-parallelization 11 a b 1 + ⨉ c fullc softmax weight bias Back-end import mxnet as mx a = mx.nd.zeros((100, 50)) b = mx.nd.ones((100, 50)) c = a * b c += 1 import mxnet as mx net = mx.symbol.Variable('data') net = mx.symbol.FullyConnected( data=net, num_hidden=128) net = mx.symbol.SoftmaxOutput(data=net) texec = mx.module.Module(net) texec.forward(data=c) texec.backward() Front-end
  • 15. In summary ✦ Symbolic ❖ efficient & portable ❖ but hard to use 10 ✦ tesla ✦ Imperative ❖ flexible ❖ may be slow ✦ Gluon ❖ imperative for developing ❖ symbolic for deploying
  • 16. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 17. Tensors: Beyond 2D world Modern data is inherently multi-dimensional
  • 18. Tensors: Beyond 2D world Modern data is inherently multi-dimensional Input Hidden 1 Hidden 2 Output
  • 19. Tensor Contraction Extends the notion of matrix product Matrix product Mv = j vjMj = + Tensor Contraction T(u, v, ·) = i,j uivjTi,j,: = ++ +
  • 20. Employing Tensor Contractions in Alexnet Replace fully connected layer with tensor contraction layer
  • 21. Enabling Tensor Contraction Layer in Mxnet
  • 22. Performance of the TCL • Trained end-to-end • On ImageNet with VGG: • 65.9% space savings • performance drop of 0.6% only • On ImageNet with AlexNet: • 56.6% space savings • Performance improvement of 0.5%
  • 25. Speeding up Tensor Contractions 1 Tensor contractions are a core primitive of multilinear algebra. 2 BLAS 3: Unbounded compute intensity (no. of ops per I/O) Consider single-index contractions: CC = AA BB = = A(:,1,:) A(:,2,:)A422 B21 C421 e.g. Cmnp = Amnk Bkp
  • 26. Speeding up Tensor Contraction Explicit permutation dominates, especially for small tensors. Consider Cmnp = Akm Bpkn. 1 Akm → Amk 2 Bpkn → Bkpn 3 Cmnp → Cmpn 4 Cm(pn) = Amk Bk(pn) 5 Cmpn → Cmnp 100 200 300 400 500 0 0.2 0.4 0.6 0.8 1 n (Top) CPU. (Bottom) GPU. The fraction of time spent in copies/transpositions. Lines are shown with 1, 2, 3, and 6 transpositions.
  • 27. Existing Primitives GEMM Suboptimal for many small matrices. Pointer-to-Pointer BatchedGEMM Available in MKL 11.3β and cuBLAS 4.1 C[p] = α op(A[p]) op(B[p]) + β C[p] cublas<T>gemmBatched(cublasHandle_t handle, cublasOperation_t transA, cublasOperation_t transB, int M, int N, int K, const T* alpha, const T** A, int ldA, const T** B, int ldB, const T* beta, T** C, int ldC, int batchCount)
  • 28. Tensor Contraction with Extended BLAS Primitives Cmn[p] = AmkBkn[p] cublasDgemmStridedBatched(handle, CUBLAS_OP_N, CUBLAS_OP_N, M, N, K, &alpha, A, ldA1, 0, B, ldB1, ldB2, &beta, C, ldC1, ldC2, P)
  • 29. Tensor Contraction with Extended BLAS Primitives Cmnp = A∗∗ × B∗∗∗ Cmnp ≡ C[m + n · ldC1 + p · ldC2] Case Contraction Kernel1 Kernel2 Case Contraction Kernel1 Kernel2 1.1 AmkBknp Cm(np) = AmkBk(np) Cmn[p] = AmkBkn[p] 4.1 AknBkmp Cmn[p] = Bkm[p]Akn 1.2 AmkBkpn Cmn[p] = AmkBk[p]n Cm[n]p = AmkBkp[n] 4.2 AknBkpm Cmn[p] = Bk[p]mAkn 1.3 AmkBnkp Cmn[p] = AmkBnk[p] 4.3 AknBmkp Cmn[p] = Bmk[p]Akn 1.4 AmkBpkn Cm[n]p = AmkBpk[n] 4.4 AknBpkm 1.5 AmkBnpk Cm(np) = AmkB(np)k Cmn[p] = AmkBn[p]k 4.5 AknBmpk Cmn[p] = Bm[p]kAkn 1.6 AmkBpnk Cm[n]p = AmkBp[n]k 4.6 AknBpmk 2.1 AkmBknp Cm(np) = AkmBk(np) Cmn[p] = AkmBkn[p] 5.1 ApkBkmn C(mn)p = Bk(mn)Apk Cm[n]p = Bkm[n]Apk 2.2 AkmBkpn Cmn[p] = AkmBk[p]n Cm[n]p = AkmBkp[n] 5.2 ApkBknm Cm[n]p = Bk[n]mApk 2.3 AkmBnkp Cmn[p] = AkmBnk[p] 5.3 ApkBmkn Cm[n]p = Bmk[n]Apk 2.4 AkmBpkn Cm[n]p = AkmBpk[n] 5.4 ApkBnkm 2.5 AkmBnpk Cm(np) = AkmB(np)k Cmn[p] = AkmBn[p]k 5.5 ApkBmnk C(mn)p = B(mn)kApk Cm[n]p = Bm[n]kApk 2.6 AkmBpnk Cm[n]p = AkmBp[n]k 5.6 ApkBnmk 3.1 AnkBkmp Cmn[p] = Bkm[p]Ank 6.1 AkpBkmn C(mn)p = Bk(mn)Akp Cm[n]p = Bkm[n]Akp 3.2 AnkBkpm Cmn[p] = Bk[p]mAnk 6.2 AkpBknm Cm[n]p = Bk[n]mAkp 3.3 AnkBmkp Cmn[p] = Bmk[p]Ank 6.3 AkpBmkn Cm[n]p = Bmk[n]Akp 3.4 AnkBpkm 6.4 AkpBnkm 3.5 AnkBmpk Cmn[p] = Bm[p]kAnk 6.5 AkpBmnk C(mn)p = B(mn)kAkp Cm[n]p = Bm[n]kAkp 3.6 AnkBpmk 6.6 AkpBnmk
  • 30. A new primitive: StridedBatchedGEMM Performance on par with pure GEMM (P100 and beyond).
  • 31. Applications: Tucker Decomposition Tmnp = GijkAmiBnjCpk mnp ijk mi njT G A B pkC Main steps in the algorithm Ymjk = TmnpBt njCt pk Yink = TmnpAt+1 mi Ct pk Yijp = TmnpBt+1 nj At+1 mi Performance on Tucker decomposition: 20 40 60 80 100 120 10−2 100 102 104 106 n Time(sec) TensorToolbox BTAS Cyclops CPU Batched GPU Batched
  • 32. Tensor Sketches Randomized dimensionality reduction through sketching. ◮ Complexity independent of tensor order: exponential gain! +1 +1 -1 Tensor T Sketch s Applications Tensor Decomposition via Sketching Visual Question and Answering CNN RNN What is the mustach made of? C W H MCT L Avgpooling FC Relu BatchNorm FC "Banana" Softmax
  • 33. MCT in Visual Question & Answering CNN RNN ‡ ¡t is the musta™  ¢¡£¤ ¥¦§ C ¨ r w© v ev FC Relu f ! m FC 4#$%$na4 Softma x
  • 34. Multimodal Tensor Pooling C W H L Text feature Image feature d1 d2 d3Spatial sketch Count sketch 3D FFT 1D FFT 3D IFFT (optional) d4 d1 d2 d3
  • 36. Extracting Topics from Documents Topics Topic Proportion police witness campus police witness campus police witness campus police witness crime Sports Educaon campus A., D. P. Foster, D. Hsu, S.M. Kakade, Y.K. Liu.“Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation,” NIPS 2012.
  • 37. Tensor Methods for Topic Modeling campus police witness Topic-word matrix P[word = i|topic = j] Linearly independent columns Moment Tensor: Co-occurrence of Word Triplets = + + campus police witness crim e Sports Educa on campus police witness cam pus police witness
  • 38. Tensors vs. Variational Inference Criterion: Perplexity = exp[−likelihood]. Learning Topics from PubMed on Spark, 8mil articles 0 2 4 6 8 10 ×104 RunningTime 103 104 105 Perplexity Tensor Variational Learning network communities from social network data Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6. 102 10 3 10 4 105 10 6 RunningTime FB YP DBLPsub DBLP 10-2 10-1 10 0 101 Error FB YP DBLPsub DBLP F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
  • 39. Tensors vs. Variational Inference Criterion: Perplexity = exp[−likelihood]. Learning Topics from PubMed on Spark, 8mil articles 0 2 4 6 8 10 ×104 RunningTime 103 104 105 Perplexity Tensor Variational Learning network communities from social network data Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6. 102 10 3 10 4 105 10 6 RunningTime FB YP DBLPsub DBLP 10-2 10-1 10 0 101 Error FB YP DBLPsub DBLP Orders of Magnitude Faster More Accurate F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
  • 40. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 41. Conclusion Distributed Deep Learning at Scale Mxnet has many attractive features ◮ Flexible programming ◮ Portable ◮ Highly efficient Easy to deploy large-scale DL on AWS cloud ◮ Deep Learning AMI ◮ Cloud formation templates Tensors are the future of ML Tensor contractions: space savings in deep architectures. New primitives speed up tensor contractions: extended BLAS = ++ + T u v = + ....