12. Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
Most Open Best On AWS
Optimized for
deep learning on
AWS
Accepted into the
Apache Incubator
13. Amazon Strategy | Apache MXNet
Integrate with
AWS Services
Bring Scalable Deep
Learning to AWS
Services such as
Amazon EMR, AWS
Lambda and
Amazon ECS.
Foundation for
AI Services
AmazonAI API
Services, Internal AI
Research, Amazon
Core AI
Development
Leverage the
Community
Community brings
velocity and
innovation with no
single project owner
or controller
14. Deep Learning using MXNet @Amazon
• Applied Research
• Core Research
• Alexa
• Demand Forecasting
• Risk Analytics
• Search
• Recommendations
• AI Services | Rek, Lex, Polly
• Q&A Systems
• Supply Chain Optimization
• Advertising
• Machine Translation
• Video Content Analysis
• Robotics
• Lots of Computer Vision..
• Lots of NLP/U..
*Teams are either actively evaluating, in development, or transitioning to scale production
15. AI Services
AI Platform
AI Engines
Amazon
Rekognition
Amazon
Polly
Amazon
Lex
More to come
in 2017
Amazon
Machine Learning
Amazon Elastic
MapReduce
Spark &
SparkML
More to come
in 2017
Apache
MXNet
Caffe Theano KerasTorch CNTK
Amazon AI: Democratized Artificial Intelligence
TensorFlow
P2 ECS Lambda
AWS
Greengrass
FPGAEMR/Spark
More to
come
in 2017
Hardware
16. Collaborations and Community
4th DL Framework in Popularity
(Outpacing Torch, CNTK and
Theano)
Diverse Community
(Spans Industry and Academia)
0 20,000 40,000 60,000
Yutian Li…
Liang Depeng…
Tianjun Xiao…
Yao Wang (AWS)
Yizhi Liu…
Sergey…
Tianqi Chen…
Bing Su (Apple)
*As of 3/30/17
0 50 100 150 200
Torch
CNTK
DL4J
Theano
Apache MXNet
Keras
Caffe
TensorFlow
*As of 2/11/17
17. Deep Learning Framework Comparison
Apache MXNet TensorFlow Cognitive Toolkit
Industry Owner
N/A – Apache
Community
Google Microsoft
Programmability
Imperative and
Declarative
Declarative only Declarative only
Language
Support
R, Python, Scala, Julia,
Cpp. Javascript, Go,
Matlab and more..
Python, Cpp.
Experimental Go and
Java
Python, Cpp,
Brainscript.
Code Length |
AlexNet (Python)
44 sloc 107 sloc using TF.Slim 214 sloc
Memory Footprint
(LSTM)
2.6GB 7.2GB N/A
*sloc – source lines of code
22. Apache MXNet | The Basics
• NDArray: Manipulate multi-dimensional arrays in a command line
paradigm (imperative).
• Symbol: Symbolic expression for neural networks (declarative).
• Module: Intermediate-level and high-level interface for neural
network training and inference.
• Loading Data: Feeding data into training/inference programs.
• Mixed Programming: Training algorithms developed using
NDArrays in concert with Symbols.
23. import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
d = c + 1
• Straightforward and flexible.
• Take advantage of language
native features (loop, condition,
debugger).
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONSEasy to tweak
in Python
Imperative Programming
24. • More chances for
optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with
D because C is deleted
later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programming
25. IMPERATIVE
NDARRAY
API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
Mixed Programming Paradigm
26. Embed symbolic expressions into imperative
programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip( texec.get_params(), texec.get_grads() ):
param -= 0.2 * grad
Mixed Programming Paradigm
27. • Fit the core library with all dependencies into a
single C++ source file
• Easy to compile on any platform
Amalgamation
BlindTool by Joseph Paul Cohen, demo on Nexus 4
RUNS IN BROWSER
WITH JAVASCRIPT
28. Roadmap / Areas of Investment
• Usability
• Keras Integration / Gluon Interface
• MinPy being merged (Dynamic Computation graphs, Std Numpy interface)
• Documentation (installation, native documents, etc.)
• Tutorials, examples | Jupyter Notebooks
• Platform support
(Linux, Windows, OS X, mobile …)
• Language bindings
(Python, C++, R, Scala, Julia, JavaScript …)
• Sparse datatypes and LSTM performance improvements
• Deploy your model your way: Lambda (+GreenGrass), Amazon EC2/Docker,
Raspberry Pi
33. 33@IntelAI
Hardware for DL Workloads
Up to 2X better peak performance
on compute-intensive analytics
100x improvement in inference
performance on EC2 C5 instance*
NEW C5 more computational
power, lower costs – customers do
more with less
Blazingly Fast Data Access
New microarchitecture, hardware
acceleration, Intel® AVX-512
50% more memory than previous
generation
Novartis conducted 39 years of
computational chemistry in 9 hours*
High Speed Scalability
Up to 1.73x faster completion of
massively parallel research
simulations than the previous
generation
Seamless data transfer via
interconnects
Training AI: Intel® xeon® scalable processor
Best-in-Class Deep Learning Training Performance
Accelerator for training compute density in deep learning centric environments
+
34. 34@IntelAI
Inference in the cloud: amazon & Intel®
Math Kernel Library for Deep Neural Networks
For developers of deep learning frameworks featuring optimized performance on Intel hardware
6.1 2.4 1.2 0.8
679.4
262.5
79.7 73.9
0
200
400
600
800
AlexNet GoogLeNet v1 ResNet-50 Inception v3
Images/Sec
c4.8xlarge MXNet Inference
No MKL MKL
Up to 2X better peak performance on compute-intensive analytics
100x improvement in inference performance on EC2 C5 instance*
Intel-optimized Caffe, Intel® MKL for high performance distributed training and inference
CloudFormation template with AWS services and EC2, CfnCluster, DynamoDB, EBS and Spot Instance support
Classify text, train a Convolutional neural network, visualize the training using Tensorboard using BigDL on AWS
35. INTEL® IOT GATEWAY REAL TIME ANALYTICSAWS IOT PLATFORM
Amazon EC2
X1
Inference at the edge: AWS & Intel®
cost savings
with scalability
End-to-end interoperability
to scale applications and services
streamlined
manageability and
analytics
Seamless data management
and analytics from thing
to network to cloud
multilayered,
end-to-end
security
A chain of trust rooted
in the hardware and linked throughout
the software
36. 36@IntelAI
Libraries, frameworks & tools
Intel® Math Kernel
Library
Intel® MLSL
Intel® Data
Analytics
Acceleration
Library
(DAAL)
Intel®
Distributio
n
Open
Source
Frameworks
Intel Deep
Learning SDK
Intel® Computer
Vision SDKIntel® MKL MKL-DNN
High
Level
Overview
Computation
primitives; high
performance math
primitives granting
low level of control
Computation
primitives; free
open source DNN
functions for high-
velocity integration
with deep learning
frameworks
Communication
primitives; building
blocks to scale deep
learning framework
performance over a
cluster
Broad data analytics
acceleration object
oriented library
supporting distributed
ML at the algorithm
level
Most popular and
fastest growing
language for
machine learning
Toolkits driven by
academia and
industry for training
machine learning
algorithms
Accelerate deep
learning model
design, training and
deployment
Toolkit to develop &
deploying vision-
oriented solutions
that harness the full
performance of Intel
CPUs and SOC
accelerators
Primary
Audience
Consumed by
developers of
higher level
libraries and
Applications
Consumed by
developers of the
next generation of
deep learning
frameworks
Deep learning
framework
developers and
optimizers
Wider Data Analytics
and ML audience,
Algorithm level
development for all
stages of data
analytics
Application
Developers and
Data Scientists
Machine Learning
App Developers,
Researchers and
Data Scientists.
Application
Developers and Data
Scientists
Developers who
create vision-oriented
solutions
Example
Usage
Framework
developers call
matrix
multiplication,
convolution
functions
New framework
with functions
developers call for
max CPU
performance
Framework
developer calls
functions to distribute
Caffe training
compute across an
Intel® Xeon Phi™
cluster
Call distributed
alternating least
squares algorithm for
a recommendation
system
Call scikit-learn
k-means function
for credit card
fraud detection
Script and train a
convolution neural
network for image
recognition
Deep Learning
training and model
creation, with
optimization for
deployment on
constrained end
device
Use deep learning to
do pedestrian
detection
…
Find out more at software.intel.com/ai
38. One-Click GPU or CPU
Deep Learning
AWS Deep Learning AMI
Up to~40k CUDA cores
Apache MXNet
TensorFlow
Theano
Caffe
Torch
Keras
Pre-configured CUDA drivers,
MKL
Anaconda, Python3
Ubuntu and Amazon Linux
+ AWS CloudFormation template
+ Container image
39. Application Examples | Jupyter Notebooks
• https://github.com/dmlc/mxnet-notebooks
• Basic concepts
• NDArray - multi-dimensional array computation
• Symbol - symbolic expression for neural networks
• Module - neural network training and inference
• Applications
• MNIST: recognize handwritten digits
• Check out the distributed training results
• Predict with pre-trained models
• LSTMs for sequence learning
• Recommender systems
• Train a state of the art Computer Vision model (CNN)
• Lots more..
40. Call to Action
MXNet Resources:
• MXNet Blog Post | AWS Endorsement
• Read up on MXNet and Learn More: mxnet.io
• MXNet Github Repo
• MXNet Recommender Systems Talk | Leo Dirac
Developer Resources:
• Deep Learning AMI | Amazon Linux
• Deep Learning AMI | Ubuntu
• CloudFormation Template Instructions
• Deep Learning Benchmark
• MXNet on Lambda
• MXNet on ECS/Docker
• MXNet on Raspberry Pi | Image Detector using Inception Network