SlideShare a Scribd company logo
1 of 40
Download to read offline
Introduction to Chainer
Seiya Tokui, July 20, 2016
Outline
• General concepts of deep learning frameworks
• Basics of Chainer (based on v1.11)
• Walk through MNIST example
2
General concepts
Considerations and internals of deep learning frameworks
Why are frameworks needed for
neural networks (NN)?
NN research requirements:
• Flexibility in defining NN architectures
• Fast trial-and-error
• It means we want to automate programming as much as possible
We want to automate
• Automatic differentiation
• CUDA programming
• Efficient execution (computational optimization)
• Boilerplates in training loops
4
Core concept: computational graph (CG)
• Directed acyclic graph (DAG) that represents how to
compute output from input
• Two types of representations
• Data-operation graph (Theano, Chainer)
• Operation graph (a.k.a. data-flow graph) (TensorFlow, Torch.nn)
matmul + MSE
Example: CG (data-operation grpah) for least-squares linear regression
MSE: Mean Squared Error
5
Automatic differentiation over a computational grpah
• Gradient can be factorized by chain rules
• Reducing the products of Jacobians one by one: automatic
differentiation
matmul + MSE
(all factors are Jacobian)
6
Backpropagation over a computational graph
Usually inputs (W, b) are larger than the output (loss), so
reducing in right-fold way is computationally efficient.
→ Backpropagation (reverse-mode AD)
Left-fold computation is called forward-mode AD or Real Time
Recurrent Learning (in RNN literature)
matmul + MSE
(all factors are Jacobian)
7
Deep learning framework stack
Device-specific general routines
(e.g. BLAS, CUDA, OpenCL, cuBLAS)
Device-specific
DL routines
(e.g. cuDNN)
Multi-dimensional array
(e.g. Eigen::Tensor, NumPy,
CuPy)
Computational graph implementation
Neural network abstraction
Training/evaluation loop implementation
“Low level API”
“High level API”
“Backend”
8
Deep learning framework stack
Device-specific general routines
Device-
specific
DL routines
Multi-dimensional array
Computational graph implementation
Neural network abstraction
Training/evaluation loop implementation
Theano
TensorFlow
Keras
TFLearn
Chainer
CuPyTorch
cuTorch
Torch.nn
9
Basics of Chainer
Based on v1.11
Chainer
• Open source framework for neural networks
• First release: v1.0.0 (June, 2015)
• Latest release: v1.11.0 (July, 2016)
URLs:
• GitHub: https://github.com/pfnet/chainer
• Official site: http://chainer.org/
• Documentation: http://docs.chainer.org/
• Forum
• English: https://groups.google.com/forum/#!forum/chainer
• Japanese: https://groups.google.com/forum/#!forum/chainer-jp
11
Basic information
Chainer is a Python framework of neural networks.
Components:
• Backend / Low-level API
• CuPy: GPU array library with NumPy-subset interface
• Dynamic computational graph
• High-level API
• Model composition (links and chains)
• Dataset abstraction
• Training loop abstraction
12
Installation
On your Python environment (2.7 or 3.5), type
pip install chainer
• Enable GPU: If CUDA is installed and paths (CPATH,
LD_LIBRARY_PATH) are properly set, the above command
also installs GPU support (including CuPy)
• NOTE: install Chainer AFTER configuring CUDA!
• When your CUDA is updated, don’t forget to reinstall Chainer
• Sometimes you might have to delete the CUDA kernel cache at
$(HOME)/.cupy (just rm -r it)
• Chainer also supports cuDNN v2 – v5.1
13
How to learn Chainer
Official tutorial
• Part of the official document (http://docs.chainer.org)
Official examples
• examples directory in the official repository
(https://github.com/pfnet/chainer)
14
Computational graph (CG) in Chainer
Chainer is based on dynamic CG construction
• CG is not built beforehand for the forward computation
• The forward computation is written like a regular program on
Variable and Function
• Variable remembers the history of computation = CG
• Backpropagation is done by walking through this CG
15
import chainer, chainer.functions as F
import numpy as np
W = chainer.Variable(np.array(...))
b = chainer.Variable(np.array(...))
x = np.array(...)
y = np.array(...)
a = F.matmul(W, x)
y_hat = a + b
ell = F.mean_squared_error(y, y_hat)
print(ell.data) # => print the computed error
CG in Chainer: example
matmul + MSE
Input definition
(use Variable if you want to
extract grad; see the next slide)
Forward computation
(done on-the-fly)
16
Backpropagation in Chainer
• backward() executes backpropagation along the history of
forward computations
• Gradient w.r.t. terminal nodes can be extracted
(for non-terminal nodes, pass retain_grad=True to the
backward method)
17
a = F.matmul(W, x)
y_hat = a + b
ell = F.mean_squared_error(y, y_hat)
ell.backward() # Compute gradient of the error
print(W.grad) # => print the gradient w.r.t. W
print(b.grad) # => print the gradient w.r.t. b
Pre-built CG vs On-the-fly CG
• Most other frameworks use “pre-built CG”
• CGs are built by scripting or from static data (e.g. prototxt)
• They are executed multiple times after the construction
• We call this paradigm “Define-and-Run”
•  easy to cache the graph optimization
•  hard to define different graphs at every iteration, unintuitive, hard
to debug
• Chainer adopts “on-the-fly CG”
• CGs are built simultaneously with the forward computation
• The graph is only used to derive automatic differentiation
• We call this paradigm “Define-by-Run”
•  flexibility, intuitiveness, easy to debug
•  hard to cache the graph optimization
18
Defininig neural network models
• In Object-Oriented Programming (OOP), we often bind codes
to data
• In NN programming, we want to bind the forward
computation with parameters
• We can use Link and Chain for that purpose
• Link: a Function with parameters
• Chain: a forward computation routine that combines one or
more child links
• Chain itself is a link, so we can nest chains, resulting in a hierarchy
of links/chains
19
import chainer, chainer.functions as F, chainer.links as L
import numpy as np
class MLP(chainer.Chain):
def __init__(self):
super().__init__(
l1=L.Linear(100, 10),
l2=L.Linear(10, 1),
)
def __call__(self, x):
h = F.tanh(self.l1(x))
return self.l2(h)
Example: multi-layer perceptron
Wx+b
Linear
Linear
MLP
tanh Linear
20
class Classifier(chainer.Chain):
def __init__(self, predictor):
super().__init__(predictor=predictor)
def __call__(self, x, y):
y_hat = self.predictor(x)
loss = F.softmax_cross_entropy(y_hat, y)
accuracy = F.accuracy(y_hat, y)
chainer.report({'loss': loss, 'accuracy': accuracy}, self)
return loss
Example: Classifier
We often implement a chain that defines the loss function like
this
child link
Report the performance to Reporter
21
Built-in functions and links
There are many Functions and Links provided by Chainer
Popular examples (see the reference manual for the full list):
• Layers with parameters:
Linear, Convolution2D, Deconvolution2D, EmbedID
• Activation functions and recurrent layers:
sigmoid, tanh, relu, maxout, LSTM, GRU
• Loss functions:
softmax_cross_entropy, mean_squared_error,
connectionist_temporary_classification
• Other NN stuffs:
dropout, BatchNormalization
Many array/math functions are also supported
(mostly from NumPy)
22
Numerical optimization
• NNs are often trained with online gradient methods
• Stochastic Gradient Descent (SGD), momentum SGD, AdaGrad,
RMSprop, Adam, etc.
• Most of these optimizers need state vectors besides the parameters
(e.g. momentum, moving average of squared gradient, etc.)
• Chainer provides Optimizer to abstract the optimization
routines
• Examples are provided in chainer.optimizers
23
Numerical optimization
• Optimizer accepts target link to optimize
• It enumerates the parameters in the link hierarchy, and
prepares the corresponding state vectors
• Pass a loss function and its arguments to update the
parameters (the optimizer runs forward prop and backprop,
and then updates the parameters)
24
model = ... # chain
optimizer = chainer.optimizers.MomentumSGD()
optimizer.setup(model)
def loss_fun(x, y): ...
optimizer.update(loss_fun, x, y)
Training loop
Training loop: a loop that implements the iterative updates of
parameters
Each iteration consists of following procedures:
• Load a mini batch
• Forward/backward propagation
• Update parameters with Optimizer
• Track the current performance
• Save a checkpoint in regular intervals
We sometimes want to resume the training from a checkpoint
25
Training loop abstraction
(new feature of v1.11)
Trainer implements the training loop.
• Load a mini batch
• Forward/backward propagation
• Update parameters with Optimizer
• Track the current performance
• Save a checkpoint in regular intervals
Updater
Extension
26
Updater: parameter update routine
• It updates parameters using a mini batch
• Dataset defines a set of training points
• Iterator defines how to iterate over the dataset
Updater Optimizer Target Link
Iterator Dataset
27
Built-in datasets, iterators, and updaters
Dataset
• MNIST, CIFAR10/100, Penn Tree Bank (word sequence)
• Also support random split and cross-validation splits
Iterators
• Random shuffling at each epoch (epoch = 1 sweep over the
whole dataset)
• MultiprocessIterator: parallel prefetch of next mini batch
Updaters
• StandardUpdater: standard implementation
• ParallelUpdater: multi-GPU data-parallel updater
28
Extensions to extend the training loop
Trainer’s training loop
• Call updater.update()
• For all extension and corresponding trigger:
If trigger(trainer) == True:
call extension(trainer)
Users can register extensions to a trainer
• It is called at every iteration unless the trigger returns False
• The trigger manages when to invoke the extension
29
Built-in extensions
• Evaluator: evaluate the performance of current models on
a validation set
• LogReport: Collect reports made by the models and output
them to a JSON file
• PrintReport: Print selected entries from the log to console
• ProgressBar: Print a progress bar of the training process
• snapshot: Save a snapshot of the trainer object
(users can resume training by deserializing the snapshot)
30
Custom extensions
Users can write their own extensions.
Two ways: inherit Extension class, or using
@make_extension decorator (the decorator is easier)
Example: learning rate decay by the inverse of iteration count
31
@chainer.training.make_extension()
def adjust_learning_rate(trainer):
optimizer = trainer.updater.get_optimizer('main')
optimizer.lr = 0.01 / optimizer.t
MNIST example
MNIST example
• MNIST: hand-written digit image dataset
• 60,000 training images and 10,000 test images
• Each image has 28x28 = 784 pixels
• Digit (0-9) is labeled to each image
• Often used as a “Hello world” of deep learning
• Task: label prediction (multiclass classification)
• We use a multi-layer perceptron and use softmax cross
entropy as a loss function
33
Definition of the multi-layer perceptron
34
class MLP(chainer.Chain):
def __init__(self, n_in, n_units, n_out):
super().__init__(
l1=L.Linear(n_in, n_units), # first layer
l2=L.Linear(n_units, n_units), # second layer
l3=L.Linear(n_units, n_out), # output layer
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
Instantiate a classifier with the MLP
We wrap MLP by Classifier chain.
to_gpu is optional: it is needed only if you want to use a GPU.
Then, set up an optimizer for it.
35
model = L.Classifier(MLP(784, 1000, 10))
model.to_gpu(device=0) # Copy the model to GPU-0
optimizer = chainer.optimizers.Adam()
optimizer.setup(model)
Prepare the dataset and iterators
Chainer provides a utility function to download MNIST.
The train and test are instances of TupleDataset.
Each example is a tuple of an image and a label.
Each image is a 784 dimensional float32 vector.
repeat=False means only one sweep over the test dataset is
needed for validation.
36
train, test = chainer.datasets.get_mnist()
train_iter = chainer.iterators.SerialIterator(train, batchsize=100)
test_iter = chainer.iterators.SerialIterator(
test, batchsize=100, repeat=False, shuffle=False)
We have to create an updater to use Trainer.
Option: we can easily use data-parallel computation with
multiple GPUs by using ParallelUpdater.
Set up an updater and a trainer
37
updater = training.StandardUpdater(train_iter, optimizer, device=0)
trainer = training.Trainer(updater, (10, 'epoch'), out='result')
updater = training.ParallelUpdater(
train_iter, optimizer, devices={'main': 0, 'second': 1})
Add extensions
38
# Evaluate the model with the test set at the end of each epoch
trainer.extend(extensions.Evaluator(tests_iter, model, device=0))
# Save a snapshot of the trainer at the end of each epoch
trainer.extend(extensions.snapshot())
# Collect performance, save it to a log file,
# and print some entries to the console
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy']))
# Print a progress bar
trainer.extend(extensions.ProgressBar())
Executes the training loop
39
Just call trainer.run.
DEMO: executing this trainer.
trainer.run()
Summary
• Computational graph is a crucial part of any deep learning
frameworks
• DL frameworks also contain high-level APIs
• Chainer provides both low-level APIs and high-level APIs
• Today, I introduced the computational graph in Chainer and
high-level APIs
• Almost any part of Chainer can be customized, so you can try
unusual workflow sometimes seen in cutting-edge DL
research
40

More Related Content

What's hot

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Chris Fregly
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningSeiya Tokui
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorchMayur Bhangale
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsKusano Hitoshi
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNetAI Frontiers
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
 

What's hot (20)

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep Learning
 
Chainer v4 and v5
Chainer v4 and v5Chainer v4 and v5
Chainer v4 and v5
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 

Similar to Introduction to Chainer

Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Ganesan Narayanasamy
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyGanesan Narayanasamy
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1Kenta Oono
 
Introduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at BerkeleyIntroduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at BerkeleyTed Xiao
 
Snabb - A toolkit for user-space networking (FOSDEM 2018)
Snabb - A toolkit for user-space networking (FOSDEM 2018)Snabb - A toolkit for user-space networking (FOSDEM 2018)
Snabb - A toolkit for user-space networking (FOSDEM 2018)Igalia
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptxRithikRaj25
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Akihiro Hayashi
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxBharathiLakshmiAAssi
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionEmanuele Bezzi
 

Similar to Introduction to Chainer (20)

Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
OpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon ValleyOpenPOWER Workshop in Silicon Valley
OpenPOWER Workshop in Silicon Valley
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
Introduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at BerkeleyIntroduction to TensorFlow, by Machine Learning at Berkeley
Introduction to TensorFlow, by Machine Learning at Berkeley
 
Snabb - A toolkit for user-space networking (FOSDEM 2018)
Snabb - A toolkit for user-space networking (FOSDEM 2018)Snabb - A toolkit for user-space networking (FOSDEM 2018)
Snabb - A toolkit for user-space networking (FOSDEM 2018)
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptx
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
MAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsxMAtrix Multiplication Parallel.ppsx
MAtrix Multiplication Parallel.ppsx
 
matrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsxmatrixmultiplicationparallel.ppsx
matrixmultiplicationparallel.ppsx
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 

More from Seiya Tokui

Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Seiya Tokui
 
Chainer v2 and future dev plan
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev planSeiya Tokui
 
Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alphaSeiya Tokui
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
 
深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開Seiya Tokui
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Seiya Tokui
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 
Deep Learningの基礎と応用
Deep Learningの基礎と応用Deep Learningの基礎と応用
Deep Learningの基礎と応用Seiya Tokui
 
Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Seiya Tokui
 
論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing TrickSeiya Tokui
 
深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待Seiya Tokui
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSeiya Tokui
 
Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Seiya Tokui
 
Deep Learning技術の今
Deep Learning技術の今Deep Learning技術の今
Deep Learning技術の今Seiya Tokui
 
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelSeiya Tokui
 
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM PredictionICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM PredictionSeiya Tokui
 
Deep Learningの技術と未来
Deep Learningの技術と未来Deep Learningの技術と未来
Deep Learningの技術と未来Seiya Tokui
 

More from Seiya Tokui (20)

Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)
 
Chainer v2 and future dev plan
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev plan
 
Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alpha
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 
Deep Learningの基礎と応用
Deep Learningの基礎と応用Deep Learningの基礎と応用
Deep Learningの基礎と応用
 
Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用
 
論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick
 
深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Deep learning実装の基礎と実践
Deep learning実装の基礎と実践
 
Deep Learning技術の今
Deep Learning技術の今Deep Learning技術の今
Deep Learning技術の今
 
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
 
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM PredictionICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
 
Deep Learningの技術と未来
Deep Learningの技術と未来Deep Learningの技術と未来
Deep Learningの技術と未来
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Introduction to Chainer

  • 1. Introduction to Chainer Seiya Tokui, July 20, 2016
  • 2. Outline • General concepts of deep learning frameworks • Basics of Chainer (based on v1.11) • Walk through MNIST example 2
  • 3. General concepts Considerations and internals of deep learning frameworks
  • 4. Why are frameworks needed for neural networks (NN)? NN research requirements: • Flexibility in defining NN architectures • Fast trial-and-error • It means we want to automate programming as much as possible We want to automate • Automatic differentiation • CUDA programming • Efficient execution (computational optimization) • Boilerplates in training loops 4
  • 5. Core concept: computational graph (CG) • Directed acyclic graph (DAG) that represents how to compute output from input • Two types of representations • Data-operation graph (Theano, Chainer) • Operation graph (a.k.a. data-flow graph) (TensorFlow, Torch.nn) matmul + MSE Example: CG (data-operation grpah) for least-squares linear regression MSE: Mean Squared Error 5
  • 6. Automatic differentiation over a computational grpah • Gradient can be factorized by chain rules • Reducing the products of Jacobians one by one: automatic differentiation matmul + MSE (all factors are Jacobian) 6
  • 7. Backpropagation over a computational graph Usually inputs (W, b) are larger than the output (loss), so reducing in right-fold way is computationally efficient. → Backpropagation (reverse-mode AD) Left-fold computation is called forward-mode AD or Real Time Recurrent Learning (in RNN literature) matmul + MSE (all factors are Jacobian) 7
  • 8. Deep learning framework stack Device-specific general routines (e.g. BLAS, CUDA, OpenCL, cuBLAS) Device-specific DL routines (e.g. cuDNN) Multi-dimensional array (e.g. Eigen::Tensor, NumPy, CuPy) Computational graph implementation Neural network abstraction Training/evaluation loop implementation “Low level API” “High level API” “Backend” 8
  • 9. Deep learning framework stack Device-specific general routines Device- specific DL routines Multi-dimensional array Computational graph implementation Neural network abstraction Training/evaluation loop implementation Theano TensorFlow Keras TFLearn Chainer CuPyTorch cuTorch Torch.nn 9
  • 11. Chainer • Open source framework for neural networks • First release: v1.0.0 (June, 2015) • Latest release: v1.11.0 (July, 2016) URLs: • GitHub: https://github.com/pfnet/chainer • Official site: http://chainer.org/ • Documentation: http://docs.chainer.org/ • Forum • English: https://groups.google.com/forum/#!forum/chainer • Japanese: https://groups.google.com/forum/#!forum/chainer-jp 11
  • 12. Basic information Chainer is a Python framework of neural networks. Components: • Backend / Low-level API • CuPy: GPU array library with NumPy-subset interface • Dynamic computational graph • High-level API • Model composition (links and chains) • Dataset abstraction • Training loop abstraction 12
  • 13. Installation On your Python environment (2.7 or 3.5), type pip install chainer • Enable GPU: If CUDA is installed and paths (CPATH, LD_LIBRARY_PATH) are properly set, the above command also installs GPU support (including CuPy) • NOTE: install Chainer AFTER configuring CUDA! • When your CUDA is updated, don’t forget to reinstall Chainer • Sometimes you might have to delete the CUDA kernel cache at $(HOME)/.cupy (just rm -r it) • Chainer also supports cuDNN v2 – v5.1 13
  • 14. How to learn Chainer Official tutorial • Part of the official document (http://docs.chainer.org) Official examples • examples directory in the official repository (https://github.com/pfnet/chainer) 14
  • 15. Computational graph (CG) in Chainer Chainer is based on dynamic CG construction • CG is not built beforehand for the forward computation • The forward computation is written like a regular program on Variable and Function • Variable remembers the history of computation = CG • Backpropagation is done by walking through this CG 15
  • 16. import chainer, chainer.functions as F import numpy as np W = chainer.Variable(np.array(...)) b = chainer.Variable(np.array(...)) x = np.array(...) y = np.array(...) a = F.matmul(W, x) y_hat = a + b ell = F.mean_squared_error(y, y_hat) print(ell.data) # => print the computed error CG in Chainer: example matmul + MSE Input definition (use Variable if you want to extract grad; see the next slide) Forward computation (done on-the-fly) 16
  • 17. Backpropagation in Chainer • backward() executes backpropagation along the history of forward computations • Gradient w.r.t. terminal nodes can be extracted (for non-terminal nodes, pass retain_grad=True to the backward method) 17 a = F.matmul(W, x) y_hat = a + b ell = F.mean_squared_error(y, y_hat) ell.backward() # Compute gradient of the error print(W.grad) # => print the gradient w.r.t. W print(b.grad) # => print the gradient w.r.t. b
  • 18. Pre-built CG vs On-the-fly CG • Most other frameworks use “pre-built CG” • CGs are built by scripting or from static data (e.g. prototxt) • They are executed multiple times after the construction • We call this paradigm “Define-and-Run” •  easy to cache the graph optimization •  hard to define different graphs at every iteration, unintuitive, hard to debug • Chainer adopts “on-the-fly CG” • CGs are built simultaneously with the forward computation • The graph is only used to derive automatic differentiation • We call this paradigm “Define-by-Run” •  flexibility, intuitiveness, easy to debug •  hard to cache the graph optimization 18
  • 19. Defininig neural network models • In Object-Oriented Programming (OOP), we often bind codes to data • In NN programming, we want to bind the forward computation with parameters • We can use Link and Chain for that purpose • Link: a Function with parameters • Chain: a forward computation routine that combines one or more child links • Chain itself is a link, so we can nest chains, resulting in a hierarchy of links/chains 19
  • 20. import chainer, chainer.functions as F, chainer.links as L import numpy as np class MLP(chainer.Chain): def __init__(self): super().__init__( l1=L.Linear(100, 10), l2=L.Linear(10, 1), ) def __call__(self, x): h = F.tanh(self.l1(x)) return self.l2(h) Example: multi-layer perceptron Wx+b Linear Linear MLP tanh Linear 20
  • 21. class Classifier(chainer.Chain): def __init__(self, predictor): super().__init__(predictor=predictor) def __call__(self, x, y): y_hat = self.predictor(x) loss = F.softmax_cross_entropy(y_hat, y) accuracy = F.accuracy(y_hat, y) chainer.report({'loss': loss, 'accuracy': accuracy}, self) return loss Example: Classifier We often implement a chain that defines the loss function like this child link Report the performance to Reporter 21
  • 22. Built-in functions and links There are many Functions and Links provided by Chainer Popular examples (see the reference manual for the full list): • Layers with parameters: Linear, Convolution2D, Deconvolution2D, EmbedID • Activation functions and recurrent layers: sigmoid, tanh, relu, maxout, LSTM, GRU • Loss functions: softmax_cross_entropy, mean_squared_error, connectionist_temporary_classification • Other NN stuffs: dropout, BatchNormalization Many array/math functions are also supported (mostly from NumPy) 22
  • 23. Numerical optimization • NNs are often trained with online gradient methods • Stochastic Gradient Descent (SGD), momentum SGD, AdaGrad, RMSprop, Adam, etc. • Most of these optimizers need state vectors besides the parameters (e.g. momentum, moving average of squared gradient, etc.) • Chainer provides Optimizer to abstract the optimization routines • Examples are provided in chainer.optimizers 23
  • 24. Numerical optimization • Optimizer accepts target link to optimize • It enumerates the parameters in the link hierarchy, and prepares the corresponding state vectors • Pass a loss function and its arguments to update the parameters (the optimizer runs forward prop and backprop, and then updates the parameters) 24 model = ... # chain optimizer = chainer.optimizers.MomentumSGD() optimizer.setup(model) def loss_fun(x, y): ... optimizer.update(loss_fun, x, y)
  • 25. Training loop Training loop: a loop that implements the iterative updates of parameters Each iteration consists of following procedures: • Load a mini batch • Forward/backward propagation • Update parameters with Optimizer • Track the current performance • Save a checkpoint in regular intervals We sometimes want to resume the training from a checkpoint 25
  • 26. Training loop abstraction (new feature of v1.11) Trainer implements the training loop. • Load a mini batch • Forward/backward propagation • Update parameters with Optimizer • Track the current performance • Save a checkpoint in regular intervals Updater Extension 26
  • 27. Updater: parameter update routine • It updates parameters using a mini batch • Dataset defines a set of training points • Iterator defines how to iterate over the dataset Updater Optimizer Target Link Iterator Dataset 27
  • 28. Built-in datasets, iterators, and updaters Dataset • MNIST, CIFAR10/100, Penn Tree Bank (word sequence) • Also support random split and cross-validation splits Iterators • Random shuffling at each epoch (epoch = 1 sweep over the whole dataset) • MultiprocessIterator: parallel prefetch of next mini batch Updaters • StandardUpdater: standard implementation • ParallelUpdater: multi-GPU data-parallel updater 28
  • 29. Extensions to extend the training loop Trainer’s training loop • Call updater.update() • For all extension and corresponding trigger: If trigger(trainer) == True: call extension(trainer) Users can register extensions to a trainer • It is called at every iteration unless the trigger returns False • The trigger manages when to invoke the extension 29
  • 30. Built-in extensions • Evaluator: evaluate the performance of current models on a validation set • LogReport: Collect reports made by the models and output them to a JSON file • PrintReport: Print selected entries from the log to console • ProgressBar: Print a progress bar of the training process • snapshot: Save a snapshot of the trainer object (users can resume training by deserializing the snapshot) 30
  • 31. Custom extensions Users can write their own extensions. Two ways: inherit Extension class, or using @make_extension decorator (the decorator is easier) Example: learning rate decay by the inverse of iteration count 31 @chainer.training.make_extension() def adjust_learning_rate(trainer): optimizer = trainer.updater.get_optimizer('main') optimizer.lr = 0.01 / optimizer.t
  • 33. MNIST example • MNIST: hand-written digit image dataset • 60,000 training images and 10,000 test images • Each image has 28x28 = 784 pixels • Digit (0-9) is labeled to each image • Often used as a “Hello world” of deep learning • Task: label prediction (multiclass classification) • We use a multi-layer perceptron and use softmax cross entropy as a loss function 33
  • 34. Definition of the multi-layer perceptron 34 class MLP(chainer.Chain): def __init__(self, n_in, n_units, n_out): super().__init__( l1=L.Linear(n_in, n_units), # first layer l2=L.Linear(n_units, n_units), # second layer l3=L.Linear(n_units, n_out), # output layer ) def __call__(self, x): h1 = F.relu(self.l1(x)) h2 = F.relu(self.l2(h1)) return self.l3(h2)
  • 35. Instantiate a classifier with the MLP We wrap MLP by Classifier chain. to_gpu is optional: it is needed only if you want to use a GPU. Then, set up an optimizer for it. 35 model = L.Classifier(MLP(784, 1000, 10)) model.to_gpu(device=0) # Copy the model to GPU-0 optimizer = chainer.optimizers.Adam() optimizer.setup(model)
  • 36. Prepare the dataset and iterators Chainer provides a utility function to download MNIST. The train and test are instances of TupleDataset. Each example is a tuple of an image and a label. Each image is a 784 dimensional float32 vector. repeat=False means only one sweep over the test dataset is needed for validation. 36 train, test = chainer.datasets.get_mnist() train_iter = chainer.iterators.SerialIterator(train, batchsize=100) test_iter = chainer.iterators.SerialIterator( test, batchsize=100, repeat=False, shuffle=False)
  • 37. We have to create an updater to use Trainer. Option: we can easily use data-parallel computation with multiple GPUs by using ParallelUpdater. Set up an updater and a trainer 37 updater = training.StandardUpdater(train_iter, optimizer, device=0) trainer = training.Trainer(updater, (10, 'epoch'), out='result') updater = training.ParallelUpdater( train_iter, optimizer, devices={'main': 0, 'second': 1})
  • 38. Add extensions 38 # Evaluate the model with the test set at the end of each epoch trainer.extend(extensions.Evaluator(tests_iter, model, device=0)) # Save a snapshot of the trainer at the end of each epoch trainer.extend(extensions.snapshot()) # Collect performance, save it to a log file, # and print some entries to the console trainer.extend(extensions.LogReport()) trainer.extend(extensions.PrintReport( ['epoch', 'main/loss', 'validation/main/loss', 'main/accuracy', 'validation/main/accuracy'])) # Print a progress bar trainer.extend(extensions.ProgressBar())
  • 39. Executes the training loop 39 Just call trainer.run. DEMO: executing this trainer. trainer.run()
  • 40. Summary • Computational graph is a crucial part of any deep learning frameworks • DL frameworks also contain high-level APIs • Chainer provides both low-level APIs and high-level APIs • Today, I introduced the computational graph in Chainer and high-level APIs • Almost any part of Chainer can be customized, so you can try unusual workflow sometimes seen in cutting-edge DL research 40