Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM

Seoul National University
Seoul National University System Health & Risk Management
Deep learning and
Tensorflow implementation
Myungyon Kim
2016. 11. 16.

Contents
2016/11/16 - 2 -
• Feature Engineering
• Deep Neural Network
• Tensorflow
• Tensorflow Implementation
• Future Works
• References

Feature Engineering
2016/11/16 - 3 -
• Data Representation
– Raw sensory data (ex. Vibration, acceleration, temperature)
: complex, high dimension, redundant, containing noises
 hard to discover useful information and insights
– Necessary to find some good, suitable way to represent our data
• Feature engineering
– Process of create and extract features which represent the systems well
– Fundamental to application of machine learning algorithms
– Quality and quantity of the features have great influence on the results
– Based on the physical, domain knowledge and intuition of engineer

Feature Engineering
2016/11/16 - 4 -
• Rotor team case
• Image processing
Edge detection Corner detection HoG (Histogram of Gradients)
Statistical Frequency
Features
System Characteristic
Frequency Features
Time-domain Features
Frequency-domain Features
Kinetic energy
related
Data statistics
related
Waveform
related
RMS, Max,
Mean
Kurtosis,
Skewness
Crest Factor,
Impulse Factor
Gear Mesh Freq.,
Sideband Freq.,
Harmonic Freq.
Freq. Center,
RMS Freq.,
Component Ratio of 1x
https://en.wikipedia.org/wiki/Edge_detection https://en.wikipedia.org/wiki/Corner_detection http://www.mdpi.com/1424-8220/16/7/1134/htm

Feature Engineering
2016/11/16 - 5 -
• Manual Feature engineering
– “Coming up with features is difficult, time-consuming, requires expert
knowledge. ‘Applied machine learning’ is basically feature engineering”
- Andrew Ng (Stanford university, Chief Scientist at Baidu Research)
– Feature engineering (select and extract features) is fundamental for machine
learning, but it is very difficult, tedious and expensive
– For some applications, we may have no idea ‘which features we should use’
• Problems of Current PHM Practices
– A considerable amount of human expertise and knowledge is required.
– Different systems and data require different feature engineering approaches
 Features relevant to diagnosis of one system may NOT be suitable to that
of another system.

Feature Engineering
2016/11/16 - 6 -
• Automated Feature Learning
– Need to substitute manual feature engineering with automated feature
learning using deep learning
• Deep Learning (Deep Neural Network)
– Multiple processing layers, with several linear and nonlinear transformation
– Learn and extract hierarchical features automatically
Replace handcrafted, manually extracted features
– Mimic the information processing and communication patterns in a nervous
system of human brain (Inspired by advances in neuroscience)
– Various algorithms and architectures (DNN, CNN, DBN, RNN) *
– Applied to various field (computer vision, speech recognition, natural
language processing, bioinformatics)
* DNN: Deep Neural Network
CNN: Convolutional Neural Network
DBN: Deep Belief Network
RNN: Recurrent Neural Network

Deep Neural Network
2016/11/16 - 7 -
• History of Neural Network
– Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– “Perceptrons” – Marvin Minsky, 1969
– Multilayer Neural Network (composition of perceptrons)
– Backpropagation – “Learning representations by back-propagating errors”,
nature, Geoffrey Hinton, 1986/ Paul Werbos 1974
– Several problems in multilayer, deep Neural Network
: hard to train, vanishing gradient, local minima, overfitting
– Pre-training (greedy pre-train using RBM) *
clever way to initialize weight values
– ReLU (Rectifier)
– Drop out
– CNN (Convolutional Neural Networks)
– Computing power/ GPU (Graphic Processing Unit)
– Large number of digital data
* RBM: Restricted Boltzmann Machine
1st winter
2nd winter

Deep Neural Network
2016/11/16 - 8 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– Most simple case of feedforward neural network
– Linear, binary classifier
– Train neural network = obtain suitable, correct weights and bias values
– He thought that his perceptron is able to classify/recognize everything in the
future, with highly developed hardware
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
or
1
10
+ +
+-
and
1
10
- +
--
And perceptron
w1=1, w2=1,
Ө=1.5
Or perceptron
w1=1, w2=1,
Ө=0.5

Deep Neural Network
2016/11/16 - 9 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
Activation function
– Activated (input > threshold Ө) or not (input < Ө)
– Sigmoid function (activation function)
 to train neural network, activation function should be differentiable
– 𝑠𝑖𝑔 𝑋 =
1
1+𝑒−𝑋
– Nonlinear activation function
 “squashing” linear net input within specific region
& add nonlinear properties to NN
– allow networks to compute nontrivial problems using only a small number of
nodes
https://commons.wikimedia.org/wiki/File:Sigmoid-function-2.svg

Deep Neural Network
2016/11/16 - 10 -
• “Perceptrons” – Marvin Minsky, 1969
– Single layer perceptron cannot solve the nonlinear classification problem
– Xor (exclusive or): logical operation that outputs true only when input differ
– To solve nonlinear problem, MLP(multilayer perceptrons, multilayer neural
network) is needed
– However, it is not easy (very hard) to train MLP properly
http://www.aistudy.com/neural/multilayer_perceptron.htm
XOR
1
10
- +
-+
 cannot solved by
using single perceptron
(linear classifier)
https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113
1st winter

Deep Neural Network
2016/11/16 - 11 -
• “Perceptrons” – Marvin Minsky, 1969
Examples) Neural networks for logical operation
http://toritris.weebly.com/perceptron-2-logical-operations.html
AND OR
XOR
 Single layer perceptron is ok
 Multi-layer perceptron is needed
(*different weight and threshold can be used.
Ex) different values for “and”, “or” as I explained before)
XOR
1
10
- +
-+

Deep Neural Network
2016/11/16 - 12 -
• Backpropagation
(“Learning representations by back-propagating errors”, nature, Geoffrey Hinton, 1986/
Paul Werbos 1974)
– “Backward propagation of errors”
– Common method to train NN with multiple layers
– error gradient with respect to each weights
 Using chain rule, quantify the influence of each weights to final error
– Using optimization method such as gradient descent algorithm
Error gradient
Calculation of partial derivative of cost
with respect to specific weight using chain rule
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Deep Neural Network
2016/11/16 - 13 -
• Backpropagation
– Procedure
1. Initialize the weights randomly
2. Forward propagation (through the neural network, to obtain output & cost)
3. Backward propagation (influence of each weights on errors)
4. Weight update (𝑊 ≔ 𝑊 − 𝛼
𝜕
𝜕𝑊
𝑐𝑜𝑠𝑡 𝑊 )
repeat those steps until the performance of the network is satisfied
• Cost function (Error)
𝐶𝑜𝑠𝑡 =
1
2𝑚
෍
𝑖=1
𝑚
𝑡𝑎𝑟𝑔𝑒𝑡 𝑖
− 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖 2
• Gradient descent algorithm
- find W, b to minimize the cost using delta rule
- used in many minimization problems
Convex function
 guarantee global minima
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html

Deep Neural Network
2016/11/16 - 14 -
• Several problems in multilayer, deep Neural Network
1. Vanishing gradient
– Error gradient: multiplication of gradients in backward direction
 for early layers, error gradients vanish
– Back-propagation fail to train earlier-layer parameters properly
– Early layers are responsible for detecting the simple patterns and the
building blocks (ex. Edge for facial recognition)
 when early layers are not trained properly, the result will be inaccurate
https://www.youtube.com/watch?v=E5a3nDpaXjwhttp://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-fully-worked-example/
Max: 0.25

Deep Neural Network
2016/11/16 - 15 -
2. Local minima
– Using gradient descent algorithm to train the deep neural network, it is
possible to get stuck in local minima
– Cannot obtain the optimum solution
https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks

Deep Neural Network
2016/11/16 - 16 -
3. Overfitting
– complex model: too many parameters relative to the number of observation
– Learn or train not only true relation, but also noise and random errors
– overacts to minor fluctuations in the given training data
 Poor predictive performance
4. Hard to train correctly, takes long time
 use other machine learning algorithms, such as support vector machine
http://www.slideshare.net/fcollova/introduction-to-neural-network
Under-fitting Just right! overfitting
2nd winter

Deep Neural Network
2016/11/16 - 17 -
• Pre-training using RBM – Geoffrey Hinton
*clever way to initialize weight values
– RBM: Restricted, special case of Boltzmann machine/ undirected, generative
energy-based model with a visible input layer and a hidden layer/
connections between the layers but not within layers
– By greedy, layer-wise training* of RBM and Stacking them (*Bengio, 2007)
 initialize the weights of DNN well (pre-training)
 faster convergence of the fine-tuning and improved performance
– 1. pre-training: Learn generally useful feature detector (RBM, AE)
2. fine-tuning: whole network is trained further by supervised BP
http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks
RBM
http://deeplearning.net/tutorial/rbm.html
Layer-wise training of a DBN * (DBN: Deep Belief Network)

Deep Neural Network
2016/11/16 - 18 -
• ReLU (Rectified Linear Unig, Rectifier)
– Activation function defined as
𝑓 𝑥 = 𝑚𝑎𝑥(0, 𝑥)
– Powerful activation function which substitute the sigmoid function
– Sigmoid: derivative is smaller than 0.25  vanishing gradient problem
ReLU: 0 or 1  error transferred 100%: no vanishing gradient!
– Sparse activation  fast and effective training of DNN with large datasets
 No need to use unsupervised pre-training (RBM, AE)
http://nn.readthedocs.io/en/latest/transfer/
ReLU and its derivative

Deep Neural Network
2016/11/16 - 19 -
• Dropout
– Regularization technique : Prevent co-adaptation on training data
– At each training step, individual nodes are either “dropped out” of the net
with probability 1 − 𝑝 or kept with 𝑝  reduced network (less parameters)
– By avoiding training all nodes on all training data, dropout reduces
overfitting in NN  can be thought as ensemble of smaller NNs
– Significantly improves the speed of training
– Reduce the tightly fitted interactions between nodes
 learn more robust features which better generalize to new data
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014), Hinton
Drop-out

Deep Neural Network
2016/11/16 - 20 -
• Convolutional Neural Network (CNN)
– A type of feed-forward artificial neural network
– Inspired by the connectivity pattern between neurons of the visual cortex*
– Individual cortical neurons respond to stimuli in a small region of space
(receptive field)
– The receptive fields of different neurons partially overlap such that they tile
the visual field  convolution operation mathematically.
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
*visual cortex: 대뇌 시각 피질
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf

Deep Neural Network
2016/11/16 - 21 -
MLP
– Multilayer perceptron  suffer from curse of dimensionality due to full
connectivity between nodes (large number of weights)
– not take into account the spatial structure of data,
 treating input pixels far apart and close on the same way
– full connectivity of neurons is wasteful in the image recognition, and the
huge number of parameters quickly leads to overfitting
CNN
– mitigate the challenges posed by the MLP architecture by exploiting the
spatially local correlation present in images (local connectivity)
– Each filter (weight) is shared across the entire visual field
 Weight sharing reduces the number of parameters dramatically, thus
lower the memory requirements and training time.

Deep Neural Network
2016/11/16 - 22 -
MLP (Deep neural network)
Nodes are fully connected with
adjacent layer
CNN
Locally connected (receptive field)
http://neuralnetworksanddeeplearning.com/chap6.html

Deep Neural Network
2016/11/16 - 23 -
• Structure of CNN
– Convolutional layer
1) depth: number of filter
2) filter size: area of filter (number of weight parameters)
3) stride: filter movement, inverse proportional to conv. layer’s dimension
4) zero-padding: pad the input with zeros on the border, control output
volume spatial size
 convolutional layer size is depend on these parameters
– ReLU layer
NL activation function which increases the NL properties
– Pooling layer
nonlinear down-sampling (ex. Max pooling, average pooling)
– Fully connected layer
after several conv. and pooling layers  high-level reasoning via FC layer
fully connected to all activations in the previous layer (same as common
DNN)

Deep Neural Network
2016/11/16 - 24 -

Deep Neural Network
2016/11/16 - 25 -
Convolutional layer

Deep Neural Network
2016/11/16 - 26 -
Convolutional layer
Pooling layer
stride=1
stride Zero-padding

Deep Neural Network
2016/11/16 - 27 -
• Examples of CNN

Tensorflow
2016/11/16 - 28 -
• Tensorflow basics
– Open source package for machine learning and deep learning developed by
Google Brain Team (within Google’s Machine Intelligence research organization)
– Library for numerical computation using data flow graph
– graph structure that contains all the information, operations and data .
– Node: represent the mathematical operations, points of data entry, output
results, or read/write variables.
– Edge: describe the relationships between nodes with their inputs and
outputs, carry tensors (the basic data structure of TensorFlow).

Tensorflow
2016/11/16 - 29 -
• Placeholder
– “symbolic” variables to manipulate them during the program execution
– Provide data using feed_dict when we run the code
• Session
– create a session to evaluate the specified symbolic expression.
– Indeed, before we run session, nothing has yet been executed in the code
– TensorFlow is both, an interface to express Machine Learning’s algorithms
and an implementation program to run them
Nodes which contain
variables and operations
Operation
Result of operation

Tensorflow
2016/11/16 - 30 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data

Tensorflow
2016/11/16 - 31 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data

Tensorflow Implementation
2016/11/16 - 32 -
• Example 1. Single-layer NN
Nodes which contain
variables and operations
Result of operation
Session
/ run
training
10 output nodes
784 input nodes

2016/11/16 - 33 -
• Example 2. Multi-layer NN: CNN
MNIST dataset
(handwritten digits)
Training set: 60,000 examples
Testing set: 10,000 examples
http://yann.lecun.com/exdb/mnist/https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Package and MNIST load

2016/11/16 - 34 -
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Define weight and parameters Define CNN structure

2016/11/16 - 35 -
Define variables and functions
Session and summary
Train (weight values to minimize the cost)

2016/11/16 - 36 -
– Tensorboard graph

2016/11/16 - 37 -
– visualization of each layers
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/cnn_mnist_simple.ipynb
Input image (28x28) 1st conv. Layer (28x28) ReLU (28x28) Max Pooling (14x14)
Convolution filter
(5x5)

Seoul National University2016/11/16 - 38 -
Thank you

References
2016/11/16 - 39 -
– http://sebastianraschka.com/faq/docs/visual-backpropagation.html
– http://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-
fully-worked-example/
– https://www.youtube.com/watch?v=E5a3nDpaXjw
– http://deeplearning.net/tutorial/rbm.html
– http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
– https://github.com/sjchoi86/tensorflow-
101/blob/master/notebooks/cnn_mnist_simple.ipynb
– http://www.mdpi.com/1424-8220/16/7/1134/htm
– https://www.toptal.com/machine-learning/an-introduction-to-deep-
learning-from-perceptrons-to-deep-networks
– http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
– http://www.aistudy.com/neural/multilayer_perceptron.htm
– https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-
example/
– https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-
concepts/

References
2016/11/16 - 40 -
– https://en.wikipedia.org/wiki/Deep_learning
– https://en.wikipedia.org/wiki/Feature_engineering
– https://en.wikipedia.org/wiki/Edge_detection
– https://en.wikipedia.org/wiki/Corner_detection
– http://www.erogol.com/brief-history-machine-learning/
– http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622
– http://yann.lecun.com/exdb/mnist/
– http://darkpgmr.tistory.com/116
– http://neuralnetworksanddeeplearning.com/chap6.html

Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM

Similar to Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM (20)

Recently uploaded

Recently uploaded (20)

Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM