Deep learning and Tensorflow implementation
2016.11.16
<Cotents>
Feature Engineering
Deep Neural Network
Tensorflow
Tensorflow Implementation
Future works
References
This slides deals with several things about deep learning.
ex) History of Deep learning, Several difficulties and breakthroughs. Things related to deep learning such as activation functions, perceptrons, Backpropagation, pre-train, drop-out, Convolutional Neural Network (CNN), Simple implementation of Tensor Flow, Python, and so on.
딥러닝, 기계학습, 머신러닝, 텐서플로우, 파이썬
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM
1. Seoul National University
Seoul National University System Health & Risk Management
Deep learning and
Tensorflow implementation
Myungyon Kim
2016. 11. 16.
3. Seoul National University
Feature Engineering
2016/11/16 - 3 -
• Data Representation
– Raw sensory data (ex. Vibration, acceleration, temperature)
: complex, high dimension, redundant, containing noises
hard to discover useful information and insights
– Necessary to find some good, suitable way to represent our data
• Feature engineering
– Process of create and extract features which represent the systems well
– Fundamental to application of machine learning algorithms
– Quality and quantity of the features have great influence on the results
– Based on the physical, domain knowledge and intuition of engineer
4. Seoul National University
Feature Engineering
2016/11/16 - 4 -
• Rotor team case
• Image processing
Edge detection Corner detection HoG (Histogram of Gradients)
Statistical Frequency
Features
System Characteristic
Frequency Features
Time-domain Features
Frequency-domain Features
Kinetic energy
related
Data statistics
related
Waveform
related
RMS, Max,
Mean
Kurtosis,
Skewness
Crest Factor,
Impulse Factor
Gear Mesh Freq.,
Sideband Freq.,
Harmonic Freq.
Freq. Center,
RMS Freq.,
Component Ratio of 1x
https://en.wikipedia.org/wiki/Edge_detection https://en.wikipedia.org/wiki/Corner_detection http://www.mdpi.com/1424-8220/16/7/1134/htm
5. Seoul National University
Feature Engineering
2016/11/16 - 5 -
• Manual Feature engineering
– “Coming up with features is difficult, time-consuming, requires expert
knowledge. ‘Applied machine learning’ is basically feature engineering”
- Andrew Ng (Stanford university, Chief Scientist at Baidu Research)
– Feature engineering (select and extract features) is fundamental for machine
learning, but it is very difficult, tedious and expensive
– For some applications, we may have no idea ‘which features we should use’
• Problems of Current PHM Practices
– A considerable amount of human expertise and knowledge is required.
– Different systems and data require different feature engineering approaches
Features relevant to diagnosis of one system may NOT be suitable to that
of another system.
6. Seoul National University
Feature Engineering
2016/11/16 - 6 -
• Automated Feature Learning
– Need to substitute manual feature engineering with automated feature
learning using deep learning
• Deep Learning (Deep Neural Network)
– Multiple processing layers, with several linear and nonlinear transformation
– Learn and extract hierarchical features automatically
Replace handcrafted, manually extracted features
– Mimic the information processing and communication patterns in a nervous
system of human brain (Inspired by advances in neuroscience)
– Various algorithms and architectures (DNN, CNN, DBN, RNN) *
– Applied to various field (computer vision, speech recognition, natural
language processing, bioinformatics)
* DNN: Deep Neural Network
CNN: Convolutional Neural Network
DBN: Deep Belief Network
RNN: Recurrent Neural Network
7. Seoul National University
Deep Neural Network
2016/11/16 - 7 -
• History of Neural Network
– Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– “Perceptrons” – Marvin Minsky, 1969
– Multilayer Neural Network (composition of perceptrons)
– Backpropagation – “Learning representations by back-propagating errors”,
nature, Geoffrey Hinton, 1986/ Paul Werbos 1974
– Several problems in multilayer, deep Neural Network
: hard to train, vanishing gradient, local minima, overfitting
– Pre-training (greedy pre-train using RBM) *
clever way to initialize weight values
– ReLU (Rectifier)
– Drop out
– CNN (Convolutional Neural Networks)
– Computing power/ GPU (Graphic Processing Unit)
– Large number of digital data
* RBM: Restricted Boltzmann Machine
1st winter
2nd winter
8. Seoul National University
Deep Neural Network
2016/11/16 - 8 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– Most simple case of feedforward neural network
– Linear, binary classifier
– Train neural network = obtain suitable, correct weights and bias values
– He thought that his perceptron is able to classify/recognize everything in the
future, with highly developed hardware
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
or
1
10
+ +
+-
and
1
10
- +
--
And perceptron
w1=1, w2=1,
Ө=1.5
Or perceptron
w1=1, w2=1,
Ө=0.5
9. Seoul National University
Deep Neural Network
2016/11/16 - 9 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
Activation function
– Activated (input > threshold Ө) or not (input < Ө)
– Sigmoid function (activation function)
to train neural network, activation function should be differentiable
– 𝑠𝑖𝑔 𝑋 =
1
1+𝑒−𝑋
– Nonlinear activation function
“squashing” linear net input within specific region
& add nonlinear properties to NN
– allow networks to compute nontrivial problems using only a small number of
nodes
https://commons.wikimedia.org/wiki/File:Sigmoid-function-2.svg
10. Seoul National University
Deep Neural Network
2016/11/16 - 10 -
• “Perceptrons” – Marvin Minsky, 1969
– Single layer perceptron cannot solve the nonlinear classification problem
– Xor (exclusive or): logical operation that outputs true only when input differ
– To solve nonlinear problem, MLP(multilayer perceptrons, multilayer neural
network) is needed
– However, it is not easy (very hard) to train MLP properly
http://www.aistudy.com/neural/multilayer_perceptron.htm
XOR
1
10
- +
-+
cannot solved by
using single perceptron
(linear classifier)
https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113
1st winter
11. Seoul National University
Deep Neural Network
2016/11/16 - 11 -
• “Perceptrons” – Marvin Minsky, 1969
Examples) Neural networks for logical operation
http://toritris.weebly.com/perceptron-2-logical-operations.html
AND OR
XOR
Single layer perceptron is ok
Multi-layer perceptron is needed
(*different weight and threshold can be used.
Ex) different values for “and”, “or” as I explained before)
XOR
1
10
- +
-+
12. Seoul National University
Deep Neural Network
2016/11/16 - 12 -
• Backpropagation
(“Learning representations by back-propagating errors”, nature, Geoffrey Hinton, 1986/
Paul Werbos 1974)
– “Backward propagation of errors”
– Common method to train NN with multiple layers
– error gradient with respect to each weights
Using chain rule, quantify the influence of each weights to final error
– Using optimization method such as gradient descent algorithm
Error gradient
Calculation of partial derivative of cost
with respect to specific weight using chain rule
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
13. Seoul National University
Deep Neural Network
2016/11/16 - 13 -
• Backpropagation
– Procedure
1. Initialize the weights randomly
2. Forward propagation (through the neural network, to obtain output & cost)
3. Backward propagation (influence of each weights on errors)
4. Weight update (𝑊 ≔ 𝑊 − 𝛼
𝜕
𝜕𝑊
𝑐𝑜𝑠𝑡 𝑊 )
repeat those steps until the performance of the network is satisfied
• Cost function (Error)
𝐶𝑜𝑠𝑡 =
1
2𝑚
𝑖=1
𝑚
𝑡𝑎𝑟𝑔𝑒𝑡 𝑖
− 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖 2
• Gradient descent algorithm
- find W, b to minimize the cost using delta rule
- used in many minimization problems
Convex function
guarantee global minima
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
14. Seoul National University
Deep Neural Network
2016/11/16 - 14 -
• Several problems in multilayer, deep Neural Network
1. Vanishing gradient
– Error gradient: multiplication of gradients in backward direction
for early layers, error gradients vanish
– Back-propagation fail to train earlier-layer parameters properly
– Early layers are responsible for detecting the simple patterns and the
building blocks (ex. Edge for facial recognition)
when early layers are not trained properly, the result will be inaccurate
https://www.youtube.com/watch?v=E5a3nDpaXjwhttp://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-fully-worked-example/
Max: 0.25
15. Seoul National University
Deep Neural Network
2016/11/16 - 15 -
• Several problems in multilayer, deep Neural Network
2. Local minima
– Using gradient descent algorithm to train the deep neural network, it is
possible to get stuck in local minima
– Cannot obtain the optimum solution
https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
16. Seoul National University
Deep Neural Network
2016/11/16 - 16 -
• Several problems in multilayer, deep Neural Network
3. Overfitting
– complex model: too many parameters relative to the number of observation
– Learn or train not only true relation, but also noise and random errors
– overacts to minor fluctuations in the given training data
Poor predictive performance
4. Hard to train correctly, takes long time
use other machine learning algorithms, such as support vector machine
http://www.slideshare.net/fcollova/introduction-to-neural-network
Under-fitting Just right! overfitting
2nd winter
17. Seoul National University
Deep Neural Network
2016/11/16 - 17 -
• Pre-training using RBM – Geoffrey Hinton
*clever way to initialize weight values
– RBM: Restricted, special case of Boltzmann machine/ undirected, generative
energy-based model with a visible input layer and a hidden layer/
connections between the layers but not within layers
– By greedy, layer-wise training* of RBM and Stacking them (*Bengio, 2007)
initialize the weights of DNN well (pre-training)
faster convergence of the fine-tuning and improved performance
– 1. pre-training: Learn generally useful feature detector (RBM, AE)
2. fine-tuning: whole network is trained further by supervised BP
http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks
RBM
http://deeplearning.net/tutorial/rbm.html
Layer-wise training of a DBN * (DBN: Deep Belief Network)
18. Seoul National University
Deep Neural Network
2016/11/16 - 18 -
• ReLU (Rectified Linear Unig, Rectifier)
– Activation function defined as
𝑓 𝑥 = 𝑚𝑎𝑥(0, 𝑥)
– Powerful activation function which substitute the sigmoid function
– Sigmoid: derivative is smaller than 0.25 vanishing gradient problem
ReLU: 0 or 1 error transferred 100%: no vanishing gradient!
– Sparse activation fast and effective training of DNN with large datasets
No need to use unsupervised pre-training (RBM, AE)
http://nn.readthedocs.io/en/latest/transfer/
ReLU and its derivative
19. Seoul National University
Deep Neural Network
2016/11/16 - 19 -
• Dropout
– Regularization technique : Prevent co-adaptation on training data
– At each training step, individual nodes are either “dropped out” of the net
with probability 1 − 𝑝 or kept with 𝑝 reduced network (less parameters)
– By avoiding training all nodes on all training data, dropout reduces
overfitting in NN can be thought as ensemble of smaller NNs
– Significantly improves the speed of training
– Reduce the tightly fitted interactions between nodes
learn more robust features which better generalize to new data
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014), Hinton
Drop-out
20. Seoul National University
Deep Neural Network
2016/11/16 - 20 -
• Convolutional Neural Network (CNN)
– A type of feed-forward artificial neural network
– Inspired by the connectivity pattern between neurons of the visual cortex*
– Individual cortical neurons respond to stimuli in a small region of space
(receptive field)
– The receptive fields of different neurons partially overlap such that they tile
the visual field convolution operation mathematically.
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
*visual cortex: 대뇌 시각 피질
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
21. Seoul National University
Deep Neural Network
2016/11/16 - 21 -
• Convolutional Neural Network (CNN)
MLP
– Multilayer perceptron suffer from curse of dimensionality due to full
connectivity between nodes (large number of weights)
– not take into account the spatial structure of data,
treating input pixels far apart and close on the same way
– full connectivity of neurons is wasteful in the image recognition, and the
huge number of parameters quickly leads to overfitting
CNN
– mitigate the challenges posed by the MLP architecture by exploiting the
spatially local correlation present in images (local connectivity)
– Each filter (weight) is shared across the entire visual field
Weight sharing reduces the number of parameters dramatically, thus
lower the memory requirements and training time.
22. Seoul National University
Deep Neural Network
2016/11/16 - 22 -
• Convolutional Neural Network (CNN)
MLP (Deep neural network)
Nodes are fully connected with
adjacent layer
CNN
Locally connected (receptive field)
http://neuralnetworksanddeeplearning.com/chap6.html
23. Seoul National University
Deep Neural Network
2016/11/16 - 23 -
• Structure of CNN
– Convolutional layer
1) depth: number of filter
2) filter size: area of filter (number of weight parameters)
3) stride: filter movement, inverse proportional to conv. layer’s dimension
4) zero-padding: pad the input with zeros on the border, control output
volume spatial size
convolutional layer size is depend on these parameters
– ReLU layer
NL activation function which increases the NL properties
– Pooling layer
nonlinear down-sampling (ex. Max pooling, average pooling)
– Fully connected layer
after several conv. and pooling layers high-level reasoning via FC layer
fully connected to all activations in the previous layer (same as common
DNN)
24. Seoul National University
Deep Neural Network
2016/11/16 - 24 -
• Structure of CNN
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
25. Seoul National University
Deep Neural Network
2016/11/16 - 25 -
• Structure of CNN
Convolutional layer
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
26. Seoul National University
Deep Neural Network
2016/11/16 - 26 -
• Structure of CNN
Convolutional layer
Pooling layer
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
stride=1
stride Zero-padding
27. Seoul National University
Deep Neural Network
2016/11/16 - 27 -
• Examples of CNN
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
28. Seoul National University
Tensorflow
2016/11/16 - 28 -
• Tensorflow basics
– Open source package for machine learning and deep learning developed by
Google Brain Team (within Google’s Machine Intelligence research organization)
– Library for numerical computation using data flow graph
– graph structure that contains all the information, operations and data .
– Node: represent the mathematical operations, points of data entry, output
results, or read/write variables.
– Edge: describe the relationships between nodes with their inputs and
outputs, carry tensors (the basic data structure of TensorFlow).
29. Seoul National University
Tensorflow
2016/11/16 - 29 -
• Placeholder
– “symbolic” variables to manipulate them during the program execution
– Provide data using feed_dict when we run the code
• Session
– create a session to evaluate the specified symbolic expression.
– Indeed, before we run session, nothing has yet been executed in the code
– TensorFlow is both, an interface to express Machine Learning’s algorithms
and an implementation program to run them
Nodes which contain
variables and operations
Operation
Result of operation
30. Seoul National University
Tensorflow
2016/11/16 - 30 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data
31. Seoul National University
Tensorflow
2016/11/16 - 31 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data
32. Seoul National University
Tensorflow Implementation
2016/11/16 - 32 -
• Example 1. Single-layer NN
Nodes which contain
variables and operations
Result of operation
Session
/ run
training
10 output nodes
784 input nodes
33. Seoul National University
Tensorflow Implementation
2016/11/16 - 33 -
• Example 2. Multi-layer NN: CNN
MNIST dataset
(handwritten digits)
Training set: 60,000 examples
Testing set: 10,000 examples
http://yann.lecun.com/exdb/mnist/https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Package and MNIST load
34. Seoul National University
Tensorflow Implementation
2016/11/16 - 34 -
• Example 2. Multi-layer NN: CNN
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Define weight and parameters Define CNN structure
35. Seoul National University
Tensorflow Implementation
2016/11/16 - 35 -
• Example 2. Multi-layer NN: CNN
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Define variables and functions
Session and summary
Train (weight values to minimize the cost)
36. Seoul National University
Tensorflow Implementation
2016/11/16 - 36 -
• Example 2. Multi-layer NN: CNN
– Tensorboard graph
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
37. Seoul National University
Tensorflow Implementation
2016/11/16 - 37 -
• Example 2. Multi-layer NN: CNN
– visualization of each layers
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/cnn_mnist_simple.ipynb
Input image (28x28) 1st conv. Layer (28x28) ReLU (28x28) Max Pooling (14x14)
Convolution filter
(5x5)