SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
Neural Networks II
Sang Jun Lee
Ph.D. candidate, POSTECH
Email: lsj4u0208@postech.ac.kr
EECE695J 전자전기공학특론J(딥러닝기초및철강공정에의활용) – LECTURE 5 (2017. 9. 28)
2
▣ Lecture 4: Neural Network I
1-page Review
Input layer Output layerHidden layer
Perceptron Multilayer perceptron (MLP)
Backpropagation Vanishing gradient
Local gradient의 곱을 이용하여 parameter gradient 계산 Deep Neural Network → parameter gradient ≅ 0
XOR example
3
Vanishing Gradient
Hidden layer의 neuron 개수를 20개로 setting
Output layer의 노드 개수는
분류하고자 하는 class의 수로
결정
2-layer network
4
Vanishing Gradient
2-layer network
(1 hidden layer + 1 output layer)
학습이 진행됨에 따라 loss가 감소하는 것을 확인
6-layer network
5
Vanishing Gradient
6-layer network
6
Vanishing Gradient
!?
Training of a Neural Network
Activation functions
Data preprocessing
Regularization
Tips for training a neural network
7
Contents
Sigmoid function
- Saturated neurons “kill” the gradient (크기가 작거나 큰 입력 X에 대한 gradient ≅ 0)
- Sigmoid outputs are always positive
8
Activation Functions
𝑑𝑑
𝑑𝑑𝑑𝑑
𝜎𝜎 𝑥𝑥 = 1 − 𝜎𝜎 𝑥𝑥 ⋅ 𝜎𝜎 𝑥𝑥 ≤ 1
• 각 layer의 local gradient가 곱해 짐에 따라 parameter에 대한 gradient 감소
• 입력 데이터에 의한 학습효과 x
Sigmoid function
Sigmoid outputs are always positive
9
Activation Functions
∇ 𝑤𝑤 𝐿𝐿 𝑥𝑥, 𝑦𝑦 = 𝒙𝒙 ⋅ 𝜎𝜎 𝑤𝑤 𝑇𝑇
𝑥𝑥 ⋅ 1 − 𝜎𝜎 𝑤𝑤 𝑇𝑇
𝑥𝑥 ⋅ 2 𝑦𝑦 − 𝜎𝜎 𝑤𝑤 𝑇𝑇
𝑥𝑥
𝜎𝜎(𝑥𝑥)
𝑙𝑙 𝑙 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙
𝑥𝑥0
𝑥𝑥1
𝑥𝑥𝑑𝑑
𝐿𝐿(𝑥𝑥, 𝑦𝑦)
𝑦𝑦
𝐿𝐿 𝑥𝑥, 𝑦𝑦 = 𝑦𝑦 − 𝜎𝜎 𝑤𝑤 𝑇𝑇
𝑥𝑥
2
Parameter gradient (vector)의 component가 모두 + 또는 –
따라서 +/- 부호가 적절히 섞여있는 zero-centered data가 좋다
tanh
- [-1,1]의 range로의 mapping
- Zero-centered
- Saturated neurons kill the gradients
10
Activation Functions
ReLU
- Computationally efficient
- Does not saturated (in + region)
- Always positive output
- Dead (output) neuron will never activate (not updated)
(slightly positive biases are commonly used)
11
Activation Functions
ReLU
𝑥𝑥0
𝑥𝑥1
𝑥𝑥𝑑𝑑
Leaky ReLU
- 𝑓𝑓 𝑥𝑥 = max(𝛼𝛼𝛼𝛼, 𝑥𝑥)
- 𝑥𝑥의 부호에 따라 +1 또는 𝛼𝛼의 local gradient를
backpropagation 과정에 반영
Activation function에 따른 영상 분류 성능 비교 (CIFAR-10)
(* VLReLU: Very Leaky ReLU, Mishkin et al. 2015)
12
Activation Functions
Mean subtraction
- Data가 모두 양수이면 parameter gradient (vector)의 component의 부호가 모두 + 또는 –
- Zero-centered data:
�𝑋𝑋 = 𝑋𝑋 − 𝜇𝜇𝑋𝑋
- 주의 할 점: 𝜇𝜇𝑋𝑋를 구할 때 training data만 사용하며, validation 또는 test 할 때에도 𝜇𝜇𝑋𝑋를 이용하여 data를
preprocessing
13
Data Preprocessing
Normalization
�𝑋𝑋 =
𝑋𝑋 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
�𝑋𝑋 =
2 𝑋𝑋 − 𝑋𝑋 𝑚𝑚𝑚𝑚 𝑚𝑚
𝑋𝑋𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑋𝑋 𝑚𝑚𝑚𝑚 𝑚𝑚
− 1 ∈ [−1, +1]
- 참고: 영상 데이터에 대해서는 일반적으로 zero-center를 preprocessing으로 사용
14
Data Preprocessing
RBM (Restricted Boltzmann Machine)
A bipartite graph, no connection within a layer
15
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
16
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
17
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
18
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
19
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
20
Weight Initialization
DBN (Deep Belief Network)
Unsupervised learning on adjacent two layers as a pre-training step (weight initialization)
21
Weight Initialization
DBN (Deep Belief Network)
Minimize KL divergence between input and recreated input
22
Weight Initialization
DBN (Deep Belief Network)
Pre-training
23
Weight Initialization
DBN (Deep Belief Network)
Pre-training
24
Weight Initialization
DBN (Deep Belief Network)
Pre-training
25
Weight Initialization
DBN (Deep Belief Network)
Fine tuning
26
Weight Initialization
No need to use complicated RBM for weight initialization
Simple methods for weight initialization
Make sure the weights are ‘just right’ (not too small & not too big)
- Small random number (ex. Gaussian with zero mean and 10−2
standard deviation)
𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐
)
- Xavier initialization: X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward
neural networks,” in International conference on artificial intelligence and statistics, 2010
𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐
)/ 𝒏𝒏
- He’s initialization: K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing
Human-Level Performance on ImageNet Classification,” 2015
𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐
)/ 𝟐𝟐𝒏𝒏
27
Weight Initialization
Xavier initialization:
𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐
)/𝒏𝒏
- 𝑠𝑠 = ∑𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖 (assume that 𝑤𝑤𝑖𝑖 and 𝑥𝑥𝑖𝑖 are zero-mean & i.i.d random variable)
- 𝑉𝑉𝑉𝑉𝑉𝑉 𝑠𝑠 = 𝑉𝑉𝑉𝑉𝑉𝑉 ∑𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖
= ∑𝑖𝑖 𝑉𝑉𝑉𝑉𝑉𝑉(𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖)
= ∑𝑖𝑖 𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤𝑖𝑖 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥𝑖𝑖)
= 𝑛𝑛 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥
28
Weight Initialization
s
𝑥𝑥0
𝑥𝑥1
𝑥𝑥𝑑𝑑
29
Optimization
ReLU
𝑥𝑥0
𝑥𝑥1
𝑥𝑥𝑑𝑑
Stochastic gradient descent (SGD)
What if loss changes quickly in one direction and slowly in another?
Very slow progress along shallow dimension, jitter along steep direction
Local minima or saddle point → zero gradient
30
Optimization
SGD with momentum
Build up “velocity” as a running mean of gradients
𝜌𝜌 gives “friction” (typically 𝜌𝜌 = 0.9 or 0.99)
31
Optimization
AdaGrad
Adaptive gradient algorithm: a modified stochastic gradient descent with per-parameter learning rate
Element-wise scaling of the gradient based on historical sum of squares in each dimension
32
Optimization
RMSProp
Root Mean Square Propagation
AdaGrad + running average
33
Optimization
Adam
Adaptive Moment Estimation
일반적으로, 𝛽𝛽1 = 0.9, 𝛽𝛽2 = 0.999, 𝜂𝜂 = 10−3
정도의 값을 사용
34
Optimization
In TensorFlow...
35
Optimization
The problem of overfitting
Basic idea:
- Add randomness
- Marginalize the noise
36
Regularization
Training accuracy
Test accuracy
Model ensemble
- Train multiple independent models
- Average their results at test time
37
Regularization
Reference : http://www.slideshare.net/sasasiapacific/ipb-improving-the-models-predictive-power-with-ensemble-approaches
Dropout
- In training step, randomly set some neurons to zero (hyper-parameter: drop probability)
- Kind of ensemble model
38
Regularization
Dropout (test time)
Consider a single neuron
In standard neural network: 𝑎𝑎 = 𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦
Want to obtain the expectation: 𝑦𝑦 = 𝑓𝑓 𝑥𝑥 = 𝐸𝐸𝑧𝑧 𝑓𝑓 𝑥𝑥, 𝑧𝑧 = ∫ 𝑝𝑝 𝑧𝑧 𝑓𝑓 𝑥𝑥, 𝑧𝑧 𝑑𝑑𝑑𝑑
At test time, we have: 𝐸𝐸 𝑎𝑎 = 𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦
Applying dropout with the drop probability of 0.5:
𝐸𝐸 𝑎𝑎 =
1
4
𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 +
1
4
𝑤𝑤1 𝑥𝑥 + 0𝑦𝑦 +
1
4
0𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 +
1
4
0𝑥𝑥 + 0𝑦𝑦 =
1
2
(𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦)
At test time, multiply by dropout probability
39
Regularization
DropConnect
In training step, randomly set some weights to zero
40
Regularization
DropConnect
In training step, randomly set some weights to zero
41
Regularization
Stochastic Depth
In training step, randomly drop layers
42
Regularization
Data augmentation
Crops / scales
- Original image: 256x480
- Sample random 224x224 patches
Randomize contrast and brightness
43
Regularization
44
ReLu, Xavier, Dropout
45
ReLu, Xavier, Dropout
46
ReLu, Xavier, Dropout
47
ReLu, Xavier, Dropout
Learning rate
48
Practical Tips for training a Neural Network
Transfer learning
49
Practical Tips for training a Neural Network
Weight initialization
- ReLU
- Leaky ReLU
Optimization
- Adam optimizer
- ...
Regularization
- Dropout or batch normalization is generally sufficient
50
Practical Tips for training a Neural Network
Activation functions
Sigmoid, tanh, ReLU, Leaky ReLU
Data preprocessing
Mean subtraction, normalization
Regularization
Model ensemble, dropout, data augmentation, ...
Tips for training a neural network
Learning rate, transfer learning
51
Summary
Computer Vision
영상 데이터에 대한 이해
Convolutional Neural Network
영상에 CNN이 효과적인 이유
52
Preview (Lecture 6)

Más contenido relacionado

La actualidad más candente

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowAltoros
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 

La actualidad más candente (20)

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 

Similar a Lecture 5: Neural Networks II

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2arogozhnikov
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final finaldinesh malla
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
High performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationHigh performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationtaeseon ryu
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Boundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceBoundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceIJDKP
 
4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalizationDonghoon Park
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Training Neural Networks.pptx
Training Neural Networks.pptxTraining Neural Networks.pptx
Training Neural Networks.pptxksghuge
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 

Similar a Lecture 5: Neural Networks II (20)

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
High performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationHigh performance large-scale image recognition without normalization
High performance large-scale image recognition without normalization
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Boundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceBoundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequence
 
4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Training Neural Networks.pptx
Training Neural Networks.pptxTraining Neural Networks.pptx
Training Neural Networks.pptx
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 

Más de Sang Jun Lee

[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic SegmentationSang Jun Lee
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningSang Jun Lee
 
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised LearningSang Jun Lee
 
Lecture 1: Introduction to Python and TensorFlow
Lecture 1: Introduction to Python and TensorFlowLecture 1: Introduction to Python and TensorFlow
Lecture 1: Introduction to Python and TensorFlowSang Jun Lee
 

Más de Sang Jun Lee (6)

[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
 
Lecture 2: Supervised Learning
Lecture 2: Supervised LearningLecture 2: Supervised Learning
Lecture 2: Supervised Learning
 
Lecture 1: Introduction to Python and TensorFlow
Lecture 1: Introduction to Python and TensorFlowLecture 1: Introduction to Python and TensorFlow
Lecture 1: Introduction to Python and TensorFlow
 

Último

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 

Último (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Lecture 5: Neural Networks II

  • 1. Neural Networks II Sang Jun Lee Ph.D. candidate, POSTECH Email: lsj4u0208@postech.ac.kr EECE695J 전자전기공학특론J(딥러닝기초및철강공정에의활용) – LECTURE 5 (2017. 9. 28)
  • 2. 2 ▣ Lecture 4: Neural Network I 1-page Review Input layer Output layerHidden layer Perceptron Multilayer perceptron (MLP) Backpropagation Vanishing gradient Local gradient의 곱을 이용하여 parameter gradient 계산 Deep Neural Network → parameter gradient ≅ 0
  • 3. XOR example 3 Vanishing Gradient Hidden layer의 neuron 개수를 20개로 setting Output layer의 노드 개수는 분류하고자 하는 class의 수로 결정
  • 4. 2-layer network 4 Vanishing Gradient 2-layer network (1 hidden layer + 1 output layer) 학습이 진행됨에 따라 loss가 감소하는 것을 확인
  • 7. Training of a Neural Network Activation functions Data preprocessing Regularization Tips for training a neural network 7 Contents
  • 8. Sigmoid function - Saturated neurons “kill” the gradient (크기가 작거나 큰 입력 X에 대한 gradient ≅ 0) - Sigmoid outputs are always positive 8 Activation Functions 𝑑𝑑 𝑑𝑑𝑑𝑑 𝜎𝜎 𝑥𝑥 = 1 − 𝜎𝜎 𝑥𝑥 ⋅ 𝜎𝜎 𝑥𝑥 ≤ 1 • 각 layer의 local gradient가 곱해 짐에 따라 parameter에 대한 gradient 감소 • 입력 데이터에 의한 학습효과 x
  • 9. Sigmoid function Sigmoid outputs are always positive 9 Activation Functions ∇ 𝑤𝑤 𝐿𝐿 𝑥𝑥, 𝑦𝑦 = 𝒙𝒙 ⋅ 𝜎𝜎 𝑤𝑤 𝑇𝑇 𝑥𝑥 ⋅ 1 − 𝜎𝜎 𝑤𝑤 𝑇𝑇 𝑥𝑥 ⋅ 2 𝑦𝑦 − 𝜎𝜎 𝑤𝑤 𝑇𝑇 𝑥𝑥 𝜎𝜎(𝑥𝑥) 𝑙𝑙 𝑙 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑥𝑥0 𝑥𝑥1 𝑥𝑥𝑑𝑑 𝐿𝐿(𝑥𝑥, 𝑦𝑦) 𝑦𝑦 𝐿𝐿 𝑥𝑥, 𝑦𝑦 = 𝑦𝑦 − 𝜎𝜎 𝑤𝑤 𝑇𝑇 𝑥𝑥 2 Parameter gradient (vector)의 component가 모두 + 또는 – 따라서 +/- 부호가 적절히 섞여있는 zero-centered data가 좋다
  • 10. tanh - [-1,1]의 range로의 mapping - Zero-centered - Saturated neurons kill the gradients 10 Activation Functions
  • 11. ReLU - Computationally efficient - Does not saturated (in + region) - Always positive output - Dead (output) neuron will never activate (not updated) (slightly positive biases are commonly used) 11 Activation Functions ReLU 𝑥𝑥0 𝑥𝑥1 𝑥𝑥𝑑𝑑
  • 12. Leaky ReLU - 𝑓𝑓 𝑥𝑥 = max(𝛼𝛼𝛼𝛼, 𝑥𝑥) - 𝑥𝑥의 부호에 따라 +1 또는 𝛼𝛼의 local gradient를 backpropagation 과정에 반영 Activation function에 따른 영상 분류 성능 비교 (CIFAR-10) (* VLReLU: Very Leaky ReLU, Mishkin et al. 2015) 12 Activation Functions
  • 13. Mean subtraction - Data가 모두 양수이면 parameter gradient (vector)의 component의 부호가 모두 + 또는 – - Zero-centered data: �𝑋𝑋 = 𝑋𝑋 − 𝜇𝜇𝑋𝑋 - 주의 할 점: 𝜇𝜇𝑋𝑋를 구할 때 training data만 사용하며, validation 또는 test 할 때에도 𝜇𝜇𝑋𝑋를 이용하여 data를 preprocessing 13 Data Preprocessing
  • 14. Normalization �𝑋𝑋 = 𝑋𝑋 − 𝜇𝜇𝑋𝑋 𝜎𝜎𝑋𝑋 �𝑋𝑋 = 2 𝑋𝑋 − 𝑋𝑋 𝑚𝑚𝑚𝑚 𝑚𝑚 𝑋𝑋𝑚𝑚𝑚𝑚𝑚𝑚 − 𝑋𝑋 𝑚𝑚𝑚𝑚 𝑚𝑚 − 1 ∈ [−1, +1] - 참고: 영상 데이터에 대해서는 일반적으로 zero-center를 preprocessing으로 사용 14 Data Preprocessing
  • 15. RBM (Restricted Boltzmann Machine) A bipartite graph, no connection within a layer 15 Weight Initialization
  • 16. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 16 Weight Initialization
  • 17. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 17 Weight Initialization
  • 18. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 18 Weight Initialization
  • 19. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 19 Weight Initialization
  • 20. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 20 Weight Initialization
  • 21. DBN (Deep Belief Network) Unsupervised learning on adjacent two layers as a pre-training step (weight initialization) 21 Weight Initialization
  • 22. DBN (Deep Belief Network) Minimize KL divergence between input and recreated input 22 Weight Initialization
  • 23. DBN (Deep Belief Network) Pre-training 23 Weight Initialization
  • 24. DBN (Deep Belief Network) Pre-training 24 Weight Initialization
  • 25. DBN (Deep Belief Network) Pre-training 25 Weight Initialization
  • 26. DBN (Deep Belief Network) Fine tuning 26 Weight Initialization
  • 27. No need to use complicated RBM for weight initialization Simple methods for weight initialization Make sure the weights are ‘just right’ (not too small & not too big) - Small random number (ex. Gaussian with zero mean and 10−2 standard deviation) 𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐 ) - Xavier initialization: X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in International conference on artificial intelligence and statistics, 2010 𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐 )/ 𝒏𝒏 - He’s initialization: K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” 2015 𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐 )/ 𝟐𝟐𝒏𝒏 27 Weight Initialization
  • 28. Xavier initialization: 𝑾𝑾~𝑵𝑵(𝟎𝟎, 𝝈𝝈𝟐𝟐 )/𝒏𝒏 - 𝑠𝑠 = ∑𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖 (assume that 𝑤𝑤𝑖𝑖 and 𝑥𝑥𝑖𝑖 are zero-mean & i.i.d random variable) - 𝑉𝑉𝑉𝑉𝑉𝑉 𝑠𝑠 = 𝑉𝑉𝑉𝑉𝑉𝑉 ∑𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖 = ∑𝑖𝑖 𝑉𝑉𝑉𝑉𝑉𝑉(𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖) = ∑𝑖𝑖 𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤𝑖𝑖 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥𝑖𝑖) = 𝑛𝑛 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑤𝑤 ⋅ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑥𝑥 28 Weight Initialization s 𝑥𝑥0 𝑥𝑥1 𝑥𝑥𝑑𝑑
  • 30. Stochastic gradient descent (SGD) What if loss changes quickly in one direction and slowly in another? Very slow progress along shallow dimension, jitter along steep direction Local minima or saddle point → zero gradient 30 Optimization
  • 31. SGD with momentum Build up “velocity” as a running mean of gradients 𝜌𝜌 gives “friction” (typically 𝜌𝜌 = 0.9 or 0.99) 31 Optimization
  • 32. AdaGrad Adaptive gradient algorithm: a modified stochastic gradient descent with per-parameter learning rate Element-wise scaling of the gradient based on historical sum of squares in each dimension 32 Optimization
  • 33. RMSProp Root Mean Square Propagation AdaGrad + running average 33 Optimization
  • 34. Adam Adaptive Moment Estimation 일반적으로, 𝛽𝛽1 = 0.9, 𝛽𝛽2 = 0.999, 𝜂𝜂 = 10−3 정도의 값을 사용 34 Optimization
  • 36. The problem of overfitting Basic idea: - Add randomness - Marginalize the noise 36 Regularization Training accuracy Test accuracy
  • 37. Model ensemble - Train multiple independent models - Average their results at test time 37 Regularization Reference : http://www.slideshare.net/sasasiapacific/ipb-improving-the-models-predictive-power-with-ensemble-approaches
  • 38. Dropout - In training step, randomly set some neurons to zero (hyper-parameter: drop probability) - Kind of ensemble model 38 Regularization
  • 39. Dropout (test time) Consider a single neuron In standard neural network: 𝑎𝑎 = 𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 Want to obtain the expectation: 𝑦𝑦 = 𝑓𝑓 𝑥𝑥 = 𝐸𝐸𝑧𝑧 𝑓𝑓 𝑥𝑥, 𝑧𝑧 = ∫ 𝑝𝑝 𝑧𝑧 𝑓𝑓 𝑥𝑥, 𝑧𝑧 𝑑𝑑𝑑𝑑 At test time, we have: 𝐸𝐸 𝑎𝑎 = 𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 Applying dropout with the drop probability of 0.5: 𝐸𝐸 𝑎𝑎 = 1 4 𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 + 1 4 𝑤𝑤1 𝑥𝑥 + 0𝑦𝑦 + 1 4 0𝑥𝑥 + 𝑤𝑤2 𝑦𝑦 + 1 4 0𝑥𝑥 + 0𝑦𝑦 = 1 2 (𝑤𝑤1 𝑥𝑥 + 𝑤𝑤2 𝑦𝑦) At test time, multiply by dropout probability 39 Regularization
  • 40. DropConnect In training step, randomly set some weights to zero 40 Regularization
  • 41. DropConnect In training step, randomly set some weights to zero 41 Regularization
  • 42. Stochastic Depth In training step, randomly drop layers 42 Regularization
  • 43. Data augmentation Crops / scales - Original image: 256x480 - Sample random 224x224 patches Randomize contrast and brightness 43 Regularization
  • 48. Learning rate 48 Practical Tips for training a Neural Network
  • 49. Transfer learning 49 Practical Tips for training a Neural Network
  • 50. Weight initialization - ReLU - Leaky ReLU Optimization - Adam optimizer - ... Regularization - Dropout or batch normalization is generally sufficient 50 Practical Tips for training a Neural Network
  • 51. Activation functions Sigmoid, tanh, ReLU, Leaky ReLU Data preprocessing Mean subtraction, normalization Regularization Model ensemble, dropout, data augmentation, ... Tips for training a neural network Learning rate, transfer learning 51 Summary
  • 52. Computer Vision 영상 데이터에 대한 이해 Convolutional Neural Network 영상에 CNN이 효과적인 이유 52 Preview (Lecture 6)