2. About me
Data Science Freelancer
Machine Learning
Programmer
@BasiaFusinska
Barbara@Fusinska.com
BarbaraFusinska.com
https://katacoda.com/basiafusinska/courses/deep-learning-with-tensorflow
3. Agenda
• Introduction to Machine Learning
• Main concepts of Deep Learning
• TensorFlow Basics
• Building Neural Networks with TensorFlow
• MNIST Classification
• Convolutional Networks
• TensorFlow abstraction levels
6. Movies Genres
Title # Kisses # Kicks Genre
Taken 3 47 Action
Love story 24 2 Romance
P.S. I love you 17 3 Romance
Rush hours 5 51 Action
Bad boys 7 42 Action
Question:
What is the genre of
Gone with the wind
?
7. Data-based classification
Id Feature 1 Feature 2 Class
1. 3 47 A
2. 24 2 B
3. 17 3 B
4. 5 51 A
5. 7 42 A
Question:
What is the class of the entry
with the following features:
F1: 31, F2: 4
?
13. Supervised Machine Learning workflow
Clean data Data split
Machine Learning
algorithm
Trained model Evaluation
Preprocess
data
Training
data
Test data
20. Task: Logistic
Regression
• Load the dataset
• Split for the train and test set
• Train the algorithm, use Logistic
Regression
• Evaluate the algorithm
31. Task: Forward
Propagation
• Read data for both linear and
nonlinear examples
• Write the forward propagation
function
• Initialise weights and biases
• Perform forward propagation on
both datasets
35. Task: Hidden
layers
• Use the non-linear dataset
• Forward propagation function for
hidden layer (use tanh)
• Prepare the weights abiases for
both layers
• Set up forward propagation for the
whole network
36.
37. Optimisation problem
• Loss function: 𝐽(𝜃)
• Minimising/Maximising
• Local extremes
• Finding the value or the function
or the arguments
• ML problems – usually
converted to the optimisation
problems
38. Gradient Descent
• Climbing the hill
• Iterative process
• Learning rate
• Tuning techniques
• Initialisation values matter
𝑍(𝑚) = 𝑍(𝑚−1) − 𝛼
𝜕𝐿
𝜕𝑍
39. Loss function for the classification process
𝑍(𝐿) = 𝑜(𝐿−1) ∙ 𝑊 𝐿
𝑇
+ 𝑏 𝐿
𝑜(𝐿) = 𝜑(𝐿)(𝑍(𝐿))
Cross entropy:
𝐽 = −
1
𝑚
(𝑌 ∗ log 𝑜 𝐿 + (1 − Y) ∗ log(1 − 𝑜(𝐿)))
47. Task: Optimising
the function
• Set up the computation graph for
the quadratic function
• Define the Optimiser (use Gradient
Descent)
• Initialise the session and run the
optimisation
• Print the results and close the
session
𝑦 = 𝑥2 − 10𝑥 + 24
48. Neural Network training in TensorFlow
Neural Network Architecture
Input
(Placeholder)
W,b(Variables)
Loss
function
Optimiser
50. Task: TensorFlow
Deep Network
Training
• Set up placeholders for the input
data and the labels
• Define hidden layer, connect it
with the input placeholders
• Define the output layer, connect it
with the hidden one
• Set up the feed_dict with the
actual data
54. Working with batches
• Accuracy vs. time
• Batches size becomes the
hyperparameter
• Stochastic Gradient Descent
• True gradient is approximated by a
gradient at a single example
59. Task: MNIST
Dataset Deep
Learning
• Load MNIST dataset
• Define placeholders for the input
data and labels
• Set up hidden layer (weights, biases
and connect with input data)
• Set up output layer and connect it
with the hidden one
• Define Adam Optmizer
• Set up retrieving the batches and the
feed_dict values for the training
• Set up feed_dict for the training
examples in the evaluation phase
60. Weights initialisation issue
• Weights range:
• Too small causes signal shrink
• Too big amplifies the signa until it’s to massive
• Symmetry problem
• The same hidden nodes values
• Zero output
61. Weights initialisation methods
• Constant:
• Zero – critical point, error signal will
not propagate, gradient will be zero
(no progress)
• Symmetry
• Random, [-1, +1 ], [0, 1]
• Use small random values
• E.g. Gaussian 𝜇 = 0, constant 𝜎
• Xavier-Glorot
• Keeping the weights ‘just right’
keeping the signal in the proper range
through many layers
63. Convolutional
Neural Network
• Inputs have higher
dimensions
• Reduce the number of
parameters (W, b)
• Neurons arranged in 3D
• Connected to the region
of the previous layer
64.
65. Pooling
• Reduce spatial size
• Control overfitting
• Applied to every depth slice
• Window size, stride
• Max, Average
67. Task: MNIST
Dataset
Convolution
• Reshape the data to 2D images
• Initialise the variables for the first
convolutional layer
• Define the convolutional and the
max pooling layer
• Define the second convolutional
layer
• Faltten the output