SlideShare a Scribd company logo
1 of 99
http://ahmedbesbes.com
https://github.com/ahmedbesbes
• Cloud services and hardware (CPU, GPU,TPU)
• Lots of data from “the internet”
• Tools and culture of collaborative and reproducible science
• Resources and efforts from large corporations
• Five decades of research in machine learning
• Cloud services and hardware (CPU, GPU,TPU)
• Lots of data from “the internet”
• Tools and culture of collaborative and reproducible science
• Resources and efforts from large corporations
• Five decades of research in machine learning
• Cloud services and hardware (CPU, GPU,TPU)
• Lots of data from “the internet”
• Tools and culture of collaborative and reproducible science
• Resources and efforts from large corporations
• Five decades of research in machine learning
• Cloud services and hardware (CPU, GPU,TPU)
• Lots of data from “the internet”
• Tools and culture of collaborative and reproducible science
• Resources and efforts from large corporations
• Five decades of research in machine learning
• Cloud services and hardware (CPU, GPU,TPU)
• Lots of data from “the internet”
• Tools and culture of collaborative and reproducible science
• Resources and efforts from large corporations
[Alexa]
Oh Zeus.
• Incoming impulses
• Synapses
• Outcoming impulse
• Firing rate
• Inputs
• Weights
• Output activation
• Activation function
Biological model Computational model
𝑤1
𝑤2
𝑤 𝑑
𝑥1
𝑥2
𝑥 𝑑
…
…
σ( 𝒙. 𝑤 𝑇
+ 𝑏) a = σ(
𝑖=1
𝑑
𝑤𝑖 𝑥𝑖 + 𝑏)
𝑖𝑛𝑝𝑢𝑡
𝒙 ϵ 𝐼𝑅 𝑑
Weights
W ϵ 𝐼𝑅 𝑑
b: bias
(scalar)
• x: input vector
• 𝒛 = 𝒙. 𝒘 𝑻
+ 𝒃: pre-activation
• a: output scalar or activation
• W, b: weights and bias
(learnable parameters of the
neuron)
a =f (
𝑖=1
𝑑
𝑤𝑖 𝑥𝑖 + 𝑏)
𝑤1
𝑤2
𝑤 𝑑
𝑥1
𝑥2
𝑥 𝑑
…
…
σ( 𝒙. 𝑤 𝑇
+ 𝑏) a = σ(
𝑖=1
𝑑
𝑤𝑖 𝑥𝑖 + 𝑏)
𝑖𝑛𝑝𝑢𝑡
𝒙 ϵ 𝐼𝑅 𝑑
Weights
W ϵ 𝐼𝑅 𝑑
b: bias
(scalar)
x1
x2
𝑦1
𝑦2
Input layer Hidden layer Output layer
What happens inside the hidden layer(s):
• Activations of previous neurons become inputs to adjacent neurons
How to interpret that?
• Intermediate non-linear computations ~ feature engineering
• Transformation of the input space
• New representations of the data over one/many layers
x1
x2
Inputs
𝑦1
𝑦2
Outputs
(predictions)
Input layer Hidden layer Output layer
George Cybenko
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
x1
x2
h1
h2
o
20
20
-20
-20
b = -10
b = 30
20
20
b = -30
h1 = σ (20x1 + 20x2 -10)
h2 = σ (-20x1 - 20x2 + 30)
o = σ (20h1 + 20h2 – 30)
X1 X2 h1 h2 O
0 0 0 1 0
0 1 1 1 1
1 0 1 1 1
1 1 1 0 0
h1 = σ (20 * 0 + 20 * 0 – 10) = σ ( -10) ~ 0
h1 = σ (20 * 0 + 20 * 1 – 10) = σ (10) ~ 1
h1 = σ (20 * 1 + 20 * 0 – 10) = σ (10) ~ 1
h1 = σ (20 * 1 + 20 * 1 – 10) = σ (30) ~ 1
h2 = σ (-20 * 0 - 20 * 0 + 30) = σ (30) ~ 1
h2 = σ (-20 * 0 - 20 * 1 + 30) = σ (10) ~ 1
h2 = σ (-20 * 1 - 20 * 0 + 30) = σ (10) ~ 1
h2 = σ (-20 * 1 - 20 * 1 + 30) = σ (-10) ~ 0
o = σ (20 * 0 + 20 * 1 - 30) = σ (-10) ~ 0
o = σ (20 * 1 + 20 * 1 - 30) = σ (10) ~ 1
o = σ (20 * 1 + 20 * 1 - 30) = σ (10) ~ 1
o = σ (20 * 1 + 20 * 0 - 30) = σ (-10) ~ 0
X1 X2 h1
0 0 0
0 1 1
1 0 1
1 1 1
X1 X2 h2
0 0 1
0 1 1
1 0 1
1 1 0
h1 h2 O
0 1 0
1 1 1
1 1 1
1 0 0
ℎ1 = 𝑋1 OR 𝑋2 ℎ2 = 𝑋1 𝐴𝑁𝐷 𝑋2 o = ℎ1 𝐴𝑁𝐷 ℎ2
x1
x2
h1
h2
o
𝑋1 OR 𝑋2
𝑋1 𝐴𝑁𝐷 𝑋2
ℎ1 𝐴𝑁𝐷 ℎ2
h2
h1
X1 X2 h1 h2 O Y
0 0 0 1 0 0
0 1 1 1 1 1
1 0 1 1 1 1
1 1 1 0 0 0
Linearly separable problem
The network learnt a new space (h1, h2) where the
data is linearly separable
h1 = σ (20x1 + 20x2 -10)
h2 = σ (-20x1 - 20x2 + 30)
https://playground.tensorflow.org/
Wait … How did we come up with the weights
to solve the XOR problem?
…
We trained the network!
0 - Input:
raw pixels
Car
2 - Prediction
1 - Forward propagation
3 - Loss
computation
𝑙 𝑓 𝑥(𝑖)
; 𝑊 , 𝑦(𝑖)
Loss function
Model prediction
(car)
Ground truth
label (Boat)
The loss function
quantifies the cost
that we pay when
misclassifying a
boat as a car
Training example
Parameters of
the network
Loss Formula for a single training data Formula for all training data Task
Mean
Square
Error
(MSE)
1
2
( 𝑦(𝑖) − 𝑦(𝑖))² 1
2𝑁
𝑖=0
𝑁
( 𝑦(𝑖) − 𝑦(𝑖))²
Regression
Cross
Entropy 𝑦(𝑖) log( 𝑦(𝑖))
1
𝑁
𝑖=0
𝑁
𝑦(𝑖) log( 𝑦(𝑖))
Classification
Examples of loss functions
Optimization problem
min
𝑊
1
𝑛
𝑖=1
𝑛
𝑙 (𝑓 𝑥(𝑖)
; 𝑊 , 𝑦(𝑖)
)
Loss function Training example
Example label
Model
parameters
Average over
the training set
Train the network : find the parameters that minimize the average loss on the training set
Model prediction
𝑤 𝑛+1 ← 𝑤 𝑛 − η
𝑑𝑓 𝑤
𝑑𝑤
, η > 0
Gradient descent algorithm
𝑊𝑛+1
[𝑙]
← 𝑊𝑛
[𝑙]
− η 𝛻 𝑊[𝑙](
1
𝑛
𝑖=1
𝑛
𝑙 𝑓 𝑥 𝑖
; 𝑊), 𝑦 𝑖
Gradient of the loss
w.r.t
weights of layer l
Weight values
of layer l at
iteration n +1
Weight values
of layer l at
iteration n
Average
training loss
Learning
rate
0 - Input:
raw pixels
Car
2 - Prediction
4 – Backward propagation
3 - Loss
computation
𝜕𝐿
𝜕𝑊[6]
𝜕𝐿
𝜕𝑊[5]
𝜕𝐿
𝜕𝑊[4]
𝜕𝐿
𝜕𝑊[3]
𝜕𝐿
𝜕𝑊[2]
𝜕𝐿
𝜕𝑊[1]
5 – Weight
update using
gradient descent
Single training example each time:
stochastic gradient descent
A batch of training examples at each
time: batch gradient descent
Terms Definition Formula
𝒛𝒋
𝒍 Weighted input to
the neuron j in
layer l (pre-
activation)
𝒛𝒋
𝒍
=
𝒌=𝟏
𝒏𝒍−𝟏
𝒘𝒋𝒌
𝒍
∗ 𝒂 𝒌
𝒍−𝟏
+ 𝒃𝒋
𝒍
𝒂𝒋
𝒍 Activation of
neuron j in layer l
𝒂𝒋
𝒍
= 𝝈(𝒛𝒋
𝒍
)
𝒃𝒋
𝒍 Bias of neuron j in
layer
-
𝒘𝒋𝒌
𝒍 Weight connecting
the neuron k in
layer l-1 to the
neuron j in layer l
-
𝑎1
0
𝑎1
0
𝑎1
2
𝑎2
2
L0 L1 L2
𝑤23
2
𝑤22
1
𝑎1
1
𝑎2
1
𝑎3
1
𝑧1
1
𝑧2
1
𝑧3
1
X1
X2
L0 L1 L2
𝑾 𝟏 𝒃 𝟏
𝑦1
𝑦2
𝒂 𝟎𝒛 𝟏
𝑤11
1
𝑤12
1
𝑤21
1
𝑤22
1
𝑤31
1
𝑤32
1
X1
X2
L0 L1 L2
𝑾 𝟐
𝒃 𝟐
𝑦1
𝑦2
𝒂 𝟏
𝒛 𝟐
𝑤11
2
𝑤12
2
𝑤13
2
𝑤21
2
𝑤22
2
𝑤23
2
L-2 L-1 L
Term Definition Formula Shape
𝒛𝒍
Vector of weighted inputs to the
neurons in layer l 𝒛𝒍
= 𝑾𝒍
𝒂𝒍−𝟏
+ 𝒃𝒍
(𝑛𝑙
, )
𝒂𝒍
Vector of neuron activations in layer l
𝒂𝒍
= 𝝈(𝒛𝒍
)
(𝑛𝑙
, )
𝒃𝒍
Vector of neuron biases in layer l
-
(𝑛𝑙
, )
𝒘𝒍
Weight matrix connecting weights in
layer l-1 to weights in layer l -
(𝑛𝑙
, 𝑛𝑙−1
)
Chain rule
2 5 8 10 20
1e-3
1e-2
0.5
1e-1
1
Number of layers
Learningrate
Don’t:
• Initialize weights to 0  this causes symmetry and same gradient
for all weights
• Initialize very small weights  this causes very small gradients
Do:
• He initialization: w = np.random.randn(D, H) * sqrt(2.0/n)
• Initialize all biases with a constant small value ~ 0.01
Dropout:
Ideal situation
http://cs231n.github.io/
http://neuralnetworksanddeeplearning.com/
https://www.coursera.org/learn/neural-networks-deep-learning
def activation(z, derivative=False):
if derivative:
return activation(z) * (1 - activation(z))
else:
return 1 / (1 + np.exp(-z))
def cost_function(y_true, y_pred):
n = y_pred.shape[0]
cost = (1./(2*n)) * np.sum((y_true - y_pred) ** 2)
return cost
def cost_function_prime(y_true, y_pred):
cost_prime = y_pred - y_true
return cost_prime
import numpy as np
from sklearn.metrics import accuracy_score
from tqdm import tqdm, tqdm_notebook
from sklearn.utils import shuffle
from sklearn.cross_validation import train_test_split
Basic imports
Sigmoid activation function
𝜎′ 𝑧 = 𝜎′ 𝑧 (1 − 𝜎′ 𝑧 )
Mean square error (loss)
class NeuralNetwork(object):
def __init__(self, size):
self.size = size
self.weights = [np.random.randn(self.size[i], self.size[i-1]) * np.sqrt(2 / self.size[i-1]) for i in range(1, len(self.size))]
self.biases = [np.random.rand(n, 1) for n in self.size[1:]]
def forward(self, input):
# input shape : (input_shape, batch_size)
a = input
pre_activations = []
activations = [a]
for w, b in zip(self.weights, self.biases):
z = np.dot(w, a) + b
a = activation(z)
pre_activations.append(z)
activations.append(a)
return a, pre_activations, activationssc
𝑎0 = 𝑋
𝑧 𝑙
= 𝑊 𝑙
𝑎 𝑙−1
+ 𝑏 𝑙
𝑎 𝑙
= 𝜎(𝑧 𝑙
)
𝑎 𝐿
= 𝑌
def compute_deltas(self, pre_activations, y_true, y_pred):
delta_L = cost_function_prime(y_true, y_pred) * activation(pre_activations[-1], derivative=True)
deltas = [0] * (len(self.size) - 1)
deltas[-1] = delta_L
for l in range(len(deltas) - 2, -1, -1):
delta = np.dot(self.weights[l + 1].transpose(), deltas[l + 1]) * activation(pre_activations[l], derivative=True)
deltas[l] = delta
return deltas
𝛿 𝐿 =
𝜕𝐿
𝜕𝑧 𝐿
= 𝛻𝑎L ⊙σ′ 𝑧 𝐿
𝛿 𝑙 =
𝜕𝐿
𝜕𝑧 𝑙
= ((𝑤 𝑙+1) 𝑇 𝛿 𝑙+1)⊙σ′(𝑧 𝑙)
def backpropagate(self, deltas, pre_activations, activations):
dW = []
db = []
deltas = [0] + deltas
for l in range(1, len(self.size)):
dW_l = np.dot(deltas[l], activations[l-1].transpose())
db_l = deltas[l]
dW.append(dW_l)
db.append(np.expand_dims(db_l.mean(axis=1), 1))
return dW, db
𝜕𝐿
𝜕𝑊 𝑙 =
𝜕𝐿
𝜕𝑧 𝑙 (𝑎 𝑙−1
) 𝑇
= 𝛿 𝑙
(𝑎 𝑙−1
) 𝑇
𝜕𝐿
𝜕𝑏 𝑙
=
𝜕𝐿
𝜕𝑧 𝑙
= 𝛿 𝑙
def train(self, X, y, batch_size, epochs, learning_rate, validation_split=0.2,
print_every=10):
history_train_losses = []
history_train_accuracies = []
history_test_losses = []
history_test_accuracies = []
x_train, x_test, y_train, y_test = train_test_split(X.T, y.T,
test_size=validation_split)
x_train, x_test, y_train, y_test = x_train.T, x_test.T, y_train.T, y_test.T
for e in tqdm_notebook(range(epochs)):
if x_train.shape[1] % batch_size == 0:
n_batches = int(x_train.shape[1] / batch_size)
else:
n_batches = int(x_train.shape[1] / batch_size ) - 1
x_train, y_train = shuffle(x_train.T, y_train.T, random_state=0)
x_train, y_train = x_train.T, y_train.T
batches_x = [x_train[:, batch_size*i:batch_size*(i+1)] for i in
range(0, n_batches)]
batches_y = [y_train[:, batch_size*i:batch_size*(i+1)] for i in
range(0, n_batches)]
train_losses = []
train_accuracies = []
test_losses = []
test_accuracies = []
train/test split
Preparation of mini batches of data and
labels
Keep track of kpis (accuracy/loss) on train
and validation sets
Training over mini batches
dw_per_epoch = [np.zeros(w.shape) for w in self.weights]
db_per_epoch = [np.zeros(b.shape) for b in self.biases]
for batch_x, batch_y in zip(batches_x, batches_y):
batch_y_pred, pre_activations, activations = self.forward(batch_x)
deltas = self.compute_deltas(pre_activations, batch_y, batch_y_pred)
dW, db = self.backpropagate(deltas, pre_activations, activations)
for i, (dw_i, db_i) in enumerate(zip(dW, db)):
dw_per_epoch[i] += dw_i / batch_size
db_per_epoch[i] += db_i / batch_size
batch_y_train_pred = self.predict(batch_x)
train_loss = cost_function(batch_y, batch_y_train_pred)
train_losses.append(train_loss)
train_accuracy = accuracy_score(batch_y.T, batch_y_train_pred.T)
train_accuracies.append(train_accuracy)
batch_y_test_pred = self.predict(x_test)
test_loss = cost_function(y_test, batch_y_test_pred)
test_losses.append(test_loss)
test_accuracy = accuracy_score(y_test.T, batch_y_test_pred.T)
test_accuracies.append(test_accuracy)
# weight update
for i, (dw_epoch, db_epoch) in enumerate(zip(dw_per_epoch, db_per_epoch)):
self.weights[i] = self.weights[i] - learning_rate * dw_epoch
self.biases[i] = self.biases[i] - learning_rate * db_epoch
history_train_losses.append(np.mean(train_losses))
history_train_accuracies.append(np.mean(train_accuracies))
history_test_losses.append(np.mean(test_losses))
history_test_accuracies.append(np.mean(test_accuracies))
if e % print_every == 0:
print('Epoch {} / {} | train loss: {} | train accuracy: {} | val loss : {} | val accuracy : {}'.format(
e, epochs, np.round(np.mean(train_losses), 3), np.round(np.mean(train_accuracies), 3),
np.round(np.mean(test_losses), 3), np.round(np.mean(test_accuracies), 3)))
history = {'epochs': epochs,
'train_loss': history_train_losses,
'train_acc': history_train_accuracies,
'test_loss': history_test_losses,
'test_acc': history_test_accuracies
}
return history
def predict(self, a):
# input shape : (input_shape, batch_size)
for w, b in zip(self.weights, self.biases):
z = np.dot(w, a) + b
a = activation(z)
predictions = (a > 0.5).astype(int)
# predictions = predictions.reshape(-1)
return predictions
Monitoring
the model
performance
Inference
method
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Number of hidden neurons
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Architecture: one-hidden-
layer Neural Net
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Defining the optimizer
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Training loop over
the data: one epoch
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Training loop over
the data: one epoch
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Set stored gradients to zero
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Forward pass
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Compute the loss
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step()
Backprop
import torch
import torch.nn as nn
import torch.optim as optim
h = 50
net = nn.Sequential(
nn.Linear(2, h),
nn.ReLU(),
nn.Linear(h, 1),
nn.Sigmoid()
)
optimizer = optim.SGD(net.parameters(), lr=1)
for i in range(100):
optimizer.zero_grad()
output = net(X[i])
loss = nn.BCELoss(output, Y[i])
loss.backward()
optimizer.step() Weight update

More Related Content

What's hot

基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」Ken'ichi Matsui
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」Ken'ichi Matsui
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
Hideitsu Hino
Hideitsu HinoHideitsu Hino
Hideitsu HinoSuurist
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao MatsutaSuurist
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkIldar Nurgaliev
 
15 integrals of trig products-i-x
15 integrals of trig products-i-x15 integrals of trig products-i-x
15 integrals of trig products-i-xmath266
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileKushagraChadha1
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
 
14 formulas from integration by parts x
14 formulas from integration by parts x14 formulas from integration by parts x
14 formulas from integration by parts xmath266
 
An Introduction into Anomaly Detection Using CUSUM
An Introduction into Anomaly Detection Using CUSUMAn Introduction into Anomaly Detection Using CUSUM
An Introduction into Anomaly Detection Using CUSUMDominik Dahlem
 
imager package in R and examples..
imager package in R and examples..imager package in R and examples..
imager package in R and examples..Dr. Volkan OBAN
 
Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Khor SoonHin
 
Gentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowGentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowKhor SoonHin
 
IVR - Chapter 4 - Variational methods
IVR - Chapter 4 - Variational methodsIVR - Chapter 4 - Variational methods
IVR - Chapter 4 - Variational methodsCharles Deledalle
 
13 integration by parts x
13 integration by parts x13 integration by parts x
13 integration by parts xmath266
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 

What's hot (20)

基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」基礎からのベイズ統計学 輪読会資料  第8章 「比率・相関・信頼性」
基礎からのベイズ統計学 輪読会資料 第8章 「比率・相関・信頼性」
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
Hideitsu Hino
Hideitsu HinoHideitsu Hino
Hideitsu Hino
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
15 integrals of trig products-i-x
15 integrals of trig products-i-x15 integrals of trig products-i-x
15 integrals of trig products-i-x
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical File
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
 
14 formulas from integration by parts x
14 formulas from integration by parts x14 formulas from integration by parts x
14 formulas from integration by parts x
 
An Introduction into Anomaly Detection Using CUSUM
An Introduction into Anomaly Detection Using CUSUMAn Introduction into Anomaly Detection Using CUSUM
An Introduction into Anomaly Detection Using CUSUM
 
imager package in R and examples..
imager package in R and examples..imager package in R and examples..
imager package in R and examples..
 
Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3Gentlest Introduction to Tensorflow - Part 3
Gentlest Introduction to Tensorflow - Part 3
 
Gentlest Introduction to Tensorflow
Gentlest Introduction to TensorflowGentlest Introduction to Tensorflow
Gentlest Introduction to Tensorflow
 
IVR - Chapter 4 - Variational methods
IVR - Chapter 4 - Variational methodsIVR - Chapter 4 - Variational methods
IVR - Chapter 4 - Variational methods
 
13 integration by parts x
13 integration by parts x13 integration by parts x
13 integration by parts x
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 

Similar to Introduction to Neural Networks and Deep Learning from Scratch

Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networksHojin Yang
 
Monadologie
MonadologieMonadologie
Monadologieleague
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronsmitamm
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4Roziq Bahtiar
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
51554 0131469657 ism-13
51554 0131469657 ism-1351554 0131469657 ism-13
51554 0131469657 ism-13Carlos Fuentes
 
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Software
 
Numerical Method Assignment
Numerical Method AssignmentNumerical Method Assignment
Numerical Method Assignmentashikul akash
 
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and ScalaFolding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and ScalaPhilip Schwarz
 
Sbe final exam jan17 - solved-converted
Sbe final exam jan17 - solved-convertedSbe final exam jan17 - solved-converted
Sbe final exam jan17 - solved-convertedcairo university
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Amro Elfeki
 

Similar to Introduction to Neural Networks and Deep Learning from Scratch (20)

Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networks
 
Monadologie
MonadologieMonadologie
Monadologie
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Intigrations
IntigrationsIntigrations
Intigrations
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
51554 0131469657 ism-13
51554 0131469657 ism-1351554 0131469657 ism-13
51554 0131469657 ism-13
 
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
 
Numerical Method Assignment
Numerical Method AssignmentNumerical Method Assignment
Numerical Method Assignment
 
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and ScalaFolding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala
 
Sbe final exam jan17 - solved-converted
Sbe final exam jan17 - solved-convertedSbe final exam jan17 - solved-converted
Sbe final exam jan17 - solved-converted
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology Lecture 2: Stochastic Hydrology
Lecture 2: Stochastic Hydrology
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Introduction to Neural Networks and Deep Learning from Scratch

  • 2.
  • 3.
  • 4. • Cloud services and hardware (CPU, GPU,TPU) • Lots of data from “the internet” • Tools and culture of collaborative and reproducible science • Resources and efforts from large corporations
  • 5. • Five decades of research in machine learning • Cloud services and hardware (CPU, GPU,TPU) • Lots of data from “the internet” • Tools and culture of collaborative and reproducible science • Resources and efforts from large corporations
  • 6. • Five decades of research in machine learning • Cloud services and hardware (CPU, GPU,TPU) • Lots of data from “the internet” • Tools and culture of collaborative and reproducible science • Resources and efforts from large corporations
  • 7. • Five decades of research in machine learning • Cloud services and hardware (CPU, GPU,TPU) • Lots of data from “the internet” • Tools and culture of collaborative and reproducible science • Resources and efforts from large corporations
  • 8. • Five decades of research in machine learning • Cloud services and hardware (CPU, GPU,TPU) • Lots of data from “the internet” • Tools and culture of collaborative and reproducible science • Resources and efforts from large corporations
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. • Incoming impulses • Synapses • Outcoming impulse • Firing rate • Inputs • Weights • Output activation • Activation function Biological model Computational model
  • 29. 𝑤1 𝑤2 𝑤 𝑑 𝑥1 𝑥2 𝑥 𝑑 … … σ( 𝒙. 𝑤 𝑇 + 𝑏) a = σ( 𝑖=1 𝑑 𝑤𝑖 𝑥𝑖 + 𝑏) 𝑖𝑛𝑝𝑢𝑡 𝒙 ϵ 𝐼𝑅 𝑑 Weights W ϵ 𝐼𝑅 𝑑 b: bias (scalar) • x: input vector • 𝒛 = 𝒙. 𝒘 𝑻 + 𝒃: pre-activation • a: output scalar or activation • W, b: weights and bias (learnable parameters of the neuron)
  • 30.
  • 31. a =f ( 𝑖=1 𝑑 𝑤𝑖 𝑥𝑖 + 𝑏)
  • 32. 𝑤1 𝑤2 𝑤 𝑑 𝑥1 𝑥2 𝑥 𝑑 … … σ( 𝒙. 𝑤 𝑇 + 𝑏) a = σ( 𝑖=1 𝑑 𝑤𝑖 𝑥𝑖 + 𝑏) 𝑖𝑛𝑝𝑢𝑡 𝒙 ϵ 𝐼𝑅 𝑑 Weights W ϵ 𝐼𝑅 𝑑 b: bias (scalar)
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 39. What happens inside the hidden layer(s): • Activations of previous neurons become inputs to adjacent neurons How to interpret that? • Intermediate non-linear computations ~ feature engineering • Transformation of the input space • New representations of the data over one/many layers x1 x2 Inputs 𝑦1 𝑦2 Outputs (predictions) Input layer Hidden layer Output layer
  • 41.
  • 42. X1 X2 Y 0 0 0 0 1 1 1 0 1 1 1 0
  • 43. x1 x2 h1 h2 o 20 20 -20 -20 b = -10 b = 30 20 20 b = -30 h1 = σ (20x1 + 20x2 -10) h2 = σ (-20x1 - 20x2 + 30) o = σ (20h1 + 20h2 – 30) X1 X2 h1 h2 O 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 h1 = σ (20 * 0 + 20 * 0 – 10) = σ ( -10) ~ 0 h1 = σ (20 * 0 + 20 * 1 – 10) = σ (10) ~ 1 h1 = σ (20 * 1 + 20 * 0 – 10) = σ (10) ~ 1 h1 = σ (20 * 1 + 20 * 1 – 10) = σ (30) ~ 1 h2 = σ (-20 * 0 - 20 * 0 + 30) = σ (30) ~ 1 h2 = σ (-20 * 0 - 20 * 1 + 30) = σ (10) ~ 1 h2 = σ (-20 * 1 - 20 * 0 + 30) = σ (10) ~ 1 h2 = σ (-20 * 1 - 20 * 1 + 30) = σ (-10) ~ 0 o = σ (20 * 0 + 20 * 1 - 30) = σ (-10) ~ 0 o = σ (20 * 1 + 20 * 1 - 30) = σ (10) ~ 1 o = σ (20 * 1 + 20 * 1 - 30) = σ (10) ~ 1 o = σ (20 * 1 + 20 * 0 - 30) = σ (-10) ~ 0
  • 44. X1 X2 h1 0 0 0 0 1 1 1 0 1 1 1 1 X1 X2 h2 0 0 1 0 1 1 1 0 1 1 1 0 h1 h2 O 0 1 0 1 1 1 1 1 1 1 0 0 ℎ1 = 𝑋1 OR 𝑋2 ℎ2 = 𝑋1 𝐴𝑁𝐷 𝑋2 o = ℎ1 𝐴𝑁𝐷 ℎ2 x1 x2 h1 h2 o 𝑋1 OR 𝑋2 𝑋1 𝐴𝑁𝐷 𝑋2 ℎ1 𝐴𝑁𝐷 ℎ2
  • 45. h2 h1 X1 X2 h1 h2 O Y 0 0 0 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 Linearly separable problem The network learnt a new space (h1, h2) where the data is linearly separable h1 = σ (20x1 + 20x2 -10) h2 = σ (-20x1 - 20x2 + 30)
  • 46.
  • 47.
  • 49.
  • 50.
  • 51. Wait … How did we come up with the weights to solve the XOR problem? … We trained the network!
  • 52. 0 - Input: raw pixels Car 2 - Prediction 1 - Forward propagation 3 - Loss computation
  • 53. 𝑙 𝑓 𝑥(𝑖) ; 𝑊 , 𝑦(𝑖) Loss function Model prediction (car) Ground truth label (Boat) The loss function quantifies the cost that we pay when misclassifying a boat as a car Training example Parameters of the network
  • 54. Loss Formula for a single training data Formula for all training data Task Mean Square Error (MSE) 1 2 ( 𝑦(𝑖) − 𝑦(𝑖))² 1 2𝑁 𝑖=0 𝑁 ( 𝑦(𝑖) − 𝑦(𝑖))² Regression Cross Entropy 𝑦(𝑖) log( 𝑦(𝑖)) 1 𝑁 𝑖=0 𝑁 𝑦(𝑖) log( 𝑦(𝑖)) Classification Examples of loss functions
  • 55. Optimization problem min 𝑊 1 𝑛 𝑖=1 𝑛 𝑙 (𝑓 𝑥(𝑖) ; 𝑊 , 𝑦(𝑖) ) Loss function Training example Example label Model parameters Average over the training set Train the network : find the parameters that minimize the average loss on the training set Model prediction
  • 56. 𝑤 𝑛+1 ← 𝑤 𝑛 − η 𝑑𝑓 𝑤 𝑑𝑤 , η > 0 Gradient descent algorithm
  • 57. 𝑊𝑛+1 [𝑙] ← 𝑊𝑛 [𝑙] − η 𝛻 𝑊[𝑙]( 1 𝑛 𝑖=1 𝑛 𝑙 𝑓 𝑥 𝑖 ; 𝑊), 𝑦 𝑖 Gradient of the loss w.r.t weights of layer l Weight values of layer l at iteration n +1 Weight values of layer l at iteration n Average training loss Learning rate
  • 58. 0 - Input: raw pixels Car 2 - Prediction 4 – Backward propagation 3 - Loss computation 𝜕𝐿 𝜕𝑊[6] 𝜕𝐿 𝜕𝑊[5] 𝜕𝐿 𝜕𝑊[4] 𝜕𝐿 𝜕𝑊[3] 𝜕𝐿 𝜕𝑊[2] 𝜕𝐿 𝜕𝑊[1] 5 – Weight update using gradient descent
  • 59.
  • 60. Single training example each time: stochastic gradient descent A batch of training examples at each time: batch gradient descent
  • 61. Terms Definition Formula 𝒛𝒋 𝒍 Weighted input to the neuron j in layer l (pre- activation) 𝒛𝒋 𝒍 = 𝒌=𝟏 𝒏𝒍−𝟏 𝒘𝒋𝒌 𝒍 ∗ 𝒂 𝒌 𝒍−𝟏 + 𝒃𝒋 𝒍 𝒂𝒋 𝒍 Activation of neuron j in layer l 𝒂𝒋 𝒍 = 𝝈(𝒛𝒋 𝒍 ) 𝒃𝒋 𝒍 Bias of neuron j in layer - 𝒘𝒋𝒌 𝒍 Weight connecting the neuron k in layer l-1 to the neuron j in layer l - 𝑎1 0 𝑎1 0 𝑎1 2 𝑎2 2 L0 L1 L2 𝑤23 2 𝑤22 1 𝑎1 1 𝑎2 1 𝑎3 1 𝑧1 1 𝑧2 1 𝑧3 1
  • 62. X1 X2 L0 L1 L2 𝑾 𝟏 𝒃 𝟏 𝑦1 𝑦2 𝒂 𝟎𝒛 𝟏 𝑤11 1 𝑤12 1 𝑤21 1 𝑤22 1 𝑤31 1 𝑤32 1
  • 63. X1 X2 L0 L1 L2 𝑾 𝟐 𝒃 𝟐 𝑦1 𝑦2 𝒂 𝟏 𝒛 𝟐 𝑤11 2 𝑤12 2 𝑤13 2 𝑤21 2 𝑤22 2 𝑤23 2
  • 65. Term Definition Formula Shape 𝒛𝒍 Vector of weighted inputs to the neurons in layer l 𝒛𝒍 = 𝑾𝒍 𝒂𝒍−𝟏 + 𝒃𝒍 (𝑛𝑙 , ) 𝒂𝒍 Vector of neuron activations in layer l 𝒂𝒍 = 𝝈(𝒛𝒍 ) (𝑛𝑙 , ) 𝒃𝒍 Vector of neuron biases in layer l - (𝑛𝑙 , ) 𝒘𝒍 Weight matrix connecting weights in layer l-1 to weights in layer l - (𝑛𝑙 , 𝑛𝑙−1 )
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. 2 5 8 10 20 1e-3 1e-2 0.5 1e-1 1 Number of layers Learningrate
  • 74. Don’t: • Initialize weights to 0  this causes symmetry and same gradient for all weights • Initialize very small weights  this causes very small gradients Do: • He initialization: w = np.random.randn(D, H) * sqrt(2.0/n) • Initialize all biases with a constant small value ~ 0.01
  • 75.
  • 78.
  • 79.
  • 81.
  • 82. def activation(z, derivative=False): if derivative: return activation(z) * (1 - activation(z)) else: return 1 / (1 + np.exp(-z)) def cost_function(y_true, y_pred): n = y_pred.shape[0] cost = (1./(2*n)) * np.sum((y_true - y_pred) ** 2) return cost def cost_function_prime(y_true, y_pred): cost_prime = y_pred - y_true return cost_prime import numpy as np from sklearn.metrics import accuracy_score from tqdm import tqdm, tqdm_notebook from sklearn.utils import shuffle from sklearn.cross_validation import train_test_split Basic imports Sigmoid activation function 𝜎′ 𝑧 = 𝜎′ 𝑧 (1 − 𝜎′ 𝑧 ) Mean square error (loss)
  • 83. class NeuralNetwork(object): def __init__(self, size): self.size = size self.weights = [np.random.randn(self.size[i], self.size[i-1]) * np.sqrt(2 / self.size[i-1]) for i in range(1, len(self.size))] self.biases = [np.random.rand(n, 1) for n in self.size[1:]] def forward(self, input): # input shape : (input_shape, batch_size) a = input pre_activations = [] activations = [a] for w, b in zip(self.weights, self.biases): z = np.dot(w, a) + b a = activation(z) pre_activations.append(z) activations.append(a) return a, pre_activations, activationssc 𝑎0 = 𝑋 𝑧 𝑙 = 𝑊 𝑙 𝑎 𝑙−1 + 𝑏 𝑙 𝑎 𝑙 = 𝜎(𝑧 𝑙 ) 𝑎 𝐿 = 𝑌
  • 84. def compute_deltas(self, pre_activations, y_true, y_pred): delta_L = cost_function_prime(y_true, y_pred) * activation(pre_activations[-1], derivative=True) deltas = [0] * (len(self.size) - 1) deltas[-1] = delta_L for l in range(len(deltas) - 2, -1, -1): delta = np.dot(self.weights[l + 1].transpose(), deltas[l + 1]) * activation(pre_activations[l], derivative=True) deltas[l] = delta return deltas 𝛿 𝐿 = 𝜕𝐿 𝜕𝑧 𝐿 = 𝛻𝑎L ⊙σ′ 𝑧 𝐿 𝛿 𝑙 = 𝜕𝐿 𝜕𝑧 𝑙 = ((𝑤 𝑙+1) 𝑇 𝛿 𝑙+1)⊙σ′(𝑧 𝑙)
  • 85. def backpropagate(self, deltas, pre_activations, activations): dW = [] db = [] deltas = [0] + deltas for l in range(1, len(self.size)): dW_l = np.dot(deltas[l], activations[l-1].transpose()) db_l = deltas[l] dW.append(dW_l) db.append(np.expand_dims(db_l.mean(axis=1), 1)) return dW, db 𝜕𝐿 𝜕𝑊 𝑙 = 𝜕𝐿 𝜕𝑧 𝑙 (𝑎 𝑙−1 ) 𝑇 = 𝛿 𝑙 (𝑎 𝑙−1 ) 𝑇 𝜕𝐿 𝜕𝑏 𝑙 = 𝜕𝐿 𝜕𝑧 𝑙 = 𝛿 𝑙
  • 86. def train(self, X, y, batch_size, epochs, learning_rate, validation_split=0.2, print_every=10): history_train_losses = [] history_train_accuracies = [] history_test_losses = [] history_test_accuracies = [] x_train, x_test, y_train, y_test = train_test_split(X.T, y.T, test_size=validation_split) x_train, x_test, y_train, y_test = x_train.T, x_test.T, y_train.T, y_test.T for e in tqdm_notebook(range(epochs)): if x_train.shape[1] % batch_size == 0: n_batches = int(x_train.shape[1] / batch_size) else: n_batches = int(x_train.shape[1] / batch_size ) - 1 x_train, y_train = shuffle(x_train.T, y_train.T, random_state=0) x_train, y_train = x_train.T, y_train.T batches_x = [x_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)] batches_y = [y_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)] train_losses = [] train_accuracies = [] test_losses = [] test_accuracies = [] train/test split Preparation of mini batches of data and labels Keep track of kpis (accuracy/loss) on train and validation sets
  • 87. Training over mini batches dw_per_epoch = [np.zeros(w.shape) for w in self.weights] db_per_epoch = [np.zeros(b.shape) for b in self.biases] for batch_x, batch_y in zip(batches_x, batches_y): batch_y_pred, pre_activations, activations = self.forward(batch_x) deltas = self.compute_deltas(pre_activations, batch_y, batch_y_pred) dW, db = self.backpropagate(deltas, pre_activations, activations) for i, (dw_i, db_i) in enumerate(zip(dW, db)): dw_per_epoch[i] += dw_i / batch_size db_per_epoch[i] += db_i / batch_size batch_y_train_pred = self.predict(batch_x) train_loss = cost_function(batch_y, batch_y_train_pred) train_losses.append(train_loss) train_accuracy = accuracy_score(batch_y.T, batch_y_train_pred.T) train_accuracies.append(train_accuracy) batch_y_test_pred = self.predict(x_test) test_loss = cost_function(y_test, batch_y_test_pred) test_losses.append(test_loss) test_accuracy = accuracy_score(y_test.T, batch_y_test_pred.T) test_accuracies.append(test_accuracy) # weight update for i, (dw_epoch, db_epoch) in enumerate(zip(dw_per_epoch, db_per_epoch)): self.weights[i] = self.weights[i] - learning_rate * dw_epoch self.biases[i] = self.biases[i] - learning_rate * db_epoch
  • 88. history_train_losses.append(np.mean(train_losses)) history_train_accuracies.append(np.mean(train_accuracies)) history_test_losses.append(np.mean(test_losses)) history_test_accuracies.append(np.mean(test_accuracies)) if e % print_every == 0: print('Epoch {} / {} | train loss: {} | train accuracy: {} | val loss : {} | val accuracy : {}'.format( e, epochs, np.round(np.mean(train_losses), 3), np.round(np.mean(train_accuracies), 3), np.round(np.mean(test_losses), 3), np.round(np.mean(test_accuracies), 3))) history = {'epochs': epochs, 'train_loss': history_train_losses, 'train_acc': history_train_accuracies, 'test_loss': history_test_losses, 'test_acc': history_test_accuracies } return history def predict(self, a): # input shape : (input_shape, batch_size) for w, b in zip(self.weights, self.biases): z = np.dot(w, a) + b a = activation(z) predictions = (a > 0.5).astype(int) # predictions = predictions.reshape(-1) return predictions Monitoring the model performance Inference method
  • 89.
  • 90. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Number of hidden neurons
  • 91. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Architecture: one-hidden- layer Neural Net
  • 92. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Defining the optimizer
  • 93. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Training loop over the data: one epoch
  • 94. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Training loop over the data: one epoch
  • 95. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Set stored gradients to zero
  • 96. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Forward pass
  • 97. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Compute the loss
  • 98. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Backprop
  • 99. import torch import torch.nn as nn import torch.optim as optim h = 50 net = nn.Sequential( nn.Linear(2, h), nn.ReLU(), nn.Linear(h, 1), nn.Sigmoid() ) optimizer = optim.SGD(net.parameters(), lr=1) for i in range(100): optimizer.zero_grad() output = net(X[i]) loss = nn.BCELoss(output, Y[i]) loss.backward() optimizer.step() Weight update