SlideShare una empresa de Scribd logo
1 de 116
Descargar para leer sin conexión
An introduction to neural networks
Nathalie Vialaneix & Hyphen
nathalie.vialaneix@inra.fr
http://www.nathalievialaneix.eu &
https://www.hyphen-stat.com/en/
DATE ET LIEU
Nathalie Vialaneix & Hyphen | Introduction to neural networks 1/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underfitting / Overfitting
Consistency
Avoiding overfitting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 2/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underfitting / Overfitting
Consistency
Avoiding overfitting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 3/41
Background
Purpose: predict Y from X;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Background
Purpose: predict Y from X;
What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn);
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Background
Purpose: predict Y from X;
What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn);
What we want: estimate unknown Y from new X: xn+1, . . . , xm.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function fonction de
régression;
if Y is a factor, Φn
is called a classifier classifieur;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function fonction de
régression;
if Y is a factor, Φn
is called a classifier classifieur;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function fonction de
régression;
if Y is a factor, Φn
is called a classifier classifieur;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function fonction de
régression;
if Y is a factor, Φn
is called a classifier classifieur;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function fonction de
régression;
if Y is a factor, Φn
is called a classifier classifieur;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Conflicting objectives!!
Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
Underfitting/Overfitting sous/sur - apprentissage
Function x → y to be estimated
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
Observations we might have
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
Observations we do have
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
First estimation from the observations: underfitting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
Second estimation from the observations: accurate estimation
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
Third estimation from the observations: overfitting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Underfitting/Overfitting sous/sur - apprentissage
Summary
Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
Errors
training error (measures the accuracy to the observations)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
test error: a way to prevent overfitting (estimates the generalization
error) is the simple validation
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
test error: a way to prevent overfitting (estimates the generalization
error) is the simple validation
1 split the data into training/test sets (usually 80%/20%)
2 train Φn
from the training dataset
3 compute the test error from the remaining data
Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
Example
Observations
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Training/Test datasets
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Training/Test errors
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Example
Summary
Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the “best” prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefficients in linear models, weights in neural networks, ...)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the “best” prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefficients in linear models, weights in neural networks, ...)
C can depend on user choices (e.g., achitecture, number of neurons in
neural networks) ⇒ hyper-parameters
parameters (learned from data) hyper-parameters (can not be learned from data)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Practical consequences
A machine or an algorithm is a method that:
start from a given set of prediction functions C
use observations (xi, yi)i to pick the “best” prediction function
according to what has already been observed.
This is often done by the estimation of some parameters (e.g.,
coefficients in linear models, weights in neural networks, ...)
C can depend on user choices (e.g., achitecture, number of neurons in
neural networks) ⇒ hyper-parameters
parameters (learned from data) hyper-parameters (can not be learned from data)
hyper-parameters often control the richness of C: must be carefully tuned
to ensure a good accuracy and avoid overfitting
Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (∼ 67/33%): the model is
estimated with the training dataset and a test error is computed with
the remaining observations;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (∼ 67/33%)
cross validation: split the data into L folds. For each fold, train a
model without the data included in the current fold and compute the
error with the data included in the fold: the averaging of these L errors
is the cross validation error;
fold 1 fold 2 fold 3 . . . fold K
train without fold 2: Φ−2
error fold 2: err(φ−2
, fold 2)
CV error: 1
K l err(Φ−l
, fold l)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Model selection / Fair error evaluation
Several strategies can be used to estimate the error of a set C and to be
able to select the best family of models C (tuning of hyper-parameters):
split the data into a training/test set (∼ 67/33%)
cross validation: split the data into L folds. For each fold, train a
model without the data included in the current fold and compute the
error with the data included in the fold: the averaging of these L errors
is the cross validation error;
out-of-bag error: create B bootstrap samples (size n taken with
replacement in the original sample) and compute a prediction based
on Out-Of-Bag samples (see next slide).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
Out-of-bag observations, prediction and error
OOB (Out-Of Bags) error: error based on the observations not included in
the “bag”:
(xi, yi)
i ∈ T 1
i ∈ T b i T B
Φ1
Φb
ΦB
OOB prediction: ΦB
(xi)
used to train the model
prediction algorithm
Nathalie Vialaneix & Hyphen | Introduction to neural networks 11/41
Other model selection methods and strategies to avoid
overfitting
Main idea: Penalize the training error with something that represents the
complexity of the model
model selection with explicit penalization strategy: minimize (∼) error
+ complexity
Example: BIC/AIC criterion in linear models
−2L + log n × p
avoiding overfitting with implicit penalization strategy: constrain the
parameters w to take their values within a reduced subset
min error(w) + λ w
with w the parameter of the model (learned) and . typically the 1 or
2 norm
Nathalie Vialaneix & Hyphen | Introduction to neural networks 12/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underfitting / Overfitting
Consistency
Avoiding overfitting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 13/41
What are (artificial) neural networks?
Common properties
(artificial) “Neural networks”: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
What are (artificial) neural networks?
Common properties
(artificial) “Neural networks”: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
combination (network) of simple elements (neurons).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
What are (artificial) neural networks?
Common properties
(artificial) “Neural networks”: general name for supervised and
unsupervised methods developed in (vague) analogy to the brain;
combination (network) of simple elements (neurons).
Example of graphical representation:
INPUTS
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
Different types of neural networks
A neural network is defined by:
1 the network structure;
2 the neuron type.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is defined by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classification and regression);
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is defined by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classification and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is defined by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classification and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Self-organizing maps (SOM also sometimes called Kohonen’s maps)
or Topographic maps: dedicated to unsupervised problems
(clustering), self-organized;
. . .
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
Different types of neural networks
A neural network is defined by:
1 the network structure;
2 the neuron type.
Standard examples
Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to
supervised problems (classification and regression);
Radial basis function networks (RBF): same purpose but based on
local smoothing;
Self-organizing maps (SOM also sometimes called Kohonen’s maps)
or Topographic maps: dedicated to unsupervised problems
(clustering), self-organized;
. . .
In this talk, focus on MLP.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
MLP: Advantages/Drawbacks
Advantages
classification OR regression (i.e., Y can be a numeric variable or a
factor);
non parametric method: flexible;
good theoretical properties.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
MLP: Advantages/Drawbacks
Advantages
classification OR regression (i.e., Y can be a numeric variable or a
factor);
non parametric method: flexible;
good theoretical properties.
Drawbacks
hard to train (high computational cost, especially when d is large);
overfit easily;
“black box” models (hard to interpret).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
References
Advised references:
[Bishop, 1995, Ripley, 1996] overview of the topic from a learning (more
than statistical) perspective
[Devroye et al., 1996, Györfi et al., 2002] in dedicated chapters present
statistical properties of perceptrons
Nathalie Vialaneix & Hyphen | Introduction to neural networks 17/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
2 when total signal is
above a given threshold,
the neuron is activated
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
Analogy to the brain
1 a neuron collects signals
from neighboring
neurons through its
dendrites
2 when total signal is
above a given threshold,
the neuron is activated
3 ... and a signal is sent to
other neurons through
the axon
connexions which frequently lead to activating a neuron are enforced (tend
to have an increasing impact on the destination neuron)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
First model of artificial neuron
[Mc Culloch and Pitts, 1943, Rosenblatt, 1958, Rosenblatt, 1962]
x(1)
x(2)
x(p)
f(x)Σ+
w1
w2
wp
w0
f : x ∈ Rp
→ 1 p
j=1
wjx(j)+w0 ≥ 0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 19/41
(artificial) Perceptron
Layers
MLP have one input layer (x ∈ Rp
), one output layer (y ∈ R or
∈ {1, . . . , K − 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y ∈ R):
INPUTS
x = (x(1)
, . . . , x(p)
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artificial) Perceptron
Layers
MLP have one input layer (x ∈ Rp
), one output layer (y ∈ R or
∈ {1, . . . , K − 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y ∈ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1
weights w
(1)
jk
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artificial) Perceptron
Layers
MLP have one input layer (x ∈ Rp
), one output layer (y ∈ R or
∈ {1, . . . , K − 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y ∈ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artificial) Perceptron
Layers
MLP have one input layer (x ∈ Rp
), one output layer (y ∈ R or
∈ {1, . . . , K − 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y ∈ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
(artificial) Perceptron
Layers
MLP have one input layer (x ∈ Rp
), one output layer (y ∈ R or
∈ {1, . . . , K − 1} values) and several hidden layers;
no connections within a layer;
connections between two consecutive layers (feedforward).
Example (regression, y ∈ R):
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
2 hidden layer MLP
Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3
+
w0 (Bias Biais)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
Standard activation functions fonctions de transfert / d’activation
Biologically inspired: Heaviside function
h(z) =
0 if z < 0
1 otherwise.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
Standard activation functions
Main issue with the Heaviside function: not continuous!
Identity
h(z) = z
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
Standard activation functions
But identity activation function gives linear model if used with one hidden
layer: not flexible enough
Logistic function
h(z) = 1
1+exp(−z)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
Standard activation functions
Another popular activation function (useful to model positive real numbers)
Rectified linear (ReLU)
h(z) = max(0, z)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
A neuron in MLP
v1
v2
v3
×w1
×w2
×w3 ×w2
×w1
+
w0
General sigmoid
sigmoid: nondecreasing function h : R → R such that
lim
z→+∞
h(z) = 1 lim
z→−∞
h(z) = 0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
Focus on one-hidden-layer perceptrons (R package nnet)
Regression case
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Φ(x)
Φ(x) =
Q
k=1
w
(2)
k
hk x w
(1)
k
+ w
(0)
k
+ w
(2)
0
, with hk a (logistic) sigmoid
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Binary classification case (y ∈ {0, 1})
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
φ(x)
φ(x) = h0


Q
k=1
w
(2)
k
hk x w
(1)
k
+ w
(0)
k
+ w
(2)
0


with h0 logistic sigmoid or identity.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Binary classification case (y ∈ {0, 1})
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
φ(x)
decision with:
Φ(x) =
0 if φ(x) < 1/2
1 otherwise
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Focus on one-hidden-layer perceptrons (R package nnet)
Extension to any classification problem in {1, . . . , K − 1}
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
φ(x)
Straightforward extension to multiple classes with a multiple output
perceptron (number of output units equal to K) and a maximum probability
rule for the decision.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
Theoretical properties of perceptrons
This section answers two questions:
1 can we approximate any function g : [0, 1]p
→ R arbitrary well with a
perceptron?
Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
Theoretical properties of perceptrons
This section answers two questions:
1 can we approximate any function g : [0, 1]p
→ R arbitrary well with a
perceptron?
2 when a perceptron is trained with i.i.d. observations from an arbitrary
random variable pair (X, Y), is it consistent? (i.e., does it reach the
minimum possible error asymptotically when the number of
observations grows to infinity?)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
Illustration of the universal approximation property
Simple examples
a function to approximate: g : [0, 1] → sin 1
x+0.1
Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
Illustration of the universal approximation property
Simple examples
a function to approximate: g : [0, 1] → sin 1
x+0.1
trying to approximate (how this is performed is explained later in this talk) this
function with MLP having different numbers of neurons on their
hidden layer
Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
Sketch of main properties of 1-hidden layer MLP
1 universal approximation [Pinkus, 1999, Devroye et al., 1996,
Hornik, 1991, Hornik, 1993, Stinchcombe, 1999]: 1-hidden-layer MLP
can approximate arbitrary well any (regular enough) function (but, in
theory, the number of neurons can grow arbitrarily for that)
2 consistency
[Farago and Lugosi, 1993, Devroye et al., 1996, White, 1990,
White, 1991, Barron, 1994, McCaffrey and Gallant, 1994]:
1-hidden-layer MLP are universally consistent with a number of
neurons that grows slower than O n
log n to +∞
⇒ the number of neurons controls the richness of the MLP and has to
increase with n
Nathalie Vialaneix & Hyphen | Introduction to neural networks 25/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
x(1)
x(2)
x(p) w(1)
w(2)+
w
(0)
1
+
w
(0)
Q
Φw(x)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
Standard approach: minimize the empirical L2 risk:
Ln(w) =
n
i=1
[fw(xi) − yi]2
with
yi ∈ R for the regression case
yi ∈ {0, 1} for the classification case, with the associated decision rule
x → 1{Φw (x)≤1/2}.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Empirical error minimization
Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w?
Standard approach: minimize the empirical L2 risk:
Ln(w) =
n
i=1
[fw(xi) − yi]2
with
yi ∈ R for the regression case
yi ∈ {0, 1} for the classification case, with the associated decision rule
x → 1{Φw (x)≤1/2}.
But: Ln(w) is not convex in w ⇒ general optimization problem
Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
Optimization with gradient descent
Method: initialize (randomly or with some prior knowledge) the weights
w(0) ∈ RQp+2Q+1
Batch approach: for t = 1, . . . , T
w(t + 1) = w(t) − µ(t) wLn(w)(w(t));
Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
Optimization with gradient descent
Method: initialize (randomly or with some prior knowledge) the weights
w(0) ∈ RQp+2Q+1
Batch approach: for t = 1, . . . , T
w(t + 1) = w(t) − µ(t) wLn(w)(w(t));
online (or stochastic) approach: write
Ln(w)(w) =
n
i=1
[fw(xi) − yi]2
=Ei
and for t = 1, . . . , T, randomly pick i ∈ {1, . . . , n} and update:
w(t + 1) = w(t) − µ(t) wEi(w(t)).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
Discussion about practical choices for this approach
batch version converges (from an optimization point of view) to a local
minimum of the error for a good choice of µ(t) but convergence can
be slow
stochastic version is usually inefficient but is useful for large datasets
(n large)
more efficient algorithms exist to solve the optimization task. The one
implemented in the R package nnet uses higher order derivatives
(BFGS algorithm) but this is a very active area of research in which
many improvements have been made recently
in all cases, solutions returned are, at best, local minima which
strongly depends on the initialization: using more than one
initialization state is advised, as well as checking the variability among
a few runs
Nathalie Vialaneix & Hyphen | Introduction to neural networks 28/41
Gradient backpropagation method
[Rumelhart and Mc Clelland, 1986]
The gradient backpropagation rétropropagation du gradient principle is
used to easily computes gradients in perceptrons (or in other types of
neural network):
Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
Gradient backpropagation method
[Rumelhart and Mc Clelland, 1986]
The gradient backpropagation rétropropagation du gradient principle is
used to easily computes gradients in perceptrons (or in other types of
neural network):
This way, stochastic gradient descent alternates:
a forward step which aims at calculating outputs from all observations
xi given a value of the weights w
a backward step in which the gradient backpropagation principle is
used to obtain the gradient for the current weights w
Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Φw(x)
initialize weights
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Φw(x)
Forward step: for all k, compute a
(1)
k
= xi
w
(1)
k
+ w
(0)
k
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Φw(x)
Forward step: for all k, compute z
(1)
k
= hk (a
(1)
k
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Φw(x) = a(2)
Forward step: compute a(2) = Q
k=1
w
(2)
k
z
(1)
k
+ w
(2)
0
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w
(0)
1
w
(0)
Q
w(2)
Φw(x) = a(2)
Backward step: compute ∂Ei
∂w
(2)
k
= δ(2) × zk with
δ(2)
= ∂Ei
∂a(2) =
[h0(a(2))−yi ]2
∂a(2) = 2h0(a(2)
) × [h0(a(2)
) − yi]
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(2)
w
(0)
1
w
(0)
Q
w(1)
Φw(x)
Backward step: ∂Ei
∂w
(1)
kj
= δ
(1)
k
× x
(j)
i
with
δ
(1)
k
= ∂Ei
∂a
(1)
k
= ∂Ei
∂a(2) × ∂a(2)
∂a
(1)
k
= δ(2)
× w
(2)
k
hk (a
(1)
k
)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Backpropagation in practice
x(1)
x(2)
x(p)
+
+
w(1)
w(2)
w
(0)
1
w
(0)
Q
Φw(x)
Backward step: ∂Ei
∂w
(0)
k
= δ
(1)
k
Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
∼ N(0, 1/
√
p) and
w
(2)
k
∼ N(0, 1/
√
Q)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
∼ N(0, 1/
√
p) and
w
(2)
k
∼ N(0, 1/
√
Q)
2 When to stop the algorithm? (gradient descent or alike) Standard
choices:
bounded T
target value of the error ˆLn(w)
target value of the evolution ˆLn(w(t)) − ˆLn(w(t + 1))
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Initialization and stopping of the training algorithm
1 How to initialize weights? Standard choices w
(1)
jk
∼ N(0, 1/
√
p) and
w
(2)
k
∼ N(0, 1/
√
Q)
In the R package nnet, weights are sampled uniformly between
[−0.5, 0.5] or between − 1
maxi X
(j)
i
, 1
maxi X
(j)
i
if X(j) is large.
2 When to stop the algorithm? (gradient descent or alike) Standard
choices:
bounded T
target value of the error ˆLn(w)
target value of the evolution ˆLn(w(t)) − ˆLn(w(t + 1))
In the R package nnet, a combination of the three criteria is used and
tunable.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
Strategies to avoid overfitting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overfitting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overfitting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Weight decay: for Q large enough, penalize the empirical risk with a
function of the weights, e.g.,
ˆLn(w) + λw w
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Strategies to avoid overfitting
Properly tune Q with a CV or a bootstrap estimation of the
generalization ability of the method
Early stopping: for Q large enough, use a part of the data as a
validation set and stops the training (gradient descent) when the
empirical error computed on this dataset starts to increase
Weight decay: for Q large enough, penalize the empirical risk with a
function of the weights, e.g.,
ˆLn(w) + λw w
Noise injection: modify the input data with a random noise during the
training
Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
Outline
1 Statistical / machine / automatic learning
Background and notations
Underfitting / Overfitting
Consistency
Avoiding overfitting
2 (non deep) Neural networks
Presentation of multi-layer perceptrons
Theoretical properties of perceptrons
Learning perceptrons
Learning in practice
3 Deep neural networks
Overview
CNN
Nathalie Vialaneix & Hyphen | Introduction to neural networks 33/41
What is “deep learning”?
INPUTS
= (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
2 hidden layer MLP
Learning with neural networks except that...
the number of neurons is huge!!!
Image taken from https://towardsdatascience.com/
Nathalie Vialaneix & Hyphen | Introduction to neural networks 34/41
Main types of deep NN
deep MLP
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Main types of deep NN
deep MLP
CNN (Convolutional Neural Network): used in image processing
Images taken from [Krizhevsky et al., 2012, Zeiler and Fergus, 2014]
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Main types of deep NN
deep MLP
CNN (Convolutional Neural Network): used in image processing
RNN (Recurrent Neural Network): used for data with a temporal
sequence (in natural language processing for instance)
Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
Why such a big buzz around deep learning?...
ImageNet results http://www.image-net.org/challenges/LSVRC/
Image by courtesy of Stéphane Canu
See also: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Nathalie Vialaneix & Hyphen | Introduction to neural networks 36/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
avoid overfitting with dropouts: randomly removing a fraction p of the
neurons (and their connexions) during each step of the training step
(larger number of iterations to converge but faster update at each
iteration) [Srivastava et al., 2014]
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Improvements in learning parameters
optimization algorithm: gradient descent is used but choice of the
descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches
are also often used to speed the learning (ex: ADAM,
[Kingma and Ba, 2017])
avoid overfitting with dropouts: randomly removing a fraction p of the
neurons (and their connexions) during each step of the training step
(larger number of iterations to converge but faster update at each
iteration) [Srivastava et al., 2014]
INPUTS
x = (x(1)
, . . . , x(p)
) Layer 1 Layer 2 y
OUTPUTS
Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
Main available implementations
Theano
Tensor-
Flow
Caffe
PyTorch
(and
autograd)
Keras
Date 2008-2017 2015-... 2013-... 2016-... 2017-...
Main dev
Montreal
University
(Y. Bengio)
Google
Brain
Berkeley
University
community
driven
(Facebook,
Google, ...)
François
Chollet /
RStudio
Lan-
guage(s)
Python
C++/
Python
C++/
Python/
Matlab
Python
(with
strong
GPU
accelera-
tions)
high level
API on top
of Tensor-
Flow (and
CNTK,
Theano...):
Python, R
Nathalie Vialaneix & Hyphen | Introduction to neural networks 38/41
General architecture of CNN
Image taken from the article https://doi.org/10.3389/fpls.2017.02235
Nathalie Vialaneix & Hyphen | Introduction to neural networks 39/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Convolutional layer
Images taken from https://towardsdatascience.com
Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
Pooling layer
Basic principle: select a subsample of the previous layer values
Generally used: max pooling
Image taken from Wikimedia Commons, by Aphex34
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
References
Barron, A. (1994).
Approximation and estimation bounds for artificial neural networks.
Machine Learning, 14:115–133.
Bishop, C. (1995).
Neural Networks for Pattern Recognition.
Oxford University Press, New York, USA.
Devroye, L., Györfi, L., and Lugosi, G. (1996).
A Probabilistic Theory for Pattern Recognition.
Springer-Verlag, New York, NY, USA.
Farago, A. and Lugosi, G. (1993).
Strong universal consistency of neural network classifiers.
IEEE Transactions on Information Theory, 39(4):1146–1151.
Györfi, L., Kohler, M., Krzy˙zak, A., and Walk, H. (2002).
A Distribution-Free Theory of Nonparametric Regression.
Springer-Verlag, New York, NY, USA.
Hornik, K. (1991).
Approximation capabilities of multilayer feedfoward networks.
Neural Networks, 4(2):251–257.
Hornik, K. (1993).
Some new results on neural network approximation.
Neural Networks, 6(8):1069–1072.
Kingma, D. P. and Ba, J. (2017).
Adam: a method for stochastic optimization.
arXiv preprint arXiv: 1412.6980.
Krizhevsky, A., Sutskever, I., and Hinton, Geoffrey, E. (2012).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
ImageNet classification with deep convolutional neural networks.
In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Proceedings of Neural Information Processing Systems (NIPS
2012), volume 25.
Mc Culloch, W. and Pitts, W. (1943).
A logical calculus of ideas immanent in nervous activity.
Bulletin of Mathematical Biophysics, 5(4):115–133.
McCaffrey, D. and Gallant, A. (1994).
Convergence rates for single hidden layer feedforward networks.
Neural Networks, 7(1):115–133.
Pinkus, A. (1999).
Approximation theory of the MLP model in neural networks.
Acta Numerica, 8:143–195.
Ripley, B. (1996).
Pattern Recognition and Neural Networks.
Cambridge University Press.
Rosenblatt, F. (1958).
The perceptron: a probabilistic model for information storage and organization in the brain.
Psychological Review, 65:386–408.
Rosenblatt, F. (1962).
Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.
Spartan Books, Washington, DC, USA.
Ruder, S. (2016).
An overview of gradient descent optimization algorithms.
arXiv preprint arXiv: 1609.04747.
Rumelhart, D. and Mc Clelland, J. (1986).
Parallel Distributed Processing: Exploration in the MicroStructure of Cognition.
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
MIT Press, Cambridge, MA, USA.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting.
Journal of Machine Learning Research, 15:1929–1958.
Stinchcombe, M. (1999).
Neural network approximation of continuous functionals and continuous functions on compactifications.
Neural Network, 12(3):467–477.
White, H. (1990).
Connectionist nonparametric regression: multilayer feedforward networks can learn arbitraty mappings.
Neural Networks, 3:535–549.
White, H. (1991).
Nonparametric estimation of conditional quantiles using neural networks.
In Proceedings of the 23rd Symposium of the Interface: Computing Science and Statistics, pages 190–199, Alexandria, VA,
USA. American Statistical Association.
Zeiler, M. D. and Fergus, R. (2014).
Visualizing and understanding convolutional networks.
In Proceedings of European Conference on Computer Vision (ECCV 2014).
Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41

Más contenido relacionado

La actualidad más candente

Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...ijcsit
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...butest
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning ParrotAI
 
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsAn Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsWaqas Tariq
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...tuxette
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 

La actualidad más candente (20)

Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
 
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent PatternsAn Optimal Approach For Knowledge Protection In Structured Frequent Patterns
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
G
GG
G
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 

Similar a An introduction to neural network

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataWeCloudData
 
Artificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingArtificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingZahidulIslamJewel2
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDataWorks Summit/Hadoop Summit
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningFrancisco E. Figueroa-Nigaglioni
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
 
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo ClinicSep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinicdoc_vogt
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineeringDr Fereidoun Dejahang
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 

Similar a An introduction to neural network (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Machine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudDataMachine learning in Healthcare - WeCloudData
Machine learning in Healthcare - WeCloudData
 
Artificial Intelligence in Medical Imaging
Artificial Intelligence in Medical ImagingArtificial Intelligence in Medical Imaging
Artificial Intelligence in Medical Imaging
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in Spark
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo ClinicSep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
Sep2009 Introduction to Medical Expert Decision Support Systems for Mayo Clinic
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineering
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 

Más de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Más de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Último

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 

An introduction to neural network

  • 1. An introduction to neural networks Nathalie Vialaneix & Hyphen nathalie.vialaneix@inra.fr http://www.nathalievialaneix.eu & https://www.hyphen-stat.com/en/ DATE ET LIEU Nathalie Vialaneix & Hyphen | Introduction to neural networks 1/41
  • 2. Outline 1 Statistical / machine / automatic learning Background and notations Underfitting / Overfitting Consistency Avoiding overfitting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 2/41
  • 3. Outline 1 Statistical / machine / automatic learning Background and notations Underfitting / Overfitting Consistency Avoiding overfitting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 3/41
  • 4. Background Purpose: predict Y from X; Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 5. Background Purpose: predict Y from X; What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn); Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 6. Background Purpose: predict Y from X; What we have: n observations of (X, Y), (x1, y1), . . . , (xn, yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. Nathalie Vialaneix & Hyphen | Introduction to neural networks 4/41
  • 7. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 8. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function fonction de régression; if Y is a factor, Φn is called a classifier classifieur; Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 9. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function fonction de régression; if Y is a factor, Φn is called a classifier classifieur; Φn is said to be trained or learned from the observations (xi, yi)i. Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 10. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function fonction de régression; if Y is a factor, Φn is called a classifier classifieur; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 11. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function fonction de régression; if Y is a factor, Φn is called a classifier classifieur; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 12. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function fonction de régression; if Y is a factor, Φn is called a classifier classifieur; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conflicting objectives!! Nathalie Vialaneix & Hyphen | Introduction to neural networks 5/41
  • 13. Underfitting/Overfitting sous/sur - apprentissage Function x → y to be estimated Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 14. Underfitting/Overfitting sous/sur - apprentissage Observations we might have Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 15. Underfitting/Overfitting sous/sur - apprentissage Observations we do have Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 16. Underfitting/Overfitting sous/sur - apprentissage First estimation from the observations: underfitting Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 17. Underfitting/Overfitting sous/sur - apprentissage Second estimation from the observations: accurate estimation Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 18. Underfitting/Overfitting sous/sur - apprentissage Third estimation from the observations: overfitting Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 19. Underfitting/Overfitting sous/sur - apprentissage Summary Nathalie Vialaneix & Hyphen | Introduction to neural networks 6/41
  • 20. Errors training error (measures the accuracy to the observations) Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 21. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 22. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 23. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 24. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 25. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation 1 split the data into training/test sets (usually 80%/20%) 2 train Φn from the training dataset 3 compute the test error from the remaining data Nathalie Vialaneix & Hyphen | Introduction to neural networks 7/41
  • 26. Example Observations Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 27. Example Training/Test datasets Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 28. Example Training/Test errors Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 29. Example Summary Nathalie Vialaneix & Hyphen | Introduction to neural networks 8/41
  • 30. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the “best” prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefficients in linear models, weights in neural networks, ...) Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 31. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the “best” prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefficients in linear models, weights in neural networks, ...) C can depend on user choices (e.g., achitecture, number of neurons in neural networks) ⇒ hyper-parameters parameters (learned from data) hyper-parameters (can not be learned from data) Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 32. Practical consequences A machine or an algorithm is a method that: start from a given set of prediction functions C use observations (xi, yi)i to pick the “best” prediction function according to what has already been observed. This is often done by the estimation of some parameters (e.g., coefficients in linear models, weights in neural networks, ...) C can depend on user choices (e.g., achitecture, number of neurons in neural networks) ⇒ hyper-parameters parameters (learned from data) hyper-parameters (can not be learned from data) hyper-parameters often control the richness of C: must be carefully tuned to ensure a good accuracy and avoid overfitting Nathalie Vialaneix & Hyphen | Introduction to neural networks 9/41
  • 33. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (∼ 67/33%): the model is estimated with the training dataset and a test error is computed with the remaining observations; Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 34. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (∼ 67/33%) cross validation: split the data into L folds. For each fold, train a model without the data included in the current fold and compute the error with the data included in the fold: the averaging of these L errors is the cross validation error; fold 1 fold 2 fold 3 . . . fold K train without fold 2: Φ−2 error fold 2: err(φ−2 , fold 2) CV error: 1 K l err(Φ−l , fold l) Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 35. Model selection / Fair error evaluation Several strategies can be used to estimate the error of a set C and to be able to select the best family of models C (tuning of hyper-parameters): split the data into a training/test set (∼ 67/33%) cross validation: split the data into L folds. For each fold, train a model without the data included in the current fold and compute the error with the data included in the fold: the averaging of these L errors is the cross validation error; out-of-bag error: create B bootstrap samples (size n taken with replacement in the original sample) and compute a prediction based on Out-Of-Bag samples (see next slide). Nathalie Vialaneix & Hyphen | Introduction to neural networks 10/41
  • 36. Out-of-bag observations, prediction and error OOB (Out-Of Bags) error: error based on the observations not included in the “bag”: (xi, yi) i ∈ T 1 i ∈ T b i T B Φ1 Φb ΦB OOB prediction: ΦB (xi) used to train the model prediction algorithm Nathalie Vialaneix & Hyphen | Introduction to neural networks 11/41
  • 37. Other model selection methods and strategies to avoid overfitting Main idea: Penalize the training error with something that represents the complexity of the model model selection with explicit penalization strategy: minimize (∼) error + complexity Example: BIC/AIC criterion in linear models −2L + log n × p avoiding overfitting with implicit penalization strategy: constrain the parameters w to take their values within a reduced subset min error(w) + λ w with w the parameter of the model (learned) and . typically the 1 or 2 norm Nathalie Vialaneix & Hyphen | Introduction to neural networks 12/41
  • 38. Outline 1 Statistical / machine / automatic learning Background and notations Underfitting / Overfitting Consistency Avoiding overfitting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 13/41
  • 39. What are (artificial) neural networks? Common properties (artificial) “Neural networks”: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 40. What are (artificial) neural networks? Common properties (artificial) “Neural networks”: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; combination (network) of simple elements (neurons). Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 41. What are (artificial) neural networks? Common properties (artificial) “Neural networks”: general name for supervised and unsupervised methods developed in (vague) analogy to the brain; combination (network) of simple elements (neurons). Example of graphical representation: INPUTS OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 14/41
  • 42. Different types of neural networks A neural network is defined by: 1 the network structure; 2 the neuron type. Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 43. Different types of neural networks A neural network is defined by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classification and regression); Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 44. Different types of neural networks A neural network is defined by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classification and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 45. Different types of neural networks A neural network is defined by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classification and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM also sometimes called Kohonen’s maps) or Topographic maps: dedicated to unsupervised problems (clustering), self-organized; . . . Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 46. Different types of neural networks A neural network is defined by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classification and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM also sometimes called Kohonen’s maps) or Topographic maps: dedicated to unsupervised problems (clustering), self-organized; . . . In this talk, focus on MLP. Nathalie Vialaneix & Hyphen | Introduction to neural networks 15/41
  • 47. MLP: Advantages/Drawbacks Advantages classification OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: flexible; good theoretical properties. Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
  • 48. MLP: Advantages/Drawbacks Advantages classification OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: flexible; good theoretical properties. Drawbacks hard to train (high computational cost, especially when d is large); overfit easily; “black box” models (hard to interpret). Nathalie Vialaneix & Hyphen | Introduction to neural networks 16/41
  • 49. References Advised references: [Bishop, 1995, Ripley, 1996] overview of the topic from a learning (more than statistical) perspective [Devroye et al., 1996, Györfi et al., 2002] in dedicated chapters present statistical properties of perceptrons Nathalie Vialaneix & Hyphen | Introduction to neural networks 17/41
  • 50. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 51. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites 2 when total signal is above a given threshold, the neuron is activated connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 52. Analogy to the brain 1 a neuron collects signals from neighboring neurons through its dendrites 2 when total signal is above a given threshold, the neuron is activated 3 ... and a signal is sent to other neurons through the axon connexions which frequently lead to activating a neuron are enforced (tend to have an increasing impact on the destination neuron) Nathalie Vialaneix & Hyphen | Introduction to neural networks 18/41
  • 53. First model of artificial neuron [Mc Culloch and Pitts, 1943, Rosenblatt, 1958, Rosenblatt, 1962] x(1) x(2) x(p) f(x)Σ+ w1 w2 wp w0 f : x ∈ Rp → 1 p j=1 wjx(j)+w0 ≥ 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 19/41
  • 54. (artificial) Perceptron Layers MLP have one input layer (x ∈ Rp ), one output layer (y ∈ R or ∈ {1, . . . , K − 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y ∈ R): INPUTS x = (x(1) , . . . , x(p) ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 55. (artificial) Perceptron Layers MLP have one input layer (x ∈ Rp ), one output layer (y ∈ R or ∈ {1, . . . , K − 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y ∈ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 weights w (1) jk Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 56. (artificial) Perceptron Layers MLP have one input layer (x ∈ Rp ), one output layer (y ∈ R or ∈ {1, . . . , K − 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y ∈ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 57. (artificial) Perceptron Layers MLP have one input layer (x ∈ Rp ), one output layer (y ∈ R or ∈ {1, . . . , K − 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y ∈ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 58. (artificial) Perceptron Layers MLP have one input layer (x ∈ Rp ), one output layer (y ∈ R or ∈ {1, . . . , K − 1} values) and several hidden layers; no connections within a layer; connections between two consecutive layers (feedforward). Example (regression, y ∈ R): INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS 2 hidden layer MLP Nathalie Vialaneix & Hyphen | Introduction to neural networks 20/41
  • 59. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 + w0 (Bias Biais) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 60. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 61. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 Standard activation functions fonctions de transfert / d’activation Biologically inspired: Heaviside function h(z) = 0 if z < 0 1 otherwise. Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 62. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 Standard activation functions Main issue with the Heaviside function: not continuous! Identity h(z) = z Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 63. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 Standard activation functions But identity activation function gives linear model if used with one hidden layer: not flexible enough Logistic function h(z) = 1 1+exp(−z) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 64. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 Standard activation functions Another popular activation function (useful to model positive real numbers) Rectified linear (ReLU) h(z) = max(0, z) Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 65. A neuron in MLP v1 v2 v3 ×w1 ×w2 ×w3 ×w2 ×w1 + w0 General sigmoid sigmoid: nondecreasing function h : R → R such that lim z→+∞ h(z) = 1 lim z→−∞ h(z) = 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 21/41
  • 66. Focus on one-hidden-layer perceptrons (R package nnet) Regression case x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Φ(x) Φ(x) = Q k=1 w (2) k hk x w (1) k + w (0) k + w (2) 0 , with hk a (logistic) sigmoid Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 67. Focus on one-hidden-layer perceptrons (R package nnet) Binary classification case (y ∈ {0, 1}) x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q φ(x) φ(x) = h0   Q k=1 w (2) k hk x w (1) k + w (0) k + w (2) 0   with h0 logistic sigmoid or identity. Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 68. Focus on one-hidden-layer perceptrons (R package nnet) Binary classification case (y ∈ {0, 1}) x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q φ(x) decision with: Φ(x) = 0 if φ(x) < 1/2 1 otherwise Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 69. Focus on one-hidden-layer perceptrons (R package nnet) Extension to any classification problem in {1, . . . , K − 1} x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q φ(x) Straightforward extension to multiple classes with a multiple output perceptron (number of output units equal to K) and a maximum probability rule for the decision. Nathalie Vialaneix & Hyphen | Introduction to neural networks 22/41
  • 70. Theoretical properties of perceptrons This section answers two questions: 1 can we approximate any function g : [0, 1]p → R arbitrary well with a perceptron? Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
  • 71. Theoretical properties of perceptrons This section answers two questions: 1 can we approximate any function g : [0, 1]p → R arbitrary well with a perceptron? 2 when a perceptron is trained with i.i.d. observations from an arbitrary random variable pair (X, Y), is it consistent? (i.e., does it reach the minimum possible error asymptotically when the number of observations grows to infinity?) Nathalie Vialaneix & Hyphen | Introduction to neural networks 23/41
  • 72. Illustration of the universal approximation property Simple examples a function to approximate: g : [0, 1] → sin 1 x+0.1 Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
  • 73. Illustration of the universal approximation property Simple examples a function to approximate: g : [0, 1] → sin 1 x+0.1 trying to approximate (how this is performed is explained later in this talk) this function with MLP having different numbers of neurons on their hidden layer Nathalie Vialaneix & Hyphen | Introduction to neural networks 24/41
  • 74. Sketch of main properties of 1-hidden layer MLP 1 universal approximation [Pinkus, 1999, Devroye et al., 1996, Hornik, 1991, Hornik, 1993, Stinchcombe, 1999]: 1-hidden-layer MLP can approximate arbitrary well any (regular enough) function (but, in theory, the number of neurons can grow arbitrarily for that) 2 consistency [Farago and Lugosi, 1993, Devroye et al., 1996, White, 1990, White, 1991, Barron, 1994, McCaffrey and Gallant, 1994]: 1-hidden-layer MLP are universally consistent with a number of neurons that grows slower than O n log n to +∞ ⇒ the number of neurons controls the richness of the MLP and has to increase with n Nathalie Vialaneix & Hyphen | Introduction to neural networks 25/41
  • 75. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? x(1) x(2) x(p) w(1) w(2)+ w (0) 1 + w (0) Q Φw(x) Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 76. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? Standard approach: minimize the empirical L2 risk: Ln(w) = n i=1 [fw(xi) − yi]2 with yi ∈ R for the regression case yi ∈ {0, 1} for the classification case, with the associated decision rule x → 1{Φw (x)≤1/2}. Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 77. Empirical error minimization Given i.i.d. observations of (X, Y), (xi, yi)i, how to choose the weights w? Standard approach: minimize the empirical L2 risk: Ln(w) = n i=1 [fw(xi) − yi]2 with yi ∈ R for the regression case yi ∈ {0, 1} for the classification case, with the associated decision rule x → 1{Φw (x)≤1/2}. But: Ln(w) is not convex in w ⇒ general optimization problem Nathalie Vialaneix & Hyphen | Introduction to neural networks 26/41
  • 78. Optimization with gradient descent Method: initialize (randomly or with some prior knowledge) the weights w(0) ∈ RQp+2Q+1 Batch approach: for t = 1, . . . , T w(t + 1) = w(t) − µ(t) wLn(w)(w(t)); Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
  • 79. Optimization with gradient descent Method: initialize (randomly or with some prior knowledge) the weights w(0) ∈ RQp+2Q+1 Batch approach: for t = 1, . . . , T w(t + 1) = w(t) − µ(t) wLn(w)(w(t)); online (or stochastic) approach: write Ln(w)(w) = n i=1 [fw(xi) − yi]2 =Ei and for t = 1, . . . , T, randomly pick i ∈ {1, . . . , n} and update: w(t + 1) = w(t) − µ(t) wEi(w(t)). Nathalie Vialaneix & Hyphen | Introduction to neural networks 27/41
  • 80. Discussion about practical choices for this approach batch version converges (from an optimization point of view) to a local minimum of the error for a good choice of µ(t) but convergence can be slow stochastic version is usually inefficient but is useful for large datasets (n large) more efficient algorithms exist to solve the optimization task. The one implemented in the R package nnet uses higher order derivatives (BFGS algorithm) but this is a very active area of research in which many improvements have been made recently in all cases, solutions returned are, at best, local minima which strongly depends on the initialization: using more than one initialization state is advised, as well as checking the variability among a few runs Nathalie Vialaneix & Hyphen | Introduction to neural networks 28/41
  • 81. Gradient backpropagation method [Rumelhart and Mc Clelland, 1986] The gradient backpropagation rétropropagation du gradient principle is used to easily computes gradients in perceptrons (or in other types of neural network): Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
  • 82. Gradient backpropagation method [Rumelhart and Mc Clelland, 1986] The gradient backpropagation rétropropagation du gradient principle is used to easily computes gradients in perceptrons (or in other types of neural network): This way, stochastic gradient descent alternates: a forward step which aims at calculating outputs from all observations xi given a value of the weights w a backward step in which the gradient backpropagation principle is used to obtain the gradient for the current weights w Nathalie Vialaneix & Hyphen | Introduction to neural networks 29/41
  • 83. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Φw(x) initialize weights Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 84. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Φw(x) Forward step: for all k, compute a (1) k = xi w (1) k + w (0) k Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 85. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Φw(x) Forward step: for all k, compute z (1) k = hk (a (1) k ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 86. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Φw(x) = a(2) Forward step: compute a(2) = Q k=1 w (2) k z (1) k + w (2) 0 Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 87. Backpropagation in practice x(1) x(2) x(p) + + w(1) w (0) 1 w (0) Q w(2) Φw(x) = a(2) Backward step: compute ∂Ei ∂w (2) k = δ(2) × zk with δ(2) = ∂Ei ∂a(2) = [h0(a(2))−yi ]2 ∂a(2) = 2h0(a(2) ) × [h0(a(2) ) − yi] Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 88. Backpropagation in practice x(1) x(2) x(p) + + w(2) w (0) 1 w (0) Q w(1) Φw(x) Backward step: ∂Ei ∂w (1) kj = δ (1) k × x (j) i with δ (1) k = ∂Ei ∂a (1) k = ∂Ei ∂a(2) × ∂a(2) ∂a (1) k = δ(2) × w (2) k hk (a (1) k ) Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 89. Backpropagation in practice x(1) x(2) x(p) + + w(1) w(2) w (0) 1 w (0) Q Φw(x) Backward step: ∂Ei ∂w (0) k = δ (1) k Nathalie Vialaneix & Hyphen | Introduction to neural networks 30/41
  • 90. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk ∼ N(0, 1/ √ p) and w (2) k ∼ N(0, 1/ √ Q) Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 91. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk ∼ N(0, 1/ √ p) and w (2) k ∼ N(0, 1/ √ Q) 2 When to stop the algorithm? (gradient descent or alike) Standard choices: bounded T target value of the error ˆLn(w) target value of the evolution ˆLn(w(t)) − ˆLn(w(t + 1)) Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 92. Initialization and stopping of the training algorithm 1 How to initialize weights? Standard choices w (1) jk ∼ N(0, 1/ √ p) and w (2) k ∼ N(0, 1/ √ Q) In the R package nnet, weights are sampled uniformly between [−0.5, 0.5] or between − 1 maxi X (j) i , 1 maxi X (j) i if X(j) is large. 2 When to stop the algorithm? (gradient descent or alike) Standard choices: bounded T target value of the error ˆLn(w) target value of the evolution ˆLn(w(t)) − ˆLn(w(t + 1)) In the R package nnet, a combination of the three criteria is used and tunable. Nathalie Vialaneix & Hyphen | Introduction to neural networks 31/41
  • 93. Strategies to avoid overfitting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 94. Strategies to avoid overfitting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 95. Strategies to avoid overfitting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Weight decay: for Q large enough, penalize the empirical risk with a function of the weights, e.g., ˆLn(w) + λw w Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 96. Strategies to avoid overfitting Properly tune Q with a CV or a bootstrap estimation of the generalization ability of the method Early stopping: for Q large enough, use a part of the data as a validation set and stops the training (gradient descent) when the empirical error computed on this dataset starts to increase Weight decay: for Q large enough, penalize the empirical risk with a function of the weights, e.g., ˆLn(w) + λw w Noise injection: modify the input data with a random noise during the training Nathalie Vialaneix & Hyphen | Introduction to neural networks 32/41
  • 97. Outline 1 Statistical / machine / automatic learning Background and notations Underfitting / Overfitting Consistency Avoiding overfitting 2 (non deep) Neural networks Presentation of multi-layer perceptrons Theoretical properties of perceptrons Learning perceptrons Learning in practice 3 Deep neural networks Overview CNN Nathalie Vialaneix & Hyphen | Introduction to neural networks 33/41
  • 98. What is “deep learning”? INPUTS = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS 2 hidden layer MLP Learning with neural networks except that... the number of neurons is huge!!! Image taken from https://towardsdatascience.com/ Nathalie Vialaneix & Hyphen | Introduction to neural networks 34/41
  • 99. Main types of deep NN deep MLP Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 100. Main types of deep NN deep MLP CNN (Convolutional Neural Network): used in image processing Images taken from [Krizhevsky et al., 2012, Zeiler and Fergus, 2014] Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 101. Main types of deep NN deep MLP CNN (Convolutional Neural Network): used in image processing RNN (Recurrent Neural Network): used for data with a temporal sequence (in natural language processing for instance) Nathalie Vialaneix & Hyphen | Introduction to neural networks 35/41
  • 102. Why such a big buzz around deep learning?... ImageNet results http://www.image-net.org/challenges/LSVRC/ Image by courtesy of Stéphane Canu See also: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ Nathalie Vialaneix & Hyphen | Introduction to neural networks 36/41
  • 103. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 104. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) avoid overfitting with dropouts: randomly removing a fraction p of the neurons (and their connexions) during each step of the training step (larger number of iterations to converge but faster update at each iteration) [Srivastava et al., 2014] INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 105. Improvements in learning parameters optimization algorithm: gradient descent is used but choice of the descent step (µ(t)) has been improved [Ruder, 2016] and mini-batches are also often used to speed the learning (ex: ADAM, [Kingma and Ba, 2017]) avoid overfitting with dropouts: randomly removing a fraction p of the neurons (and their connexions) during each step of the training step (larger number of iterations to converge but faster update at each iteration) [Srivastava et al., 2014] INPUTS x = (x(1) , . . . , x(p) ) Layer 1 Layer 2 y OUTPUTS Nathalie Vialaneix & Hyphen | Introduction to neural networks 37/41
  • 106. Main available implementations Theano Tensor- Flow Caffe PyTorch (and autograd) Keras Date 2008-2017 2015-... 2013-... 2016-... 2017-... Main dev Montreal University (Y. Bengio) Google Brain Berkeley University community driven (Facebook, Google, ...) François Chollet / RStudio Lan- guage(s) Python C++/ Python C++/ Python/ Matlab Python (with strong GPU accelera- tions) high level API on top of Tensor- Flow (and CNTK, Theano...): Python, R Nathalie Vialaneix & Hyphen | Introduction to neural networks 38/41
  • 107. General architecture of CNN Image taken from the article https://doi.org/10.3389/fpls.2017.02235 Nathalie Vialaneix & Hyphen | Introduction to neural networks 39/41
  • 108. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 109. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 110. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 111. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 112. Convolutional layer Images taken from https://towardsdatascience.com Nathalie Vialaneix & Hyphen | Introduction to neural networks 40/41
  • 113. Pooling layer Basic principle: select a subsample of the previous layer values Generally used: max pooling Image taken from Wikimedia Commons, by Aphex34 Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 114. References Barron, A. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning, 14:115–133. Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA. Devroye, L., Györfi, L., and Lugosi, G. (1996). A Probabilistic Theory for Pattern Recognition. Springer-Verlag, New York, NY, USA. Farago, A. and Lugosi, G. (1993). Strong universal consistency of neural network classifiers. IEEE Transactions on Information Theory, 39(4):1146–1151. Györfi, L., Kohler, M., Krzy˙zak, A., and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York, NY, USA. Hornik, K. (1991). Approximation capabilities of multilayer feedfoward networks. Neural Networks, 4(2):251–257. Hornik, K. (1993). Some new results on neural network approximation. Neural Networks, 6(8):1069–1072. Kingma, D. P. and Ba, J. (2017). Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980. Krizhevsky, A., Sutskever, I., and Hinton, Geoffrey, E. (2012). Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 115. ImageNet classification with deep convolutional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Proceedings of Neural Information Processing Systems (NIPS 2012), volume 25. Mc Culloch, W. and Pitts, W. (1943). A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5(4):115–133. McCaffrey, D. and Gallant, A. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks, 7(1):115–133. Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195. Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, DC, USA. Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv: 1609.04747. Rumelhart, D. and Mc Clelland, J. (1986). Parallel Distributed Processing: Exploration in the MicroStructure of Cognition. Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41
  • 116. MIT Press, Cambridge, MA, USA. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958. Stinchcombe, M. (1999). Neural network approximation of continuous functionals and continuous functions on compactifications. Neural Network, 12(3):467–477. White, H. (1990). Connectionist nonparametric regression: multilayer feedforward networks can learn arbitraty mappings. Neural Networks, 3:535–549. White, H. (1991). Nonparametric estimation of conditional quantiles using neural networks. In Proceedings of the 23rd Symposium of the Interface: Computing Science and Statistics, pages 190–199, Alexandria, VA, USA. American Statistical Association. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision (ECCV 2014). Nathalie Vialaneix & Hyphen | Introduction to neural networks 41/41