Modules & architectures

Modules &
Architectures
Yann LeCun
NYU - Courant Institute & Center for Data Science
Facebook AI Research
Deep Learning, NYU Spring 2021

Y. LeCun
Deep Learning: Assembling differentiable modules
Deep Learning Definition:
Assembling networks of differentiable
modules and training them with a gradient-
based optimization method.
Any graph of modules is fine
As long as there exist a partial order on the
modules, and we can backpropagate gradient
with respect to the relevant variables
If the graph has loops…
we need to “unroll” them.
Recurrent networks and bakprop through time

Y. LeCun
Simple Neural Net
Object-oriented version
Uses predefined nn.Linear class,
(which includes a bias vector)
Uses torch.relu function
State variables are temporary

Activation Functions
In PyTorch & how to use them

Y. LeCun
LeakyReLU – nn.LeakyReLU()

Y. LeCun
Softplus – nn.Softplus()

Y. LeCun
CELU – nn.CELU()
Barron (2017) Continuously differentiable exponential linear units

Y. LeCun
SELU – nn.SELU()
Klambauer, Unterthiner, Mayr, Hochreiter (2017) Self-normalizing neural networks

Y. LeCun
Sigmoid – nn.Sigmoid()

Y. LeCun
Softsign – nn.Softsign()

Y. LeCun
Hardtanh – nn.Hardtanh()

Y. LeCun
Threshold – nn.Threshold()

Y. LeCun
Tanhshrink – nn.Tanhshrink()

Y. LeCun
Softshrink – nn.Softshrink()

Y. LeCun
Hardshrink – nn.Hardshrink()

Y. LeCun
LogSigmoid – nn.LogSigmoid()

Y. LeCun
Softmin – nn.Softmin()

Y. LeCun
Softmax – nn.Softmax()

Y. LeCun
LogSoftmax – nn.LogSoftmax()

Cost Functions
In PyTorch & how to use them

Y. LeCun
CE – nn.CrossEntropyLoss()

Y. LeCun
nn.AdaptiveLogSoftmaxWithLoss()

Y. LeCun
KLDiv – nn.KLDivLoss()

Y. LeCun
Poisson – nn.PoissonNLLLoss()

Y. LeCun
BCE – nn.BCEWithLogitsLoss()

Y. LeCun
Ranking – nn.MarginRankingLoss()

Y. LeCun
nn.TripletMarginLoss()

Y. LeCun
nn.MultiLabelMarginLoss()

Y. LeCun
nn.MultiLabelSoftMarginLoss()

Y. LeCun
nn.HingeEmbeddingLoss()

Y. LeCun
nn.CosineEmbeddingLoss()

Architectures
Arranging modules

Y. LeCun
Multiplicative Modules
Quadratic layer, product units, Sigma-Pi units
Attention module
softmax
+

Y. LeCun
Mixture of Experts
Attention module “Switches” expert networks
softmax
+
Expert2
Expert1
Gater

Y. LeCun
Parameter transformations
When the parameter vector is the output of a function
G(x,w)
y
x y
C(y,y)
H(u)
u w

Y. LeCun
Simple parameter transform: weight sharing
Function H(u) replicates one component of u into multiple
components of w
H is like a “Y” branch.
Gradients are summed in the backprop
The gradients w.r.t. shared parameters are added.
G(x,w)
y
x y
C(y,y)
H(u)
u w

Y. LeCun
Shared Weights for Motif Detection
Detecting motifs anywhere on an input
w G(x,w) G(x,w) G(x,w) G(x,w) G(x,w)
MAX

Modules & architectures

Recomendados

Recomendados

Más contenido relacionado

Similar a Modules & architectures

Similar a Modules & architectures (20)

Último

Último (20)

Modules & architectures