The reversible residual network

•

1 recomendación•261 vistas

ThyrixYang1

The reversible residual network paper reading.

Ciencias

The Reversible Residual Network:
Backpropagation Without Storing Activations
Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B.
Grosse
presentation by Jiaqi Yang
LAMDA Group

Idea
Deep residual networks (ResNets) are the state-of-the-art
architecture across multiple computer vision tasks. The key
architectural innovation behind ResNets was the residual block.
Memory consumption is a bottleneck of deep neural networks, as
one needs to store the activations in order to calculate gradients
using backpropagation.
If we can restore activation from outputs, then backpropagation can
be as memory efﬁcient as forward pass.
1

Related work
Trade memory with computation.
Checkpointing: divide to O(
√
n) blocks, reduce memory to O(
√
n).
Exploit the idea of checkpointing recursively:
g(n) = k + g(n/(k + 1)) =⇒ g(n) = klogk+1(n).
k = 1 =⇒ g(n) = log2(n).
computational complexity: O(nlogn).
3

ResNet
One of the main difﬁculties in training very deep networks is the
problem of exploding and vanishing gradients.
residual blocks:
y = x + f(x)
The basic and bottleneck residual block:
a(x) = ReLU(BN(x))
ck = Convk×k(a(x))
Basic(x) = c3(c3(x))
Bottleneck(x) = c1(c3(c1(x)))
4

Reversible Residual Blocks
Partition the units in each layer into two groups, denoted x1 and x2.
Partition the channels.
Each reversible block takes inputs (x1, x2) and produces outputs
(y1, y2).
y1 = x1 + f(x2)
y2 = x2 + g(y1)
Each layer’s activations can be reconstructed from the next layer’s
activations:
x2 = y2 − g(y1)
x1 = y1 − f(x2)
5

Extend to RNN
Reversible Recurrent Neural Networks (NIPS 2018).
Trouble: forget-gate
ht
= zt
⊙ ht−1
+ (1 − zt
) ⊙ gt
The forget gate make it hard to use the same idea directly.
Drop the forget-gate?
ht
= ht−1
+ (1 − zt
) ⊙ gt
7

$Extend to RNN Simply drop the forget gate will harm performance (they call it: Impossibility of No Forgetting), show by repeat. Deal with ﬁxed point math explicitly (still need to tolerate some loss) =⇒ Gradient-based Hyperparameter Optimization through Reversible Learning (ICML 2015). Attention mechanism: crop the cell state to a fraction. 8$

Más contenido relacionado

La actualidad más candente

2012 mdsp pr04 monte carlonozomuhamada

Circular ConvolutionSarang Joshi

Project PPTDhaarna Singh

Pixel RNN to Pixel CNN++Dongheon Lee

(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE

Low-rank response surface in numerical aerodynamicsAlexander Litvinenko

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa

About Unsupervised Image-to-Image TranslationMehdi Shibahara

Visualizing Data Using t-SNEDavid Khosid

High Dimensional Data Visualization using t-SNEKai-Wen Zhao

ISMVL2018: A Ternary Weight Binary Input Convolutional Neural NetworkHiroki Nakahara

VggheedaeKwon

fast-matmul-ppopp2015Austin Benson

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Bayesian Neural Networksm.a.kirn

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Universitat Politècnica de Catalunya

Visualization using tSNEYan Xu

2013.10.24 big datavisualizationSean Kandel

Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringSOYEON KIM

La actualidad más candente (20)

2012 mdsp pr04 monte carlo

Circular Convolution

Project PPT

Pixel RNN to Pixel CNN++

(Paper Review)3D shape reconstruction from sketches via multi view convolutio...

Low-rank response surface in numerical aerodynamics

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

Neural Networks: Radial Bases Functions (RBF)

About Unsupervised Image-to-Image Translation

Visualizing Data Using t-SNE

High Dimensional Data Visualization using t-SNE

ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network

Vgg

fast-matmul-ppopp2015

Semantic segmentation with Convolutional Neural Network Approaches

Bayesian Neural Networks

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018

Visualization using tSNE

2013.10.24 big datavisualization

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Similar a The reversible residual network

Neural Networks: Support Vector machinesMostafa G. M. Mostafa

Hardware Acceleration for Machine LearningCastLabKAIST

Digit recognizer by convolutional neural networkDing Li

Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Universitat Politècnica de Catalunya

ASCC2022_JunsooKim_220530_.pdfJunsoo Kim

Neural network basic and introduction of Deep learningTapas Majumdar

ImageNet classification with deep convolutional neural networks(2012)WoochulShin10

convolutional_neural_networks in deep learningssusere5ddd6

Introduction to Neural Networks and Deep LearningVahid Mirjalili

Eye deepsveitser

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

[251] implementing deep learning using cu dnnNAVER D2

chap4_ann.pptxImXaib

2022-01-17-Rethinking_Bisenet.pptxJAEMINJEONG5

Lecture 5: Convolutional Neural Network ModelsMohamed Loey

Introduction to Applied Machine LearningSheilaJimenezMorejon

FPL15 talk: Deep Convolutional Neural Network on FPGAHiroki Nakahara

From RNN to neural networks for cyclic undirected graphstuxette

Fast Algorithms for Quantized Convolutional Neural NetworksNECST Lab @ Politecnico di Milano

Deep Learning for Computer Vision: Deep Networks (UPC 2016)Universitat Politècnica de Catalunya

Similar a The reversible residual network (20)

Neural Networks: Support Vector machines

Hardware Acceleration for Machine Learning

Digit recognizer by convolutional neural network

Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018

ASCC2022_JunsooKim_220530_.pdf

Neural network basic and introduction of Deep learning

ImageNet classification with deep convolutional neural networks(2012)

convolutional_neural_networks in deep learning

Introduction to Neural Networks and Deep Learning

Eye deep

Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)

[251] implementing deep learning using cu dnn

chap4_ann.pptx

2022-01-17-Rethinking_Bisenet.pptx

Lecture 5: Convolutional Neural Network Models

Introduction to Applied Machine Learning

FPL15 talk: Deep Convolutional Neural Network on FPGA

From RNN to neural networks for cyclic undirected graphs

Fast Algorithms for Quantized Convolutional Neural Networks

Deep Learning for Computer Vision: Deep Networks (UPC 2016)

Último

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

The Philosophy of ScienceUniversity of Hertfordshire

Biological Classification BioHack (3).pdfmuntazimhurra

Forensic Biology & Its biological significance.pdfrohankumarsinghrore1

Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823

GBSN - Microbiology (Unit 2)Areesha Ahmad

Nanoparticles synthesis and characterization kaibalyasahoo82800

GBSN - Microbiology (Unit 1)Areesha Ahmad

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

The reversible residual network

1. The Reversible Residual Network: Backpropagation Without Storing Activations Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse presentation by Jiaqi Yang LAMDA Group

2. Idea Deep residual networks (ResNets) are the state-of-the-art architecture across multiple computer vision tasks. The key architectural innovation behind ResNets was the residual block. Memory consumption is a bottleneck of deep neural networks, as one needs to store the activations in order to calculate gradients using backpropagation. If we can restore activation from outputs, then backpropagation can be as memory efﬁcient as forward pass. 1

3. Related work 2

4. Related work Trade memory with computation. Checkpointing: divide to O( √ n) blocks, reduce memory to O( √ n). Exploit the idea of checkpointing recursively: g(n) = k + g(n/(k + 1)) =⇒ g(n) = klogk+1(n). k = 1 =⇒ g(n) = log2(n). computational complexity: O(nlogn). 3

5. ResNet One of the main difﬁculties in training very deep networks is the problem of exploding and vanishing gradients. residual blocks: y = x + f(x) The basic and bottleneck residual block: a(x) = ReLU(BN(x)) ck = Convk×k(a(x)) Basic(x) = c3(c3(x)) Bottleneck(x) = c1(c3(c1(x))) 4

6. Reversible Residual Blocks Partition the units in each layer into two groups, denoted x1 and x2. Partition the channels. Each reversible block takes inputs (x1, x2) and produces outputs (y1, y2). y1 = x1 + f(x2) y2 = x2 + g(y1) Each layer’s activations can be reconstructed from the next layer’s activations: x2 = y2 − g(y1) x1 = y1 − f(x2) 5

7. Algorithm 6

8. Extend to RNN Reversible Recurrent Neural Networks (NIPS 2018). Trouble: forget-gate ht = zt ⊙ ht−1 + (1 − zt ) ⊙ gt The forget gate make it hard to use the same idea directly. Drop the forget-gate? ht = ht−1 + (1 − zt ) ⊙ gt 7

9. Extend to RNN Simply drop the forget gate will harm performance (they call it: Impossibility of No Forgetting), show by repeat. Deal with ﬁxed point math explicitly (still need to tolerate some loss) =⇒ Gradient-based Hyperparameter Optimization through Reversible Learning (ICML 2015). Attention mechanism: crop the cell state to a fraction. 8

The reversible residual network

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a The reversible residual network

Similar a The reversible residual network (20)

Último

Último (20)

The reversible residual network