SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
†
‡
† †‡ †
nofDepth
,96,/4,pool/2
256,pool/2
nv,384
nv,384
256,pool/2
4096
4096
1000
3x3conv,64
3x3conv,64,pool/2
3x3conv,128
3x3conv,128,pool/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
fc,4096
fc,4096
fc,1000
VGG,19layers
(ILSVRC2014)
input
Conv
7x7+2(S)
MaxPool
3x3+2(S)
LocalRespNorm
Conv
1x1+1(V)
Conv
3x3+1(S)
LocalRespNorm
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
AveragePool
7x7+1(V)
FC
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax0
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax1
SoftmaxActivation
softmax2
GoogleNet,22layers
(ILSVRC2014)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x2conv,128,/2
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,256,/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
7x7conv,64,/2,pool/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,512,/2
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
avepool,fc1000
ageRecognition”.arXi
et.al
w1 w2 w3
w1
w2
w3
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE
tion, outputting the classification scores using global average pooling or global max p
from the feature map f (·). However, global average pooling increases in the respons
of entire feature map at specific class due to using an average of all pixel at a featur
On the other hand, global max pooling does not increase the entire feature map at s
class because of using a maximum pixel value in a feature map. Response score fo
class of global average pooling and global max pooling is calculated as follow Eq. (1
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
After outputting the score for each class, the attention of pedestrian and occlusion r
are generated. First, we fuse the multiple channel feature map to one channel. In this
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is
summation of feature map. In softmax weighting, it is weighted the feature maps fo
channel using softmax score by Eq. (2). The softmax weighting can mask the unnec
channel feature map. In SE block fusion, it is weighted the feature maps for each c
using the attention of SE block like Squeeze-and-Excitation Network. After fusing
channel, pedestrian classification and occlusion state attentions are fused. In this wo
calculate the attention by subtracting the occlusion attention from pedestrian classifi
attention. Here, we call the attention the attention map because of containing positi
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and featu
of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m
unnecessary background feature and enhancing the important locations. Converted
map is made of the inner product of attention map and feature map from RoI poolin
perception branch is composed two fully connected layers like Fast R-CNN. The struc
the perception branch is the same as conventional Fast R-CNN, however, our model e
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
90
91
92
93
94
95
96
97
98
99
00
01
02
03
04
05
06
07
08
09
10
11
12
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary
channel feature map. In SE block fusion, it is weighted the feature maps for each channel
using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one
channel, pedestrian classification and occlusion state attentions are fused. In this work, we
calculate the attention by subtracting the occlusion attention from pedestrian classification
attention. Here, we call the attention the attention map because of containing positive and
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
(2)
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and feature map
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
References
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
How Small Network Can Detect Ped
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
M, N (3)
C (4)
Table 1. Classification error on the ILSVRC validation set.
Networks top-1 val. error top-5 val. error
VGGnet-GAP 33.4 12.2
GoogLeNet-GAP 35.0 13.2
AlexNet∗-GAP 44.9 20.9
AlexNet-GAP 51.1 26.3
GoogLeNet 31.9 11.3
VGGnet 31.2 11.4
AlexNet 42.6 19.5
NIN 41.9 19.6
GoogLeNet-GMP 35.6 13.9
Table 2. Localization error on the ILSVRC validation set. Bac
prop refers to using [23] for localization instead of CAM.
Method top-1 val.error top-5 val. error
GoogLeNet-GAP 56.40 43.00
VGGnet-GAP 57.20 45.14
GoogLeNet 60.09 49.34
AlexNet∗-GAP 63.75 49.53
AlexNet-GAP 67.19 52.16
NIN 65.47 54.19
Backprop on GoogLeNet 61.31 50.55
Lall(x) = Eatt(x) + Eper(x)
Eper(x)
Eatt(x)
g(x)
M(x)
g′(x)
g′(x) = (1 + M(x)) ⋅ g(x)
irshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
C San Diego 2
Facebook AI Research
@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
rized network archi-
etwork is constructed
egates a set of trans-
ur simple design re-
architecture that has
is strategy exposes a
ality” (the size of the
factor in addition to
On the ImageNet-1K
under the restricted
ncreasing cardinality
racy. Moreover, in-
han going deeper or
Our models, named
entry to the ILSVRC
secured 2nd place.
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie1
Ross Girshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
1
UC San Diego 2
Facebook AI Research
{s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
Abstract
We present a simple, highly modularized network archi-
tecture for image classification. Our network is constructed
by repeating a building block that aggregates a set of trans-
formations with the same topology. Our simple design re-
sults in a homogeneous, multi-branch architecture that has
only a few hyper-parameters to set. This strategy exposes a
new dimension, which we call “cardinality” (the size of the
set of transformations), as an essential factor in addition to
the dimensions of depth and width. On the ImageNet-1K
dataset, we empirically show that even under the restricted
condition of maintaining complexity, increasing cardinality
is able to improve classification accuracy. Moreover, in-
creasing cardinality is more effective than going deeper or
wider when we increase the capacity. Our models, named
ResNeXt, are the foundations of our entry to the ILSVRC
2016 classification task in which we secured 2nd place.
We further investigate ResNeXt on an ImageNet-5K set and
the COCO detection set, also showing better results than
its ResNet counterpart. The code and models are publicly
available online1
.
1. Introduction
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
in neural networks. Moreover, we argue that the simplicity
of this rule may reduce the risk of over-adapting the hyper-
parameters to a specific dataset. The robustness of VGG-
nets and ResNets has been proven by various visual recog-
nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks
involving speech [42, 30] and language [4, 41, 20].
Unlike VGG-nets, the family of Inception models [38,
17, 39, 37] have demonstrated that carefully designed
v:1611.05431v2[cs.CV]11Apr2017
Densely Connected Convolutional Networks
Gao Huang⇤
Cornell University
gh349@cornell.edu
Zhuang Liu⇤
Tsinghua University
liuzhuang13@mails.tsinghua.edu.cn
Laurens van der Maaten
Facebook AI Research
lvdmaaten@fb.com
Kilian Q. Weinberger
Cornell University
kqw4@cornell.edu
Abstract
Recent work has shown that convolutional networks can
be substantially deeper, more accurate, and efficient to train
if they contain shorter connections between layers close to
the input and those close to the output. In this paper, we
embrace this observation and introduce the Dense Convo-
lutional Network (DenseNet), which connects each layer
to every other layer in a feed-forward fashion. Whereas
traditional convolutional networks with L layers have L
connections—one between each layer and its subsequent
layer—our network has L(L+1)
2 direct connections. For
each layer, the feature-maps of all preceding layers are
used as inputs, and its own feature-maps are used as inputs
into all subsequent layers. DenseNets have several com-
pelling advantages: they alleviate the vanishing-gradient
problem, strengthen feature propagation, encourage fea-
ture reuse, and substantially reduce the number of parame-
ters. We evaluate our proposed architecture on four highly
competitive object recognition benchmark tasks (CIFAR-10,
CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-
nificant improvements over the state-of-the-art on most of
them, whilst requiring less computation to achieve high per-
formance. Code and pre-trained models are available at
https://github.com/liuzhuang13/DenseNet.
1. Introduction
Convolutional neural networks (CNNs) have become
the dominant machine learning approach for visual object
recognition. Although they were originally introduced over
20 years ago [18], improvements in computer hardware and
network structure have enabled the training of truly deep
CNNs only recently. The original LeNet5 [19] consisted of
5 layers, VGG featured 19 [29], and only last year Highway
⇤Authors contributed equally
x0
x1
H1
x2
H2
H3
H4
x3
x4
Figure 1: A 5-layer dense block with a growth rate of k = 4.
Each layer takes all preceding feature-maps as input.
Networks [34] and Residual Networks (ResNets) [11] have
surpassed the 100-layer barrier.
As CNNs become increasingly deep, a new research
problem emerges: as information about the input or gra-
dient passes through many layers, it can vanish and “wash
out” by the time it reaches the end (or beginning) of the
network. Many recent publications address this or related
problems. ResNets [11] and Highway Networks [34] by-
pass signal from one layer to the next via identity connec-
tions. Stochastic depth [13] shortens ResNets by randomly
dropping layers during training to allow better information
and gradient flow. FractalNets [17] repeatedly combine sev-
eral parallel layer sequences with different number of con-
volutional blocks to obtain a large nominal depth, while
maintaining many short paths in the network. Although
these different approaches vary in network topology and
training procedure, they all share a key characteristic: they
create short paths from early layers to later layers.
1
arXiv:1608.06993v5[cs.CV]28Jan2018
tanh
× Σ
f(st)
g(st)
g′(st)
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

Más contenido relacionado

La actualidad más candente

Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )cvpaper. challenge
 
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic DatasetsDeep Learning JP
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Keigo Nishida
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)cvpaper. challenge
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたぱんいち すみもと
 
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face RecognitionArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognitionharmonylab
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選Kazuyuki Miyazawa
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingYasunori Ozaki
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge
 
Sift特徴量について
Sift特徴量についてSift特徴量について
Sift特徴量についてla_flance
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)Deep Learning JP
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由Yoshitaka Ushiku
 
SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph GenerationDeep Learning JP
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説tancoro
 

La actualidad más candente (20)

Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )
 
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
 
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face RecognitionArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者
 
Sift特徴量について
Sift特徴量についてSift特徴量について
Sift特徴量について
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
 
SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
数式からみるWord2Vec
数式からみるWord2Vec数式からみるWord2Vec
数式からみるWord2Vec
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 

Similar a [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEThiyagarajan G
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoderijsrd.com
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfNesrine Wagaa
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingAndrea Tassi
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphspione30
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysislalitxp
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problemAnkit Katiyar
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONVLSICS Design
 
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)MichaelDang47
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...zammok
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...cscpconf
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationVLSICS Design
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 

Similar a [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network (20)

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service Multicasting
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
UDSLF
UDSLFUDSLF
UDSLF
 

Más de Hiroshi Fukui

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural NetworkHiroshi Fukui
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep LearningHiroshi Fukui
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向Hiroshi Fukui
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたHiroshi Fukui
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料Hiroshi Fukui
 

Más de Hiroshi Fukui (6)

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural Network
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみた
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料
 

Último

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 

Último (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

  • 2. nofDepth ,96,/4,pool/2 256,pool/2 nv,384 nv,384 256,pool/2 4096 4096 1000 3x3conv,64 3x3conv,64,pool/2 3x3conv,128 3x3conv,128,pool/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 fc,4096 fc,4096 fc,1000 VGG,19layers (ILSVRC2014) input Conv 7x7+2(S) MaxPool 3x3+2(S) LocalRespNorm Conv 1x1+1(V) Conv 3x3+1(S) LocalRespNorm MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat AveragePool 7x7+1(V) FC Conv 1x1+1(S) FC FC SoftmaxActivation softmax0 Conv 1x1+1(S) FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2 GoogleNet,22layers (ILSVRC2014) KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015. 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x2conv,128,/2 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,256,/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 7x7conv,64,/2,pool/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,512,/2 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 avepool,fc1000 ageRecognition”.arXi
  • 5. 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE tion, outputting the classification scores using global average pooling or global max p from the feature map f (·). However, global average pooling increases in the respons of entire feature map at specific class due to using an average of all pixel at a featur On the other hand, global max pooling does not increase the entire feature map at s class because of using a maximum pixel value in a feature map. Response score fo class of global average pooling and global max pooling is calculated as follow Eq. (1 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), After outputting the score for each class, the attention of pedestrian and occlusion r are generated. First, we fuse the multiple channel feature map to one channel. In this we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is summation of feature map. In softmax weighting, it is weighted the feature maps fo channel using softmax score by Eq. (2). The softmax weighting can mask the unnec channel feature map. In SE block fusion, it is weighted the feature maps for each c using the attention of SE block like Squeeze-and-Excitation Network. After fusing channel, pedestrian classification and occlusion state attentions are fused. In this wo calculate the attention by subtracting the occlusion attention from pedestrian classifi attention. Here, we call the attention the attention map because of containing positi negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and featu of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m unnecessary background feature and enhancing the important locations. Converted map is made of the inner product of attention map and feature map from RoI poolin perception branch is composed two fully connected layers like Fast R-CNN. The struc the perception branch is the same as conventional Fast R-CNN, however, our model e 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary channel feature map. In SE block fusion, it is weighted the feature maps for each channel using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one channel, pedestrian classification and occlusion state attentions are fused. In this work, we calculate the attention by subtracting the occlusion attention from pedestrian classification attention. Here, we call the attention the attention map because of containing positive and negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i (2) 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and feature map 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion References 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 How Small Network Can Detect Ped Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) M, N (3) C (4)
  • 6. Table 1. Classification error on the ILSVRC validation set. Networks top-1 val. error top-5 val. error VGGnet-GAP 33.4 12.2 GoogLeNet-GAP 35.0 13.2 AlexNet∗-GAP 44.9 20.9 AlexNet-GAP 51.1 26.3 GoogLeNet 31.9 11.3 VGGnet 31.2 11.4 AlexNet 42.6 19.5 NIN 41.9 19.6 GoogLeNet-GMP 35.6 13.9 Table 2. Localization error on the ILSVRC validation set. Bac prop refers to using [23] for localization instead of CAM. Method top-1 val.error top-5 val. error GoogLeNet-GAP 56.40 43.00 VGGnet-GAP 57.20 45.14 GoogLeNet 60.09 49.34 AlexNet∗-GAP 63.75 49.53 AlexNet-GAP 67.19 52.16 NIN 65.47 54.19 Backprop on GoogLeNet 61.31 50.55
  • 7.
  • 8. Lall(x) = Eatt(x) + Eper(x) Eper(x) Eatt(x)
  • 9.
  • 10. g(x) M(x) g′(x) g′(x) = (1 + M(x)) ⋅ g(x)
  • 11.
  • 12.
  • 13. irshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 C San Diego 2 Facebook AI Research @ucsd.edu {rbg,pdollar,kaiminghe}@fb.com rized network archi- etwork is constructed egates a set of trans- ur simple design re- architecture that has is strategy exposes a ality” (the size of the factor in addition to On the ImageNet-1K under the restricted ncreasing cardinality racy. Moreover, in- han going deeper or Our models, named entry to the ILSVRC secured 2nd place. 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension Aggregated Residual Transformations for Deep Neural Networks Saining Xie1 Ross Girshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 1 UC San Diego 2 Facebook AI Research {s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com Abstract We present a simple, highly modularized network archi- tecture for image classification. Our network is constructed by repeating a building block that aggregates a set of trans- formations with the same topology. Our simple design re- sults in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, in- creasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online1 . 1. Introduction 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension in neural networks. Moreover, we argue that the simplicity of this rule may reduce the risk of over-adapting the hyper- parameters to a specific dataset. The robustness of VGG- nets and ResNets has been proven by various visual recog- nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks involving speech [42, 30] and language [4, 41, 20]. Unlike VGG-nets, the family of Inception models [38, 17, 39, 37] have demonstrated that carefully designed v:1611.05431v2[cs.CV]11Apr2017 Densely Connected Convolutional Networks Gao Huang⇤ Cornell University gh349@cornell.edu Zhuang Liu⇤ Tsinghua University liuzhuang13@mails.tsinghua.edu.cn Laurens van der Maaten Facebook AI Research lvdmaaten@fb.com Kilian Q. Weinberger Cornell University kqw4@cornell.edu Abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convo- lutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several com- pelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage fea- ture reuse, and substantially reduce the number of parame- ters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig- nificant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high per- formance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet. 1. Introduction Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [18], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [19] consisted of 5 layers, VGG featured 19 [29], and only last year Highway ⇤Authors contributed equally x0 x1 H1 x2 H2 H3 H4 x3 x4 Figure 1: A 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature-maps as input. Networks [34] and Residual Networks (ResNets) [11] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: as information about the input or gra- dient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets [11] and Highway Networks [34] by- pass signal from one layer to the next via identity connec- tions. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets [17] repeatedly combine sev- eral parallel layer sequences with different number of con- volutional blocks to obtain a large nominal depth, while maintaining many short paths in the network. Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers. 1 arXiv:1608.06993v5[cs.CV]28Jan2018
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.