SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Pyramid Scene Parsing Network
Hengshuang Zhao1
, Jianping Shi2
, Xiaojuan Qi1
,
Xiaogang Wang1
, Jiaya Jia 1
1
The Chinese University of Hong Kong, 2
SenseTime Group Limited
Presentation: Shunta Saito
Slide: Powered by Deckset
(c) Preferred Networks 1
Summary
• Introduce Pyramid Pooling Module for better context grasp with sub-region awareness
(c) Preferred Networks 2
Why did I choose this paper?
• Presented in CVPR 2017
• 1st place in ImageNet Scene Parsing Challenge
2016 (ADE20K)
• was 1st place in Cityscapes leaderboard
• now it's in 2nd place (I noticed this last week!)
(c) Preferred Networks 3
Agenda
1. Common building blocks in semantic segmentation
2. Major Issue
3. Prior Work
4. Pyramid Pooling Module
5. Experiment results
(c) Preferred Networks 4
Semantic Segmentation
• Predict pixel-wise labels from natural
images
• Each pixel in an image belongs to an
object class
• So it's not instance-aware !
(c) Preferred Networks 5
Common Building Blocks (1)
Fully convolutional network (FCN)1
• A deep convolutional neural network
which doesn't include any fully-
connected layers
• Almost all recent methods are based
on FCN
• Typically pre-trained with ImageNet
under classification problem setting
1
"Fully Convolutional Networks for Semantic Segmentation", PAMI 2016
(c) Preferred Networks 6
Common Building Blocks (2)
Dilated convolution2
• Widen receptive field without reducing
feature map resolution
• Important for leveraging global context
prior efficiently
2
"Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016
(c) Preferred Networks 7
Common Building Blocks (3)
Multi-scale feature ensemble
• Higher-layer feature contains more
semantic meaning and less location
information
• Combining multi-scale features can
improve the performance3
3
"Hypercolumns for Object Segmentation and Fine-grained Localization",
CVPR 2015
(c) Preferred Networks 8
Common Building Blocks (4)
Conditional random field (CRF)
• Post-processing to refine the
segmentation result (DeepLab4
)
• Some following methods refined network
via end-to-end modeling (DPN5
, CRF as
RNN6
, Detections and Superpixels7
)
7
"Higher order conditional random fields in deep neural networks", ECCV
2016
6
"Conditional random fields as recurrent neural networks", ICCV 2015
5
"Semantic image segmentation via deep parsing network", ICCV 2015
4
"Semantic image segmentation with deep convolutional nets and fully
connected crfs", ICLR 2015
(c) Preferred Networks 9
Common Building Blocks (5)
Global average pooling (GAP)
• ParsenNet8
proved that global average
pooling with FCN can improve semantic
segmentation results
• But the global descriptors used in the
paper are not representative enough for
some challenging datasets like ADE20K
8
"Parsenet: Looking wider to see better", ICLR 2016
(c) Preferred Networks 10
Major Issue (1)
Mismatched relationship
• Co-occurrent visual patterns imply some
contexts
• e.g., an airplane is likely to fly in sky
while not over a road
• Lack of the ability to collect contextual
information increases the chance of
misclassification
• In the right figure, FCN predicts the boat
in the yellow box as a "car" based on its
appearance
(c) Preferred Networks 11
Major Issue (2)
Confusing Classes
• There are confusing classes in major datasets: field
and earth; mountain and hill; wall, house, building
and skyscraper, etc.
• The expert human annotator still makes 17.6%
pixel error for ADE20K9
• FCN predicts the object in the box as part of
skyscraper and part of building but the whole object
should be either skyscraper or building, not both
• Utilizing the relationship between classes is
important
9
"Semantic understanding of scenes through the ADE20K dataset",
CVPR 2017
(c) Preferred Networks 12
Major Issue (3)
Inconspicuous Classes
• Small objects like streetlight and
signboard are inconspicuous and hard
to find while they may be important
• Big objects may appear in
discontinuous, but FCN couldn't label
the pillow which has similar
appearance with the sheet correctly
• To improve performance for small or
very big objects, sub-regions should be
paid more attention
(c) Preferred Networks 13
Summary of Issues
• Use co-occurrent visual patterns as context
• Consider relationship between classes
• Sub-regions should be paid more attention
(c) Preferred Networks 14
Prior Work
Global Average Pooling (GAP)10
• Receptive field of ResNet is already
larger than the input image, so GAP
sounds good to summarize the all
information
• But, pixels in an image may be various
objects which have different sizes, so
directly fusing them to form a single
vector may lose the spatial relation
and cause ambiguity
10
"Parsenet: Looking wider to see better", ICLR 2016
(c) Preferred Networks 15
Prior Work
Spatial Pyramid Pooling (SPP)11
• Pooling with different kernel/stride
sizes to the feature maps
• Then flatten and concatenate the
pooling results to make fix-length
representation
• There still is context information loss
11
"Spatial pyramid pooling in deep convolutional networks for visual
recognition", ECCV 2014
(c) Preferred Networks 16
Pyramid Pooling Module
• A hierarchical global prior, containing information with different scales and varying among different sub-regions
• Pyramid Pooling Module for global scene prior constructed on the top of the final-layer-feature-map
(c) Preferred Networks 17
Pyramid Pooling Module
• Use 1x1 conv to reduce the number of channels
• Then upsample (bilinear) them to the same size and concatenate all
(c) Preferred Networks 18
Implementation details (1)
• The average pooling are four levels, 1x1, 2x2,
3x3, and 6x6 (ksize, stride)
• Pre-trained ResNet model with dilated
convolution is used as the feature extractor
(the output size will be 1/8 of input image)
• They use two losses;
1. softmax loss between final layer and labels
2. softmax loss between an intermediate
output of ResNet and labels12
(weighted by
0.4)
12
"Relay backpropagation for effective learning of deep convolutional
neural networks", ECCV 2016
(c) Preferred Networks 19
Implementation details (2)
Optimization
MomentumSGD with weight
deacy
LR Scheduling
Momentum: 0.9
Weight decay: 0.0001 where
(c) Preferred Networks 20
Implementation details (3)
Training iteration Dataset augmentation
ADE20K: 150K Random mirror
PASCAL VOC: 30K Random resize between 0.5 and 2
Cityscapes: 90K Random rotation betwee -10 and 10
degrees
Random Gaussian blur for ADE20K
and PASCAL VOC
(c) Preferred Networks 21
Implementation detailts (4)
• An appropriately large "cropsize" can yield good performance
• "batchsize" in the batch normalization layer is of great importance:
Cropsize Batchsize
ADE20K: 473 x 473 16 for all dataset
PASCAL VOC: 473 x 473
Cityscapes: 713 x 713
(c) Preferred Networks 22
Implementation detailts (5)
MultiNode Batch Normalization
• To increase the "batchsize" in batch
normalization layers, they used custom
BN layer applied on data gathered from
multiple GPUs using OpenMPI
• We have Akiba-san's implementation of
multi-node batch normalization !
(c) Preferred Networks 23
ImageNet Scene Parsing
Challenge 2016
• Dataset: ADE20K
• 150 classes and 1,038 image-level
labels
• 20,000/2,000/3,000 pixel-level labels
for train/val/test
(c) Preferred Networks 24
Ablation Study for
Pyramid Pooling Module
• Average pooling works better than max
pooling in all settings
• Pooling with pyramid parsing
outperforms that using global pooling
• With dimension reduction (DR; reducing
the number of channels after pyramid
pooling), the performance is further
enhanced
(c) Preferred Networks 25
Ablation Study for
Auxiliary Loss
• Set the auxiliary loss weight between
0 and 1 and compared the final results
• yields the best performance
(c) Preferred Networks 26
Ablation Study for the
depth of ResNet
Deeper is better
(c) Preferred Networks 27
More Detailed
Performance Analysis
Additional processing Improvement (% in mIoU)
Data augmentation (DA) +1.54
Auxiliary loss (AL) +1.41
Pyramid pooling module (PSP) +4.45
Use deeper ResNet (50 to 269) +2.13
Multi-scale testing (MS) +1.13
• For multi-scale testing, they create prediction at 6 different
scales (0.5, 0.75, 1, 1.25, 1.5, and 1.75) and take average of them.
(c) Preferred Networks 28
Results on PASCAL VOC
2012
• Extended with Semantic Boundaries Dataset (SBD) 13
, they
used
• 10582, 1449, and 1456 images for train/val/test
• Mismatched relationship: For "aeroplane" and "sky" in the
second and third rows, PSPNet finds missing parts.
• Confusing classes: For "cows" in row one, our baseline
model treats it as "horse" and "dog" while PSPNet corrects
these errors
• Conspicuous objects: For "person", "bottle" and "plant" in
following rows, PSPNet performs well on these small-size-
object classes in the images compared to the baseline model
13
"Semantic Contours from Inverse Detectors", ICCV 2011, http://
home.bharathh.info/pubs/codes/SBD/download.html
(c) Preferred Networks 29
Results on PASCAL VOC 2012
• Comparing PSPNet with previous best-performing methods on the testing set based on two settings, i.e., with or without pre-training
on MS-COCO dataset
(c) Preferred Networks 30
Results on Cityscapes
• Cityscapes dataset consits of 2975, 500, and 1525 train/val/tests images (19
classes)
• 20000 coarsely annotated images are available (in the table below, ‡ means it's used)
(c) Preferred Networks 31
Thank you for your attention
• The official repository doesn't include any training code
• My own implementation for both training and testing have been ready:
• mitmul/chainer-pspnet: https://github.com/mitmul/chainer-pspnet
• Now I'm training a model to ensure the reproducibility
• Once finished the reproduction work, I'll send the code to ChainerCV
• In semantic segmentation task,
• input image is large (713 for PSPNet on cityscapes)
• appropriate batchsize, e.g., 16 or so, is important for batch normalization
• As the authors said, distributed batch normalization seems to be important in multi-GPU training
• So, now ChainerMN is necessary tool for such large-scale dataset and deep models
• It means that we need more GPU machines connected with InfiniBand
(c) Preferred Networks 32

Más contenido relacionado

La actualidad más candente

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation범준 김
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimationtaeseon ryu
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningWei Yang
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎Takumi Ohkuma
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"Deep Learning JP
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―Yosuke Shinya
 
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...Ken Sakurada
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーnlab_utokyo
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)Takanori Ogata
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII
 

La actualidad más candente (20)

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Survey on Monocular Depth Estimation
Survey on Monocular Depth EstimationSurvey on Monocular Depth Estimation
Survey on Monocular Depth Estimation
 
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose EstimationHRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
HRNET : Deep High-Resolution Representation Learning for Human Pose Estimation
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 

Destacado

これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座NVIDIA Japan
 
Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Taikai Takeda
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)Takuma Yagi
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksEiichi Matsumoto
 

Destacado (8)

これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座
 
Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会
 
Semantic segmentation2
Semantic segmentation2Semantic segmentation2
Semantic segmentation2
 
CVPR 2017 速報
CVPR 2017 速報CVPR 2017 速報
CVPR 2017 速報
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
 

Similar a [unofficial] Pyramid Scene Parsing Network (CVPR 2017)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangLihang Li
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...ssuser4b1f48
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
URBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxURBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxbalajimankena
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...cscpconf
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware applicationcsandit
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainJoonhyung Lee
 

Similar a [unofficial] Pyramid Scene Parsing Network (CVPR 2017) (20)

PointNet
PointNetPointNet
PointNet
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihang
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
paper
paperpaper
paper
 
URBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxURBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptx
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Portfolio
PortfolioPortfolio
Portfolio
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware application
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
 

Más de Shunta Saito

[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...Shunta Saito
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksShunta Saito
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusionShunta Saito
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningShunta Saito
 
強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論Shunta Saito
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回Shunta Saito
 

Más de Shunta Saito (10)

[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusion
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learning
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

  • 1. Pyramid Scene Parsing Network Hengshuang Zhao1 , Jianping Shi2 , Xiaojuan Qi1 , Xiaogang Wang1 , Jiaya Jia 1 1 The Chinese University of Hong Kong, 2 SenseTime Group Limited Presentation: Shunta Saito Slide: Powered by Deckset (c) Preferred Networks 1
  • 2. Summary • Introduce Pyramid Pooling Module for better context grasp with sub-region awareness (c) Preferred Networks 2
  • 3. Why did I choose this paper? • Presented in CVPR 2017 • 1st place in ImageNet Scene Parsing Challenge 2016 (ADE20K) • was 1st place in Cityscapes leaderboard • now it's in 2nd place (I noticed this last week!) (c) Preferred Networks 3
  • 4. Agenda 1. Common building blocks in semantic segmentation 2. Major Issue 3. Prior Work 4. Pyramid Pooling Module 5. Experiment results (c) Preferred Networks 4
  • 5. Semantic Segmentation • Predict pixel-wise labels from natural images • Each pixel in an image belongs to an object class • So it's not instance-aware ! (c) Preferred Networks 5
  • 6. Common Building Blocks (1) Fully convolutional network (FCN)1 • A deep convolutional neural network which doesn't include any fully- connected layers • Almost all recent methods are based on FCN • Typically pre-trained with ImageNet under classification problem setting 1 "Fully Convolutional Networks for Semantic Segmentation", PAMI 2016 (c) Preferred Networks 6
  • 7. Common Building Blocks (2) Dilated convolution2 • Widen receptive field without reducing feature map resolution • Important for leveraging global context prior efficiently 2 "Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016 (c) Preferred Networks 7
  • 8. Common Building Blocks (3) Multi-scale feature ensemble • Higher-layer feature contains more semantic meaning and less location information • Combining multi-scale features can improve the performance3 3 "Hypercolumns for Object Segmentation and Fine-grained Localization", CVPR 2015 (c) Preferred Networks 8
  • 9. Common Building Blocks (4) Conditional random field (CRF) • Post-processing to refine the segmentation result (DeepLab4 ) • Some following methods refined network via end-to-end modeling (DPN5 , CRF as RNN6 , Detections and Superpixels7 ) 7 "Higher order conditional random fields in deep neural networks", ECCV 2016 6 "Conditional random fields as recurrent neural networks", ICCV 2015 5 "Semantic image segmentation via deep parsing network", ICCV 2015 4 "Semantic image segmentation with deep convolutional nets and fully connected crfs", ICLR 2015 (c) Preferred Networks 9
  • 10. Common Building Blocks (5) Global average pooling (GAP) • ParsenNet8 proved that global average pooling with FCN can improve semantic segmentation results • But the global descriptors used in the paper are not representative enough for some challenging datasets like ADE20K 8 "Parsenet: Looking wider to see better", ICLR 2016 (c) Preferred Networks 10
  • 11. Major Issue (1) Mismatched relationship • Co-occurrent visual patterns imply some contexts • e.g., an airplane is likely to fly in sky while not over a road • Lack of the ability to collect contextual information increases the chance of misclassification • In the right figure, FCN predicts the boat in the yellow box as a "car" based on its appearance (c) Preferred Networks 11
  • 12. Major Issue (2) Confusing Classes • There are confusing classes in major datasets: field and earth; mountain and hill; wall, house, building and skyscraper, etc. • The expert human annotator still makes 17.6% pixel error for ADE20K9 • FCN predicts the object in the box as part of skyscraper and part of building but the whole object should be either skyscraper or building, not both • Utilizing the relationship between classes is important 9 "Semantic understanding of scenes through the ADE20K dataset", CVPR 2017 (c) Preferred Networks 12
  • 13. Major Issue (3) Inconspicuous Classes • Small objects like streetlight and signboard are inconspicuous and hard to find while they may be important • Big objects may appear in discontinuous, but FCN couldn't label the pillow which has similar appearance with the sheet correctly • To improve performance for small or very big objects, sub-regions should be paid more attention (c) Preferred Networks 13
  • 14. Summary of Issues • Use co-occurrent visual patterns as context • Consider relationship between classes • Sub-regions should be paid more attention (c) Preferred Networks 14
  • 15. Prior Work Global Average Pooling (GAP)10 • Receptive field of ResNet is already larger than the input image, so GAP sounds good to summarize the all information • But, pixels in an image may be various objects which have different sizes, so directly fusing them to form a single vector may lose the spatial relation and cause ambiguity 10 "Parsenet: Looking wider to see better", ICLR 2016 (c) Preferred Networks 15
  • 16. Prior Work Spatial Pyramid Pooling (SPP)11 • Pooling with different kernel/stride sizes to the feature maps • Then flatten and concatenate the pooling results to make fix-length representation • There still is context information loss 11 "Spatial pyramid pooling in deep convolutional networks for visual recognition", ECCV 2014 (c) Preferred Networks 16
  • 17. Pyramid Pooling Module • A hierarchical global prior, containing information with different scales and varying among different sub-regions • Pyramid Pooling Module for global scene prior constructed on the top of the final-layer-feature-map (c) Preferred Networks 17
  • 18. Pyramid Pooling Module • Use 1x1 conv to reduce the number of channels • Then upsample (bilinear) them to the same size and concatenate all (c) Preferred Networks 18
  • 19. Implementation details (1) • The average pooling are four levels, 1x1, 2x2, 3x3, and 6x6 (ksize, stride) • Pre-trained ResNet model with dilated convolution is used as the feature extractor (the output size will be 1/8 of input image) • They use two losses; 1. softmax loss between final layer and labels 2. softmax loss between an intermediate output of ResNet and labels12 (weighted by 0.4) 12 "Relay backpropagation for effective learning of deep convolutional neural networks", ECCV 2016 (c) Preferred Networks 19
  • 20. Implementation details (2) Optimization MomentumSGD with weight deacy LR Scheduling Momentum: 0.9 Weight decay: 0.0001 where (c) Preferred Networks 20
  • 21. Implementation details (3) Training iteration Dataset augmentation ADE20K: 150K Random mirror PASCAL VOC: 30K Random resize between 0.5 and 2 Cityscapes: 90K Random rotation betwee -10 and 10 degrees Random Gaussian blur for ADE20K and PASCAL VOC (c) Preferred Networks 21
  • 22. Implementation detailts (4) • An appropriately large "cropsize" can yield good performance • "batchsize" in the batch normalization layer is of great importance: Cropsize Batchsize ADE20K: 473 x 473 16 for all dataset PASCAL VOC: 473 x 473 Cityscapes: 713 x 713 (c) Preferred Networks 22
  • 23. Implementation detailts (5) MultiNode Batch Normalization • To increase the "batchsize" in batch normalization layers, they used custom BN layer applied on data gathered from multiple GPUs using OpenMPI • We have Akiba-san's implementation of multi-node batch normalization ! (c) Preferred Networks 23
  • 24. ImageNet Scene Parsing Challenge 2016 • Dataset: ADE20K • 150 classes and 1,038 image-level labels • 20,000/2,000/3,000 pixel-level labels for train/val/test (c) Preferred Networks 24
  • 25. Ablation Study for Pyramid Pooling Module • Average pooling works better than max pooling in all settings • Pooling with pyramid parsing outperforms that using global pooling • With dimension reduction (DR; reducing the number of channels after pyramid pooling), the performance is further enhanced (c) Preferred Networks 25
  • 26. Ablation Study for Auxiliary Loss • Set the auxiliary loss weight between 0 and 1 and compared the final results • yields the best performance (c) Preferred Networks 26
  • 27. Ablation Study for the depth of ResNet Deeper is better (c) Preferred Networks 27
  • 28. More Detailed Performance Analysis Additional processing Improvement (% in mIoU) Data augmentation (DA) +1.54 Auxiliary loss (AL) +1.41 Pyramid pooling module (PSP) +4.45 Use deeper ResNet (50 to 269) +2.13 Multi-scale testing (MS) +1.13 • For multi-scale testing, they create prediction at 6 different scales (0.5, 0.75, 1, 1.25, 1.5, and 1.75) and take average of them. (c) Preferred Networks 28
  • 29. Results on PASCAL VOC 2012 • Extended with Semantic Boundaries Dataset (SBD) 13 , they used • 10582, 1449, and 1456 images for train/val/test • Mismatched relationship: For "aeroplane" and "sky" in the second and third rows, PSPNet finds missing parts. • Confusing classes: For "cows" in row one, our baseline model treats it as "horse" and "dog" while PSPNet corrects these errors • Conspicuous objects: For "person", "bottle" and "plant" in following rows, PSPNet performs well on these small-size- object classes in the images compared to the baseline model 13 "Semantic Contours from Inverse Detectors", ICCV 2011, http:// home.bharathh.info/pubs/codes/SBD/download.html (c) Preferred Networks 29
  • 30. Results on PASCAL VOC 2012 • Comparing PSPNet with previous best-performing methods on the testing set based on two settings, i.e., with or without pre-training on MS-COCO dataset (c) Preferred Networks 30
  • 31. Results on Cityscapes • Cityscapes dataset consits of 2975, 500, and 1525 train/val/tests images (19 classes) • 20000 coarsely annotated images are available (in the table below, ‡ means it's used) (c) Preferred Networks 31
  • 32. Thank you for your attention • The official repository doesn't include any training code • My own implementation for both training and testing have been ready: • mitmul/chainer-pspnet: https://github.com/mitmul/chainer-pspnet • Now I'm training a model to ensure the reproducibility • Once finished the reproduction work, I'll send the code to ChainerCV • In semantic segmentation task, • input image is large (713 for PSPNet on cityscapes) • appropriate batchsize, e.g., 16 or so, is important for batch normalization • As the authors said, distributed batch normalization seems to be important in multi-GPU training • So, now ChainerMN is necessary tool for such large-scale dataset and deep models • It means that we need more GPU machines connected with InfiniBand (c) Preferred Networks 32