SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Multi-Label Image Recognition with Graph
Convolutional Network
CVPR2019
Zhao-MinChen, Xiu-ShenWei, PengWang, YanwenGuo
Wonbeom Jang
Introduction
Multi-Label Image Recognition
- Predict a set of object labels that present in an image
- A Naïve way -a set of binary classification problems to predict
whether each object
- Ignoring the complex topology structure between objects
→ Inter-dependent object classifiers from prior label representations
→ Utilize GCN to propagate information between multiple labels
and consequently learn inter-dependent classifiers for each of
image labels
Approach
Overall Architecture
Approach
Motivations
Use a graph to model the inter dependencies between labels, which is a flexible way to capture the topological
structure in the label space
1. the learned classifiers can retain the weak semantic structures in the word embedding space, where semantic
related concepts are close to each other
2. Design a novel label correlation matrix based on their co-occurrence patterns to explicitly model the label
dependencies by GCN, with which the update of node features will absorb information from correlated nodes
(labels)
Approach
Graph Convolutional Network Recap
Graph Convolutional Network (GCN) was introduced in[14] to perform semi-supervised classification.
The goal of GCN is to learn a function 𝑓(·,·) on a graph 𝒢.
𝑯𝒍+𝟏 = 𝒇(𝑯𝒍, 𝑨)
𝑯𝒍+𝟏
= 𝒉(෡𝑨 𝑯𝒍
𝑾𝒍
)
Feature description: 𝑯𝒍
∈ 𝑹 𝒏×𝒅
Input: 𝑨 ∈ 𝑹 𝒏×𝒏
Transformation Matrix: 𝑾𝒍 ∈ 𝑹 𝒅×𝒅′
Normalized version of correlation matrix: ෡𝑨 ∈ 𝑹 𝒏×𝒏
Non-linear function: 𝒉(·)
Approach
GCN for Multi-label Recognition
ResNet-101 → global max-pooling
𝑥 = 𝑓𝐺𝑀𝑃(𝑓𝑐 𝑛𝑛 (𝐼; 𝜃𝑐𝑛𝑛)) ∈ 𝑅 𝐷
𝐼 is with the 448 × 448 resolution
𝜃𝑐𝑛𝑛 indicates model parameters and D = 2048.
Image representation learning
Approach
GCN for Multi-label Recognition
GCN based classifier learning
We use stacked GCNs where each GCN layer l takes the node representations from previous layer
𝐻 𝑙 as inputs and outputs new node representations, i.e., 𝐻 𝑙+1
Approach
GCN for Multi-label Recognition
GCN based classifier learning
For the first layer, the input is the 𝑍 ∈ 𝑅 𝐶×𝑑 matrix, where d is the dimensionality of the label-level
word embedding
For the last layer, the output is 𝑊 ∈ 𝑅 𝐶×𝐷 with D denoting the dimensionality of the image
representation
ො𝑦 = 𝑊𝑥
Loss function
ground truth: 𝑦 ∈ 𝑅 𝐶 𝑤ℎ𝑒𝑟𝑒 𝑦 𝑖 = {0, 1}
Approach
GCN for Multi-label Recognition
Approach
Correlation Matrix of ML-GCN
We model the label correlation dependency in the form of conditional probability, i.e., 𝑃 𝐿𝑖 𝐿𝑗
* 𝑃 𝐿𝑖 𝐿𝑗 ≠ 𝑃 𝐿𝑖 𝐿𝑗
Make matrix
1. 𝑀 ∈ 𝑅 𝐶×𝐷 where 𝑀𝑖𝑗 denotes the concurring times of 𝐿𝑖 and 𝐿𝑗
2. 𝑃𝑖 = 𝑀𝑖/𝑁𝑖
Introduction
Two drawbacks
1. May exhibit a long-tail distribution
2. the absolute number of co-occurrences from training and test may not be completely consistent
→ Overfitting
Solution
The threshold 𝜏
A is the binary correlation matrix
Correlation Matrix of ML-GCN
Approach
Over-smoothing problem
From Eq.(2), we can conclude that after GCN, the feature of a node will be the weighted sum of its own feature
and the adjacent nodes’ features. → Oversmooth
𝑯𝑙+1 = 𝒉 ෡𝑨 𝑯𝑙 𝑾𝑙 (2)
→ Propose the following re-weighted scheme,
where 𝐴′ is the re-weighted correlation matrix, and 𝑝 determines the weights assigned to a node itself and other
correlated nodes.
Experimental Results
Results on MS-COCO
Experimental Results
Results on VOC 2007
Experimental Results
ML-GCN under different types of word embeddings
Experimental Results
Effects of different threshold values 𝜏
Experimental Results
Effects of different p for correlation matrix re-weighting
Experimental Results
The deeper, the better?
The possible reasonfor the performance drop may be that when using more GCN layers, the propagation
between nodes will be accumulated, which can result in over-smoothing.
Experimental Results
Classifier Visualization
In this section, we visualize the learned inter-dependent classifiers
to show if meaningful semantic topology can be maintained.
→ t-SNE
It is clear to see that, the classifiers learned by our method maintain
meaningful semantic topology.
The classifiers learned through vanilla ResNet uniformly distribute
in the space and do not shown any meaningful topology.
Experimental Results
Performance on image retrieval
Evaluate if our model can learn better image representations.
We use the k-NN algorithm.
→ Better than the vanilla ResNet base line.

Más contenido relacionado

La actualidad más candente

Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
zukun
 

La actualidad más candente (20)

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
markov chain map generation
markov chain map generationmarkov chain map generation
markov chain map generation
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
E0152531
E0152531E0152531
E0152531
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Steganography using reversible texture synthesis
Steganography using reversible texture synthesisSteganography using reversible texture synthesis
Steganography using reversible texture synthesis
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Steganography using reversible texture synthesis
Steganography using reversible texture synthesisSteganography using reversible texture synthesis
Steganography using reversible texture synthesis
 

Similar a ML-GCN

Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Zihao(Gerald) Zhang
 

Similar a ML-GCN (20)

論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels
 
Bridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsBridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphs
 
Deep Graph Contrastive Representation Learning.pptx
Deep Graph Contrastive Representation Learning.pptxDeep Graph Contrastive Representation Learning.pptx
Deep Graph Contrastive Representation Learning.pptx
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Colloquium.pptx
Colloquium.pptxColloquium.pptx
Colloquium.pptx
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
 
Qcce quality constrained co saliency estimation for common object detection
Qcce quality constrained co saliency estimation for common object detectionQcce quality constrained co saliency estimation for common object detection
Qcce quality constrained co saliency estimation for common object detection
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Review: [CIKM'21]UltraGCN.pptx
Review: [CIKM'21]UltraGCN.pptxReview: [CIKM'21]UltraGCN.pptx
Review: [CIKM'21]UltraGCN.pptx
 
Learning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RLLearning Graph Representation for Data-Efficiency RL
Learning Graph Representation for Data-Efficiency RL
 
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptxTowards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
Towards Deep Attention in Graph Neural Networks: Problems and Remedies.pptx
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 
Unsupervised Object Detection
Unsupervised Object DetectionUnsupervised Object Detection
Unsupervised Object Detection
 

Último

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Último (20)

Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

ML-GCN

  • 1. Multi-Label Image Recognition with Graph Convolutional Network CVPR2019 Zhao-MinChen, Xiu-ShenWei, PengWang, YanwenGuo Wonbeom Jang
  • 2. Introduction Multi-Label Image Recognition - Predict a set of object labels that present in an image - A Naïve way -a set of binary classification problems to predict whether each object - Ignoring the complex topology structure between objects → Inter-dependent object classifiers from prior label representations → Utilize GCN to propagate information between multiple labels and consequently learn inter-dependent classifiers for each of image labels
  • 4. Approach Motivations Use a graph to model the inter dependencies between labels, which is a flexible way to capture the topological structure in the label space 1. the learned classifiers can retain the weak semantic structures in the word embedding space, where semantic related concepts are close to each other 2. Design a novel label correlation matrix based on their co-occurrence patterns to explicitly model the label dependencies by GCN, with which the update of node features will absorb information from correlated nodes (labels)
  • 5. Approach Graph Convolutional Network Recap Graph Convolutional Network (GCN) was introduced in[14] to perform semi-supervised classification. The goal of GCN is to learn a function 𝑓(·,·) on a graph 𝒢. 𝑯𝒍+𝟏 = 𝒇(𝑯𝒍, 𝑨) 𝑯𝒍+𝟏 = 𝒉(෡𝑨 𝑯𝒍 𝑾𝒍 ) Feature description: 𝑯𝒍 ∈ 𝑹 𝒏×𝒅 Input: 𝑨 ∈ 𝑹 𝒏×𝒏 Transformation Matrix: 𝑾𝒍 ∈ 𝑹 𝒅×𝒅′ Normalized version of correlation matrix: ෡𝑨 ∈ 𝑹 𝒏×𝒏 Non-linear function: 𝒉(·)
  • 6. Approach GCN for Multi-label Recognition ResNet-101 → global max-pooling 𝑥 = 𝑓𝐺𝑀𝑃(𝑓𝑐 𝑛𝑛 (𝐼; 𝜃𝑐𝑛𝑛)) ∈ 𝑅 𝐷 𝐼 is with the 448 × 448 resolution 𝜃𝑐𝑛𝑛 indicates model parameters and D = 2048. Image representation learning
  • 7. Approach GCN for Multi-label Recognition GCN based classifier learning We use stacked GCNs where each GCN layer l takes the node representations from previous layer 𝐻 𝑙 as inputs and outputs new node representations, i.e., 𝐻 𝑙+1
  • 8. Approach GCN for Multi-label Recognition GCN based classifier learning For the first layer, the input is the 𝑍 ∈ 𝑅 𝐶×𝑑 matrix, where d is the dimensionality of the label-level word embedding For the last layer, the output is 𝑊 ∈ 𝑅 𝐶×𝐷 with D denoting the dimensionality of the image representation ො𝑦 = 𝑊𝑥 Loss function ground truth: 𝑦 ∈ 𝑅 𝐶 𝑤ℎ𝑒𝑟𝑒 𝑦 𝑖 = {0, 1}
  • 10. Approach Correlation Matrix of ML-GCN We model the label correlation dependency in the form of conditional probability, i.e., 𝑃 𝐿𝑖 𝐿𝑗 * 𝑃 𝐿𝑖 𝐿𝑗 ≠ 𝑃 𝐿𝑖 𝐿𝑗 Make matrix 1. 𝑀 ∈ 𝑅 𝐶×𝐷 where 𝑀𝑖𝑗 denotes the concurring times of 𝐿𝑖 and 𝐿𝑗 2. 𝑃𝑖 = 𝑀𝑖/𝑁𝑖
  • 11. Introduction Two drawbacks 1. May exhibit a long-tail distribution 2. the absolute number of co-occurrences from training and test may not be completely consistent → Overfitting Solution The threshold 𝜏 A is the binary correlation matrix Correlation Matrix of ML-GCN
  • 12. Approach Over-smoothing problem From Eq.(2), we can conclude that after GCN, the feature of a node will be the weighted sum of its own feature and the adjacent nodes’ features. → Oversmooth 𝑯𝑙+1 = 𝒉 ෡𝑨 𝑯𝑙 𝑾𝑙 (2) → Propose the following re-weighted scheme, where 𝐴′ is the re-weighted correlation matrix, and 𝑝 determines the weights assigned to a node itself and other correlated nodes.
  • 15. Experimental Results ML-GCN under different types of word embeddings
  • 16. Experimental Results Effects of different threshold values 𝜏
  • 17. Experimental Results Effects of different p for correlation matrix re-weighting
  • 18. Experimental Results The deeper, the better? The possible reasonfor the performance drop may be that when using more GCN layers, the propagation between nodes will be accumulated, which can result in over-smoothing.
  • 19. Experimental Results Classifier Visualization In this section, we visualize the learned inter-dependent classifiers to show if meaningful semantic topology can be maintained. → t-SNE It is clear to see that, the classifiers learned by our method maintain meaningful semantic topology. The classifiers learned through vanilla ResNet uniformly distribute in the space and do not shown any meaningful topology.
  • 20. Experimental Results Performance on image retrieval Evaluate if our model can learn better image representations. We use the k-NN algorithm. → Better than the vanilla ResNet base line.