SlideShare una empresa de Scribd logo
1 de 59
Robust Inference via Generative Classifiers
for Handling Noisy Labels
1 Korea Advanced Institute of Science and Technology (KAIST)
2 University of Michigan
3University of Illinois at Urbana Champaign
4Google Brain
5AItrics
ICML 2019
Kimin Lee1 Sukmin Yun1 Kibok Lee2 Honglak Lee4,2 Bo Li3 Jinwoo Shin1,5
• Large-scale datasets collect class labels from
• Data mining on social media and web data
Introduction: Noisy Labels
1
Web-crawled dataset [Krause’ 16]
[Mahajan’ 18] Exploring the Limits of Weakly Supervised Pretraining. In ECCV, 2018.
[Krause’ 16] The unreasonable effectiveness of noisy data for fine-grained recognition. In ECCV 2016.
[Kuznetsova’ 18] The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint 2018.
Open Images dataset [Kuznetsova’ 18]
FaceBook’s Instagram dataset
[Mahajan’ 18]
• Large-scale datasets collect class labels from
• Data mining on social media and web data
• Large-scale datasets may contain noisy (incorrect) labels
Introduction: Noisy Labels
1
FaceBook’s Instagram dataset
[Mahajan’ 18]
Cat
Human
(noisy label)
Human
(noisy label)
Cat Cat Cat
• Large-scale datasets collect class labels from
• Data mining on social media and web data
• Large-scale datasets may contain noisy (incorrect) labels
• DNNs do not generalize well from such noisy datasets
Introduction: Noisy Labels
1
• Large-scale datasets collect class labels from
• Data mining on social media and web data
• Large-scale datasets may contain noisy (incorrect) labels
• DNNs do not generalize well from such noisy datasets
Introduction: Noisy Labels
[Test set accuracy]
DenseNet-100
Testsetaccuracy(%)
50
60
70
80
90
100
Noise fraction
0 0.2 0.4 0.6
[Huang’ 17] Densely connected convolutional networks. In CVPR, 2017.
[Krizhevsky’ 09] Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
1
DeerDog Frog
[Uniform noise]
…• Dataset: CIFAR-10 [Krizhevsky’ 09]
• Model: DenseNet-100 [Huang’ 17]
• Large-scale datasets collect class labels from
• Data mining on social media and web data
• Large-scale datasets may contain noisy (incorrect) labels
• DNNs do not generalize well from such noisy datasets
• Several training strategies have also been investigated
Introduction: Noisy Labels
1
• Utilizing an estimated/corrected label
• Bootstrapping [Reed’ 14; Ma’ 18]
• Loss correction [Patrini’ 17; Hendrycks’ 18]
• Training on selected (cleaner) samples
• Ensemble [Malach’ 17; Han’ 18]
• Meta-learning [Jiang’ 18]
[Reed’ 14] Training deep neural networks on noisy labels with bootstrapping. arXiv preprint 2014.
[Hendrycks’ 18] Using trusted data to train deep networks on labels corrupted by severe noise.
In NeurIPS, 2018
[Ma’ 18] Dimensionality-driven learning with noisy labels. In ICML, 2018
[Partrini’ 17] Making deep neural networks robust to label noise: A loss correction approach.
In CVPR, 2017
[Han’ 18] Co-teaching: robust training deep neural networks with extremely noisy labels.
In NeurIPS, 2018.
[Jiang’ 18] Mentornet: Regularizing very deep neural networks on corrupted labels.
In ICML, 2018.
[Malach ‘ 17] Decoupling” when to update” from” how to update”. In NeurIPS, 2017.
• We propose a new inference method which can be applied to any pre-trained DNNs
Our Contributions
2
• We propose a new inference method which can be applied to any pre-trained DNNs
Our Contributions
2
Softmax
Generative (sample mean on noisy labels)
Testsetaccuracy(%)
40
50
60
70
80
90
100
Noise fraction
0 0.2 0.4 0.6
• Inducing a “generative classifier”
Our Contributions
2
Softmax
Generative (sample mean on noisy labels)
Generative (MCD on noisy labels)
Testsetaccuracy(%)
40
50
60
70
80
90
100
Noise fraction
0 0.2 0.4 0.6
• We propose a new inference method which can be applied to any pre-trained DNNs
• Inducing a “generative classifier”
• Applying a “robust inference” to estimate
parameters of generative classifier
• Breakdown points
• Generalization bounds
Our Contributions
2
• Inducing a “generative classifier”
• Applying a “robust inference” to estimate
parameters of generative classifier
• Breakdown points
• Generalization bounds
• Introducing “ensemble of generative
classifiers”
Softmax
Generative (sample mean on noisy labels)
Generative (MCD on noisy labels)
Generative (MCD on noisy labels) + ensemble
Testsetaccuracy(%)
40
50
60
70
80
90
100
Noise fraction
0 0.2 0.4 0.6
• We propose a new inference method which can be applied to any pre-trained DNNs
Outline
• Our method: Robust Inference via Generative Classifiers
• Generative classifier
• Minimum covariance determinant estimator
• Ensemble of generative classifiers
• Experiments
• Experimental results on synthetic noisy labels
• Experimental results on semantic and open-set noisy labels
• Conclusion
Outline
• Our method: Robust Inference via Generative Classifiers
• Generative classifier
• Minimum covariance determinant estimator
• Ensemble of generative classifiers
• Experiments
• Experimental results on synthetic noisy labels
• Experimental results on semantic and open-set noisy labels
• Conclusion
• t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels
Motivation: Why Generative Classifier?
3
Training samples with clean labels
Training samples with noisy labels
• t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels
• Features from training samples with noisy labels (red stars) are distributed like outliers
Motivation: Why Generative Classifier?
3
Training samples with clean labels
Training samples with noisy labels
• t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels
• Features from training samples with noisy labels (red stars) are distributed like outliers
• Features from training samples with clean labels (black dots) are still clustered!!
Motivation: Why Generative Classifier?
3
Training samples with clean labels
Training samples with noisy labels
• t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels
• Features from training samples with noisy labels (red stars) are distributed like outliers
• Features from training samples with clean labels (black dots) are still clustered!!
• If we remove the outliers and induce decision boundaries, they can be more robust
Motivation: Why Generative Classifier?
3
Training samples with clean labels
Training samples with noisy labels
Training samples with clean labels
• t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels
• Features from training samples with noisy labels (red stars) are distributed like outliers
• Features from training samples with clean labels (black dots) are still clustered!!
• If we remove the outliers and induce decision boundaries, they can be more robust
• Generative classifier: model of a data distribution instead of
Motivation: Why Generative Classifier?
3
Training samples with clean labels
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
Robust Inference via Generative Classifier
4
Penultimate layer Gaussian distribution
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
• Why Gaussian?
• Features from softmax classifier could be well-fitted in Gaussian distributions [Lee’ 18]
Robust Inference via Generative Classifier
4
Penultimate layer Gaussian distribution
[Lee’ 18] A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
Robust Inference via Generative Classifier
4
Penultimate layer
Bayes’ rule
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
• How to estimate the parameters of the generative classifier?
Robust Inference via Generative Classifier
4
Penultimate layer
Bayes’ rule
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
• How to estimate the parameters of the generative classifier?
• With training data
Robust Inference via Generative Classifier
4
Penultimate layer
Bayes’ rule
• Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels)
Minimum Covariance Determinant (MCD)
5
Naïve sample estimator
• Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels)
• Minimum Covariance Determinant (MCD) estimator (blue circle)
Minimum Covariance Determinant (MCD)
5
MCD estimator
Naïve sample estimator
• Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels)
• Minimum Covariance Determinant (MCD) estimator (blue circle)
• For each class c, find a subset for which the determinant of the sample covariance
matrix is minimum
Minimum Covariance Determinant (MCD)
5
“Variance ∝ determinant”
MCD estimator
Motivation of MCD
- Outliers (e.g., sample with
noisy labels) are scattered in
the sample spaces
• Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels)
• Minimum Covariance Determinant (MCD) estimator (blue circle)
• For each class c, find a subset for which the determinant of the sample covariance
matrix is minimum
• Compute the mean and covariance matrix only using selected samples
Minimum Covariance Determinant (MCD)
5
Motivation of MCD
- Outliers (e.g., sample with
noisy labels) are scattered in
the sample spaces
• 1. Breakdown points
• The smallest fraction of outliers to carry the estimate beyond all bounds.
• High breakdown points = robust to outliers
Advantages of MCD estimators
6
• 1. Breakdown points
• The smallest fraction of outliers to carry the estimate beyond all bounds.
• High breakdown points = robust to outliers
• Theorem 1 (Lopuhaa et al., 1991)
Advantages of MCD estimators
6
Under some mild assumptions, MCD estimator has
near-optimal breakdown points, i.e., almost 50 %
[Lopuhaa et al., 1991] Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 1991.
Note: Naïve sample estimator has 0% breakdown points
• 2. Tighter generalization errors under noisy labels
• Theorem 2 (Lee et al., 2019)
• Where
Advantages of MCD estimators
7
[Durrant et al., 2010] A. Compressed fisher linear discriminant analysis: Classification of randomly projected data. In ACM SIGKDD, 2010.
Under some mild assumptions, parameters from MCD estimator are more closer
to true parameters than parameters from sample estimator and has larger inter-
class distance
• 2. Tighter generalization errors under noisy labels
• Theorem 2 (Lee et al., 2019)
Advantages of MCD estimators
7
[Durrant et al., 2010] A. Compressed fisher linear discriminant analysis: Classification of randomly projected data. In ACM SIGKDD, 2010.
Theorem 3 (Durrant et al., 2010)
Generalization error of generative classifier is bounded by negative of
inter-class distance and distance between true and estimated parameters
Under some mild assumptions, parameters from MCD estimator are more closer
to true parameters than parameters from sample estimator and has larger inter-
class distance
• Finding the optimal solution of MCD is computationally intractable
• [Bernholt’ 06] showed “Vertex Cover” (NP-hard) reduces to MCD
How to Solve MCD?
8
[Bernholt’ 06] Robust estimators are hard to compute. Technical report, Technical Report/Universit at Dortmund, SFB 475 Komplexitatsreduktion in Multivariaten Datenstrukturen, 2006.
How to Solve MCD?
8
Two-step approach [Hubert’ 04]
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
How to Solve MCD?
8
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
Two-step approach [Hubert’ 04]
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
How to Solve MCD?
8
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
Two-step approach [Hubert’ 04]
How to Solve MCD?
8
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
• C. Construct a new subset which contains samples with
smaller distances
Two-step approach [Hubert’ 04]
How to Solve MCD?
8
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
• C. Construct a new subset which contains samples with
smaller distances
• D. Update the sample mean and covariance matrix
Two-step approach [Hubert’ 04]
How to Solve MCD?
8
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
• C. Construct a new subset which contains samples with
smaller distances
• D. Update the sample mean and covariance matrix
• Repeat Step B ~ D until the determinant of covariance is
not decreasing
Two-step approach [Hubert’ 04]
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
How to Solve MCD?
8
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
• C. Construct a new subset which contains samples with
smaller distances
• D. Update the sample mean and covariance matrix
• Repeat Step B ~ D until the determinant of covariance is
not decreasing
• Step 2. Compute the mean and covariance only using
selected samples
Two-step approach [Hubert’ 04]
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
How to Solve MCD?
8
• Step 1. For each class, find a subset as follows:
• A. Uniformly sample an initial subset
& compute sample mean and covariance matrix
• B. Compute the Mahalanobis distance
• ‘
• C. Construct a new subset which contains samples with
smaller distances
• D. Update the sample mean and covariance matrix
• Repeat Step B ~ D until the determinant of covariance is
not decreasing
• Step 2. Compute the mean and covariance only using
selected samples
Two-step approach [Hubert’ 04]
[Hubert’ 04] Fast and robust discriminant analysis.
Computational Statistics & Data Analysis, 2004.
Monotonically decreasing
a objective of MCD
estimator [Hubert’ 04] !
Summary
9
• Given pre-trained softmax classifier with DNNs
• Inducing a generative classifier on the hidden feature space
Penultimate layer
MCD estimator
Is it possible to utilize the low-level features
Ensemble of Generative Classifiers
9
• Boosting the performance: utilizing low-level features
• Inducing the generative classifiers with respect to low-level features
• Why?
• Low-level features can be often more robust to noisy labels [Morcos’ 18, Arpit’ 17]
• They can capture the similar patterns from the input data
[Arpit’ 17] A closer look at memorization in deep networks. In ICML, 2017.
[Morcos’ 18] Insights on representational similarity in neural networks with canonical correlation. In NeurIPS, 2018.
Ensemble of Generative Classifiers
9
• Boosting the performance: utilizing low-level features
• Inducing the generative classifiers with respect to low-level features
• Ensemble of generative classifiers
Posterior distribution from ℓ-th layer
Outline
• Our method: Robust Inference via Generative Classifiers
• Generative classifier
• Minimum covariance determinant estimator
• Ensemble of generative classifiers
• Experiments
• Experimental results on synthetic noisy labels
• Experimental results on semantic and open-set noisy labels
• Conclusion
• Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16]
• Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11]
Experiments: Setup
10
[Huang’ 17] Densely connected convolutional networks. In CVPR, 2017.
[Krizhevsky’ 09] Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
[Netzer’ 11] Reading digits in natural images with unsupervised feature learning. In NeurIPS workshop, 2011.
[He’ 16] Deep residual learning for image recognition. In CVPR, 2016.
• 32 × 32 RGB
• 10/ 100 classes
• 50,000 training set
• 10,000 test set
• 32 × 32 RGB
• 10 classes
• 73,257 training set
• 26,032 test set
• Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16]
• Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11]
• NLP tasks on Tweeter [Gimpel’ 11] and Reuters [Lewis’ 04]
Experiments: Setup
10
Parts-of-Speech Tagging using Tweets
• 19 classes
• 14,468 tweets for training
• 7,082 tweets for test
[Gimpel’ 11] Part-of-speech tagging for twitter: Annotation, features, and experiments. In ACL, 2011
[Lewis’ 04] Rcv1: A new benchmark collection for text categorization research. JMLR, 2004
Text categorization on Reuters
• 7 classes
• 5,444 training data
• 2,179 test data
• Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16]
• Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11]
• NLP tasks on Tweeter [Gimpel’ 11] and Reuters [Lewis’ 04]
• Noise type
• Uniform: corrupting a label to other class uniformly at random
• Flip: corrupting a label only to a specific class
Experiments: Setup
10
Deer
Dog
Frog
[Uniform noise]
… Deer
Dog
Frog
[Flip noise]
…
• Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise
Experiments: Empirical Analysis
11
• Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise
• MCD estimator improves the performance by removing outliers
Experiments: Empirical Analysis
11
• Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise
• MCD estimator improves the performance by removing outliers
• MCD estimator can select clean data samples more effectively than sample estimator!
Experiments: Empirical Analysis
11
[The fraction of clean samples]
• Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise
• MCD estimator improves the performance by removing outliers
• Ensemble method significantly improves the classification accuracy
Experiments: Empirical Analysis
11
• Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise
• MCD estimator improves the performance by removing outliers
• Ensemble method significantly improves the classification accuracy
• Generative classifiers from lower layers are more robust to noisy labels
• Generative classifier does not decrease the classification accuracy on clean dataset
Experiments: Empirical Analysis
11
[Layer-wise accuracy]
Clean
Uniform (20%)
Uniform (40%)
Uniform (60%)
Testsetaccuracy(%)
40
50
60
70
80
90
100
Index of basic block
2 4 6 8 10 12 14
• Test set accuracy of ResNet-44 trained on CIFAR-10 with 60% uniform noises
Comparison with Prior Training Methods
12
[Reed’ 14] Training deep neural networks on noisy labels with bootstrapping. arXiv preprint 2014.
[Ma’ 18] Dimensionality-driven learning with noisy labels. In ICML, 2018
[Partrini’ 17] Making deep neural networks robust to label noise: A loss correction approach. In CVPR, 2017
Softmax
Testsetaccuracy(%)
60
65
70
75
80
Cross entropy Bootstrap Forward D2L
• Utilizing an
estimated/corrected label
• Bootstrap [Reed’ 14]
• Forward[Patrini’ 17]
• D2L [Ma’ 18]
Softmax
Testsetaccuracy(%)
60
65
70
75
80
Cross entropy Bootstrap Forward D2L
Softmax
Generative + MCD (ours)
Testsetaccuracy(%)
60
65
70
75
80
Cross entropy Bootstrap Forward D2L
• Training methods with an ensemble of
classifiers or meta-learning model
• Model: 9-layer CNNs
• Dataset: CIFAR-100
• Noise: 45% Flip noise
Comparison with Prior Training Methods
12
Testsetaccuracy(%)
25
30
35
40
45
50
Cross entropy MentoNet Co-teaching Co-teaching + ours
• Training methods utilizing clean labels
on NLP datasets
• Model: 2-layer FCNs
• Dataset: Twitter
• Noise: 60% uniform noise
Testsetaccuracy(%)
45
50
55
60
65
70
75
Cross entropy Forward (gold) GLC GLC + ours
• Semantic noisy labels from a weak machine labeler
Experiments: Machine Noisy Labels
13
(small number of) labeled data Classifier
• Semantic noisy labels from a weak machine labeler
• Confusion graph from ResNet-34 trained on 5% of
CIFAR-10 labels
Experiments: Machine Noisy Labels
13
Pre-trained
(weak) classifier
Unlabeled dataPseudo-labeled data
* Node: class, Edge: its most confusing class
Softmax
Generative + MCD (ours)
Testsetaccuracy(%)
66.0
66.5
67.0
67.5
68.0
68.5
69.0
Cross entropy Bootstrap Forward D2L
• What is Open-set noisy labels?
• Noisy samples from out-of-distribution [Wang’ 18]
• E.g., “Cat” in CIFAR-10
Experiments: Open-set Noisy Labels
[Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018.
14
• What is Open-set noisy labels?
• Noisy samples from out-of-distribution [Wang’ 18]
• E.g., “Cat” in CIFAR-10 (which does not contain “apple” and “chair”)
Experiments: Open-set Noisy Labels
[Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018.
14
Open-set noisy labels
• What is Open-set noisy labels?
• Noisy samples from out-of-distribution [Wang’ 18]
• E.g., “Cat” in CIFAR-10 (which does not contain “apple” and “chair”)
Experiments: Open-set Noisy Labels
[Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018.
14
• Experimental setup
• In-distribution: CIFAR-10
• 60% of noise samples from
ImageNet and CIFAR-100
• Model: DenseNet-100
Open-set noisy labels
[Test accuracy (%) of DenseNet on the CIFAR-10]
• To handle noisy labels,
• We believe that our results can be useful for many machine learning problems:
• Defense against adversarial attacks
• Detecting out-of-distribution samples
• Poster session: Pacific Ballroom #16
Conclusion
15
Generative classifier Robust inference Ensemble method
- New inference method
- LDA-based generative classifier
- MCD estimator
- Generalization error
- Generative classifier from
multiple layers
Thank you for your attention 

Más contenido relacionado

Similar a Robust inference via generative classifiers for handling noisy labels

Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesBigML, Inc
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
intership summary
intership summaryintership summary
intership summaryJunting Ma
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian Aurisano
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...Jihwan Bang
 

Similar a Robust inference via generative classifiers for handling noisy labels (20)

Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
intership summary
intership summaryintership summary
intership summary
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Robust inference via generative classifiers for handling noisy labels

  • 1. Robust Inference via Generative Classifiers for Handling Noisy Labels 1 Korea Advanced Institute of Science and Technology (KAIST) 2 University of Michigan 3University of Illinois at Urbana Champaign 4Google Brain 5AItrics ICML 2019 Kimin Lee1 Sukmin Yun1 Kibok Lee2 Honglak Lee4,2 Bo Li3 Jinwoo Shin1,5
  • 2. • Large-scale datasets collect class labels from • Data mining on social media and web data Introduction: Noisy Labels 1 Web-crawled dataset [Krause’ 16] [Mahajan’ 18] Exploring the Limits of Weakly Supervised Pretraining. In ECCV, 2018. [Krause’ 16] The unreasonable effectiveness of noisy data for fine-grained recognition. In ECCV 2016. [Kuznetsova’ 18] The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint 2018. Open Images dataset [Kuznetsova’ 18] FaceBook’s Instagram dataset [Mahajan’ 18]
  • 3. • Large-scale datasets collect class labels from • Data mining on social media and web data • Large-scale datasets may contain noisy (incorrect) labels Introduction: Noisy Labels 1 FaceBook’s Instagram dataset [Mahajan’ 18] Cat Human (noisy label) Human (noisy label) Cat Cat Cat
  • 4. • Large-scale datasets collect class labels from • Data mining on social media and web data • Large-scale datasets may contain noisy (incorrect) labels • DNNs do not generalize well from such noisy datasets Introduction: Noisy Labels 1
  • 5. • Large-scale datasets collect class labels from • Data mining on social media and web data • Large-scale datasets may contain noisy (incorrect) labels • DNNs do not generalize well from such noisy datasets Introduction: Noisy Labels [Test set accuracy] DenseNet-100 Testsetaccuracy(%) 50 60 70 80 90 100 Noise fraction 0 0.2 0.4 0.6 [Huang’ 17] Densely connected convolutional networks. In CVPR, 2017. [Krizhevsky’ 09] Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 1 DeerDog Frog [Uniform noise] …• Dataset: CIFAR-10 [Krizhevsky’ 09] • Model: DenseNet-100 [Huang’ 17]
  • 6. • Large-scale datasets collect class labels from • Data mining on social media and web data • Large-scale datasets may contain noisy (incorrect) labels • DNNs do not generalize well from such noisy datasets • Several training strategies have also been investigated Introduction: Noisy Labels 1 • Utilizing an estimated/corrected label • Bootstrapping [Reed’ 14; Ma’ 18] • Loss correction [Patrini’ 17; Hendrycks’ 18] • Training on selected (cleaner) samples • Ensemble [Malach’ 17; Han’ 18] • Meta-learning [Jiang’ 18] [Reed’ 14] Training deep neural networks on noisy labels with bootstrapping. arXiv preprint 2014. [Hendrycks’ 18] Using trusted data to train deep networks on labels corrupted by severe noise. In NeurIPS, 2018 [Ma’ 18] Dimensionality-driven learning with noisy labels. In ICML, 2018 [Partrini’ 17] Making deep neural networks robust to label noise: A loss correction approach. In CVPR, 2017 [Han’ 18] Co-teaching: robust training deep neural networks with extremely noisy labels. In NeurIPS, 2018. [Jiang’ 18] Mentornet: Regularizing very deep neural networks on corrupted labels. In ICML, 2018. [Malach ‘ 17] Decoupling” when to update” from” how to update”. In NeurIPS, 2017.
  • 7. • We propose a new inference method which can be applied to any pre-trained DNNs Our Contributions 2
  • 8. • We propose a new inference method which can be applied to any pre-trained DNNs Our Contributions 2 Softmax Generative (sample mean on noisy labels) Testsetaccuracy(%) 40 50 60 70 80 90 100 Noise fraction 0 0.2 0.4 0.6 • Inducing a “generative classifier”
  • 9. Our Contributions 2 Softmax Generative (sample mean on noisy labels) Generative (MCD on noisy labels) Testsetaccuracy(%) 40 50 60 70 80 90 100 Noise fraction 0 0.2 0.4 0.6 • We propose a new inference method which can be applied to any pre-trained DNNs • Inducing a “generative classifier” • Applying a “robust inference” to estimate parameters of generative classifier • Breakdown points • Generalization bounds
  • 10. Our Contributions 2 • Inducing a “generative classifier” • Applying a “robust inference” to estimate parameters of generative classifier • Breakdown points • Generalization bounds • Introducing “ensemble of generative classifiers” Softmax Generative (sample mean on noisy labels) Generative (MCD on noisy labels) Generative (MCD on noisy labels) + ensemble Testsetaccuracy(%) 40 50 60 70 80 90 100 Noise fraction 0 0.2 0.4 0.6 • We propose a new inference method which can be applied to any pre-trained DNNs
  • 11. Outline • Our method: Robust Inference via Generative Classifiers • Generative classifier • Minimum covariance determinant estimator • Ensemble of generative classifiers • Experiments • Experimental results on synthetic noisy labels • Experimental results on semantic and open-set noisy labels • Conclusion
  • 12. Outline • Our method: Robust Inference via Generative Classifiers • Generative classifier • Minimum covariance determinant estimator • Ensemble of generative classifiers • Experiments • Experimental results on synthetic noisy labels • Experimental results on semantic and open-set noisy labels • Conclusion
  • 13. • t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels Motivation: Why Generative Classifier? 3 Training samples with clean labels Training samples with noisy labels
  • 14. • t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels • Features from training samples with noisy labels (red stars) are distributed like outliers Motivation: Why Generative Classifier? 3 Training samples with clean labels Training samples with noisy labels
  • 15. • t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels • Features from training samples with noisy labels (red stars) are distributed like outliers • Features from training samples with clean labels (black dots) are still clustered!! Motivation: Why Generative Classifier? 3 Training samples with clean labels Training samples with noisy labels
  • 16. • t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels • Features from training samples with noisy labels (red stars) are distributed like outliers • Features from training samples with clean labels (black dots) are still clustered!! • If we remove the outliers and induce decision boundaries, they can be more robust Motivation: Why Generative Classifier? 3 Training samples with clean labels Training samples with noisy labels Training samples with clean labels
  • 17. • t-SNE embedding of DenseNet-100 trained on CIFAR-10 with uniform noisy labels • Features from training samples with noisy labels (red stars) are distributed like outliers • Features from training samples with clean labels (black dots) are still clustered!! • If we remove the outliers and induce decision boundaries, they can be more robust • Generative classifier: model of a data distribution instead of Motivation: Why Generative Classifier? 3 Training samples with clean labels
  • 18. • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space Robust Inference via Generative Classifier 4 Penultimate layer Gaussian distribution
  • 19. • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space • Why Gaussian? • Features from softmax classifier could be well-fitted in Gaussian distributions [Lee’ 18] Robust Inference via Generative Classifier 4 Penultimate layer Gaussian distribution [Lee’ 18] A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
  • 20. • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space Robust Inference via Generative Classifier 4 Penultimate layer Bayes’ rule
  • 21. • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space • How to estimate the parameters of the generative classifier? Robust Inference via Generative Classifier 4 Penultimate layer Bayes’ rule
  • 22. • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space • How to estimate the parameters of the generative classifier? • With training data Robust Inference via Generative Classifier 4 Penultimate layer Bayes’ rule
  • 23. • Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels) Minimum Covariance Determinant (MCD) 5 Naïve sample estimator
  • 24. • Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels) • Minimum Covariance Determinant (MCD) estimator (blue circle) Minimum Covariance Determinant (MCD) 5 MCD estimator Naïve sample estimator
  • 25. • Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels) • Minimum Covariance Determinant (MCD) estimator (blue circle) • For each class c, find a subset for which the determinant of the sample covariance matrix is minimum Minimum Covariance Determinant (MCD) 5 “Variance ∝ determinant” MCD estimator Motivation of MCD - Outliers (e.g., sample with noisy labels) are scattered in the sample spaces
  • 26. • Naïve sample estimator (green circle) can be affected by outliers (i.e., noisy labels) • Minimum Covariance Determinant (MCD) estimator (blue circle) • For each class c, find a subset for which the determinant of the sample covariance matrix is minimum • Compute the mean and covariance matrix only using selected samples Minimum Covariance Determinant (MCD) 5 Motivation of MCD - Outliers (e.g., sample with noisy labels) are scattered in the sample spaces
  • 27. • 1. Breakdown points • The smallest fraction of outliers to carry the estimate beyond all bounds. • High breakdown points = robust to outliers Advantages of MCD estimators 6
  • 28. • 1. Breakdown points • The smallest fraction of outliers to carry the estimate beyond all bounds. • High breakdown points = robust to outliers • Theorem 1 (Lopuhaa et al., 1991) Advantages of MCD estimators 6 Under some mild assumptions, MCD estimator has near-optimal breakdown points, i.e., almost 50 % [Lopuhaa et al., 1991] Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 1991. Note: Naïve sample estimator has 0% breakdown points
  • 29. • 2. Tighter generalization errors under noisy labels • Theorem 2 (Lee et al., 2019) • Where Advantages of MCD estimators 7 [Durrant et al., 2010] A. Compressed fisher linear discriminant analysis: Classification of randomly projected data. In ACM SIGKDD, 2010. Under some mild assumptions, parameters from MCD estimator are more closer to true parameters than parameters from sample estimator and has larger inter- class distance
  • 30. • 2. Tighter generalization errors under noisy labels • Theorem 2 (Lee et al., 2019) Advantages of MCD estimators 7 [Durrant et al., 2010] A. Compressed fisher linear discriminant analysis: Classification of randomly projected data. In ACM SIGKDD, 2010. Theorem 3 (Durrant et al., 2010) Generalization error of generative classifier is bounded by negative of inter-class distance and distance between true and estimated parameters Under some mild assumptions, parameters from MCD estimator are more closer to true parameters than parameters from sample estimator and has larger inter- class distance
  • 31. • Finding the optimal solution of MCD is computationally intractable • [Bernholt’ 06] showed “Vertex Cover” (NP-hard) reduces to MCD How to Solve MCD? 8 [Bernholt’ 06] Robust estimators are hard to compute. Technical report, Technical Report/Universit at Dortmund, SFB 475 Komplexitatsreduktion in Multivariaten Datenstrukturen, 2006.
  • 32. How to Solve MCD? 8 Two-step approach [Hubert’ 04] [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004.
  • 33. How to Solve MCD? 8 [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004. Two-step approach [Hubert’ 04] • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix
  • 34. How to Solve MCD? 8 [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004. • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ Two-step approach [Hubert’ 04]
  • 35. How to Solve MCD? 8 [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004. • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ • C. Construct a new subset which contains samples with smaller distances Two-step approach [Hubert’ 04]
  • 36. How to Solve MCD? 8 [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004. • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ • C. Construct a new subset which contains samples with smaller distances • D. Update the sample mean and covariance matrix Two-step approach [Hubert’ 04]
  • 37. How to Solve MCD? 8 • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ • C. Construct a new subset which contains samples with smaller distances • D. Update the sample mean and covariance matrix • Repeat Step B ~ D until the determinant of covariance is not decreasing Two-step approach [Hubert’ 04] [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004.
  • 38. How to Solve MCD? 8 • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ • C. Construct a new subset which contains samples with smaller distances • D. Update the sample mean and covariance matrix • Repeat Step B ~ D until the determinant of covariance is not decreasing • Step 2. Compute the mean and covariance only using selected samples Two-step approach [Hubert’ 04] [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004.
  • 39. How to Solve MCD? 8 • Step 1. For each class, find a subset as follows: • A. Uniformly sample an initial subset & compute sample mean and covariance matrix • B. Compute the Mahalanobis distance • ‘ • C. Construct a new subset which contains samples with smaller distances • D. Update the sample mean and covariance matrix • Repeat Step B ~ D until the determinant of covariance is not decreasing • Step 2. Compute the mean and covariance only using selected samples Two-step approach [Hubert’ 04] [Hubert’ 04] Fast and robust discriminant analysis. Computational Statistics & Data Analysis, 2004. Monotonically decreasing a objective of MCD estimator [Hubert’ 04] !
  • 40. Summary 9 • Given pre-trained softmax classifier with DNNs • Inducing a generative classifier on the hidden feature space Penultimate layer MCD estimator Is it possible to utilize the low-level features
  • 41. Ensemble of Generative Classifiers 9 • Boosting the performance: utilizing low-level features • Inducing the generative classifiers with respect to low-level features • Why? • Low-level features can be often more robust to noisy labels [Morcos’ 18, Arpit’ 17] • They can capture the similar patterns from the input data [Arpit’ 17] A closer look at memorization in deep networks. In ICML, 2017. [Morcos’ 18] Insights on representational similarity in neural networks with canonical correlation. In NeurIPS, 2018.
  • 42. Ensemble of Generative Classifiers 9 • Boosting the performance: utilizing low-level features • Inducing the generative classifiers with respect to low-level features • Ensemble of generative classifiers Posterior distribution from ℓ-th layer
  • 43. Outline • Our method: Robust Inference via Generative Classifiers • Generative classifier • Minimum covariance determinant estimator • Ensemble of generative classifiers • Experiments • Experimental results on synthetic noisy labels • Experimental results on semantic and open-set noisy labels • Conclusion
  • 44. • Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16] • Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11] Experiments: Setup 10 [Huang’ 17] Densely connected convolutional networks. In CVPR, 2017. [Krizhevsky’ 09] Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. [Netzer’ 11] Reading digits in natural images with unsupervised feature learning. In NeurIPS workshop, 2011. [He’ 16] Deep residual learning for image recognition. In CVPR, 2016. • 32 × 32 RGB • 10/ 100 classes • 50,000 training set • 10,000 test set • 32 × 32 RGB • 10 classes • 73,257 training set • 26,032 test set
  • 45. • Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16] • Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11] • NLP tasks on Tweeter [Gimpel’ 11] and Reuters [Lewis’ 04] Experiments: Setup 10 Parts-of-Speech Tagging using Tweets • 19 classes • 14,468 tweets for training • 7,082 tweets for test [Gimpel’ 11] Part-of-speech tagging for twitter: Annotation, features, and experiments. In ACL, 2011 [Lewis’ 04] Rcv1: A new benchmark collection for text categorization research. JMLR, 2004 Text categorization on Reuters • 7 classes • 5,444 training data • 2,179 test data
  • 46. • Model: DenseNet-100 [Huang’ 17] and ResNet-34 [He’ 16] • Image classification on CIFAR-10, CIFAR-100 [Krizhevsky’ 09] and SVHN [Netzer’ 11] • NLP tasks on Tweeter [Gimpel’ 11] and Reuters [Lewis’ 04] • Noise type • Uniform: corrupting a label to other class uniformly at random • Flip: corrupting a label only to a specific class Experiments: Setup 10 Deer Dog Frog [Uniform noise] … Deer Dog Frog [Flip noise] …
  • 47. • Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise Experiments: Empirical Analysis 11
  • 48. • Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise • MCD estimator improves the performance by removing outliers Experiments: Empirical Analysis 11
  • 49. • Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise • MCD estimator improves the performance by removing outliers • MCD estimator can select clean data samples more effectively than sample estimator! Experiments: Empirical Analysis 11 [The fraction of clean samples]
  • 50. • Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise • MCD estimator improves the performance by removing outliers • Ensemble method significantly improves the classification accuracy Experiments: Empirical Analysis 11
  • 51. • Test set accuracy of ResNet-34 trained on CIFAR-10 with 60% uniform noise • MCD estimator improves the performance by removing outliers • Ensemble method significantly improves the classification accuracy • Generative classifiers from lower layers are more robust to noisy labels • Generative classifier does not decrease the classification accuracy on clean dataset Experiments: Empirical Analysis 11 [Layer-wise accuracy] Clean Uniform (20%) Uniform (40%) Uniform (60%) Testsetaccuracy(%) 40 50 60 70 80 90 100 Index of basic block 2 4 6 8 10 12 14
  • 52. • Test set accuracy of ResNet-44 trained on CIFAR-10 with 60% uniform noises Comparison with Prior Training Methods 12 [Reed’ 14] Training deep neural networks on noisy labels with bootstrapping. arXiv preprint 2014. [Ma’ 18] Dimensionality-driven learning with noisy labels. In ICML, 2018 [Partrini’ 17] Making deep neural networks robust to label noise: A loss correction approach. In CVPR, 2017 Softmax Testsetaccuracy(%) 60 65 70 75 80 Cross entropy Bootstrap Forward D2L • Utilizing an estimated/corrected label • Bootstrap [Reed’ 14] • Forward[Patrini’ 17] • D2L [Ma’ 18] Softmax Testsetaccuracy(%) 60 65 70 75 80 Cross entropy Bootstrap Forward D2L Softmax Generative + MCD (ours) Testsetaccuracy(%) 60 65 70 75 80 Cross entropy Bootstrap Forward D2L
  • 53. • Training methods with an ensemble of classifiers or meta-learning model • Model: 9-layer CNNs • Dataset: CIFAR-100 • Noise: 45% Flip noise Comparison with Prior Training Methods 12 Testsetaccuracy(%) 25 30 35 40 45 50 Cross entropy MentoNet Co-teaching Co-teaching + ours • Training methods utilizing clean labels on NLP datasets • Model: 2-layer FCNs • Dataset: Twitter • Noise: 60% uniform noise Testsetaccuracy(%) 45 50 55 60 65 70 75 Cross entropy Forward (gold) GLC GLC + ours
  • 54. • Semantic noisy labels from a weak machine labeler Experiments: Machine Noisy Labels 13 (small number of) labeled data Classifier
  • 55. • Semantic noisy labels from a weak machine labeler • Confusion graph from ResNet-34 trained on 5% of CIFAR-10 labels Experiments: Machine Noisy Labels 13 Pre-trained (weak) classifier Unlabeled dataPseudo-labeled data * Node: class, Edge: its most confusing class Softmax Generative + MCD (ours) Testsetaccuracy(%) 66.0 66.5 67.0 67.5 68.0 68.5 69.0 Cross entropy Bootstrap Forward D2L
  • 56. • What is Open-set noisy labels? • Noisy samples from out-of-distribution [Wang’ 18] • E.g., “Cat” in CIFAR-10 Experiments: Open-set Noisy Labels [Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018. 14
  • 57. • What is Open-set noisy labels? • Noisy samples from out-of-distribution [Wang’ 18] • E.g., “Cat” in CIFAR-10 (which does not contain “apple” and “chair”) Experiments: Open-set Noisy Labels [Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018. 14 Open-set noisy labels
  • 58. • What is Open-set noisy labels? • Noisy samples from out-of-distribution [Wang’ 18] • E.g., “Cat” in CIFAR-10 (which does not contain “apple” and “chair”) Experiments: Open-set Noisy Labels [Wang’ 18] Iterative learning with open-set noisy labels. In CVPR, 2018. 14 • Experimental setup • In-distribution: CIFAR-10 • 60% of noise samples from ImageNet and CIFAR-100 • Model: DenseNet-100 Open-set noisy labels [Test accuracy (%) of DenseNet on the CIFAR-10]
  • 59. • To handle noisy labels, • We believe that our results can be useful for many machine learning problems: • Defense against adversarial attacks • Detecting out-of-distribution samples • Poster session: Pacific Ballroom #16 Conclusion 15 Generative classifier Robust inference Ensemble method - New inference method - LDA-based generative classifier - MCD estimator - Generalization error - Generative classifier from multiple layers Thank you for your attention 

Notas del editor

  1. Hi, I’m Kimin. In this presentation, I’m gonna introduce a new inference method based on generative classifier to handle noisy labels in training data. This is a joint work with Sukmin Yun, Kibok Lee, Honglak Lee, Bo Li and Jinwoo Shin.
  2. Ok, let me start the presentation. Large-scale datasets collect class label using data mining techniques because it is hard to obtain class label from domain experts like human. For example, Facebook’s Instagram dataset collects class labels using a hashtag from users and Open Images dataset uses machine-generated labels.
  3. However, if we collect the labels using data mining technique, datasets can contain incorrect noisy labels by mistake as shown in this example.
  4. And it is well-known that DNNs can not generalize well from such noisy dataset.
  5. For example, if we corrupt a label of CIFAR-10 data to other class uniformly at random, the test set accuracy of DenseNet decreases as shown in this figure. This result shows that handling noisy labels is really important problem in classification tasks.
  6. There are many kinds of training methods like bootstrapping, loss correction, ensemble method and meta-learning method. However, most of them might increase the complexity of training like training time and hyper-parameter tuning.
  7. So, we propose an inference method which can be applied to any pre-trained DNNs
  8. Our main idea is post-processing a “generative classifier” which models the data distribution on hidden feature spaces
  9. And by applying a “robust inference” to estimate parameters of generative classifier, we induce more robust decision boundary as shown in this figure. Here I wanna remark that we analyze the advantages of such robust inference method with respect to breakdown points and generalization bounds in this paper.
  10. Finally, we introduce an “Ensembles of generative classifiers” to further improve the performance of the proposed methods.
  11. And I will talk about more details of our methods and experimental results from the next slides.
  12. First, let me explain our motivation using t-SNE embedding of DensNet trained on CIFAR-10 with uniform noise.
  13. As shown in this figure, we first found that hidden features from training samples with noisy labels are scattered in the hidden spaces like outliers.
  14. On the other hands, features from training samples with clean labels are still clustered even though we train deep neural networks on noisy labels. This implies that deep representation has some clustering properties even when trained with noisy labels.
  15. So, if we can induce posterior distribution from hidden features by removing outliers, It can be robust to noisy labels. Motivated by this, we utilize a generative classifier which models data distribution P(x|y) where x is data and y is label.
  16. So, if we can induce posterior distribution from hidden features by removing outliers, It can be robust to noisy labels. Motivated by this, we utilize a generative classifier which models data distribution P(x|y) where x is data and y is label.
  17. Specifically, Given pre-trained softmax classifier, we post-process a generative classifier by defining the class-conditional Gaussian distribution of hidden features.
  18. Here, the reason why we assume Gaussian distribution is that hidden features from softmax classifier can be well-fitted in Gaussian distributions our approach is based on a theoretical connection that the posterior distribution of the generative classifier with a tied covariance is equivalent to the softmax classifier.
  19. With this Gaussian assumptions, new posterior distribution can be obtained from generative classifiers based on Bayes rule.
  20. However, to get new posterior distribution, we have to estimate the parameters of the generative classifier.
  21. One naïve solution is to compute the sample estimator using features from training data,
  22. However, Naïve sample estimator can be highly influenced by outliers
  23. To minimize the negative effects from outliers, we utilize Minimum Covariance Determinant shortly MCD estimator in this paper.
  24. Main idea of MCD is finding a subset which has smallest covariance determinant. One can imagine that outliers are usually scattered in sample spaces. Therefore, if we find a subset with smallest variance, outliers can be not included in that subset. Motivated by this, MCD estimator first selects a subset with minimum covariance determinant for each class,
  25. And the estimate the parameters of generative classifier only using the selected samples.
  26. Actually, such MCD estimator has theoretical advantages compared to naïve estimator. The first one is breakdown points which correspond to the smallest fraction of outliers to make a distance between true and estimated parameters. Here, I wanna remark that high breakdown points imply that estimator is robust to outliers.
  27. As shown in this theorem it is well-known that MCD estimator has near-optimal breakdown points, while Sample estimator has 0% breakdown points.
  28. In addition, we also prove that parameters from MCD estimator are more closer to true parameters than parameters from sample estimator as shown in this theorem.
  29. This result implies that generative classifier from MCD estimator has more tighter generalization error bound than sample estimator because Generalization error of generative classifier is bounded by the distance between true parameters and estimated parameters.
  30. Even though MCD estimator has several advantages, finding the optimal solution of MCD is computationally intractable.
  31. To handle this complexity issue, we utilize two step approach proposed by Hubert.
  32. First, given training samples, we choose a random subset. And compute the mean and covariance matrix.
  33. Then, for all training samples, we compute the Mahalanobis distance which corresponds to the distance between sample and gaussian distribution.
  34. Next, we construct a new subset which contains samples with smaller distances
  35. And update the mean and covariance matrix again.
  36. We repeat these steps until the determinant of covariance is not decreasing.
  37. And compute the final solution using the selected subsets from each class.
  38. Here, I remark that this greedy method can give us a local optimal solution because finding a subset using mahalanobis distances monotonically decreases the determinant of covariance matrix.
  39. In summary, we propose a new inference method based on generative classifier and utilize the MCD estimator to improve the robustness of classifier. One natural question here is Is it possible to utilize the low-level features?
  40. To solve this problem, We utilize two step approache proposed by Hubert. Idea is simple. First, for each class, we compute the sample mean and covariance from random initial set and Compute the Mahalanobis distance for every sample.
  41. In order to improve the performance, we also can post-process the generative classifier from low-level features as shown in this figure. Then, one can induce additional posterior distributions from low-level features.
  42. So, up to now, I explain the main idea of CMCL. Next, Let’s move on to the experiment.
  43. To evaluate our methods, we consider image classification task
  44. And NLP tasks like parts-of-speech Tagging and text categorization.
  45. To make noisy labels, we utilize uniform and flop noises defined as follows.
  46. In order to verify the stepwise improvement of each component, we incrementally apply our method using ResNet trained on CIFAR-10 dataset
  47. First, as shown in this table, MCD estimator improves the performance of generative classifier better than sample estimator by removing outliers.
  48. When we measure the selected clean samples by each estimator, we indeed found that MCD select clean sample more than sample estimator as shown in this graph.
  49. Next, I want to remark that Ensemble method significantly improves the classification accuracy.
  50. To analyze the effects of ensemble method, we measure the layer-wise accracy of generative classifier where x-axis corresnponds to the index of layer and y axis is classification accuracy.
  51. In this case, some training images are often from the open world and not relevant to the targeted classification task at all, i.e., out-of-distribution. However, they are still labeled to certain classes. For example, as shown in Figure 3(b), noisy samples like “chair” from CIFAR-100 and “door” from (downsampled) ImageNet (Chrabaszcz et al., 2017) can be labeled as “bird” to train, even though their true labels are not contained within the set of training classes in the CIFAR-10 dataset
  52. In our experiments, open-set noisy datasets are built by replacing some training samples in CIFAR-10 by out-of-distribution samples, while keeping the labels and the number of images per class unchanged. We train DenseNet-100 on CIFAR-10 with 60% open-set noise. As shown in Table 7, our method achieves comparable or significantly better test accuracy than the softmax classifier
  53. Let me conclude this presentation.