Slideshare breaking inter layer co-adaptation

Masayuki Tanaka
Breaking Inter-Layer Co-Adaptation
by Classifier Anonymization
Ikuro Sato†, Kohta Ishikawa†, Guoqing Liu†, Masayuki Tanaka‡
(ICML2019)
† ‡

Meta reviewer’s comment
…This paper seems to me like a perfect example of a
“High Risk High Reward” paper, …
Acceptance ratio of ICML2019: 773/3424 = 22.6%
We have taken that as a compliment. It is a research!
1

What I’m going to talk
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Feature space 𝜉𝜉
+
+
+ +
+
+ +
--
-
-
-- -
-
+
++
+
+
+
+-- --
--
-
End-to-end DNN
<<
Which is better? Why? How can we obtain good features?2

Summary
About what?
How?
Theory?
In reality?
Breaking co-adaptation between
feature extractor and classifier.
By classifier anonymization technique.
Proved: Features form simple
point-like distribution.
Point-like property largely confirmed
on real datasets.
3

What is a co-adaptation?
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Decision
boundary
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation:
Feature extractor adapts a particular classifier.
Classifier adapts a particular feature extractor.
Break
co-adaptation
-
+
++
+
+
+
+-- --
--
-
Classifiers
Feature extractor should be
trained for many classifiers.
End-to-end DNN
4

Proposed algorithm: FOCA
-
+++
+
+ ++
--
-----
(Under several conditions,)
we theoretically proved the FOCA
can train the feature extractor
which projects single point.
for given feature extractor
FOCA can train feature extractor to make any weak classifier strong.
FOCA:
Feature-extractor Optimization through Classifier Anonymization
5

Message of FOCA
Traditional training FOCA training
Feature extractor
(Junior researcher)
Feature extractor
(Junior researcher)
Weak classifiers
(Boss variety???)
Strong classifier
(Smart boss)
Transfer learning
(New boss, new domain)
FOCA can train
feature extractor strong.
6

Weak classifier assumption
Definition:
Weak classifier is slightly better than random guess.
𝜃𝜃𝜙𝜙
∗
= arg min
𝜃𝜃
E
(𝑥𝑥,𝑡𝑡)~𝑝𝑝(𝑥𝑥,𝑡𝑡)
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
𝜃𝜃𝜙𝜙
𝐵𝐵
= arg min
𝜃𝜃
�
𝑥𝑥,𝑡𝑡 ∈𝐵𝐵
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
Strong classifier
Strong classifier is strong for entire data.
Weak classifier assumption
We assume that strong classifier for small samples is
weak classifier for entire data.
B is small samples of entire data.
7

Practical FOCA algorithm
𝐹𝐹𝜙𝜙(𝑥𝑥)
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
generatorFeature
extractor
Classifier model
𝐹𝐹𝐹𝜙𝜙(𝑥𝑥)
Previous
feature extractor
Training data
Optimize the classifier
for given small samples
with previous feature extractor.
Update feature extractor
for given mini-batch
with weak classifier.
Sampling
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
Update
Mini-batch
8

Experimental validation
Two-step training:
Train the feature extractor. Then, train the classifier with the fixed
given feature extractor.
-
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation Point-like
-
+++
+
+ ++
--
-----
Many samples are required to train
the classifier.
A few samples are good enough to
train the classifier.
9

Links
Official proceedings of ICML2019
http://proceedings.mlr.press/v97/
arxiv: Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
https://arxiv.org/abs/1906.01150
Twitter: Masayuki Tanaka
https://twitter.com/likesilkto
Twitter: Ikuro Sato
https://twitter.com/ikuro_s
12

Slideshare breaking inter layer co-adaptation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Similar a Slideshare breaking inter layer co-adaptation

Similar a Slideshare breaking inter layer co-adaptation (20)

Más de Masayuki Tanaka

Más de Masayuki Tanaka (20)

Último

Último (20)

Slideshare breaking inter layer co-adaptation