SlideShare a Scribd company logo
1 of 20
2021๋…„ 1์›” 31์ผ
๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ ๋ชจ์ž„
์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌํŒ€ : ๊น€๋ณ‘ํ˜„ ๋ฐ•๋™ํ›ˆ ์•ˆ์ข…์‹ ํ™์€๊ธฐ ํ—ˆ๋‹ค์šด
Training data-Efficient Image transformer &
Distillation through Attention(DeiT)
Contents
Summary 01
03
02
04
05
Experience
Prerequisites
Method
Discussion
Summary
01
Summary of DeiT
01. Summary
1. 2020๋…„ 12์›” ๋ฐœํ‘œ, Facebook AI
2. ViT๋ฅผ ์ผ๋ถ€ ๋ฐœ์ „์‹œํ‚ค๊ณ  Distillation ๊ฐœ๋… ๋„์ž…
3. Contribution
- CNN์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ Image Classification
- ImageNet๋งŒ์œผ๋กœ ํ•™์Šต
- Single 8-GPU Node๋กœ 2~3์ผ์ •๋„๋งŒ ํ•™์Šต
- SOTA CNN๊ธฐ๋ฐ˜ Model๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ ํ™•์ธ
- Distillation ๊ฐœ๋… ๋„์ž…
4. Conclusion
- CNN ๊ธฐ๋ฐ˜ Architecture๋“ค์€ ๋‹ค๋…„๊ฐ„ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์–ด ์„ฑ๋Šฅ ํ–ฅ์ƒ
- Image Context Task์—์„œ Transformer๋Š” ์ด์ œ ๋ง‰ ์—ฐ๊ตฌ๋˜๊ธฐ ์‹œ์ž‘ํ•จ
> ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ ์—์„œ Transformer์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์คŒ
Prerequisites
02
Vision Transformer & Knowledge Distillation
02. Prerequisites
1. Vision Transformer
- An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google
> ์ฐธ์กฐ : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - ํ™์€๊ธฐ
02. Prerequisites
1. Vision Transformer
- Training Dataset : JFT-300M
- Pre-train : Low Resolution, Fine-tunning : High Resolution
> Position Embedding : Bicubic Interpolation
02. Prerequisites
2. Knowledge Distillation
- ๋ฏธ๋ฆฌ ์ž˜ ํ•™์Šต๋œ Teacher Model์„ ์ž‘์€ Student Model์— ์ง€์‹์„ ์ „๋‹ฌํ•œ๋‹ค๋Š” ๊ฐœ๋…
> ์ฐธ์กฐ : Explaining knowledge distillation by quantifying the knowledge - ๊น€๋™ํฌ
Q & A
Architecture
03
Implement of DeiT
03. Architecture
1. Knowledge Distillation
- Class Token๊ณผ ๊ฐ™์€ ๊ตฌ์กฐ์˜ Distillation Token ์ถ”๊ฐ€
- Soft Distillation
- Hard Distillation
- Random Crop์œผ๋กœ ์ธํ•œ ์ž˜๋ชป๋œ ํ•™์Šต ๋ฐฉ์ง€ ๊ฐ€๋Šฅ
GT : Cat / Prediction : Cat
GT : Cat / Prediction : ???
03. Architecture
2. Bag of Tricks
- ๊ธฐ๋ณธ์ ์œผ๋กœ, ViT ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ (ViT-B = DeiT-B)
> ๊ธฐ๋ณธ์ ์ธ ํ•™์Šต ๋ฐฉ๋ฒ• ๋™์ผ
> Hyper parameter Tunning์œผ๋กœ ์„ฑ๋Šฅ ํ–ฅ์ƒ
Q & A
EXPERIMENTS
04
Experiment Result of DeiT
04. Experiments
1. Distillation
- Teacher Model : RegNetY-16GF
> ConvNet is Better than Transformer Model
โ€œProbablyโ€ Inductive Bias !
- Distillation Comparison : Hard is Better
* Inductive Bias
- Distillation Method๊ฐ€ Convnet์˜ Inductive Bias๋ฅผ ๋” ์ž˜ ํ•™์Šตํ•œ๋‹ค
04. Experiments
2. Efficiency vs Accuracy
- Parameter์˜ ๊ฐœ์ˆ˜, ์ฒ˜๋ฆฌ์†๋„, Accuracy๋ฅผ ๋น„๊ต
> Throughput๊ณผ Accuracy๋กœ ๋น„๊ตํ•˜๋ฉด, Convnet์™€ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค
- Base Model : DeiT-B (= ViT-B)
3. Transfer Learning
- ImageNet์œผ๋กœ ํ•™์Šตํ•œ Pre-Train Model์„ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ Set์œผ๋กœ Test
Discussion
05
Conclusion & Discussing
05. Discussion
1. Contribution
1) Transformer ๊ธฐ๋ฐ˜์˜ ViT Model์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ (Convnet X)
2) ViT๋ณด๋‹ค ๋” ์ ์€ Dataset์œผ๋กœ ํ•™์Šต ๋ฐ ํ•™์Šต์†๋„ ํ–ฅ์ƒ
3) SOTA Convnet๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ ํ™•์ธ
4) ๊ฐ„ํŽธํ•œ Knowledge Distillation ๋ฐฉ๋ฒ• ์ œ์•ˆ
2. Opinion
1) ์—ฌ์ „ํžˆ ๋งŽ์€ Epoch ํ•„์š” (300~500Epoch)
2) Transformer์˜ ๋‹จ์ ์ด ๋“œ๋Ÿฌ๋‚จ
> Hyper Parameter์— ๋ฏผ๊ฐ
> Convnet๋Œ€๋น„ ๋งŽ์€ Dataset๊ณผ Training ์‹œ๊ฐ„์ด ํ•„์š”
> ์—ฐ๊ตฌ๋‹จ๊ณ„์—์„œ๋Š” ๋งŽ์€ ์—ฐ๊ตฌ ๊ฐ€๋Šฅ, ํ˜„์—…์— ์ ์šฉํ•˜๊ธฐ์—๋Š” ์–ด๋ ค์›€
3) Deep Learning ๊ฐœ๋ฐœ ์ดˆ๊ธฐ๋‹จ๊ณ„์˜ ์—ฐ๊ตฌ ๋ฐฉ์‹
> Quantitative Research (Experiment ๏ƒ  Theory)
> Experiment์˜ ๊ฒฐ๊ณผ๋ฅผ ์ถฉ๋ถ„ํžˆ ํ•ด์„ํ•˜์ง€ ๋ชปํ•จ
3. Conclusion
1) ์•„์ง ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ํ•„์š”ํ•œ ๋ถ„์•ผ
2) ์—ฐ๊ตฌ ์ดˆ๊ธฐ๋‹จ๊ณ„์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  CNN๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค๋Š” ๊ฒƒ์€
NLP์—์„œ์˜ ๋ณ€ํ™”์ฒ˜๋Ÿผ, CNN์„ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
Q & A
THANK YOU
for Watching

More Related Content

What's hot

Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
myxymyxomatosis
ย 
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟnTรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
Chien Dang
ย 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
ย 

What's hot (20)

CVPR2019่ชญใฟไผš "A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruc...
CVPR2019่ชญใฟไผš "A Theory of Fermat Paths  for Non-Line-of-Sight Shape Reconstruc...CVPR2019่ชญใฟไผš "A Theory of Fermat Paths  for Non-Line-of-Sight Shape Reconstruc...
CVPR2019่ชญใฟไผš "A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruc...
ย 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
ย 
simple_rnn_forward_back_propagation
simple_rnn_forward_back_propagationsimple_rnn_forward_back_propagation
simple_rnn_forward_back_propagation
ย 
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015่ชญใฟไผš #...
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015่ชญใฟไผš #...Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015่ชญใฟไผš #...
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015่ชญใฟไผš #...
ย 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
ย 
Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
Transformerใ‚’็”จใ„ใŸAutoEncoderใฎ่จญ่จˆใจๅฎŸ้จ“
ย 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
ย 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
ย 
A brief survey of tensors
A brief survey of tensorsA brief survey of tensors
A brief survey of tensors
ย 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
ย 
ใ‚นใƒ‘ใƒผใ‚นใƒขใƒ‡ใƒชใƒณใ‚ฐใซใ‚ˆใ‚‹ๅคšๆฌกๅ…ƒไฟกๅทใƒป็”ปๅƒๅพฉๅ…ƒ
ใ‚นใƒ‘ใƒผใ‚นใƒขใƒ‡ใƒชใƒณใ‚ฐใซใ‚ˆใ‚‹ๅคšๆฌกๅ…ƒไฟกๅทใƒป็”ปๅƒๅพฉๅ…ƒใ‚นใƒ‘ใƒผใ‚นใƒขใƒ‡ใƒชใƒณใ‚ฐใซใ‚ˆใ‚‹ๅคšๆฌกๅ…ƒไฟกๅทใƒป็”ปๅƒๅพฉๅ…ƒ
ใ‚นใƒ‘ใƒผใ‚นใƒขใƒ‡ใƒชใƒณใ‚ฐใซใ‚ˆใ‚‹ๅคšๆฌกๅ…ƒไฟกๅทใƒป็”ปๅƒๅพฉๅ…ƒ
ย 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ย 
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟnTรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
Tรญnh toรกn khoa hแปc - Chฦฐฦกng 4: Giแบฃi phฦฐฦกng trรฌnh phi tuyแบฟn
ย 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
ย 
PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...
PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...
PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...
ย 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
ย 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
ย 
State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
ย 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
ย 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks
ย 

Similar to Training data-efficient image transformers & distillation through attention

20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
eastarJeong2
ย 

Similar to Training data-efficient image transformers & distillation through attention (20)

๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ efficient netv2 ๋…ผ๋ฌธ๋ฆฌ๋ทฐ
๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ efficient netv2  ๋…ผ๋ฌธ๋ฆฌ๋ทฐ๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ efficient netv2  ๋…ผ๋ฌธ๋ฆฌ๋ทฐ
๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ efficient netv2 ๋…ผ๋ฌธ๋ฆฌ๋ทฐ
ย 
I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)I3D and Kinetics datasets (Action Recognition)
I3D and Kinetics datasets (Action Recognition)
ย 
๊ถŒ๊ธฐํ›ˆ_ํฌํŠธํด๋ฆฌ์˜ค
๊ถŒ๊ธฐํ›ˆ_ํฌํŠธํด๋ฆฌ์˜ค๊ถŒ๊ธฐํ›ˆ_ํฌํŠธํด๋ฆฌ์˜ค
๊ถŒ๊ธฐํ›ˆ_ํฌํŠธํด๋ฆฌ์˜ค
ย 
TinyBERT
TinyBERTTinyBERT
TinyBERT
ย 
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
ย 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
ย 
[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
ย 
AUTOML
AUTOMLAUTOML
AUTOML
ย 
Automl
AutomlAutoml
Automl
ย 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
ย 
์ƒ์ฒด ๊ด‘ํ•™ ๋ฐ์ดํ„ฐ ๋ถ„์„ AI ๊ฒฝ์ง„๋Œ€ํšŒ 3์œ„ ์ˆ˜์ƒ์ž‘
์ƒ์ฒด ๊ด‘ํ•™ ๋ฐ์ดํ„ฐ ๋ถ„์„ AI ๊ฒฝ์ง„๋Œ€ํšŒ 3์œ„ ์ˆ˜์ƒ์ž‘์ƒ์ฒด ๊ด‘ํ•™ ๋ฐ์ดํ„ฐ ๋ถ„์„ AI ๊ฒฝ์ง„๋Œ€ํšŒ 3์œ„ ์ˆ˜์ƒ์ž‘
์ƒ์ฒด ๊ด‘ํ•™ ๋ฐ์ดํ„ฐ ๋ถ„์„ AI ๊ฒฝ์ง„๋Œ€ํšŒ 3์œ„ ์ˆ˜์ƒ์ž‘
ย 
์œ„์„ฑ๊ด€์ธก ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๊ฐ•์ˆ˜๋Ÿ‰ ์‚ฐ์ถœ AI ๊ฒฝ์ง„๋Œ€ํšŒ 1์œ„ ์ˆ˜์ƒ์ž‘
์œ„์„ฑ๊ด€์ธก ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๊ฐ•์ˆ˜๋Ÿ‰ ์‚ฐ์ถœ AI ๊ฒฝ์ง„๋Œ€ํšŒ 1์œ„ ์ˆ˜์ƒ์ž‘์œ„์„ฑ๊ด€์ธก ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๊ฐ•์ˆ˜๋Ÿ‰ ์‚ฐ์ถœ AI ๊ฒฝ์ง„๋Œ€ํšŒ 1์œ„ ์ˆ˜์ƒ์ž‘
์œ„์„ฑ๊ด€์ธก ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๊ฐ•์ˆ˜๋Ÿ‰ ์‚ฐ์ถœ AI ๊ฒฝ์ง„๋Œ€ํšŒ 1์œ„ ์ˆ˜์ƒ์ž‘
ย 
๋„คํŠธ์›Œํฌ ๊ฒฝ๋Ÿ‰ํ™” ์ด๋ชจ์ €๋ชจ @ 2020 DLD
๋„คํŠธ์›Œํฌ ๊ฒฝ๋Ÿ‰ํ™” ์ด๋ชจ์ €๋ชจ @ 2020 DLD๋„คํŠธ์›Œํฌ ๊ฒฝ๋Ÿ‰ํ™” ์ด๋ชจ์ €๋ชจ @ 2020 DLD
๋„คํŠธ์›Œํฌ ๊ฒฝ๋Ÿ‰ํ™” ์ด๋ชจ์ €๋ชจ @ 2020 DLD
ย 
์Šค๋งˆํŠธํฐ ์œ„์˜ ๋”ฅ๋Ÿฌ๋‹
์Šค๋งˆํŠธํฐ ์œ„์˜ ๋”ฅ๋Ÿฌ๋‹์Šค๋งˆํŠธํฐ ์œ„์˜ ๋”ฅ๋Ÿฌ๋‹
์Šค๋งˆํŠธํฐ ์œ„์˜ ๋”ฅ๋Ÿฌ๋‹
ย 
180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pub180624 mobile visionnet_baeksucon_jwkang_pub
180624 mobile visionnet_baeksucon_jwkang_pub
ย 
ICIP 2018 REVIEW
ICIP 2018 REVIEWICIP 2018 REVIEW
ICIP 2018 REVIEW
ย 
20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
20แ„‚แ…งแ†ซแ„ƒแ…ฌแ†ซ Naver Cafe แ„‰แ…ฅแ„‡แ…ตแ„‰แ…ณแ„€แ…ก Modularizationแ„‹แ…ณแ„…แ…ฉ แ„Œแ…ตแ†ซแ„’แ…ช แ„’แ…กแ„€แ…ต_แ„Œแ…ฅแ†ผแ„ƒแ…ฉแ†ผแ„Œแ…ตแ†ซ.pdf
ย 
์†Œํ”„ํŠธ์›จ์–ด ๋งˆ์—์ŠคํŠธ๋กœ 10๊ธฐ - ์ฑ…์„ ๋งŒ๋‚˜๋Š” ์ˆœ๊ฐ„, ์ฑ…์„์ฐ๋‹ค
์†Œํ”„ํŠธ์›จ์–ด ๋งˆ์—์ŠคํŠธ๋กœ 10๊ธฐ - ์ฑ…์„ ๋งŒ๋‚˜๋Š” ์ˆœ๊ฐ„, ์ฑ…์„์ฐ๋‹ค์†Œํ”„ํŠธ์›จ์–ด ๋งˆ์—์ŠคํŠธ๋กœ 10๊ธฐ - ์ฑ…์„ ๋งŒ๋‚˜๋Š” ์ˆœ๊ฐ„, ์ฑ…์„์ฐ๋‹ค
์†Œํ”„ํŠธ์›จ์–ด ๋งˆ์—์ŠคํŠธ๋กœ 10๊ธฐ - ์ฑ…์„ ๋งŒ๋‚˜๋Š” ์ˆœ๊ฐ„, ์ฑ…์„์ฐ๋‹ค
ย 
๋”ฅ๋Ÿฌ๋‹ ์„ธ๊ณ„์— ์ž…๋ฌธํ•˜๊ธฐ ์œ„๋ฐ˜ ๋ถ„ํˆฌ
๋”ฅ๋Ÿฌ๋‹ ์„ธ๊ณ„์— ์ž…๋ฌธํ•˜๊ธฐ ์œ„๋ฐ˜ ๋ถ„ํˆฌ๋”ฅ๋Ÿฌ๋‹ ์„ธ๊ณ„์— ์ž…๋ฌธํ•˜๊ธฐ ์œ„๋ฐ˜ ๋ถ„ํˆฌ
๋”ฅ๋Ÿฌ๋‹ ์„ธ๊ณ„์— ์ž…๋ฌธํ•˜๊ธฐ ์œ„๋ฐ˜ ๋ถ„ํˆฌ
ย 
์ž‘๊ณ  ๋น ๋ฅธ ๋”ฅ๋Ÿฌ๋‹ ๊ทธ๋ฆฌ๊ณ  Edge computing
์ž‘๊ณ  ๋น ๋ฅธ ๋”ฅ๋Ÿฌ๋‹ ๊ทธ๋ฆฌ๊ณ  Edge computing์ž‘๊ณ  ๋น ๋ฅธ ๋”ฅ๋Ÿฌ๋‹ ๊ทธ๋ฆฌ๊ณ  Edge computing
์ž‘๊ณ  ๋น ๋ฅธ ๋”ฅ๋Ÿฌ๋‹ ๊ทธ๋ฆฌ๊ณ  Edge computing
ย 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
ย 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
ย 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
ย 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
ย 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
taeseon ryu
ย 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
ย 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
ย 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
ย 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
ย 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
ย 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
ย 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
ย 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
ย 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
ย 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
ย 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
ย 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
ย 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
ย 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
ย 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
ย 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
ย 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
ย 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_๋ณ€ํ˜„์ •
ย 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
ย 
YOLO V6
YOLO V6YOLO V6
YOLO V6
ย 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
ย 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
ย 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
ย 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
ย 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
ย 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
ย 
mPLUG
mPLUGmPLUG
mPLUG
ย 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
ย 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
ย 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
ย 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
ย 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
ย 

Training data-efficient image transformers & distillation through attention

  • 1. 2021๋…„ 1์›” 31์ผ ๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ฝ๊ธฐ ๋ชจ์ž„ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌํŒ€ : ๊น€๋ณ‘ํ˜„ ๋ฐ•๋™ํ›ˆ ์•ˆ์ข…์‹ ํ™์€๊ธฐ ํ—ˆ๋‹ค์šด Training data-Efficient Image transformer & Distillation through Attention(DeiT)
  • 4. 01. Summary 1. 2020๋…„ 12์›” ๋ฐœํ‘œ, Facebook AI 2. ViT๋ฅผ ์ผ๋ถ€ ๋ฐœ์ „์‹œํ‚ค๊ณ  Distillation ๊ฐœ๋… ๋„์ž… 3. Contribution - CNN์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ Image Classification - ImageNet๋งŒ์œผ๋กœ ํ•™์Šต - Single 8-GPU Node๋กœ 2~3์ผ์ •๋„๋งŒ ํ•™์Šต - SOTA CNN๊ธฐ๋ฐ˜ Model๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ ํ™•์ธ - Distillation ๊ฐœ๋… ๋„์ž… 4. Conclusion - CNN ๊ธฐ๋ฐ˜ Architecture๋“ค์€ ๋‹ค๋…„๊ฐ„ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์–ด ์„ฑ๋Šฅ ํ–ฅ์ƒ - Image Context Task์—์„œ Transformer๋Š” ์ด์ œ ๋ง‰ ์—ฐ๊ตฌ๋˜๊ธฐ ์‹œ์ž‘ํ•จ > ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ ์—์„œ Transformer์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์คŒ
  • 5. Prerequisites 02 Vision Transformer & Knowledge Distillation
  • 6. 02. Prerequisites 1. Vision Transformer - An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google > ์ฐธ์กฐ : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - ํ™์€๊ธฐ
  • 7. 02. Prerequisites 1. Vision Transformer - Training Dataset : JFT-300M - Pre-train : Low Resolution, Fine-tunning : High Resolution > Position Embedding : Bicubic Interpolation
  • 8. 02. Prerequisites 2. Knowledge Distillation - ๋ฏธ๋ฆฌ ์ž˜ ํ•™์Šต๋œ Teacher Model์„ ์ž‘์€ Student Model์— ์ง€์‹์„ ์ „๋‹ฌํ•œ๋‹ค๋Š” ๊ฐœ๋… > ์ฐธ์กฐ : Explaining knowledge distillation by quantifying the knowledge - ๊น€๋™ํฌ
  • 11. 03. Architecture 1. Knowledge Distillation - Class Token๊ณผ ๊ฐ™์€ ๊ตฌ์กฐ์˜ Distillation Token ์ถ”๊ฐ€ - Soft Distillation - Hard Distillation - Random Crop์œผ๋กœ ์ธํ•œ ์ž˜๋ชป๋œ ํ•™์Šต ๋ฐฉ์ง€ ๊ฐ€๋Šฅ GT : Cat / Prediction : Cat GT : Cat / Prediction : ???
  • 12. 03. Architecture 2. Bag of Tricks - ๊ธฐ๋ณธ์ ์œผ๋กœ, ViT ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ (ViT-B = DeiT-B) > ๊ธฐ๋ณธ์ ์ธ ํ•™์Šต ๋ฐฉ๋ฒ• ๋™์ผ > Hyper parameter Tunning์œผ๋กœ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • 13. Q & A
  • 15. 04. Experiments 1. Distillation - Teacher Model : RegNetY-16GF > ConvNet is Better than Transformer Model โ€œProbablyโ€ Inductive Bias ! - Distillation Comparison : Hard is Better * Inductive Bias - Distillation Method๊ฐ€ Convnet์˜ Inductive Bias๋ฅผ ๋” ์ž˜ ํ•™์Šตํ•œ๋‹ค
  • 16. 04. Experiments 2. Efficiency vs Accuracy - Parameter์˜ ๊ฐœ์ˆ˜, ์ฒ˜๋ฆฌ์†๋„, Accuracy๋ฅผ ๋น„๊ต > Throughput๊ณผ Accuracy๋กœ ๋น„๊ตํ•˜๋ฉด, Convnet์™€ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค - Base Model : DeiT-B (= ViT-B) 3. Transfer Learning - ImageNet์œผ๋กœ ํ•™์Šตํ•œ Pre-Train Model์„ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ Set์œผ๋กœ Test
  • 18. 05. Discussion 1. Contribution 1) Transformer ๊ธฐ๋ฐ˜์˜ ViT Model์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ (Convnet X) 2) ViT๋ณด๋‹ค ๋” ์ ์€ Dataset์œผ๋กœ ํ•™์Šต ๋ฐ ํ•™์Šต์†๋„ ํ–ฅ์ƒ 3) SOTA Convnet๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ ํ™•์ธ 4) ๊ฐ„ํŽธํ•œ Knowledge Distillation ๋ฐฉ๋ฒ• ์ œ์•ˆ 2. Opinion 1) ์—ฌ์ „ํžˆ ๋งŽ์€ Epoch ํ•„์š” (300~500Epoch) 2) Transformer์˜ ๋‹จ์ ์ด ๋“œ๋Ÿฌ๋‚จ > Hyper Parameter์— ๋ฏผ๊ฐ > Convnet๋Œ€๋น„ ๋งŽ์€ Dataset๊ณผ Training ์‹œ๊ฐ„์ด ํ•„์š” > ์—ฐ๊ตฌ๋‹จ๊ณ„์—์„œ๋Š” ๋งŽ์€ ์—ฐ๊ตฌ ๊ฐ€๋Šฅ, ํ˜„์—…์— ์ ์šฉํ•˜๊ธฐ์—๋Š” ์–ด๋ ค์›€ 3) Deep Learning ๊ฐœ๋ฐœ ์ดˆ๊ธฐ๋‹จ๊ณ„์˜ ์—ฐ๊ตฌ ๋ฐฉ์‹ > Quantitative Research (Experiment ๏ƒ  Theory) > Experiment์˜ ๊ฒฐ๊ณผ๋ฅผ ์ถฉ๋ถ„ํžˆ ํ•ด์„ํ•˜์ง€ ๋ชปํ•จ 3. Conclusion 1) ์•„์ง ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ํ•„์š”ํ•œ ๋ถ„์•ผ 2) ์—ฐ๊ตฌ ์ดˆ๊ธฐ๋‹จ๊ณ„์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  CNN๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค๋Š” ๊ฒƒ์€ NLP์—์„œ์˜ ๋ณ€ํ™”์ฒ˜๋Ÿผ, CNN์„ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
  • 19. Q & A