SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Paper Reviews in
Visual Attention
1
2018.3.29
SNU DATAMINING CENTER
MINKI CHUNG
WHO AM I 2
▸ Chung Minki
▸ BS, KAIST, IE, 2016
▸ MS, SNU, IE, 2018..?!
▸ Vision Projects
▸ Working on Semantic Image Inpainting
WHAT IS VISUAL ATTENTION 3
▸ Attention is HOT nowadays
▸ http://openaccess.thecvf.com/CVPR2017_search.py
▸ http://search.iclr2018.smerity.com/search/?query=attention
WHAT IS VISUAL ATTENTION 4
▸ Maybe heard of
▸ "Neural Machine Translation by Jointly Learning to Align and Translate"
▸ "Show, Attend, and Tell: Neural Image Caption"
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate"
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML.
"Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention"
WHAT IS VISUAL ATTENTION 5
▸ More,
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, NIPS, 2014. "Spatial Transformer Network"
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
Siavash Gorji, James J. Clark, 2017, CVPR. "Attentional Push: A Deep Convolutional Network for Augmenting Image Salience
with Shared Attention Modeling in Social Scenes"
WHAT IS VISUAL ATTENTION 6
▸ Visual Attention:
▸ Attend on certain part of image to solve a task more efficiently
▸ Deep learning, the black box model → Interpretability
TABLE OF CONTENTS 7
▸ Early Works
▸ Recurrent Attention Model (RAM)
▸ Spatial Transformer Network (STN)
▸ Recent Works of visual attention
▸ in ICLR
▸ in CVPR
PREREQUISITE 8
▸ CNN, Transpose Convolution(or Deconvolution), Dilated Convolution
▸ RNN
▸ MLP
▸ GAN
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
EARLY WORKS
:RAM, STN
9
RECURRENT ATTENTION MODEL 10
▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS.
"Recurrent Models of Visual Attention"
▸ Google DeepMind, 563 citations
▸ Motivation: Confronted by large image, human process image sequentially,
selecting where and what to look
▸ Tackle ConvNet limitation: poor scalability with increasing input image size
RECURRENT ATTENTION MODEL 11
▸ Multiple Object Recognition with Visual Attention (DRAM), 2015, ICLR
▸ Refined architecture version of RAM
▸ RNN Structure with multi-resolution crop, called glimpse
▸ Architecture:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
RECURRENT ATTENTION MODEL 12
▸ Architecture:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
WHERE TO SEE
WHAT TO SEE
provide initial state
locate glimpse
outputs the inputs for rnn(1)
for multiple objects
RECURRENT ATTENTION MODEL 13
▸ Demo
▸ Single object classification
https://github.com/kevinzakka/recurrent-visual-attention
RECURRENT ATTENTION MODEL 14
▸ Training:
▸ maximize
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
LOWERBOUND F
multiple object case
RECURRENT ATTENTION MODEL 15
▸ Cont'd:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
REINFORCE
RECURRENT ATTENTION MODEL 16
▸ Experiments & Results
▸ MNIST, SVHN
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
SPATIAL TRANSFORMER NETWORK 17
▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014
NIPS. "Spatial Transformer Network"
▸ Google DeepMind, 624 citations
▸ Motivation: Human process distorted objects by un-distorting it
▸ ConvNet is not actually invariant to large transformation(only realised over a
deep hierarchy of max-pooling)
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
https://kevinzakka.github.io/2017/01/18/stn-part2/
SPATIAL TRANSFORMER NETWORK 18
▸ Architecture:
▸ three parts: localisation net, sampling grid, sampler
▸ Assume 𝛵𝜃 is 2D affine transformation A𝜃,
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
regression
H,W,C H',W',C
SPATIAL TRANSFORMER NETWORK 19
▸ 𝛵𝜃, for attention becomes:
▸ Allowing cropping, translation, isotropic scaling
▸ In case if a bilinear sampling kernel,
▸ Differentiable, Modular,
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 20
▸ Experiments and Results
▸ MNIST
▸ SVHN
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 21
▸ Experiments and Results
▸ Fine-grained classification (CUB-200-211 bird classification dataset)
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 22
▸ Already implemented in Tensorlayer
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 23
▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional
Networks for Saliency Detection"
▸ RAM(Glimpse system) + STN(Differentiability) for Saliency Detection
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 24
▸ Recurrent Attentional Convolutional-Deconvolutional Network (RACDNN)
▸ Architecture
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 25
▸ Experiments & Results
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECENT WORKS
:ICLR, CVPR
26
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 27
▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR.
"Generative Image Inpainting with Contextual Attention"
▸ Adobe Research
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 28
▸ Architecture
▸ Two-stage(coarse to fine)
▸ Global and Local W-GANS
▸ Spatially discounted reconstruction loss(𝑙1): 𝛾
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
USE W-GAN
attention
𝑙
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 29
▸ Attention
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
fx,y
bx,y
Calculate cosine similarity:
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 30
▸ Experiments & Results
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
LEARN TO PAY ATTENTION 31
▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn
to Pay Attention"
▸ Very simple
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LEARN TO PAY ATTENTION 32
▸ Architecture
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
Attention
Compatibility
function(dot
product)
LEARN TO PAY ATTENTION 33
▸ Experiments & Results
▸ Image classification and fine-grained recognition
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LEARN TO PAY ATTENTION 34
▸ Experiments & Results
▸ Weakly supervised semantic segmentation
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LOOK CLOSER TO SEE BETTER 35
▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better:
Recurrent Attention Convolutional Neural Network for Fine-grained Image
Recognition"
▸ Fine-grained image recognition:
▸ Discriminative region localization + fine-grained feature learning
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 36
▸ Recurrent Attention Convolutional Neural Network (RA-CNN)
▸ Multi-scale networks: classification sub-network, attention proposal sub-
network(APN)
▸ Finer-scale network (coarse to fine)
▸ Intra-scale softmax loss for classification, inter-scale pairwise ranking loss for
APN
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 37
▸ RA-CNN architecture:
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
bilinear
interpolation
to amplify
LOOK CLOSER TO SEE BETTER 38
▸ Training:
▸ Multi-task loss:
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
forces
LOOK CLOSER TO SEE BETTER 39
▸ Experiments & Results
▸ CUB-200-211 Bird Dataset
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 40
▸ Experiments & Results
▸ Stanford Dogs, Stanford Cars
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
SUMMARY 41
▸ Attention for efficiency, better performance, interpretability
▸ Many types of Attention:
▸ RAM
▸ STN
▸ RAM+STN
▸ Others
ANY Q?
42
REFERERNCE 43
▸ Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly
Learning to Align and Translate"
▸ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard
Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual
Attention"
▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual
Attention"
▸ Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual
Attention"
▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer
Network"
▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image
Inpainting with Contextual Attention"
▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention
Convolutional Neural Network for Fine-grained Image Recognition"
END OF
DOCUMENT
44

Más contenido relacionado

Similar a Paper Reviews on Visual Attention

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
 
What Would Shannon Do?
What Would Shannon Do?What Would Shannon Do?
What Would Shannon Do?Karen Ullrich
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks曾 子芸
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_SlideKang-Ho Lee
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingMike McCann
 
capsule network
capsule networkcapsule network
capsule network민기 정
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleRoelof Pieters
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...Electronic Arts / DICE
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 
Intermediate inception network for person re-identification
Intermediate inception network for person re-identificationIntermediate inception network for person re-identification
Intermediate inception network for person re-identificationHuan-Cheng Hsu
 

Similar a Paper Reviews on Visual Attention (20)

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
What Would Shannon Do?
What Would Shannon Do?What Would Shannon Do?
What Would Shannon Do?
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
 
capsule network
capsule networkcapsule network
capsule network
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Intermediate inception network for person re-identification
Intermediate inception network for person re-identificationIntermediate inception network for person re-identification
Intermediate inception network for person re-identification
 

Último

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Paper Reviews on Visual Attention

  • 1. Paper Reviews in Visual Attention 1 2018.3.29 SNU DATAMINING CENTER MINKI CHUNG
  • 2. WHO AM I 2 ▸ Chung Minki ▸ BS, KAIST, IE, 2016 ▸ MS, SNU, IE, 2018..?! ▸ Vision Projects ▸ Working on Semantic Image Inpainting
  • 3. WHAT IS VISUAL ATTENTION 3 ▸ Attention is HOT nowadays ▸ http://openaccess.thecvf.com/CVPR2017_search.py ▸ http://search.iclr2018.smerity.com/search/?query=attention
  • 4. WHAT IS VISUAL ATTENTION 4 ▸ Maybe heard of ▸ "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ "Show, Attend, and Tell: Neural Image Caption" Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention"
  • 5. WHAT IS VISUAL ATTENTION 5 ▸ More, Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, NIPS, 2014. "Spatial Transformer Network" Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" Siavash Gorji, James J. Clark, 2017, CVPR. "Attentional Push: A Deep Convolutional Network for Augmenting Image Salience with Shared Attention Modeling in Social Scenes"
  • 6. WHAT IS VISUAL ATTENTION 6 ▸ Visual Attention: ▸ Attend on certain part of image to solve a task more efficiently ▸ Deep learning, the black box model → Interpretability
  • 7. TABLE OF CONTENTS 7 ▸ Early Works ▸ Recurrent Attention Model (RAM) ▸ Spatial Transformer Network (STN) ▸ Recent Works of visual attention ▸ in ICLR ▸ in CVPR
  • 8. PREREQUISITE 8 ▸ CNN, Transpose Convolution(or Deconvolution), Dilated Convolution ▸ RNN ▸ MLP ▸ GAN https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
  • 10. RECURRENT ATTENTION MODEL 10 ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Google DeepMind, 563 citations ▸ Motivation: Confronted by large image, human process image sequentially, selecting where and what to look ▸ Tackle ConvNet limitation: poor scalability with increasing input image size
  • 11. RECURRENT ATTENTION MODEL 11 ▸ Multiple Object Recognition with Visual Attention (DRAM), 2015, ICLR ▸ Refined architecture version of RAM ▸ RNN Structure with multi-resolution crop, called glimpse ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  • 12. RECURRENT ATTENTION MODEL 12 ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" WHERE TO SEE WHAT TO SEE provide initial state locate glimpse outputs the inputs for rnn(1) for multiple objects
  • 13. RECURRENT ATTENTION MODEL 13 ▸ Demo ▸ Single object classification https://github.com/kevinzakka/recurrent-visual-attention
  • 14. RECURRENT ATTENTION MODEL 14 ▸ Training: ▸ maximize Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" LOWERBOUND F multiple object case
  • 15. RECURRENT ATTENTION MODEL 15 ▸ Cont'd: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" REINFORCE
  • 16. RECURRENT ATTENTION MODEL 16 ▸ Experiments & Results ▸ MNIST, SVHN Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  • 17. SPATIAL TRANSFORMER NETWORK 17 ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Google DeepMind, 624 citations ▸ Motivation: Human process distorted objects by un-distorting it ▸ ConvNet is not actually invariant to large transformation(only realised over a deep hierarchy of max-pooling) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" https://kevinzakka.github.io/2017/01/18/stn-part2/
  • 18. SPATIAL TRANSFORMER NETWORK 18 ▸ Architecture: ▸ three parts: localisation net, sampling grid, sampler ▸ Assume 𝛵𝜃 is 2D affine transformation A𝜃, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" regression H,W,C H',W',C
  • 19. SPATIAL TRANSFORMER NETWORK 19 ▸ 𝛵𝜃, for attention becomes: ▸ Allowing cropping, translation, isotropic scaling ▸ In case if a bilinear sampling kernel, ▸ Differentiable, Modular, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 20. SPATIAL TRANSFORMER NETWORK 20 ▸ Experiments and Results ▸ MNIST ▸ SVHN Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 21. SPATIAL TRANSFORMER NETWORK 21 ▸ Experiments and Results ▸ Fine-grained classification (CUB-200-211 bird classification dataset) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 22. SPATIAL TRANSFORMER NETWORK 22 ▸ Already implemented in Tensorlayer Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 23. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 23 ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ RAM(Glimpse system) + STN(Differentiability) for Saliency Detection Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 24. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 24 ▸ Recurrent Attentional Convolutional-Deconvolutional Network (RACDNN) ▸ Architecture Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 25. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 25 ▸ Experiments & Results Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 27. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 27 ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Adobe Research Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  • 28. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 28 ▸ Architecture ▸ Two-stage(coarse to fine) ▸ Global and Local W-GANS ▸ Spatially discounted reconstruction loss(𝑙1): 𝛾 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention USE W-GAN attention 𝑙
  • 29. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 29 ▸ Attention Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention fx,y bx,y Calculate cosine similarity:
  • 30. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 30 ▸ Experiments & Results Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  • 31. LEARN TO PAY ATTENTION 31 ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Very simple Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 32. LEARN TO PAY ATTENTION 32 ▸ Architecture Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" Attention Compatibility function(dot product)
  • 33. LEARN TO PAY ATTENTION 33 ▸ Experiments & Results ▸ Image classification and fine-grained recognition Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 34. LEARN TO PAY ATTENTION 34 ▸ Experiments & Results ▸ Weakly supervised semantic segmentation Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 35. LOOK CLOSER TO SEE BETTER 35 ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition" ▸ Fine-grained image recognition: ▸ Discriminative region localization + fine-grained feature learning Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 36. LOOK CLOSER TO SEE BETTER 36 ▸ Recurrent Attention Convolutional Neural Network (RA-CNN) ▸ Multi-scale networks: classification sub-network, attention proposal sub- network(APN) ▸ Finer-scale network (coarse to fine) ▸ Intra-scale softmax loss for classification, inter-scale pairwise ranking loss for APN Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 37. LOOK CLOSER TO SEE BETTER 37 ▸ RA-CNN architecture: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" bilinear interpolation to amplify
  • 38. LOOK CLOSER TO SEE BETTER 38 ▸ Training: ▸ Multi-task loss: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" forces
  • 39. LOOK CLOSER TO SEE BETTER 39 ▸ Experiments & Results ▸ CUB-200-211 Bird Dataset Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 40. LOOK CLOSER TO SEE BETTER 40 ▸ Experiments & Results ▸ Stanford Dogs, Stanford Cars Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 41. SUMMARY 41 ▸ Attention for efficiency, better performance, interpretability ▸ Many types of Attention: ▸ RAM ▸ STN ▸ RAM+STN ▸ Others
  • 43. REFERERNCE 43 ▸ Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention" ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition"