SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
[course site]
Image Retrieval
Day 3 Lecture 6
Eva Mohedano
Content Based Image Retrieval
2
Given an image query, generate a rank of all
similar images.
Classification
3
Query: This chair Results from dataset classified as “chair”
Retrieval
4
Query: This chair Similar images
Retrieval Pipeline
5
Image RepresentationsQuery image
Image
Dataset
Image Matching Ranking List
Similarity score Image
..
.
0.98
0.97
0.10
0.01
v = (v1
, …, vn
)
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
Euclidean distance
Cosine Similarity
Similarity
Metric
Retrieval Pipeline
6
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
k feature vectors per
image
Bag of Visual
Words
N-Dimensional
feature space
M visual words
(M clusters)
INVERTED FILE
word Image ID
1 1, 12,
2 1, 30, 102
3 10, 12
4 2,3
6 10
...
Large vocabularies (50k-1M)
Very fast!
Typically used with SIFT features
CNN for retrieval
7
Classification Object Detection
Segmentation
Off-the-shelf CNN representations
8
Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV
Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In CVPRW
FC layers as global feature representation
Off-the-shelf CNN representations
9
sum/max pool conv features across filters
Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. ICCV
Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879.
Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065.
Off-the-shelf CNN representations
10
Descriptors from convolutional layers
Off-the-shelf CNN representations
11
R-MAC: Regional Maximum Activation of Convolutions
Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint
arXiv:1511.05879.
Off-the-shelf CNN representations
12
BoW, VLAD encoding of conv features
Ng, J., Yang, F., & Davis, L. (2015). Exploiting local features from deep networks for image retrieval. In CVPRW
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance
Search. In ICMR
Off-the-shelf CNN representations
13
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptors from convolutional layers
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable
Instance Search. In ICMR
Off-the-shelf CNN representations
14
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptors from convolutional layers
Off-the-shelf CNN representations
15
Paris Buildings 6k Oxford Buildings 5k
TRECVID Instance Search 2013
(subset of 23k frames)
[7] Kalantidis, Y., Mellina, C., & Osindero, S. (2015).
Cross-dimensional Weighting for Aggregated Deep Convolutional
Features. arXiv preprint arXiv:1512.04065.
Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor
N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for
Scalable Instance Search. In ICMR
Off-the-shelf CNN representations
CNN representations
- l2 Normalization + PCA whitening + l2 Normalization
- Cosine similarity
- Convolutional features better than fully connected features
- Convolutional features keep spatial information → Retrieval+object location
- Convolutional layers allows custom input size.
- If data labels available, fine tuning the network to the image domain improves
CNN representations.
16
Learning representations for retrieval
Siamese Network: Network to learn a function that maps input
patterns into a target space such that l2-norm in the target
space approximates the semantic distance in the input space.
Applied in:
Dimensionality reduction[1]
Face verification[2]
Learning local image representations[3]
17
[1] Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR.
[2] S. Chopra, R. Hadsell and Y. LeCun, Learning a similarity metric discriminatively, with application to face verification.(CVPR'05)
[3] Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR,
abs/1412.6537, 2014
Learning representations for retrieval
Siamese Network: Network to learn a function that maps input
patterns into a target space such that l2-norm in the target
space approximates the semantic distance in the input space.
18
Image from: Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR,
abs/1412.6537, 2014
Learning representations for retrieval
Siamese Network with Triplet Loss: Loss function minimizes distance between query and
positive and maximizes distance between query and negative
19
Schroff, F; Kalenichenko, D and Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015
w w
CNN CNN CNN
a p n
L2 embedding space
Triplet Loss
Learning representations for retrieval
20
Deep Image Retrieval: Learning global representations for image
search, Gordo A. et al. Xerox Research Centre, 2016
- R-MAC representation
- Learning descriptors for retrieval using three channels
siamese loss: Ranking objective:
- Learning where to pool within an image: predicting object
locations
- Local features (from predicted ROI) pooled into a more
discriminative space (learned fc)
- Building and cleaning a dataset to generate triplets
Learning representations for retrieval
21
Learning representations for retrieval
22
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Dataset: Landmarks dataset:
● 214K images of 672 famous landmark site.
● Dataset processing based on a matching
baseline: SIFT + Hessian-Affine keypoint
detector.
● Important to select the “useful” triplets.
Learning representations for retrieval
23
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Comparison between training for Classification (C) of training for Rankings (R)
Learning representations for retrieval
24
Deep Image Retrieval: Learning global representations for image search,
Gordo A. et al. Xerox Research Centre, 2016
Summary
25
Pre-trained CNN are useful to generate image descriptors for retrieval
Convolutional layers allow us to encode local information
Knowing how to rank similarity is the primary task in retrieval
Designing CNN architectures to learn how to rank

Más contenido relacionado

La actualidad más candente

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer VisionDongmin Choi
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 

La actualidad más candente (20)

Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 

Similar a Deep Learning for Computer Vision: Image Retrieval (UPC 2016)

[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...Susang Kim
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSunando Sengupta
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Universitat Politècnica de Catalunya
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Bundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image SearchBundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image SearchWin Yu
 
Currency recognition on mobile phones
Currency recognition on mobile phonesCurrency recognition on mobile phones
Currency recognition on mobile phoneshabeebsab
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfSamuCerezo
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...Seiya Ito
 
Face Detection for identification of people in Images of Internet
Face Detection for identification of people in Images of InternetFace Detection for identification of people in Images of Internet
Face Detection for identification of people in Images of Internetijceronline
 
Remotesensing 07-14680##
Remotesensing 07-14680##Remotesensing 07-14680##
Remotesensing 07-14680##Ajaz Ansari
 

Similar a Deep Learning for Computer Vision: Image Retrieval (UPC 2016) (20)

[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
[Paper] GIRAFFE: Representing Scenes as Compositional Generative Neural Featu...
 
Semantic Mapping of Road Scenes
Semantic Mapping of Road ScenesSemantic Mapping of Road Scenes
Semantic Mapping of Road Scenes
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
EIS_REVIEW_1.pptx
EIS_REVIEW_1.pptxEIS_REVIEW_1.pptx
EIS_REVIEW_1.pptx
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Bundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image SearchBundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image Search
 
conv_nets.pptx
conv_nets.pptxconv_nets.pptx
conv_nets.pptx
 
Lec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-systemLec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-system
 
Currency recognition on mobile phones
Currency recognition on mobile phonesCurrency recognition on mobile phones
Currency recognition on mobile phones
 
Blurclassification
BlurclassificationBlurclassification
Blurclassification
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
 
Face Detection for identification of people in Images of Internet
Face Detection for identification of people in Images of InternetFace Detection for identification of people in Images of Internet
Face Detection for identification of people in Images of Internet
 
Remotesensing 07-14680##
Remotesensing 07-14680##Remotesensing 07-14680##
Remotesensing 07-14680##
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 

Más de Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Último

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Deep Learning for Computer Vision: Image Retrieval (UPC 2016)

  • 1. [course site] Image Retrieval Day 3 Lecture 6 Eva Mohedano
  • 2. Content Based Image Retrieval 2 Given an image query, generate a rank of all similar images.
  • 3. Classification 3 Query: This chair Results from dataset classified as “chair”
  • 5. Retrieval Pipeline 5 Image RepresentationsQuery image Image Dataset Image Matching Ranking List Similarity score Image .. . 0.98 0.97 0.10 0.01 v = (v1 , …, vn ) v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... Euclidean distance Cosine Similarity Similarity Metric
  • 6. Retrieval Pipeline 6 v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... k feature vectors per image Bag of Visual Words N-Dimensional feature space M visual words (M clusters) INVERTED FILE word Image ID 1 1, 12, 2 1, 30, 102 3 10, 12 4 2,3 6 10 ... Large vocabularies (50k-1M) Very fast! Typically used with SIFT features
  • 7. CNN for retrieval 7 Classification Object Detection Segmentation
  • 8. Off-the-shelf CNN representations 8 Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In CVPRW FC layers as global feature representation
  • 9. Off-the-shelf CNN representations 9 sum/max pool conv features across filters Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. ICCV Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879. Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065.
  • 11. Off-the-shelf CNN representations 11 R-MAC: Regional Maximum Activation of Convolutions Tolias, G., Sicre, R., & Jégou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879.
  • 12. Off-the-shelf CNN representations 12 BoW, VLAD encoding of conv features Ng, J., Yang, F., & Davis, L. (2015). Exploiting local features from deep networks for image retrieval. In CVPRW Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 13. Off-the-shelf CNN representations 13 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 14. Off-the-shelf CNN representations 14 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers
  • 15. Off-the-shelf CNN representations 15 Paris Buildings 6k Oxford Buildings 5k TRECVID Instance Search 2013 (subset of 23k frames) [7] Kalantidis, Y., Mellina, C., & Osindero, S. (2015). Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv preprint arXiv:1512.04065. Mohedano, E., Salvador A., McGuinnes K, Marques F, O’Connor N, Giro-i-Nieto X (2016). Bags of Local Convolutional Features for Scalable Instance Search. In ICMR
  • 16. Off-the-shelf CNN representations CNN representations - l2 Normalization + PCA whitening + l2 Normalization - Cosine similarity - Convolutional features better than fully connected features - Convolutional features keep spatial information → Retrieval+object location - Convolutional layers allows custom input size. - If data labels available, fine tuning the network to the image domain improves CNN representations. 16
  • 17. Learning representations for retrieval Siamese Network: Network to learn a function that maps input patterns into a target space such that l2-norm in the target space approximates the semantic distance in the input space. Applied in: Dimensionality reduction[1] Face verification[2] Learning local image representations[3] 17 [1] Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR. [2] S. Chopra, R. Hadsell and Y. LeCun, Learning a similarity metric discriminatively, with application to face verification.(CVPR'05) [3] Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  • 18. Learning representations for retrieval Siamese Network: Network to learn a function that maps input patterns into a target space such that l2-norm in the target space approximates the semantic distance in the input space. 18 Image from: Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, and F. Moreno-Noguer. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  • 19. Learning representations for retrieval Siamese Network with Triplet Loss: Loss function minimizes distance between query and positive and maximizes distance between query and negative 19 Schroff, F; Kalenichenko, D and Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015 w w CNN CNN CNN a p n L2 embedding space Triplet Loss
  • 20. Learning representations for retrieval 20 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 - R-MAC representation - Learning descriptors for retrieval using three channels siamese loss: Ranking objective: - Learning where to pool within an image: predicting object locations - Local features (from predicted ROI) pooled into a more discriminative space (learned fc) - Building and cleaning a dataset to generate triplets
  • 22. Learning representations for retrieval 22 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 Dataset: Landmarks dataset: ● 214K images of 672 famous landmark site. ● Dataset processing based on a matching baseline: SIFT + Hessian-Affine keypoint detector. ● Important to select the “useful” triplets.
  • 23. Learning representations for retrieval 23 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 Comparison between training for Classification (C) of training for Rankings (R)
  • 24. Learning representations for retrieval 24 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016
  • 25. Summary 25 Pre-trained CNN are useful to generate image descriptors for retrieval Convolutional layers allow us to encode local information Knowing how to rank similarity is the primary task in retrieval Designing CNN architectures to learn how to rank