SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Language and Vision
Day 2 Lecture 11
#InsightDL2017
2
Acknowledgments
Santi Pascual
3
Previously in the RNN lecture...
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger
Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for
statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
Language IN
Language OUT
4
Motivation
5
Encoder-Decoder: Beyond text
6
Captioning: Show & Tell
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption
generator." CVPR 2015.
7
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating
image descriptions." CVPR 2015
8
Captioning: DeepImageSent
(Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for
generating image descriptions." CVPR 2015
only takes into account
image features in the first
hidden state
Multimodal Recurrent
Neural Network
9
Captioning: Show & Tell
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption
generator." CVPR 2015.
10
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
Captioning (+ Detection): DenseCap
11
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
12
Captioning (+ Detection): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
XAVI: “man has
short hair”, “man
with short hair”
AMAIA:”a woman
wearing a black
shirt”, “
BOTH: “two men
wearing black
glasses”
13
Captioning (+ Retrieval): DenseCap
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for
dense captioning." CVPR 2016
14
Captioning: Video
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor
Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
15
( Slides by Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical
Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016.
LSTM unit
(2nd layer)
Time
Image
t = 1 t = T
hidden state
at t = T
first chunk
of data
Captioning: Video
16
Visual Question Answering
[z1
, z2
, … zN
] [y1
, y2
, … yM
]
“Is economic growth decreasing ?”
“Yes”
Encode
Encode
Decode
17
Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and
Devi Parikh. "VQA: Visual question answering." CVPR 2015.
Visual Question Answering
18
Extract visual
features
Embedding
Predict answerMerge
Question
What object is flying?
Answer
Kite
Visual Question Answering
Slide credit: Issey Masuda
19
Visual Question Answering
Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter
prediction. CVPR 2016
Dynamic Parameter Prediction Network (DPPnet)
20
Visual Question Answering: Dynamic
(Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks
for Visual and Textual Question Answering." arXiv preprint arXiv:1603.01417 (2016).
21
Visual Question Answering: Dynamic
(Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks
for Visual and Textual Question Answering." ICML 2016.
Main idea: split image into local regions.
Consider each region equivalent to a
sentence.
Local Region Feature Extraction: CNN
(VGG-19):
(1) Rescale input to 448x448.
(2) Take output from last pooling layer →
D=512x14x14 → 196 512-d local region vectors.
Visual feature embedding: W matrix to project
image features to “q”-textual space.
22
Visual Question Answering: Grounded
(Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei."Visual7W: Grounded
Question Answering in Images." CVPR 2016.
23
Visual Dialog (Image Guessing Game)
Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra.
"Visual Dialog." CVPR 2017
24
Visual Dialog (Image Guessing Game)
Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra.
"Visual Dialog." CVPR 2017
25
Visual Reasoning
Johnson, Justin, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. "CLEVR:
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning." CVPR 2017
26
Conclusions
New Turing test? How to evaluate AI’s image
understanding?
Slide credit: Issey Masuda
27
Learn more
Julia Hockenmeirer
28
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi

Más contenido relacionado

La actualidad más candente

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya
 

La actualidad más candente (20)

Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
Deep Audio and Vision - Eva Mohedano - UPC Barcelona 2018
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
 
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
 

Similar a Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)

Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Universitat Politècnica de Catalunya
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
210610 SSIIi2021 Computer Vision x Trasnformer
210610 SSIIi2021 Computer Vision x Trasnformer210610 SSIIi2021 Computer Vision x Trasnformer
210610 SSIIi2021 Computer Vision x Trasnformerexwzds
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
A Survey on Cross-Modal Embedding
A Survey on Cross-Modal EmbeddingA Survey on Cross-Modal Embedding
A Survey on Cross-Modal EmbeddingYasuhide Miura
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Universitat Politècnica de Catalunya
 

Similar a Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017) (20)

Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
 
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
210610 SSIIi2021 Computer Vision x Trasnformer
210610 SSIIi2021 Computer Vision x Trasnformer210610 SSIIi2021 Computer Vision x Trasnformer
210610 SSIIi2021 Computer Vision x Trasnformer
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
A Survey on Cross-Modal Embedding
A Survey on Cross-Modal EmbeddingA Survey on Cross-Modal Embedding
A Survey on Cross-Modal Embedding
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
Wav2Pix: Speech-conditioned face generation using Generative Adversarial Netw...
 

Más de Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationUniversitat Politècnica de Catalunya
 

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 

Último

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)

  • 1. DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Language and Vision Day 2 Lecture 11 #InsightDL2017
  • 3. 3 Previously in the RNN lecture... Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014). Language IN Language OUT
  • 6. 6 Captioning: Show & Tell Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
  • 7. 7 Captioning: DeepImageSent (Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015
  • 8. 8 Captioning: DeepImageSent (Slides by Marc Bolaños): Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015 only takes into account image features in the first hidden state Multimodal Recurrent Neural Network
  • 9. 9 Captioning: Show & Tell Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
  • 10. 10 Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 Captioning (+ Detection): DenseCap
  • 11. 11 Captioning (+ Detection): DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016
  • 12. 12 Captioning (+ Detection): DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016 XAVI: “man has short hair”, “man with short hair” AMAIA:”a woman wearing a black shirt”, “ BOTH: “two men wearing black glasses”
  • 13. 13 Captioning (+ Retrieval): DenseCap Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. "Densecap: Fully convolutional localization networks for dense captioning." CVPR 2016
  • 14. 14 Captioning: Video Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrel. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR 2015. code
  • 15. 15 ( Slides by Marc Bolaños) Pingbo Pan, Zhongwen Xu, Yi Yang,Fei Wu,Yueting Zhuang Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning, CVPR 2016. LSTM unit (2nd layer) Time Image t = 1 t = T hidden state at t = T first chunk of data Captioning: Video
  • 16. 16 Visual Question Answering [z1 , z2 , … zN ] [y1 , y2 , … yM ] “Is economic growth decreasing ?” “Yes” Encode Encode Decode
  • 17. 17 Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. "VQA: Visual question answering." CVPR 2015. Visual Question Answering
  • 18. 18 Extract visual features Embedding Predict answerMerge Question What object is flying? Answer Kite Visual Question Answering Slide credit: Issey Masuda
  • 19. 19 Visual Question Answering Noh, H., Seo, P. H., & Han, B. Image question answering using convolutional neural network with dynamic parameter prediction. CVPR 2016 Dynamic Parameter Prediction Network (DPPnet)
  • 20. 20 Visual Question Answering: Dynamic (Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks for Visual and Textual Question Answering." arXiv preprint arXiv:1603.01417 (2016).
  • 21. 21 Visual Question Answering: Dynamic (Slides and Slidecast by Santi Pascual): Xiong, Caiming, Stephen Merity, and Richard Socher. "Dynamic Memory Networks for Visual and Textual Question Answering." ICML 2016. Main idea: split image into local regions. Consider each region equivalent to a sentence. Local Region Feature Extraction: CNN (VGG-19): (1) Rescale input to 448x448. (2) Take output from last pooling layer → D=512x14x14 → 196 512-d local region vectors. Visual feature embedding: W matrix to project image features to “q”-textual space.
  • 22. 22 Visual Question Answering: Grounded (Slides and Screencast by Issey Masuda): Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei."Visual7W: Grounded Question Answering in Images." CVPR 2016.
  • 23. 23 Visual Dialog (Image Guessing Game) Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual Dialog." CVPR 2017
  • 24. 24 Visual Dialog (Image Guessing Game) Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual Dialog." CVPR 2017
  • 25. 25 Visual Reasoning Johnson, Justin, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. "CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning." CVPR 2017
  • 26. 26 Conclusions New Turing test? How to evaluate AI’s image understanding? Slide credit: Issey Masuda
  • 28. 28 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi