SlideShare una empresa de Scribd logo
1 de 83
Descargar para leer sin conexión
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Open Challenges in
Sign Language Translation & Production
UPC Intelligent Data Science
and Artificial Intelligence)
VASC Seminar
September 8, 2021
Current & former students
2
Benet
Oriol
Jordi
Aguilar
Cayetana
López
Lucas
Ventura
Amanda
Duarte
Laia
Tarrés
Andrea
Iturralde
Maram A.
Mohamed
Álvaro
Budria
Sandra
Roca
Daniel
Moreno
Janna
Escur
Mireia
Hernández
Peter
Muschick
Pol
Pérez
Görkem
Camli
Jordi
López
Gerard
Gállego
Acknowledgements
3
Shruti
Palaskar
Deepti
Ghadiyaram
Kenneth
DeHaan
Florian
Metze
Francesc
Moreno
Jordi
Torres
Marta R.
Costa-jussà
Kevin
McGuinness
Outline
4
Motivation
A crash course on sign languages (SL)
State of the art
Challenges
Conclusion
Classic Motivation: Accessibility
5
“World Report on Hearing”. World Health Organization 2021.
Number of people and
percentage prevalence
according to grades of
hearing loss.
Classic Motivation: Accessibility
6
Shelly Shadha, “Launch of the World Report on Hearing”. World Health Organization 2021.
Classic Motivation: Accessibility to basic services
7
“World Report on Hearing”. World Health Organization 2021.
● Sign language interpretation improves
access to education and health services.
○ A survey conducted in 2009 by the World
Federation of the Deaf revealed that 68% of the
93 responding countries did not have access to
professional sign language.
○ Professional sign language interpreters are even
more scarce in developing countries
Classic Motivation: Accessibility
8
● New challenges for the deaf community
because of the COVID-19 pandemic.
https://whereistheinterpreter.com/
#whereistheinterpreter
“Due to the pandemic, more and more medical
professionals are treating COVID-19 patients
from behind a barrier, using masks that impede
lip-reading, and not allowing in-person
interpreters,” says the. National Association of
the Deaf.
Summer Epps, “COVID’s Forgotten Victims: The Deaf Community” . Webmd 2021
Classic Motivation: Accessibility
9
Amit Moryossef, “Google Translate for Sign Language”. 2021. [talk] [code]
Classic Motivation: Accessibility
10
Google Home Max Amazon Echo Show 10
Facebook Portal
Novel Motivation: Human-Computer Interaction
11
Samsung, How to use the Gesture Control on Smart TV? (2020)
Novel Motivation: Human-Computer Interaction
12
Novel Motivation: Human-Computer Interaction
13
Computer Human
Teaching
that scales
Interaction
Interaction
Human
Outline
14
Motivation
A crash course on sign languages (SL)
State of the art
Challenges
Conclusion
A crash course on Sign Languages (SL)
Cultural diversity of sign languages, similar to spoken languages
○ American (ASL), British (BSL), German (GSL), Chinese (CSL)… sign languages.
15
Irish Sign Language (ISL) Catalan Sign Language (LSC)
A crash course on Sign Languages (SL)
Sign languages are NOT a one-to-one mapping from spoken languages.
16
Look-Up
Table
Hi, I’m Amelia and I’m
going to talk to you
about how to remove
gum from hair.
Sign Language
(video)
Spoken Language
(transcription)
��🏼
A crash course on Sign Languages (SL)
There exist a textual transcription method named “glosses”.
17
HI, ME FS-AMELIA WILL
EXPLAIN HOW REMOVE
GUM FROM YOUR HAIR
Hi, I’m Amelia and I’m
going to talk to you about
how to remove gum from
hair.
Spoken Language
(transcription)
Sign Language
(transcription)
A crash course on Sign Languages (SL)
● Manual features:
○ Handshape
○ Palm
● Non-manual fetaures
○ Head (nod / shake / tilt)
○ Mouth
○ Eyebrows
○ Cheeks
○ Facial grammar (or expressions)
○ Body position
...orientation, movement, location.
18
Stokoe Jr, William C. "Sign language structure: An outline of the visual communication systems of the American deaf." Journal of
deaf studies and deaf education (2005).
Figure: Arizona State University
A crash course on Sign Languages (SL)
SLs use persistent spatial grounding (eg. by pointing & placing) !
19
Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996).
“Right along here…” ...immobile entity is
located here,
A crash course on Sign Languages (SL)
SLs use persistent spatial grounding (eg. by pointing & placing) !
20
Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996).
“Not far and to the
right of,
...tall, vertical entity at this place.
Outline
21
Motivation
A crash course on sign languages (SL)
State of the art
Challenges
Conclusion
Sign-to-Spoken Language Tasks
22
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
GIPHY/SIGNN WITH ROBERT
Isolated SL Recognition
Continuous SL Recognition
Finger-spelling
HI, ME FS-AMELIA WILL EXPLAIN
HOW REMOVE GUM FROM YOUR
HAIR
“I”
A, B, C, D...
Sign-to-Spoken Language Tasks
23
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
Sign-Spoken Language Tasks
SL Production
SL Translation
Sign Language
(video)
24
Spoken Language
(transcription)
Hi, I’m Amelia and
I’m going to talk
to you about how
to remove gum
from hair.
Neural Machine Translation
25
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." NeurIPS 2014.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase
representations using RNN encoder-decoder for statistical machine translation." EMNLP 2014.
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Dia duit, is mise
Amelia agus beidh
mé ag caint leat faoi
conas guma a bhaint
de ghruaig.
Automatic Speech Recognition (ASR)
26
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with recurrent neural networks." ICML 2014.
#LAS Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary
conversational speech recognition." ICASSP 2016.
Image Captioning
27
Encoder Decoder
Representation
A group of people
shopping at ann
outdoor market.
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015.
Neural Sign Language Translation
28
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Translation
29
Camgoz, Necati Cihan, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden.
"Neural sign language translation." CVPR 2018.
Neural Sign Language Translation
30
Camgoz, Necati Cihan, Oscar Koller, Simon Hadfield, and Richard Bowden. "Sign language
transformers: Joint end-to-end sign language recognition and translation." CVPR 2020.
Neural Sign Language Production
31
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Production
32
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Mixed SIGNals: Sign Language Production via
a Mixture of Motion Primitives." ICCV 2021.
Neural Sign Language Production
33
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Production
34
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end
sign language production." ECCV 2020.
Neural Sign Language Production
35
Stoll, Stephanie, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. "Text2Sign: Towards sign
language production using neural machine translation and generative adversarial networks." IJCV 2020.
Neural Sign Language Production
36
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Everybody sign now: Translating spoken
language to photo realistic sign language video." arXiv 2020.
Outline
37
Motivation
A crash course on sign languages (SL)
State of the art
Challenges
Conclusion
Challenges
38
Computer Vision
Speech
NLP
Training Data
Challenges in Computer Vision
39
Off-the-shelf pose detectors and generators struggle with hands.
40
��
Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time
hand shape and motion capture using multi-modal data." CVPR 2020.
Challenges in Computer Vision
41
��
Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of
part experts for whole-body 3d pose estimation in the wild." ECCV 2020.
Challenges in Computer Vision
42
��
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language
production." ECCV 2020.
Challenges in Computer Vision
43
��
Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from
conversational gesture body dynamics." CVPR 2021.
Challenges in Computer Vision
Challenges
44
Computer Vision
Speech
NLP
Training Data
Challenges in NLP
Sign Languages are:
45
🤔
(Very) low-resource
languages…
...in a (very) high
dimensional space (video).
��🏼
��🏼
Challenges in NLP
46
Figure: TensorFlow tutorial
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning
research 3, no. Feb (2003): 1137-1155.
🤔
What are “language
models” in sign
language ?
Challenges in NLP
47
How to transfer from
large pre-trained
(“foundation”) models ?
#GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models
are few-shot learners. NeurIPS 2020 (best paper award).
Source: [OpenAI API]
English: My name is Barbara.
ASL: ME NAME fs-B-A-R-B-A-R-A.
English: Is he a teacher?
ASL: HE TEACHER HE
English: Amir is tall.
ASL: fs-A-M-I-R, HE TALL HE
English: I’m not sad.
ASL: ME SAD ME 🤔
Challenges
48
Computer Vision
Speech
NLP
Training Data
Challenges in Speech Translation
49
Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech
translation." arXiv preprint arXiv:2107.08661 (2021).
Speech Video
Speech Speech
End-to-end End-to-end
🤔
Challenges
50
Computer Vision
Speech
NLP
Training Data
Challenges in Training Data
51
Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet]
Data(X)
Labels(y)
Challenges in Training Data
52
Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet]
X
Parallel corpus
53
Fully supervised learning requires a large dataset of pairs of sentences in the two
languages to translate.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning
phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
Continuous Sign Language Datasets
54
The How2Sign dataset
55
Multi-view RGB videos RGB-D videos
Body-face-hands keypoints
2D keypoints estimation from OpenPose [2]
How2 dataset [1]
Speech Signal
English Transcription
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
Instructional videos
Multi-view VGA and HD videos [3]
Multi-view recordings (only for a subset)
3D keypoints
estimation
Gloss Annotation
HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Continuous Sign Language Datasets
56
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
The How2Sign dataset: Recorded at CMU
57
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
The largest dataset in ASL
58
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
59
Built on top of How2
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Built on top of How2
Spoken Language
(speech)
SL Production
SL Translation
Sign Language
(video)
60
Spoken Language
(transcription)
Hi, I’m Amelia and I’m going to
talk to you about how to
remove gum from hair.
Synthesis
ASR
#How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for
multimodal language understanding." arXiv 2018.
Built on top of How2
How2 dataset [1]
Speech Signal
English Transcription
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
Instructional videos
[1] Sanabria, Ramon, et al. "How2: a large-scale dataset for multimodal language understanding." arXiv preprint arXiv:1811.00347 (2018).
English Speech
Speech track available for end-to-end English to ASL.
English Transcriptions
Automatically generated subtitles aligned at the
sentence level.
English to Brazilian Translations
Allows multilingual research.
61
62
Built on top of How2
Front+side RGB, Front Depth & Multi-view RGB
63
Green Studio
Multi-view RGB videos
RGB-D videos
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S.,
Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In:
ICCV, 2015.
Panoptic Studio
Multi-view recordings (only for a subset)
Multi-view VGA and HD videos
64
2D & 3D pose estimation
65
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
2D & 3D pose estimation
Multi-view RGB videos
Body-face-hands keypoints
2D keypoints estimation from OpenPose [1]
Multi-view recordings (only for a subset)
3D keypoints estimation [2]
[1] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" in TPAMI, 2019.
[2] Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015
Multi-view VGA and HD videos
66
67
Dataset statistics
Dataset hierarchy
68
Camera view
Recording
Video
Clip
Frame
Green studio: Frontal or side
Panoptic: Multi-view
ASL Gloss
English transcription
RGB, Depth
Openpose
Category
Signer
Studio
Green studio
Panoptic (multi-view)
Dataset statistics
69
Dataset statistics
Clips length Sentences length
70
Application: Human motion transfer
71
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
Application: Human motion transfer
72
2D Pose
estimation
[Openpose]
GAN-
generated
[Everybody
dance now]
Application: Human motion transfer
73
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign
language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
74
“Choose one category”
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Classification
accuracy
75
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Mean Opinion
Score
“How well could you understand the video?”
76
“Translate the ASL signs into written English.”
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Challenges in Training Data
77
Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet]
X
78
Challenges in Training Data
Yin, Kayo, and Jesse Read. "Better Sign Language Translation with
STMC-Transformer." COLING 2020. [talk]
Moryossef, Amit, Kayo Yin, Graham Neubig, and Yoav Goldberg. "Data
Augmentation for Sign Language Gloss Translation." arXiv 2021.
Generation of gloss pseudo-labels by training a transformer.
Moreno D, Duarte A, Costa-jussà MR, Giró-i-Nieto X.
English to ASL Translator for Speech2Signs. UPC 2018.
79
Challenges in Training Data
Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional
networks." ICASSP 2021.
Sign segmentation in continuous sign language videos.
80
Challenges in Training Data
Bull, Hannah, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, and Andrew Zisserman. "Aligning Subtitles in Sign
Language Videos." ICCV 2021.
Temporal alignment of automatic ASR subtitles with on-screen sign language video
Outline
81
Motivation
A crash course on sign languages (SL)
State of the art
Challenges
Conclusion
82
Conclusion: Speech2Signs (and Signs2Speech)
End-to-end translation & production
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
HI, ME FS-AMELIA WILL
EXPLAIN HOW REMOVE
GUM FROM YOUR HAIR
Speech Language Gloss [1] Sign transcription [2] Video
3D Poses 2D Poses Segments [3]
Multiple vision, natural language & speech challenges for a societally impactful task.
[1] Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020.
[2] Hanke, Thomas. "HamNoSys-representing sign language data in language resources and language processing contexts." In LREC, vol. 4, pp. 1-6. 2004.
[3] Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional networks." ICASSP 2021.
Supported by
Facebook AI
Interested in work with us on SL ?
● @DocXavi
● xavier.giro@upc.edu
● Full list of publications & tech reports.
{Thank You}
Thank you
These slides &
talk
https://how2sign.github.io/

Más contenido relacionado

Similar a Open challenges in sign language translation and production

Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technologynixytl
 
silentsoundtechnology-150326091317-conversion-gate01 (4).pptx
silentsoundtechnology-150326091317-conversion-gate01 (4).pptxsilentsoundtechnology-150326091317-conversion-gate01 (4).pptx
silentsoundtechnology-150326091317-conversion-gate01 (4).pptxRohithTopula
 
Liceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingLiceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingadelindare
 
Liceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingLiceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingadelindare
 
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)Universitat Politècnica de Catalunya
 

Similar a Open challenges in sign language translation and production (14)

Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Once Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for MultimediaOnce Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for Multimedia
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technology
 
Sicsds
SicsdsSicsds
Sicsds
 
silentsoundtechnology-150326091317-conversion-gate01 (4).pptx
silentsoundtechnology-150326091317-conversion-gate01 (4).pptxsilentsoundtechnology-150326091317-conversion-gate01 (4).pptx
silentsoundtechnology-150326091317-conversion-gate01 (4).pptx
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)
Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)
Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)
 
Liceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingLiceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meeting
 
Liceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meetingLiceo'ulivi' for trikala meeting
Liceo'ulivi' for trikala meeting
 
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)
Deep Audio and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Language)
 

Más de Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 

Último

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Último (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Open challenges in sign language translation and production

  • 1. Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya Open Challenges in Sign Language Translation & Production UPC Intelligent Data Science and Artificial Intelligence) VASC Seminar September 8, 2021
  • 2. Current & former students 2 Benet Oriol Jordi Aguilar Cayetana López Lucas Ventura Amanda Duarte Laia Tarrés Andrea Iturralde Maram A. Mohamed Álvaro Budria Sandra Roca Daniel Moreno Janna Escur Mireia Hernández Peter Muschick Pol Pérez Görkem Camli Jordi López Gerard Gállego
  • 4. Outline 4 Motivation A crash course on sign languages (SL) State of the art Challenges Conclusion
  • 5. Classic Motivation: Accessibility 5 “World Report on Hearing”. World Health Organization 2021. Number of people and percentage prevalence according to grades of hearing loss.
  • 6. Classic Motivation: Accessibility 6 Shelly Shadha, “Launch of the World Report on Hearing”. World Health Organization 2021.
  • 7. Classic Motivation: Accessibility to basic services 7 “World Report on Hearing”. World Health Organization 2021. ● Sign language interpretation improves access to education and health services. ○ A survey conducted in 2009 by the World Federation of the Deaf revealed that 68% of the 93 responding countries did not have access to professional sign language. ○ Professional sign language interpreters are even more scarce in developing countries
  • 8. Classic Motivation: Accessibility 8 ● New challenges for the deaf community because of the COVID-19 pandemic. https://whereistheinterpreter.com/ #whereistheinterpreter “Due to the pandemic, more and more medical professionals are treating COVID-19 patients from behind a barrier, using masks that impede lip-reading, and not allowing in-person interpreters,” says the. National Association of the Deaf. Summer Epps, “COVID’s Forgotten Victims: The Deaf Community” . Webmd 2021
  • 9. Classic Motivation: Accessibility 9 Amit Moryossef, “Google Translate for Sign Language”. 2021. [talk] [code]
  • 10. Classic Motivation: Accessibility 10 Google Home Max Amazon Echo Show 10 Facebook Portal
  • 11. Novel Motivation: Human-Computer Interaction 11 Samsung, How to use the Gesture Control on Smart TV? (2020)
  • 13. Novel Motivation: Human-Computer Interaction 13 Computer Human Teaching that scales Interaction Interaction Human
  • 14. Outline 14 Motivation A crash course on sign languages (SL) State of the art Challenges Conclusion
  • 15. A crash course on Sign Languages (SL) Cultural diversity of sign languages, similar to spoken languages ○ American (ASL), British (BSL), German (GSL), Chinese (CSL)… sign languages. 15 Irish Sign Language (ISL) Catalan Sign Language (LSC)
  • 16. A crash course on Sign Languages (SL) Sign languages are NOT a one-to-one mapping from spoken languages. 16 Look-Up Table Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Sign Language (video) Spoken Language (transcription) ��🏼
  • 17. A crash course on Sign Languages (SL) There exist a textual transcription method named “glosses”. 17 HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Spoken Language (transcription) Sign Language (transcription)
  • 18. A crash course on Sign Languages (SL) ● Manual features: ○ Handshape ○ Palm ● Non-manual fetaures ○ Head (nod / shake / tilt) ○ Mouth ○ Eyebrows ○ Cheeks ○ Facial grammar (or expressions) ○ Body position ...orientation, movement, location. 18 Stokoe Jr, William C. "Sign language structure: An outline of the visual communication systems of the American deaf." Journal of deaf studies and deaf education (2005). Figure: Arizona State University
  • 19. A crash course on Sign Languages (SL) SLs use persistent spatial grounding (eg. by pointing & placing) ! 19 Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996). “Right along here…” ...immobile entity is located here,
  • 20. A crash course on Sign Languages (SL) SLs use persistent spatial grounding (eg. by pointing & placing) ! 20 Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996). “Not far and to the right of, ...tall, vertical entity at this place.
  • 21. Outline 21 Motivation A crash course on sign languages (SL) State of the art Challenges Conclusion
  • 22. Sign-to-Spoken Language Tasks 22 SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. GIPHY/SIGNN WITH ROBERT Isolated SL Recognition Continuous SL Recognition Finger-spelling HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR “I” A, B, C, D...
  • 23. Sign-to-Spoken Language Tasks 23 SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 24. Sign-Spoken Language Tasks SL Production SL Translation Sign Language (video) 24 Spoken Language (transcription) Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 25. Neural Machine Translation 25 Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." NeurIPS 2014. Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." EMNLP 2014. Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Dia duit, is mise Amelia agus beidh mé ag caint leat faoi conas guma a bhaint de ghruaig.
  • 26. Automatic Speech Recognition (ASR) 26 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with recurrent neural networks." ICML 2014. #LAS Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." ICASSP 2016.
  • 27. Image Captioning 27 Encoder Decoder Representation A group of people shopping at ann outdoor market. Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015. Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015.
  • 28. Neural Sign Language Translation 28 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 29. Neural Sign Language Translation 29 Camgoz, Necati Cihan, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. "Neural sign language translation." CVPR 2018.
  • 30. Neural Sign Language Translation 30 Camgoz, Necati Cihan, Oscar Koller, Simon Hadfield, and Richard Bowden. "Sign language transformers: Joint end-to-end sign language recognition and translation." CVPR 2020.
  • 31. Neural Sign Language Production 31 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 32. Neural Sign Language Production 32 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives." ICCV 2021.
  • 33. Neural Sign Language Production 33 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 34. Neural Sign Language Production 34 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." ECCV 2020.
  • 35. Neural Sign Language Production 35 Stoll, Stephanie, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. "Text2Sign: Towards sign language production using neural machine translation and generative adversarial networks." IJCV 2020.
  • 36. Neural Sign Language Production 36 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Everybody sign now: Translating spoken language to photo realistic sign language video." arXiv 2020.
  • 37. Outline 37 Motivation A crash course on sign languages (SL) State of the art Challenges Conclusion
  • 39. Challenges in Computer Vision 39 Off-the-shelf pose detectors and generators struggle with hands.
  • 40. 40 �� Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time hand shape and motion capture using multi-modal data." CVPR 2020. Challenges in Computer Vision
  • 41. 41 �� Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of part experts for whole-body 3d pose estimation in the wild." ECCV 2020. Challenges in Computer Vision
  • 42. 42 �� Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." ECCV 2020. Challenges in Computer Vision
  • 43. 43 �� Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from conversational gesture body dynamics." CVPR 2021. Challenges in Computer Vision
  • 45. Challenges in NLP Sign Languages are: 45 🤔 (Very) low-resource languages… ...in a (very) high dimensional space (video). ��🏼 ��🏼
  • 46. Challenges in NLP 46 Figure: TensorFlow tutorial Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning research 3, no. Feb (2003): 1137-1155. 🤔 What are “language models” in sign language ?
  • 47. Challenges in NLP 47 How to transfer from large pre-trained (“foundation”) models ? #GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models are few-shot learners. NeurIPS 2020 (best paper award). Source: [OpenAI API] English: My name is Barbara. ASL: ME NAME fs-B-A-R-B-A-R-A. English: Is he a teacher? ASL: HE TEACHER HE English: Amir is tall. ASL: fs-A-M-I-R, HE TALL HE English: I’m not sad. ASL: ME SAD ME 🤔
  • 49. Challenges in Speech Translation 49 Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech translation." arXiv preprint arXiv:2107.08661 (2021). Speech Video Speech Speech End-to-end End-to-end 🤔
  • 51. Challenges in Training Data 51 Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet] Data(X) Labels(y)
  • 52. Challenges in Training Data 52 Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet] X
  • 53. Parallel corpus 53 Fully supervised learning requires a large dataset of pairs of sentences in the two languages to translate. Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
  • 55. The How2Sign dataset 55 Multi-view RGB videos RGB-D videos Body-face-hands keypoints 2D keypoints estimation from OpenPose [2] How2 dataset [1] Speech Signal English Transcription Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Instructional videos Multi-view VGA and HD videos [3] Multi-view recordings (only for a subset) 3D keypoints estimation Gloss Annotation HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 56. Continuous Sign Language Datasets 56 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 57. The How2Sign dataset: Recorded at CMU 57 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 58. The largest dataset in ASL 58 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 59. 59 Built on top of How2 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 60. Built on top of How2 Spoken Language (speech) SL Production SL Translation Sign Language (video) 60 Spoken Language (transcription) Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Synthesis ASR #How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for multimodal language understanding." arXiv 2018.
  • 61. Built on top of How2 How2 dataset [1] Speech Signal English Transcription Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Instructional videos [1] Sanabria, Ramon, et al. "How2: a large-scale dataset for multimodal language understanding." arXiv preprint arXiv:1811.00347 (2018). English Speech Speech track available for end-to-end English to ASL. English Transcriptions Automatically generated subtitles aligned at the sentence level. English to Brazilian Translations Allows multilingual research. 61
  • 62. 62 Built on top of How2
  • 63. Front+side RGB, Front Depth & Multi-view RGB 63
  • 64. Green Studio Multi-view RGB videos RGB-D videos Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015. Panoptic Studio Multi-view recordings (only for a subset) Multi-view VGA and HD videos 64
  • 65. 2D & 3D pose estimation 65 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 66. 2D & 3D pose estimation Multi-view RGB videos Body-face-hands keypoints 2D keypoints estimation from OpenPose [1] Multi-view recordings (only for a subset) 3D keypoints estimation [2] [1] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" in TPAMI, 2019. [2] Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015 Multi-view VGA and HD videos 66
  • 68. Dataset hierarchy 68 Camera view Recording Video Clip Frame Green studio: Frontal or side Panoptic: Multi-view ASL Gloss English transcription RGB, Depth Openpose Category Signer Studio Green studio Panoptic (multi-view)
  • 70. Dataset statistics Clips length Sentences length 70
  • 71. Application: Human motion transfer 71 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
  • 72. Application: Human motion transfer 72 2D Pose estimation [Openpose] GAN- generated [Everybody dance now]
  • 73. Application: Human motion transfer 73 Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
  • 74. 74 “Choose one category” Can ASL signers understand our generated videos ? Skeleton GAN-generated Classification accuracy
  • 75. 75 Can ASL signers understand our generated videos ? Skeleton GAN-generated Mean Opinion Score “How well could you understand the video?”
  • 76. 76 “Translate the ASL signs into written English.” Can ASL signers understand our generated videos ? Skeleton GAN-generated
  • 77. Challenges in Training Data 77 Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet] X
  • 78. 78 Challenges in Training Data Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020. [talk] Moryossef, Amit, Kayo Yin, Graham Neubig, and Yoav Goldberg. "Data Augmentation for Sign Language Gloss Translation." arXiv 2021. Generation of gloss pseudo-labels by training a transformer. Moreno D, Duarte A, Costa-jussà MR, Giró-i-Nieto X. English to ASL Translator for Speech2Signs. UPC 2018.
  • 79. 79 Challenges in Training Data Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional networks." ICASSP 2021. Sign segmentation in continuous sign language videos.
  • 80. 80 Challenges in Training Data Bull, Hannah, Triantafyllos Afouras, Gül Varol, Samuel Albanie, Liliane Momeni, and Andrew Zisserman. "Aligning Subtitles in Sign Language Videos." ICCV 2021. Temporal alignment of automatic ASR subtitles with on-screen sign language video
  • 81. Outline 81 Motivation A crash course on sign languages (SL) State of the art Challenges Conclusion
  • 82. 82 Conclusion: Speech2Signs (and Signs2Speech) End-to-end translation & production Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Speech Language Gloss [1] Sign transcription [2] Video 3D Poses 2D Poses Segments [3] Multiple vision, natural language & speech challenges for a societally impactful task. [1] Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020. [2] Hanke, Thomas. "HamNoSys-representing sign language data in language resources and language processing contexts." In LREC, vol. 4, pp. 1-6. 2004. [3] Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional networks." ICASSP 2021.
  • 83. Supported by Facebook AI Interested in work with us on SL ? ● @DocXavi ● xavier.giro@upc.edu ● Full list of publications & tech reports. {Thank You} Thank you These slides & talk https://how2sign.github.io/