https://imatge-upc.github.io/pathgan/
We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets.
PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks
1. bit.ly/PathGAN
@DocXavi
EPIC
Third International
Workshop on
Egocentric Perception,
Interaction and
Computing
Munich, Germany
9th September, 2018
PathGAN:
Visual Scanpath Prediction with
Generative Adversarial Networks
@DocXavi
Marc
Assens
Kevin
McGuinness
Noel E.
O’Connor
Xavier
Giro-i-Nieto
10. bit.ly/PathGAN
@DocXavi
Why don’t we see the changes?
We don’t really see the whole image
We only focus on small specific regions: the salient parts
Human beings reliably attend to the same regions of images
when shown
22. bit.ly/PathGAN
@DocXavi
Our brief history of saliency models
Eva
Mohedano
Panagiotis
Linardos
Cristian
Canton
Junting
Pan
Èric
Arazo
Marta
Coll
Elisa
Sayrol
Jordi
Torres
36. bit.ly/PathGAN
@DocXavi
36
Winners of IEEE ICME
2017 Challenge
Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “SaltiNet: Scan-Path
Prediction on 360 Degree Images Using Saliency Volumes.” ICCVW 2017.
40. bit.ly/PathGAN
@DocXavi
40
Create a model able to predict visual scanpaths
PathGAN solution:
Generate a sequence of fixation points with
Recurrent Neural Networks (RNNs).
Challenge:
44. bit.ly/PathGAN
@DocXavi
44
● Pick a real scanpath x from training set
● Show x to D and update weights to
output 1 (real)
Adversarial training in a nutshell
45. bit.ly/PathGAN
@DocXavi
45
● Pick a real scanpath x from training set
● Show x to D and update weights to predict “real” (1).
Adversarial training in a nutshell
46. bit.ly/PathGAN
@DocXavi
46
Adversarial training in a nutshell
● G generates a synthetic scanpath ẍ (adding noise & drop out)
● Show synthetic scanpath ẍ to D and update its weights to predict “fake” (0).
47. bit.ly/PathGAN
@DocXavi
47
Adversarial training in a nutshell
● Freeze D weights
● Keep showing the same synthetic scanpath ẍ to D
● Update G weights to make D predict “real” (1) (only G weights!)
● Unfreeze D Weights and repeat
59. bit.ly/PathGAN
@DocXavi
59
Next: What about video ?
Check our other poster presented by Eva Mohedano !
Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin, Xavier Giro-i-Nieto. “Temporal
Saliency Adaptation in Egocentric Videos”, Extended abstract at the ECCV Workshop on Egocentric
Perception, Interaction and Computing (EPIC), 2018.
EgoMon datasetSalGAN + ConvLSTM Saliency maps