Can you see it? Annotating Image Regions based on Users' Gaze Information

Can you see it?
Annotating Image Regions
based on Users' Gaze
Information
Ansgar Scherp, Tina Walber, Steffen Staab

Technical University of Vienna
October 2012

Idea

Benefiting of Eye Tracking
Information for Image Region Annotation

A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 2 of 40

Eye-tracking Hardware

X60

Recorded Data

Saccade Fixation

Scenario: Image Tagging
tree

girl
car

store

people
sidewalk
 Find specific objects in images
 Analyzing the user‟s gaze path

Investigation in 3 Steps

3 Interactive Tagging Application

2 Gaze + Automatic Segments

1 Gaze + Manual Regions


1st Step

1.Best fixation measure to find the correct
image region given a specific tag?

2. Can we differentiate two regions in the
same image?


3 Steps Conducted by Users

 Look at red blinking dot
 Decide whether tag can be seen (“y” or “n”)

Dataset
 LabelM community images
 Manually drawn polygons
 Regions annotated with tags
 182.657 images (August 2010)
http://labelme.csail.mit.edu/Release3.0/

 High-quality segmentation and annotation
 Used as ground truth


Dataset (continued)


Experiment Images and Tags
 Randomly selected images from LabelMe
 Each image: at least two regions, 1000p x 700p

 Created three sets of 51 images each
 Assigned a tag to each image

 Tags are either “true” or “false”
 “true”  object described by tag can be seen
 “false”  object cannot be seen on the image
 Keep subjects concentrated during experiment

Subjects & Experiment System
 30 subjects
 21 male, 9 female (age: 22-45, Ø=28.7)
 Undergrads (10), PhD (17), office clerks (3)

 Experiment system
 Simple web page in Internet Explorer
 Standard notebook, resolution 1680x1050
 Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)


Conducting the Experiment
 Each user looked at 51 tag-image-pairs
 First tag-image-pair dismissed

 94.6% correct answers
 Roughly equal for true/false tags
 ~2.8s avg. until decision (true), ~3.8s avg. (false)

 Users felt comfortable during the experiment
(avg.: 4.4, SD: 0.75)
 Eyetracker did not much influence comfort

Pre-processing of Eye-tracking Data
 Obtained 799 gaze paths from 30 users where
 Image has “true” tag assigned
 Users gave correct answers

 Fixation extraction
 Tobii Studio‟s velocity & distance thresholds
 Fixation: focus on particular point on screen

 One fixation inside or near the correct region
 656 gaze paths fulfill this requirement (82%)

Analysis of Gaze Fixations (1)
 Applied 13 fixation measures on the 656 paths
(2 new, 7 standard Tobii , 4 literature)

 Fixation measure: function on users‟ gaze paths
 Calculated for each image region, over all users
viewing the same tag-image-pair


Considered Fixation Measures
Nr Name Favorite region r Origin
1 firstFixation No. of fixations before 1st on r Tobii
2 secondFixation No. of fixations before 2nd on r [13]
3 fixationsAfter No. of fixations after last on r [4]
4 fixationsBeforeDecision fixationsAfter, but before decision New
5 fixationsAfterDecision fixationsBeforeDecision and after New
6 fixationDuration Total duration of all fixations on r Tobii
7 firstFixationDuration Duration of first fixation on r Tobii
8 lastFixationDuration Duration of last fixation on r [11]
9 fixationCount Number of fixations on r Tobii
10 maxVisitDuration Max time first fixation until outside r Tobii
11 meanVisitDuration Mean time first fixation until outside r Tobii
12 visitCount No. of fixations until outside r Tobii
13 A.saccLength S. Staab – Identifying Objects in Imageslength, before fixation on rSlide[6]of 40
Scherp, T. Walber, Saccade 16

Analysis of Gaze Fixations (2)

 For every image region (b) the fixation
measure is calculated over all gaze paths (c)
 Results are summed up per region
 Regions ordered according to fixation measure
 If favorite region (d) and tag (a) match, result is
true positive (tp), otherwise false positive (fp)

Precision per Fixation Measure
lastFixationDuration
P
Sum of tp and fp assignments

fixationsBeforeDecision meanVisitDuration

fixationDuration

Fixation measures

Adding Boundaries and Weights
 Take eye-tracker inaccuracies into account
 Extension of region boundaries by 13 pixels

 Larger regions more likely to be fixated
 Give weight to regions < 5% of image size
 lastFixationDuration increases to P = 0.65

Weighted Measure Function

 Measure function fm(r) on region r with m=1…13
 Relative region size: sr
 Threshold when weighting is applied: T
 Maximum weighting value: M

Weighted Measure Function


Examples: Tag-Region-Assignments


Comparison with Baselines
P

 Naïve baseline: largest region r is favorite
 Salience baseline: Itti et al., TPAMI, 20(11), Nov 1998
 Random baseline: randomly select favorite r
 Gaze / Gaze* significantly better (all tests: p < 0.0015)
 Least significant result X2=(1,N=124)=10.723

Effect of Gaze Path Aggregation
P

# of gaze
paths used
 Aggregation of precision P for Gaze*

Research Questions

 lastFixationDuration with precision of 65%

same image?


Experiment Images and Tags
 Randomly selected images from LabelMe
 Images contained at least two tagged regions
 Organized in three sets of 51 images each

 Assigned a tag to each image

 Tags are either “true” or “false”

 Two of the image sets share the same images
 Thus, these images have two tags each


Differentiate Two Objects
 Use first and second tag set to identify different
objects in the same images
 16 images (of our 51) have two “true” tags
 6 images had two correct regions identified
 Proportion of 38%

 Average precision for single object is 63%
 Correct tag assignment for two images: 40%


Correctly Differentiated Objects


Research Questions

 lastFixationDuration with precision of 65%

same image?
 Accuracy of 38%


So far …

car + +

For 63% of the images, we
can identify the correct region.

= T. Walber, A. Scherp, and S. Staab:
Identifying Objects in Images from
Analyzing the Users' Gaze Movements
car for Provided Tags, MMM, Klagenfurt,
Austria, 2012.


Now:

car + +

 Automatic segmentation
 LabelMe segments only

= used as ground truth
T. Walber, A. Scherp, and S. Staab: Can
car you see it? Two Novel Eye-Tracking-Based
Measures for Assigning Tags to Image
Regions, MMM, Huangshan, China, 2013.

2nd Step: New Measure
 Automatic segmentation measure
 Berkeley Segmentation Data Set and
Benchmarks 500 (BSDS500)
 Berkley„s bPb-owt-ucm algorithm
 Segmentation on different hierarchy levels
 Combination of contour detection and
segmentation
 Oriented Watershed Transform and
Ultrametric Contour Map
P. Arbeléz, M. Maire, C. Fowlkes, and J. Malik. Contour detection and
hierarachical image segmentation. IEEE TPAMI, 33(5):898–916, May 2011.

Segmentation Example
 Segmentations with different k = 0 … 0.4


Automatic Segments + Gaze
 Conducted same computations as before
 But on the automatically extracted segments


Results for different k’s: P/R/F
P P

Eye-tracking-based Golden sections
automatic segmentation rule baseline
measure

Baseline: Golden Sections Rule

a+b/a = a/b

Best Precision & Best F-measure

 Eye-tracking-based automatic segmentation measure
significantly outperforms golden sections baseline
 Also shown: eye-tracking-based heatmap measure
(no automatic segmentation)

3rd Step: Interactive Application

car ; house ; girl
► tree_

APPENDIX


Influence of Red Dot

 First 5 fixations, over all subjects and all images

Experiment Data Cleaning
 Manually replaced images with
a) Tags that are incomprehensible, require
expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
the image and thus were “true” tags


How to Compute P/R?
 Rfav is calculated from
 Automatic segmentation measure
 Baseline measure


Can you see it? Annotating Image Regions based on Users' Gaze Information

Recomendados

Recomendados

Más contenido relacionado

Similar a Can you see it? Annotating Image Regions based on Users' Gaze Information

Similar a Can you see it? Annotating Image Regions based on Users' Gaze Information (12)

Más de Ansgar Scherp

Más de Ansgar Scherp (15)

Último

Último (20)

Can you see it? Annotating Image Regions based on Users' Gaze Information