Presentation on eyetracking-based annotation of image regions that I gave at Vienna on Oct 19, 2012. Download original PowerPoint file to enjoy all animations. For the papers, please refer to: http://www.ansgarscherp.net/publications
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Can you see it? Annotating Image Regions based on Users' Gaze Information
1. Can you see it?
Annotating Image Regions
based on Users' Gaze
Information
Ansgar Scherp, Tina Walber, Steffen Staab
Technical University of Vienna
October 2012
2. Idea
Benefiting of Eye Tracking
Information for Image Region Annotation
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 2 of 40
3. Eye-tracking Hardware
X60
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 3 of 40
4. Recorded Data
Saccade Fixation
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 4 of 40
5. Scenario: Image Tagging
tree
girl
car
store
people
sidewalk
Find specific objects in images
Analyzing the user‟s gaze path
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 5 of 40
6. Investigation in 3 Steps
3 Interactive Tagging Application
2 Gaze + Automatic Segments
1 Gaze + Manual Regions
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 6 of 40
7. 1st Step
1.Best fixation measure to find the correct
image region given a specific tag?
2. Can we differentiate two regions in the
same image?
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 7 of 40
8. 3 Steps Conducted by Users
Look at red blinking dot
Decide whether tag can be seen (“y” or “n”)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 8 of 40
9. Dataset
LabelM community images
Manually drawn polygons
Regions annotated with tags
182.657 images (August 2010)
http://labelme.csail.mit.edu/Release3.0/
High-quality segmentation and annotation
Used as ground truth
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 9 of 40
10. Dataset (continued)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 10 of 40
11. Experiment Images and Tags
Randomly selected images from LabelMe
Each image: at least two regions, 1000p x 700p
Created three sets of 51 images each
Assigned a tag to each image
Tags are either “true” or “false”
“true” object described by tag can be seen
“false” object cannot be seen on the image
Keep subjects concentrated during experiment
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 11 of 40
12. Subjects & Experiment System
30 subjects
21 male, 9 female (age: 22-45, Ø=28.7)
Undergrads (10), PhD (17), office clerks (3)
Experiment system
Simple web page in Internet Explorer
Standard notebook, resolution 1680x1050
Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 12 of 40
13. Conducting the Experiment
Each user looked at 51 tag-image-pairs
First tag-image-pair dismissed
94.6% correct answers
Roughly equal for true/false tags
~2.8s avg. until decision (true), ~3.8s avg. (false)
Users felt comfortable during the experiment
(avg.: 4.4, SD: 0.75)
Eyetracker did not much influence comfort
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 13 of 40
14. Pre-processing of Eye-tracking Data
Obtained 799 gaze paths from 30 users where
Image has “true” tag assigned
Users gave correct answers
Fixation extraction
Tobii Studio‟s velocity & distance thresholds
Fixation: focus on particular point on screen
One fixation inside or near the correct region
656 gaze paths fulfill this requirement (82%)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 14 of 40
15. Analysis of Gaze Fixations (1)
Applied 13 fixation measures on the 656 paths
(2 new, 7 standard Tobii , 4 literature)
Fixation measure: function on users‟ gaze paths
Calculated for each image region, over all users
viewing the same tag-image-pair
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 15 of 40
16. Considered Fixation Measures
Nr Name Favorite region r Origin
1 firstFixation No. of fixations before 1st on r Tobii
2 secondFixation No. of fixations before 2nd on r [13]
3 fixationsAfter No. of fixations after last on r [4]
4 fixationsBeforeDecision fixationsAfter, but before decision New
5 fixationsAfterDecision fixationsBeforeDecision and after New
6 fixationDuration Total duration of all fixations on r Tobii
7 firstFixationDuration Duration of first fixation on r Tobii
8 lastFixationDuration Duration of last fixation on r [11]
9 fixationCount Number of fixations on r Tobii
10 maxVisitDuration Max time first fixation until outside r Tobii
11 meanVisitDuration Mean time first fixation until outside r Tobii
12 visitCount No. of fixations until outside r Tobii
13 A.saccLength S. Staab – Identifying Objects in Imageslength, before fixation on rSlide[6]of 40
Scherp, T. Walber, Saccade 16
17. Analysis of Gaze Fixations (2)
For every image region (b) the fixation
measure is calculated over all gaze paths (c)
Results are summed up per region
Regions ordered according to fixation measure
If favorite region (d) and tag (a) match, result is
true positive (tp), otherwise false positive (fp)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 17 of 40
18. Precision per Fixation Measure
lastFixationDuration
P
Sum of tp and fp assignments
fixationsBeforeDecision meanVisitDuration
fixationDuration
Fixation measures
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 18 of 40
19. Adding Boundaries and Weights
Take eye-tracker inaccuracies into account
Extension of region boundaries by 13 pixels
Larger regions more likely to be fixated
Give weight to regions < 5% of image size
lastFixationDuration increases to P = 0.65
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 19 of 40
20. Weighted Measure Function
Measure function fm(r) on region r with m=1…13
Relative region size: sr
Threshold when weighting is applied: T
Maximum weighting value: M
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 20 of 40
21. Weighted Measure Function
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 21 of 40
23. Comparison with Baselines
P
Naïve baseline: largest region r is favorite
Salience baseline: Itti et al., TPAMI, 20(11), Nov 1998
Random baseline: randomly select favorite r
Gaze / Gaze* significantly better (all tests: p < 0.0015)
Least significant result X2=(1,N=124)=10.723
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 23 of 40
24. Effect of Gaze Path Aggregation
P
# of gaze
paths used
Aggregation of precision P for Gaze*
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 24 of 40
25. Research Questions
1.Best fixation measure to find the correct
image region given a specific tag?
lastFixationDuration with precision of 65%
2. Can we differentiate two regions in the
same image?
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 25 of 40
26. Experiment Images and Tags
Randomly selected images from LabelMe
Images contained at least two tagged regions
Organized in three sets of 51 images each
Assigned a tag to each image
Tags are either “true” or “false”
Two of the image sets share the same images
Thus, these images have two tags each
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 26 of 40
27. Differentiate Two Objects
Use first and second tag set to identify different
objects in the same images
16 images (of our 51) have two “true” tags
6 images had two correct regions identified
Proportion of 38%
Average precision for single object is 63%
Correct tag assignment for two images: 40%
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 27 of 40
29. Research Questions
1.Best fixation measure to find the correct
image region given a specific tag?
lastFixationDuration with precision of 65%
2. Can we differentiate two regions in the
same image?
Accuracy of 38%
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 29 of 40
30. Investigation in 3 Steps
3 Interactive Tagging Application
2 Gaze + Automatic Segments
1 Gaze + Manual Regions
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 30 of 40
31. So far …
car + +
For 63% of the images, we
can identify the correct region.
= T. Walber, A. Scherp, and S. Staab:
Identifying Objects in Images from
Analyzing the Users' Gaze Movements
car for Provided Tags, MMM, Klagenfurt,
Austria, 2012.
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 31 of 40
32. Now:
car + +
Automatic segmentation
LabelMe segments only
= used as ground truth
T. Walber, A. Scherp, and S. Staab: Can
car you see it? Two Novel Eye-Tracking-Based
Measures for Assigning Tags to Image
Regions, MMM, Huangshan, China, 2013.
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 32 of 40
33. 2nd Step: New Measure
Automatic segmentation measure
Berkeley Segmentation Data Set and
Benchmarks 500 (BSDS500)
Berkley„s bPb-owt-ucm algorithm
Segmentation on different hierarchy levels
Combination of contour detection and
segmentation
Oriented Watershed Transform and
Ultrametric Contour Map
P. Arbeléz, M. Maire, C. Fowlkes, and J. Malik. Contour detection and
hierarachical image segmentation. IEEE TPAMI, 33(5):898–916, May 2011.
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 33 of 40
35. Automatic Segments + Gaze
Conducted same computations as before
But on the automatically extracted segments
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 35 of 40
36. Results for different k’s: P/R/F
P P
Eye-tracking-based Golden sections
automatic segmentation rule baseline
measure
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 36 of 40
37. Baseline: Golden Sections Rule
a+b/a = a/b
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 37 of 40
38. Best Precision & Best F-measure
Eye-tracking-based automatic segmentation measure
significantly outperforms golden sections baseline
Also shown: eye-tracking-based heatmap measure
(no automatic segmentation)
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 38 of 40
39. Investigation in 3 Steps
3 Interactive Tagging Application
2 Gaze + Automatic Segments
1 Gaze + Manual Regions
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 39 of 40
40. 3rd Step: Interactive Application
car ; house ; girl
► tree_
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 40 of 40
41. APPENDIX
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 41 of 40
42. Influence of Red Dot
First 5 fixations, over all subjects and all images
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 42 of 40
43. Experiment Data Cleaning
Manually replaced images with
a) Tags that are incomprehensible, require
expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
the image and thus were “true” tags
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 43 of 40
44. How to Compute P/R?
Rfav is calculated from
Automatic segmentation measure
Baseline measure
A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 44 of 40