SlideShare una empresa de Scribd logo
1 de 44
Can you see it?
Annotating Image Regions
based on Users' Gaze
Information
Ansgar Scherp, Tina Walber, Steffen Staab

Technical University of Vienna
October 2012
Idea




          Benefiting of Eye Tracking
   Information for Image Region Annotation

 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 2 of 40
Eye-tracking Hardware




                                                                  X60
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images         Slide 3 of 40
Recorded Data




               Saccade                                            Fixation
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images              Slide 4 of 40
Scenario: Image Tagging
                      tree

                                                                  girl
       car

                                                                                  store

                                                                         people
       sidewalk
     Find specific objects in images
     Analyzing the user‟s gaze path
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images                   Slide 5 of 40
Investigation in 3 Steps



                 3 Interactive Tagging Application

                 2 Gaze + Automatic Segments

                 1 Gaze + Manual Regions


 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 6 of 40
1st Step


1.Best fixation measure to find the correct
  image region given a specific tag?



2. Can we differentiate two regions in the
   same image?


  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 7 of 40
3 Steps Conducted by Users




 Look at red blinking dot
 Decide whether tag can be seen (“y” or “n”)
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 8 of 40
Dataset
 LabelM        community images
   Manually drawn polygons
   Regions annotated with tags
 182.657 images (August 2010)
http://labelme.csail.mit.edu/Release3.0/


 High-quality segmentation and annotation
 Used as ground truth

 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 9 of 40
Dataset (continued)




 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 10 of 40
Experiment Images and Tags
 Randomly selected images from LabelMe
 Each image: at least two regions, 1000p x 700p

 Created three sets of 51 images each
 Assigned a tag to each image

 Tags are either “true” or “false”
   “true”  object described by tag can be seen
   “false”  object cannot be seen on the image
 Keep subjects concentrated during experiment
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 11 of 40
Subjects & Experiment System
 30 subjects
   21 male, 9 female (age: 22-45, Ø=28.7)
   Undergrads (10), PhD (17), office clerks (3)


 Experiment system
    Simple web page in Internet Explorer
    Standard notebook, resolution 1680x1050
    Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)

  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 12 of 40
Conducting the Experiment
 Each user looked at 51 tag-image-pairs
 First tag-image-pair dismissed

 94.6% correct answers
 Roughly equal for true/false tags
 ~2.8s avg. until decision (true), ~3.8s avg. (false)

 Users felt comfortable during the experiment
  (avg.: 4.4, SD: 0.75)
   Eyetracker did not much influence comfort
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 13 of 40
Pre-processing of Eye-tracking Data
 Obtained 799 gaze paths from 30 users where
   Image has “true” tag assigned
   Users gave correct answers

 Fixation extraction
   Tobii Studio‟s velocity & distance thresholds
   Fixation: focus on particular point on screen

 One fixation inside or near the correct region
 656 gaze paths fulfill this requirement (82%)
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 14 of 40
Analysis of Gaze Fixations (1)
 Applied 13 fixation measures on the 656 paths
  (2 new, 7 standard Tobii , 4 literature)

 Fixation measure: function on users‟ gaze paths
 Calculated for each image region, over all users
  viewing the same tag-image-pair




  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 15 of 40
Considered Fixation Measures
Nr Name                              Favorite region r                                    Origin
1    firstFixation                   No. of fixations before 1st on r                     Tobii
2    secondFixation                  No. of fixations before 2nd on r                     [13]
3    fixationsAfter                  No. of fixations after last on r                     [4]
4    fixationsBeforeDecision fixationsAfter, but before decision                          New
5    fixationsAfterDecision          fixationsBeforeDecision and after                    New
6    fixationDuration                Total duration of all fixations on r                 Tobii
7    firstFixationDuration           Duration of first fixation on r                      Tobii
8    lastFixationDuration            Duration of last fixation on r                       [11]
9    fixationCount                   Number of fixations on r                             Tobii
10 maxVisitDuration                  Max time first fixation until outside r              Tobii
11 meanVisitDuration                 Mean time first fixation until outside r Tobii
12 visitCount                        No. of fixations until outside r                     Tobii
13 A.saccLength S. Staab – Identifying Objects in Imageslength, before fixation on rSlide[6]of 40
     Scherp, T. Walber,                Saccade                                            16
Analysis of Gaze Fixations (2)




 For every image region (b) the fixation
  measure is calculated over all gaze paths (c)
 Results are summed up per region
 Regions ordered according to fixation measure
 If favorite region (d) and tag (a) match, result is
  true positive (tp), otherwise false positive (fp)
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 17 of 40
Precision per Fixation Measure
                                                     lastFixationDuration
                                                                                                                             P
Sum of tp and fp assignments




            fixationsBeforeDecision                                                             meanVisitDuration

                                                                                      fixationDuration



                                                                       Fixation measures
                               A. Scherp, T. Walber, S. Staab – Identifying Objects in Images               Slide 18 of 40
Adding Boundaries and Weights
 Take eye-tracker inaccuracies into account
 Extension of region boundaries by 13 pixels




 Larger regions more likely to be fixated
 Give weight to regions < 5% of image size
 lastFixationDuration increases to P = 0.65
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 19 of 40
Weighted Measure Function




 Measure function fm(r) on region r with m=1…13
 Relative region size: sr
 Threshold when weighting is applied: T
 Maximum weighting value: M
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 20 of 40
Weighted Measure Function




 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 21 of 40
Examples: Tag-Region-Assignments




 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 22 of 40
Comparison with Baselines
P




 Naïve baseline: largest region r is favorite
 Salience baseline: Itti et al., TPAMI, 20(11), Nov 1998
 Random baseline: randomly select favorite r
 Gaze / Gaze* significantly better (all tests: p < 0.0015)
 Least significant result X2=(1,N=124)=10.723
    A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 23 of 40
Effect of Gaze Path Aggregation
 P




                                                                   # of gaze
                                                                   paths used
 Aggregation of precision P for Gaze*
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images     Slide 24 of 40
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
   lastFixationDuration with precision of 65%


2. Can we differentiate two regions in the
   same image?


  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 25 of 40
Experiment Images and Tags
 Randomly selected images from LabelMe
 Images contained at least two tagged regions
 Organized in three sets of 51 images each

 Assigned a tag to each image

 Tags are either “true” or “false”

 Two of the image sets share the same images
 Thus, these images have two tags each

  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 26 of 40
Differentiate Two Objects
 Use first and second tag set to identify different
  objects in the same images
 16 images (of our 51) have two “true” tags
 6 images had two correct regions identified
   Proportion of 38%

 Average precision for single object is 63%
  Correct tag assignment for two images: 40%


  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 27 of 40
Correctly Differentiated Objects




 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 28 of 40
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
   lastFixationDuration with precision of 65%


2. Can we differentiate two regions in the
   same image?
   Accuracy of 38%

  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 29 of 40
Investigation in 3 Steps



                 3 Interactive Tagging Application

                 2 Gaze + Automatic Segments

                 1 Gaze + Manual Regions


 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 30 of 40
So far …

 car           +                                                  +


                                                        For 63% of the images, we
                                                        can identify the correct region.

=                                                       T. Walber, A. Scherp, and S. Staab:
                                                        Identifying Objects in Images from
                                                        Analyzing the Users' Gaze Movements
                                         car           for Provided Tags, MMM, Klagenfurt,
                                                        Austria, 2012.

 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images                      Slide 31 of 40
Now:

 car           +                                                  +


                                                         Automatic segmentation
                                                         LabelMe segments only

=                                                         used as ground truth
                                                        T. Walber, A. Scherp, and S. Staab: Can
                                         car           you see it? Two Novel Eye-Tracking-Based
                                                        Measures for Assigning Tags to Image
                                                        Regions, MMM, Huangshan, China, 2013.
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images                       Slide 32 of 40
2nd Step: New Measure
 Automatic segmentation measure
 Berkeley Segmentation Data Set and
  Benchmarks 500 (BSDS500)
 Berkley„s bPb-owt-ucm algorithm
   Segmentation on different hierarchy levels
   Combination of contour detection and
   segmentation
   Oriented Watershed Transform and
    Ultrametric Contour Map
 P. Arbeléz, M. Maire, C. Fowlkes, and J. Malik. Contour detection and
 hierarachical image segmentation. IEEE TPAMI, 33(5):898–916, May 2011.
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 33 of 40
Segmentation Example
 Segmentations with different k = 0 … 0.4




  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 34 of 40
Automatic Segments + Gaze
 Conducted same computations as before
 But on the automatically extracted segments




  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 35 of 40
Results for different k’s: P/R/F
 P                                                                P




  Eye-tracking-based                                                  Golden sections
  automatic segmentation                                              rule baseline
  measure
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images                    Slide 36 of 40
Baseline: Golden Sections Rule




                                    a+b/a = a/b
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 37 of 40
Best Precision & Best F-measure




 Eye-tracking-based automatic segmentation measure
  significantly outperforms golden sections baseline
 Also shown: eye-tracking-based heatmap measure
  (no automatic segmentation)
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 38 of 40
Investigation in 3 Steps



                 3 Interactive Tagging Application

                 2 Gaze + Automatic Segments

                 1 Gaze + Manual Regions


 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 39 of 40
3rd Step: Interactive Application




 car ; house ; girl
► tree_
 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 40 of 40
APPENDIX




 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 41 of 40
Influence of Red Dot




 First 5 fixations, over all subjects and all images
  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 42 of 40
Experiment Data Cleaning
 Manually replaced images with
a) Tags that are incomprehensible, require
   expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
   drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
   the image and thus were “true” tags


  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 43 of 40
How to Compute P/R?
 Rfav is calculated from
    Automatic segmentation measure
    Baseline measure




  A. Scherp, T. Walber, S. Staab – Identifying Objects in Images   Slide 44 of 40

Más contenido relacionado

Similar a Can you see it? Annotating Image Regions based on Users' Gaze Information

TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- ITOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
Anish Acharya
 

Similar a Can you see it? Annotating Image Regions based on Users' Gaze Information (12)

Lec12 review-part-i
Lec12 review-part-iLec12 review-part-i
Lec12 review-part-i
 
Lec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-systemLec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-system
 
Land mine detection
Land mine detectionLand mine detection
Land mine detection
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matching
 
Adaptive Spectral Projection
Adaptive Spectral ProjectionAdaptive Spectral Projection
Adaptive Spectral Projection
 
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- ITOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
 
JPEG XR objective and subjective evaluations
JPEG XR objective and subjective evaluationsJPEG XR objective and subjective evaluations
JPEG XR objective and subjective evaluations
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Denoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodDenoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using Sobelmethod
 
cv1.ppt
cv1.pptcv1.ppt
cv1.ppt
 
Paper reading best of both world
Paper reading best of both worldPaper reading best of both world
Paper reading best of both world
 

Más de Ansgar Scherp

Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Ansgar Scherp
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
Ansgar Scherp
 

Más de Ansgar Scherp (15)

Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interest
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triples
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Can you see it? Annotating Image Regions based on Users' Gaze Information

  • 1. Can you see it? Annotating Image Regions based on Users' Gaze Information Ansgar Scherp, Tina Walber, Steffen Staab Technical University of Vienna October 2012
  • 2. Idea Benefiting of Eye Tracking Information for Image Region Annotation A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 2 of 40
  • 3. Eye-tracking Hardware X60 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 3 of 40
  • 4. Recorded Data Saccade Fixation A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 4 of 40
  • 5. Scenario: Image Tagging tree girl car store people sidewalk  Find specific objects in images  Analyzing the user‟s gaze path A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 5 of 40
  • 6. Investigation in 3 Steps 3 Interactive Tagging Application 2 Gaze + Automatic Segments 1 Gaze + Manual Regions A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 6 of 40
  • 7. 1st Step 1.Best fixation measure to find the correct image region given a specific tag? 2. Can we differentiate two regions in the same image? A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 7 of 40
  • 8. 3 Steps Conducted by Users  Look at red blinking dot  Decide whether tag can be seen (“y” or “n”) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 8 of 40
  • 9. Dataset  LabelM community images  Manually drawn polygons  Regions annotated with tags  182.657 images (August 2010) http://labelme.csail.mit.edu/Release3.0/  High-quality segmentation and annotation  Used as ground truth A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 9 of 40
  • 10. Dataset (continued) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 10 of 40
  • 11. Experiment Images and Tags  Randomly selected images from LabelMe  Each image: at least two regions, 1000p x 700p  Created three sets of 51 images each  Assigned a tag to each image  Tags are either “true” or “false”  “true”  object described by tag can be seen  “false”  object cannot be seen on the image  Keep subjects concentrated during experiment A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 11 of 40
  • 12. Subjects & Experiment System  30 subjects  21 male, 9 female (age: 22-45, Ø=28.7)  Undergrads (10), PhD (17), office clerks (3)  Experiment system  Simple web page in Internet Explorer  Standard notebook, resolution 1680x1050  Tobii X60 eye-tracker (60 Hz, 0.5° accuracy) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 12 of 40
  • 13. Conducting the Experiment  Each user looked at 51 tag-image-pairs  First tag-image-pair dismissed  94.6% correct answers  Roughly equal for true/false tags  ~2.8s avg. until decision (true), ~3.8s avg. (false)  Users felt comfortable during the experiment (avg.: 4.4, SD: 0.75)  Eyetracker did not much influence comfort A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 13 of 40
  • 14. Pre-processing of Eye-tracking Data  Obtained 799 gaze paths from 30 users where  Image has “true” tag assigned  Users gave correct answers  Fixation extraction  Tobii Studio‟s velocity & distance thresholds  Fixation: focus on particular point on screen  One fixation inside or near the correct region  656 gaze paths fulfill this requirement (82%) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 14 of 40
  • 15. Analysis of Gaze Fixations (1)  Applied 13 fixation measures on the 656 paths (2 new, 7 standard Tobii , 4 literature)  Fixation measure: function on users‟ gaze paths  Calculated for each image region, over all users viewing the same tag-image-pair A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 15 of 40
  • 16. Considered Fixation Measures Nr Name Favorite region r Origin 1 firstFixation No. of fixations before 1st on r Tobii 2 secondFixation No. of fixations before 2nd on r [13] 3 fixationsAfter No. of fixations after last on r [4] 4 fixationsBeforeDecision fixationsAfter, but before decision New 5 fixationsAfterDecision fixationsBeforeDecision and after New 6 fixationDuration Total duration of all fixations on r Tobii 7 firstFixationDuration Duration of first fixation on r Tobii 8 lastFixationDuration Duration of last fixation on r [11] 9 fixationCount Number of fixations on r Tobii 10 maxVisitDuration Max time first fixation until outside r Tobii 11 meanVisitDuration Mean time first fixation until outside r Tobii 12 visitCount No. of fixations until outside r Tobii 13 A.saccLength S. Staab – Identifying Objects in Imageslength, before fixation on rSlide[6]of 40 Scherp, T. Walber, Saccade 16
  • 17. Analysis of Gaze Fixations (2)  For every image region (b) the fixation measure is calculated over all gaze paths (c)  Results are summed up per region  Regions ordered according to fixation measure  If favorite region (d) and tag (a) match, result is true positive (tp), otherwise false positive (fp) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 17 of 40
  • 18. Precision per Fixation Measure lastFixationDuration P Sum of tp and fp assignments fixationsBeforeDecision meanVisitDuration fixationDuration Fixation measures A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 18 of 40
  • 19. Adding Boundaries and Weights  Take eye-tracker inaccuracies into account  Extension of region boundaries by 13 pixels  Larger regions more likely to be fixated  Give weight to regions < 5% of image size  lastFixationDuration increases to P = 0.65 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 19 of 40
  • 20. Weighted Measure Function  Measure function fm(r) on region r with m=1…13  Relative region size: sr  Threshold when weighting is applied: T  Maximum weighting value: M A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 20 of 40
  • 21. Weighted Measure Function A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 21 of 40
  • 22. Examples: Tag-Region-Assignments A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 22 of 40
  • 23. Comparison with Baselines P  Naïve baseline: largest region r is favorite  Salience baseline: Itti et al., TPAMI, 20(11), Nov 1998  Random baseline: randomly select favorite r  Gaze / Gaze* significantly better (all tests: p < 0.0015)  Least significant result X2=(1,N=124)=10.723 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 23 of 40
  • 24. Effect of Gaze Path Aggregation P # of gaze paths used  Aggregation of precision P for Gaze* A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 24 of 40
  • 25. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  lastFixationDuration with precision of 65% 2. Can we differentiate two regions in the same image? A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 25 of 40
  • 26. Experiment Images and Tags  Randomly selected images from LabelMe  Images contained at least two tagged regions  Organized in three sets of 51 images each  Assigned a tag to each image  Tags are either “true” or “false”  Two of the image sets share the same images  Thus, these images have two tags each A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 26 of 40
  • 27. Differentiate Two Objects  Use first and second tag set to identify different objects in the same images  16 images (of our 51) have two “true” tags  6 images had two correct regions identified  Proportion of 38%  Average precision for single object is 63%  Correct tag assignment for two images: 40% A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 27 of 40
  • 28. Correctly Differentiated Objects A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 28 of 40
  • 29. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  lastFixationDuration with precision of 65% 2. Can we differentiate two regions in the same image?  Accuracy of 38% A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 29 of 40
  • 30. Investigation in 3 Steps 3 Interactive Tagging Application 2 Gaze + Automatic Segments 1 Gaze + Manual Regions A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 30 of 40
  • 31. So far … car + + For 63% of the images, we can identify the correct region. = T. Walber, A. Scherp, and S. Staab: Identifying Objects in Images from Analyzing the Users' Gaze Movements car for Provided Tags, MMM, Klagenfurt, Austria, 2012. A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 31 of 40
  • 32. Now: car + +  Automatic segmentation  LabelMe segments only = used as ground truth T. Walber, A. Scherp, and S. Staab: Can car you see it? Two Novel Eye-Tracking-Based Measures for Assigning Tags to Image Regions, MMM, Huangshan, China, 2013. A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 32 of 40
  • 33. 2nd Step: New Measure  Automatic segmentation measure  Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)  Berkley„s bPb-owt-ucm algorithm  Segmentation on different hierarchy levels  Combination of contour detection and segmentation  Oriented Watershed Transform and Ultrametric Contour Map P. Arbeléz, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarachical image segmentation. IEEE TPAMI, 33(5):898–916, May 2011. A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 33 of 40
  • 34. Segmentation Example  Segmentations with different k = 0 … 0.4 A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 34 of 40
  • 35. Automatic Segments + Gaze  Conducted same computations as before  But on the automatically extracted segments A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 35 of 40
  • 36. Results for different k’s: P/R/F P P Eye-tracking-based Golden sections automatic segmentation rule baseline measure A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 36 of 40
  • 37. Baseline: Golden Sections Rule a+b/a = a/b A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 37 of 40
  • 38. Best Precision & Best F-measure  Eye-tracking-based automatic segmentation measure significantly outperforms golden sections baseline  Also shown: eye-tracking-based heatmap measure (no automatic segmentation) A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 38 of 40
  • 39. Investigation in 3 Steps 3 Interactive Tagging Application 2 Gaze + Automatic Segments 1 Gaze + Manual Regions A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 39 of 40
  • 40. 3rd Step: Interactive Application car ; house ; girl ► tree_ A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 40 of 40
  • 41. APPENDIX A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 41 of 40
  • 42. Influence of Red Dot  First 5 fixations, over all subjects and all images A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 42 of 40
  • 43. Experiment Data Cleaning  Manually replaced images with a) Tags that are incomprehensible, require expert-knowledge, or nonsense b) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle) c) Obstructed objects (bicycle behind a car) d) “False”-tag actually refers to a visible part of the image and thus were “true” tags A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 43 of 40
  • 44. How to Compute P/R?  Rfav is calculated from  Automatic segmentation measure  Baseline measure A. Scherp, T. Walber, S. Staab – Identifying Objects in Images Slide 44 of 40