SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
The PASCAL Visual Object Classes
   (VOC) Dataset and Challenge


         Andrew Zisserman

        Visual Geometry Group
          University of Oxford
The PASCAL VOC Challenge

• Challenge in visual object
  recognition funded by
  PASCAL network of excellence

• Publicly available dataset of
  annotated images



• Main competitions in classification (is there an X in this image), detection
  (where are the X’s), and segmentation (which pixels belong to X)

• Competition has run each year since 2006.

• Organized by Mark Everingham, Luc Van Gool, Chris Williams, John Winn
  and Andrew Zisserman
Dataset Content
• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat,
  chair, cow, dining table, dog, horse, motorbike, person,
  potted plant, sheep, train, TV

• Real images downloaded from flickr, not filtered for “quality”




• Complex scenes, scale, pose, lighting, occlusion, truncation ...
Annotation
• Complete annotation of all objects

• Annotated in one session with written guidelines




Occluded                                             Difficult
Object is                                            Not scored
significantly                                        in evaluation
occluded within BB



   Truncated                                         Pose
   Object extends                                    Facing left
   beyond BB
Examples
 Aeroplane   Bicycle   Bird   Boat    Bottle




   Bus         Car      Cat   Chair   Cow
Examples
Dining Table   Dog     Horse   Motorbike    Person




Potted Plant   Sheep    Sofa     Train     TV/Monitor
Dataset Statistics – VOC 2010

                     Training             Testing
        Images         10,103              9,637

       Objects         23,374             22,992




• Minimum ~500 training objects per category
   – ~1700 cars, 1500 dogs, 7000 people


• Approximately equal distribution across training and test sets

• Data augmented each year
Design choices: Things we got right …
1. Standard method of assessment

•   Train/validation/test splits given
•   Standard evaluation protocol – AP per class
•   Software supplied
    – Includes baseline classifier/detector/segmenter
    – Runs from training to generating PR curve and AP on validation or
      test data out of the box
•   Has meant that results on VOC can be consistently
    compared in publications

•   “Best Practice” webpage on how the training/validation/test
    data should be used for the challenge
Design choices: Things we got right …
2. Evaluation of test data

Three possibilities

   1. Release test data and annotation (most liberal) and participants can
      assess performance
       – Cons: open to abuse

   2. Release test data, but test annotation withheld - participants submit
      results and organizers assess performance (use an evaluation
      server)

   3. No release of test data - participants have to submit software and
      organizers run this and assess performance
Design choices: Things we got right …
3. Augmentation of dataset each year

•   VOC2010 around 40% increase in size over VOC2009
                              Training                        Testing

            Images        10,103       (7,054)           9,637      (6,650)

           Objects        23,374      (17,218)         22,992       (16,829)

                           VOC2009 counts shown in brackets



•   Has prevented over fitting on data

•   2008/9 datasets retained as subset of 2010
    –   Assignments to training/test sets maintained
    –   So can measure progress from 2008 to 2010
Detection Challenge: Progress 2008-2010
             60

             50
Max AP (%)




             40
                                                                                                                                                                                                 2008
             30                                                                                                                                                                                  2009
                                                                                                                                                                                                 2010
             20

             10

             0




                                                                                                                                                                                     tvmonitor
                                                                                                                                                pottedplant
                                                      bottle




                                                                                                                           motorbike
                                                                                               diningtable
                  aeroplane




                                                                                                                   horse




                                                                                                                                                                      sofa
                                                                                                                                       person




                                                                                                                                                                             train
                                                                                                                                                              sheep
                              bicycle




                                                                                         cow
                                                                           cat
                                               boat


                                                               bus
                                        bird




                                                                                                             dog
                                                                     car


                                                                                 chair




• Results on 2008 data improve for best 2009 and 2010 methods
  for all categories, by over 100% for some categories
             – Caveat: Better methods or more training data?
Design choices: Things we got right …
4. Equivalence bands


•   Uses Friedman/Nemenyi ANOVA significance test
•   Makes explicit that many methods are just slight
    perturbations of each other
•   Need more research on this area for obtaining equivalence
    classes for ranking (rather than classification)
•   Plan to include banner/header results on evaluation server
    (to aid comparisons for reviewing etc)
Statistical Significance
• Friedman/Nemenyi analysis
     – Compare differences in mean rank of methods over classes using non-
       parametric version of ANOVA
     – Mean rank must differ by at least 5.4 to be considered significant
       (p=0.05)                       Mean rank
                                         45   40   35   30   25   20   15   10   5




                              (chance)                                               NUSPSL_KERNELREGFUSING
                 HIT_PROTOLEARN_2                                                    NUSPSL_MFDETSVM
                         NII_SVMSIFT                                                 NUSPSL_EXCLASSIFIER
             BUPT_LPBETA_MULTFEAT                                                    NEC_V1_HOGLBP_NONLIN_SVMDET
                 NTHU_LINSPARSE_2                                                    NLPR_VSTAR_CLS_DICTLEARN
               BUPT_SVM_MULTFEAT                                                     NEC_V1_HOGLBP_NONLIN_SVM
          LIG_MSVM_FUSE_CONCEPT                                                      UVA_BW_NEWCOLOURSIFT
                   WLU_SPM_EMDIST                                                    UVA_BW_NEWCOLOURSIFT_SRKDA
                 BUPT_SPM_SC_HOG                                                     CVC_PLUSDET
                 LIP6UPMC_RANKING                                                    SURREY_MK_KDA
          LIP6UPMC_KSVM_BASELINE                                                     CVC_PLUS
                   LIP6UPMC_MKL_L1                                                   BUT_FU_SVM_SIFT
                      UC3M_GENDISC                                                   XRCE_IFV
 NUDT_SVM_WHGO_SIFT_CENTRIST_LLM                                                     CVC_FLAT
                   RITSU_CBVR_WKF                                                    BONN_FGT_SEGM
                  TIT_SIFT_GMM_MKL                                                   LIRIS_MKL_TRAINVAL
     NUDT_SVM_LDP_SIFT_PMK_SPMK
The future …
Action Classification Taster Challenge – 2010 on
• Given the bounding box of a person, predict whether they are performing
  a given action




           Playing Instrument?                   Reading?
• Developed with Ivan Laptev
• Strong participation from the start (11 methods, 8 groups)
Ten Action Classes
 Phoning    Playing Instrument      Reading       Riding Bike   Riding Horse




 Running      Taking Photo       Using Computer   Walking        Jumping




• 2011: added jumping + `other’ class
Person Layout Taster Challenge – 2009 on
• Given the bounding box of a person, predict the positions of head, hands
  and feet.
• aim: to encourage research on more detailed image interpretation



                                                          hand          hand
                                     head


                        head

                                                                 head

                 hand                   hand
                                         hand




          foot


                                                   foot
                                            foot




• 2009: no participation
• 2010: relaxed success criteria, 2 participants
• Assessment method still not satisfactory?
Futures
2012
• Last official year of PASCAL2
• May introduce a new taster competition, e.g
   – Materials
   – Actions in video


• Open to suggestions

Más contenido relacionado

Más de zukun

ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 

Más de zukun (20)

ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Fcv dataset zisserman_2

  • 1. The PASCAL Visual Object Classes (VOC) Dataset and Challenge Andrew Zisserman Visual Geometry Group University of Oxford
  • 2. The PASCAL VOC Challenge • Challenge in visual object recognition funded by PASCAL network of excellence • Publicly available dataset of annotated images • Main competitions in classification (is there an X in this image), detection (where are the X’s), and segmentation (which pixels belong to X) • Competition has run each year since 2006. • Organized by Mark Everingham, Luc Van Gool, Chris Williams, John Winn and Andrew Zisserman
  • 3. Dataset Content • 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV • Real images downloaded from flickr, not filtered for “quality” • Complex scenes, scale, pose, lighting, occlusion, truncation ...
  • 4. Annotation • Complete annotation of all objects • Annotated in one session with written guidelines Occluded Difficult Object is Not scored significantly in evaluation occluded within BB Truncated Pose Object extends Facing left beyond BB
  • 5. Examples Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair Cow
  • 6. Examples Dining Table Dog Horse Motorbike Person Potted Plant Sheep Sofa Train TV/Monitor
  • 7. Dataset Statistics – VOC 2010 Training Testing Images 10,103 9,637 Objects 23,374 22,992 • Minimum ~500 training objects per category – ~1700 cars, 1500 dogs, 7000 people • Approximately equal distribution across training and test sets • Data augmented each year
  • 8. Design choices: Things we got right … 1. Standard method of assessment • Train/validation/test splits given • Standard evaluation protocol – AP per class • Software supplied – Includes baseline classifier/detector/segmenter – Runs from training to generating PR curve and AP on validation or test data out of the box • Has meant that results on VOC can be consistently compared in publications • “Best Practice” webpage on how the training/validation/test data should be used for the challenge
  • 9. Design choices: Things we got right … 2. Evaluation of test data Three possibilities 1. Release test data and annotation (most liberal) and participants can assess performance – Cons: open to abuse 2. Release test data, but test annotation withheld - participants submit results and organizers assess performance (use an evaluation server) 3. No release of test data - participants have to submit software and organizers run this and assess performance
  • 10. Design choices: Things we got right … 3. Augmentation of dataset each year • VOC2010 around 40% increase in size over VOC2009 Training Testing Images 10,103 (7,054) 9,637 (6,650) Objects 23,374 (17,218) 22,992 (16,829) VOC2009 counts shown in brackets • Has prevented over fitting on data • 2008/9 datasets retained as subset of 2010 – Assignments to training/test sets maintained – So can measure progress from 2008 to 2010
  • 11. Detection Challenge: Progress 2008-2010 60 50 Max AP (%) 40 2008 30 2009 2010 20 10 0 tvmonitor pottedplant bottle motorbike diningtable aeroplane horse sofa person train sheep bicycle cow cat boat bus bird dog car chair • Results on 2008 data improve for best 2009 and 2010 methods for all categories, by over 100% for some categories – Caveat: Better methods or more training data?
  • 12. Design choices: Things we got right … 4. Equivalence bands • Uses Friedman/Nemenyi ANOVA significance test • Makes explicit that many methods are just slight perturbations of each other • Need more research on this area for obtaining equivalence classes for ranking (rather than classification) • Plan to include banner/header results on evaluation server (to aid comparisons for reviewing etc)
  • 13. Statistical Significance • Friedman/Nemenyi analysis – Compare differences in mean rank of methods over classes using non- parametric version of ANOVA – Mean rank must differ by at least 5.4 to be considered significant (p=0.05) Mean rank 45 40 35 30 25 20 15 10 5 (chance) NUSPSL_KERNELREGFUSING HIT_PROTOLEARN_2 NUSPSL_MFDETSVM NII_SVMSIFT NUSPSL_EXCLASSIFIER BUPT_LPBETA_MULTFEAT NEC_V1_HOGLBP_NONLIN_SVMDET NTHU_LINSPARSE_2 NLPR_VSTAR_CLS_DICTLEARN BUPT_SVM_MULTFEAT NEC_V1_HOGLBP_NONLIN_SVM LIG_MSVM_FUSE_CONCEPT UVA_BW_NEWCOLOURSIFT WLU_SPM_EMDIST UVA_BW_NEWCOLOURSIFT_SRKDA BUPT_SPM_SC_HOG CVC_PLUSDET LIP6UPMC_RANKING SURREY_MK_KDA LIP6UPMC_KSVM_BASELINE CVC_PLUS LIP6UPMC_MKL_L1 BUT_FU_SVM_SIFT UC3M_GENDISC XRCE_IFV NUDT_SVM_WHGO_SIFT_CENTRIST_LLM CVC_FLAT RITSU_CBVR_WKF BONN_FGT_SEGM TIT_SIFT_GMM_MKL LIRIS_MKL_TRAINVAL NUDT_SVM_LDP_SIFT_PMK_SPMK
  • 14. The future … Action Classification Taster Challenge – 2010 on • Given the bounding box of a person, predict whether they are performing a given action Playing Instrument? Reading? • Developed with Ivan Laptev • Strong participation from the start (11 methods, 8 groups)
  • 15. Ten Action Classes Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Jumping • 2011: added jumping + `other’ class
  • 16. Person Layout Taster Challenge – 2009 on • Given the bounding box of a person, predict the positions of head, hands and feet. • aim: to encourage research on more detailed image interpretation hand hand head head head hand hand hand foot foot foot • 2009: no participation • 2010: relaxed success criteria, 2 participants • Assessment method still not satisfactory?
  • 17. Futures 2012 • Last official year of PASCAL2 • May introduce a new taster competition, e.g – Materials – Actions in video • Open to suggestions