SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
22-­‐05-­‐13	
  
1	
  
A	
  Rose'a	
  Stone	
  for	
  Image	
  Understanding	
  
Cees	
  Snoek	
  
	
  
University	
  of	
  Amsterdam	
  
The	
  Netherlands	
  
Euvision	
  Technologies	
  
The	
  Netherlands	
  
A	
  classical	
  problem	
  
Understanding	
  was	
  lost	
  from	
  394CE	
  to	
  1822	
  
22-­‐05-­‐13	
  
2	
  
RoseEa	
  Stone	
  discovery	
  in	
  1799	
  
A	
  decree	
  by	
  King	
  Ptolemy	
  V	
  
– Hieroglyphs	
  
– DemoMc	
  script	
  
– Ancient	
  Greek	
  
Key	
  to	
  decipherment	
  in	
  1822	
  
JF	
  Champollion	
  
RECOGNIZING	
  WORDS	
  
Understanding	
  images	
  
Mazloom	
  et	
  al.,	
  ICMR	
  201
22-­‐05-­‐13	
  
3	
  
How	
  difficult	
  is	
  the	
  problem?	
  
Human	
  vision	
  consumes	
  50%	
  brain	
  power…	
  
Van	
  Essen,	
  Science	
  1992	
  
Visual	
  labeling	
  in	
  a	
  nutshell	
  
Visualization by
Jasper Schulte
22-­‐05-­‐13	
  
4	
  
Visual	
  labeling	
  by	
  machine	
  
Encode Reduce
Encode Reduce
Learn
Label
InternaMonal	
  compeMMon	
  
NIST	
  TRECVID	
  Benchmark	
  
Promote	
  progress	
  in	
  video	
  retrieval	
  research	
  
Open	
  data,	
  tasks,	
  evaluaMon	
  and	
  innovaMon	
  
hEp://trecvid.nist.gov/	
  
22-­‐05-­‐13	
  
5	
  
Are	
  we	
  making	
  progress?	
  
•	
  1000+	
  others	
  
x MediaMill team
MediaMill team, TRECVID 2004-2012
Performance	
  doubled	
  in	
  just	
  3	
  years	
  
Snoek & Smeulders, IEEE Computer 2010
So&ware	
  licensed	
  by	
  Euvision	
  Technologies	
  
22-­‐05-­‐13	
  
6	
  
MediaMill	
  video	
  search	
  engines	
  
Learning	
  from	
  social-­‐tagged	
  images	
  
Xirong	
  Li	
  et	
  al,	
  TMM	
  2009	
  
	
  Exploit	
  consistency	
  in	
  tagging	
  behavior	
  of	
  
different	
  users	
  for	
  visually	
  similar	
  images	
  
22-­‐05-­‐13	
  
7	
  
Tag	
  relevance	
  
ObjecMve	
  tags	
  are	
  idenMfied	
  and	
  reinforced	
  
Based	
  on	
  3.5	
  Million	
  images	
  downloaded	
  from	
  Flickr	
  
RECOGNIZING	
  SENTENCES	
  
Understanding	
  images	
  
Mazloom	
  et	
  al.,	
  ICMR	
  2013
22-­‐05-­‐13	
  
8	
  
Human	
  event	
  descripMon	
  on	
  web	
  video	
  
We	
  analyze	
  13K	
  web	
  videos	
  and	
  their	
  descripMons	
  
People	
  compe:ng	
  in	
  a	
  sand	
  sculp:ng	
  compe::on	
  and	
  children	
  playing	
  on	
  the	
  beach.	
  
A	
  woman	
  folds	
  and	
  packages	
  a	
  scarf	
  she	
  has	
  made.	
  
Habibian	
  et	
  al.,	
  ICMR	
  2013	
  
	
  
Human	
  concept-­‐vocabulary	
  
Consists	
  of	
  5K	
  disMnct	
  and	
  mostly	
  rare	
  concepts	
  
Includes	
  general	
  and	
  specialized	
  concepts	
  
It	
  is	
  composed	
  of	
  various	
  concept	
  types	
  
0 10 20 30 40 50
Non Visual
Attribute
Scene
Action
Object
Portions (in %)
Animal
People
22-­‐05-­‐13	
  
9	
  
Concepts	
  categorized	
  by	
  type	
  
Object	
  
People	
  
Animal	
  
Scene	
  
AcDon	
  
A'ribute	
  
From	
  concepts	
  to	
  sentences	
  
Input	
  Video	
  
Event	
  Models	
  
Concept	
  1	
  
Concept	
  2	
  
Concept	
  K	
  
…	
  
Concept	
  Vocabulary	
  
Train	
  
SVM	
  
Crea9ng	
  the	
  concept	
  vocabulary	
  is	
  cri9cal	
  
	
  
Sadanand,	
  CVPR12	
  
Merler,	
  TMM12	
  
Althoff,	
  MM12	
  
	
  
AEempMng	
  a	
  board	
  trick	
  
22-­‐05-­‐13	
  
10	
  
Video	
  sentence	
  examples	
  
ABemp9ng	
  a	
  board	
  trick	
  
Working	
  on	
  a	
  woodworking	
  project	
  
Changing	
  a	
  vehicle	
  9re	
  
Are	
  more	
  concepts	
  beEer?	
  
In	
  general,	
  more	
  is	
  beBer.	
  But,	
  a	
  vocabulary	
  of	
  	
  
500	
  concepts	
  exists	
  that	
  outperforms	
  all	
  others	
  	
  
Mazloom	
  et	
  al.,	
  ICMR	
  2013	
  
	
  
22-­‐05-­‐13	
  
11	
  
Results	
  for	
  “Landing	
  a	
  fish	
  in”	
  
A	
  vocabulary	
  of	
  100	
  concepts	
  is	
  the	
  best	
  performer	
  
InformaMve	
  concepts	
  vs	
  All	
  concepts	
  
The	
  23%	
  most	
  informa9ve	
  concepts	
  lead	
  to	
  	
  
a	
  65%	
  rela9ve	
  increase	
  in	
  event	
  detec9on	
  accuracy.	
  	
  
22-­‐05-­‐13	
  
12	
  
What	
  concepts	
  are	
  informaMve	
  
Font size correlates with informativeness
Wedding	
  Ceremony	
  Landing	
  a	
  Fish	
  
Visual	
  translaMon	
  
Represent images and text in unified semantic space
C1	
  
Cn	
  
C2	
  
The	
  18th-­‐largest	
  country	
  in	
  the	
  
world	
   in	
   terms	
   of	
   area	
   at	
  
1 , 6 4 8 , 1 9 5	
   I r a n	
   h a s	
   a	
  
populaMon	
   of	
   around	
   75	
  
million.	
   It	
   is	
   a	
   country	
   of	
  
parMcular	
  geo..	
  
Concept	
  Detectors	
  (Textual)	
   Concept	
  Detectors	
  (Visual)	
  
SemanMc	
  Space	
  
22-­‐05-­‐13	
  
13	
  
Example:	
  query	
  by	
  a	
  video	
  
Video	
  translaMon	
  
Summary	
  of	
  most	
  likely	
  translaMons	
  
Habibian	
  et	
  al.,	
  submi@ed	
  
22-­‐05-­‐13	
  
14	
  
Conclusion	
  
	
  
	
  AI-­‐progress	
  and	
  human	
  descripMons	
  on	
  the	
  web	
  
act	
  as	
  ‘RoseEa	
  Stone’	
  for	
  image	
  understanding.	
  
	
  
AutomaMc	
  metadata	
  generaMon	
  jumps	
  from	
  
words	
  to	
  sentences.	
  
	
  
www.ceessnoek.info	
  

Más contenido relacionado

Similar a Presentation 17 may morning keynote cees snoek

Multimedia Presentation on Obsolete & Emerging Technologies
Multimedia Presentation on Obsolete & Emerging TechnologiesMultimedia Presentation on Obsolete & Emerging Technologies
Multimedia Presentation on Obsolete & Emerging Technologies
Ashley Odom
 
Training Leadership Summit
Training Leadership SummitTraining Leadership Summit
Training Leadership Summit
tonyodriscoll
 
Image and Video Compression Techniques In Image Processing an Overview
Image and Video Compression Techniques In Image Processing an OverviewImage and Video Compression Techniques In Image Processing an Overview
Image and Video Compression Techniques In Image Processing an Overview
MangaiK4
 

Similar a Presentation 17 may morning keynote cees snoek (20)

Isis duke 041610
Isis duke 041610Isis duke 041610
Isis duke 041610
 
Cees Snoek (UvA) @ CMC Video Formats
Cees Snoek (UvA) @ CMC Video FormatsCees Snoek (UvA) @ CMC Video Formats
Cees Snoek (UvA) @ CMC Video Formats
 
Chief Learning Officer Forum
Chief Learning Officer ForumChief Learning Officer Forum
Chief Learning Officer Forum
 
Future of advance technology
Future of advance technology Future of advance technology
Future of advance technology
 
When textual and visual information join forces for multimedia retrieval
When textual and visual information join forces for multimedia retrievalWhen textual and visual information join forces for multimedia retrieval
When textual and visual information join forces for multimedia retrieval
 
FRIEND: A Cyber-Physical System for Traffic Flow Related Information aggrEgat...
FRIEND: A Cyber-Physical System for Traffic Flow Related Information aggrEgat...FRIEND: A Cyber-Physical System for Traffic Flow Related Information aggrEgat...
FRIEND: A Cyber-Physical System for Traffic Flow Related Information aggrEgat...
 
Portal Projects-V1 thru V3.pdf
Portal Projects-V1 thru V3.pdfPortal Projects-V1 thru V3.pdf
Portal Projects-V1 thru V3.pdf
 
Journey into the New Frontier II
Journey into the New Frontier IIJourney into the New Frontier II
Journey into the New Frontier II
 
Holography
Holography Holography
Holography
 
Multimedia Presentation on Obsolete & Emerging Technologies
Multimedia Presentation on Obsolete & Emerging TechnologiesMultimedia Presentation on Obsolete & Emerging Technologies
Multimedia Presentation on Obsolete & Emerging Technologies
 
Training Leadership Summit
Training Leadership SummitTraining Leadership Summit
Training Leadership Summit
 
Training Leadership Summit
Training Leadership SummitTraining Leadership Summit
Training Leadership Summit
 
Perception and Quality of Immersive Media
Perception and Quality of Immersive MediaPerception and Quality of Immersive Media
Perception and Quality of Immersive Media
 
FYP IBA - VR Education
FYP IBA - VR EducationFYP IBA - VR Education
FYP IBA - VR Education
 
Image and Video Compression Techniques
Image and Video Compression Techniques Image and Video Compression Techniques
Image and Video Compression Techniques
 
Image and Video Compression Techniques In Image Processing an Overview
Image and Video Compression Techniques In Image Processing an OverviewImage and Video Compression Techniques In Image Processing an Overview
Image and Video Compression Techniques In Image Processing an Overview
 
Introduction to Visual Analysis
Introduction to Visual AnalysisIntroduction to Visual Analysis
Introduction to Visual Analysis
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
Analysis Of Netflix
Analysis Of NetflixAnalysis Of Netflix
Analysis Of Netflix
 

Más de Nederlands Instituut voor Beeld en Geluid

Más de Nederlands Instituut voor Beeld en Geluid (11)

Presentation 17 may afternoon casestudy 1 yves raimond kopie
Presentation 17 may afternoon casestudy 1 yves raimond kopiePresentation 17 may afternoon casestudy 1 yves raimond kopie
Presentation 17 may afternoon casestudy 1 yves raimond kopie
 
Presentation 17 may morning case study 2 sarahhaye aziz
Presentation 17 may morning case study 2 sarahhaye azizPresentation 17 may morning case study 2 sarahhaye aziz
Presentation 17 may morning case study 2 sarahhaye aziz
 
Presentation 17 may morning casestudy 1 sam davies
Presentation 17 may morning casestudy 1 sam daviesPresentation 17 may morning casestudy 1 sam davies
Presentation 17 may morning casestudy 1 sam davies
 
Presentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unanderPresentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unander
 
Presentation 16 may morning casestudy 2 xavier jacques jourion
Presentation 16 may morning casestudy 2 xavier jacques jourionPresentation 16 may morning casestudy 2 xavier jacques jourion
Presentation 16 may morning casestudy 2 xavier jacques jourion
 
Presentation 16 may morning casestudy 1 maarten de rijke
Presentation 16 may morning casestudy 1 maarten de rijkePresentation 16 may morning casestudy 1 maarten de rijke
Presentation 16 may morning casestudy 1 maarten de rijke
 
Presentation 16 may morning keynote seth van hooland
Presentation 16 may morning keynote seth van hoolandPresentation 16 may morning keynote seth van hooland
Presentation 16 may morning keynote seth van hooland
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Presentation 16 may casestudy daniel steinmeier
Presentation 16 may casestudy daniel steinmeierPresentation 16 may casestudy daniel steinmeier
Presentation 16 may casestudy daniel steinmeier
 
Presentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unanderPresentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unander
 
Presentation 16 may archive achievements awards tom de smet
Presentation 16 may archive achievements awards tom de smetPresentation 16 may archive achievements awards tom de smet
Presentation 16 may archive achievements awards tom de smet
 

Presentation 17 may morning keynote cees snoek

  • 1. 22-­‐05-­‐13   1   A  Rose'a  Stone  for  Image  Understanding   Cees  Snoek     University  of  Amsterdam   The  Netherlands   Euvision  Technologies   The  Netherlands   A  classical  problem   Understanding  was  lost  from  394CE  to  1822  
  • 2. 22-­‐05-­‐13   2   RoseEa  Stone  discovery  in  1799   A  decree  by  King  Ptolemy  V   – Hieroglyphs   – DemoMc  script   – Ancient  Greek   Key  to  decipherment  in  1822   JF  Champollion   RECOGNIZING  WORDS   Understanding  images   Mazloom  et  al.,  ICMR  201
  • 3. 22-­‐05-­‐13   3   How  difficult  is  the  problem?   Human  vision  consumes  50%  brain  power…   Van  Essen,  Science  1992   Visual  labeling  in  a  nutshell   Visualization by Jasper Schulte
  • 4. 22-­‐05-­‐13   4   Visual  labeling  by  machine   Encode Reduce Encode Reduce Learn Label InternaMonal  compeMMon   NIST  TRECVID  Benchmark   Promote  progress  in  video  retrieval  research   Open  data,  tasks,  evaluaMon  and  innovaMon   hEp://trecvid.nist.gov/  
  • 5. 22-­‐05-­‐13   5   Are  we  making  progress?   •  1000+  others   x MediaMill team MediaMill team, TRECVID 2004-2012 Performance  doubled  in  just  3  years   Snoek & Smeulders, IEEE Computer 2010 So&ware  licensed  by  Euvision  Technologies  
  • 6. 22-­‐05-­‐13   6   MediaMill  video  search  engines   Learning  from  social-­‐tagged  images   Xirong  Li  et  al,  TMM  2009    Exploit  consistency  in  tagging  behavior  of   different  users  for  visually  similar  images  
  • 7. 22-­‐05-­‐13   7   Tag  relevance   ObjecMve  tags  are  idenMfied  and  reinforced   Based  on  3.5  Million  images  downloaded  from  Flickr   RECOGNIZING  SENTENCES   Understanding  images   Mazloom  et  al.,  ICMR  2013
  • 8. 22-­‐05-­‐13   8   Human  event  descripMon  on  web  video   We  analyze  13K  web  videos  and  their  descripMons   People  compe:ng  in  a  sand  sculp:ng  compe::on  and  children  playing  on  the  beach.   A  woman  folds  and  packages  a  scarf  she  has  made.   Habibian  et  al.,  ICMR  2013     Human  concept-­‐vocabulary   Consists  of  5K  disMnct  and  mostly  rare  concepts   Includes  general  and  specialized  concepts   It  is  composed  of  various  concept  types   0 10 20 30 40 50 Non Visual Attribute Scene Action Object Portions (in %) Animal People
  • 9. 22-­‐05-­‐13   9   Concepts  categorized  by  type   Object   People   Animal   Scene   AcDon   A'ribute   From  concepts  to  sentences   Input  Video   Event  Models   Concept  1   Concept  2   Concept  K   …   Concept  Vocabulary   Train   SVM   Crea9ng  the  concept  vocabulary  is  cri9cal     Sadanand,  CVPR12   Merler,  TMM12   Althoff,  MM12     AEempMng  a  board  trick  
  • 10. 22-­‐05-­‐13   10   Video  sentence  examples   ABemp9ng  a  board  trick   Working  on  a  woodworking  project   Changing  a  vehicle  9re   Are  more  concepts  beEer?   In  general,  more  is  beBer.  But,  a  vocabulary  of     500  concepts  exists  that  outperforms  all  others     Mazloom  et  al.,  ICMR  2013    
  • 11. 22-­‐05-­‐13   11   Results  for  “Landing  a  fish  in”   A  vocabulary  of  100  concepts  is  the  best  performer   InformaMve  concepts  vs  All  concepts   The  23%  most  informa9ve  concepts  lead  to     a  65%  rela9ve  increase  in  event  detec9on  accuracy.    
  • 12. 22-­‐05-­‐13   12   What  concepts  are  informaMve   Font size correlates with informativeness Wedding  Ceremony  Landing  a  Fish   Visual  translaMon   Represent images and text in unified semantic space C1   Cn   C2   The  18th-­‐largest  country  in  the   world   in   terms   of   area   at   1 , 6 4 8 , 1 9 5   I r a n   h a s   a   populaMon   of   around   75   million.   It   is   a   country   of   parMcular  geo..   Concept  Detectors  (Textual)   Concept  Detectors  (Visual)   SemanMc  Space  
  • 13. 22-­‐05-­‐13   13   Example:  query  by  a  video   Video  translaMon   Summary  of  most  likely  translaMons   Habibian  et  al.,  submi@ed  
  • 14. 22-­‐05-­‐13   14   Conclusion      AI-­‐progress  and  human  descripMons  on  the  web   act  as  ‘RoseEa  Stone’  for  image  understanding.     AutomaMc  metadata  generaMon  jumps  from   words  to  sentences.     www.ceessnoek.info