This document discusses progress in image understanding and summarizes research from the MediaMill team. It describes how understanding of images has improved from recognizing words to generating sentences to describe images. Key steps included learning from social image tags, creating concept vocabularies, and developing techniques to translate between images and text in a unified semantic space. The research aims to automate metadata generation for images at a level comparable to human descriptions.
Presentation 16 may archive achievements awards tom de smet
Presentation 17 may morning keynote cees snoek
1. 22-‐05-‐13
1
A
Rose'a
Stone
for
Image
Understanding
Cees
Snoek
University
of
Amsterdam
The
Netherlands
Euvision
Technologies
The
Netherlands
A
classical
problem
Understanding
was
lost
from
394CE
to
1822
2. 22-‐05-‐13
2
RoseEa
Stone
discovery
in
1799
A
decree
by
King
Ptolemy
V
– Hieroglyphs
– DemoMc
script
– Ancient
Greek
Key
to
decipherment
in
1822
JF
Champollion
RECOGNIZING
WORDS
Understanding
images
Mazloom
et
al.,
ICMR
201
3. 22-‐05-‐13
3
How
difficult
is
the
problem?
Human
vision
consumes
50%
brain
power…
Van
Essen,
Science
1992
Visual
labeling
in
a
nutshell
Visualization by
Jasper Schulte
4. 22-‐05-‐13
4
Visual
labeling
by
machine
Encode Reduce
Encode Reduce
Learn
Label
InternaMonal
compeMMon
NIST
TRECVID
Benchmark
Promote
progress
in
video
retrieval
research
Open
data,
tasks,
evaluaMon
and
innovaMon
hEp://trecvid.nist.gov/
5. 22-‐05-‐13
5
Are
we
making
progress?
•
1000+
others
x MediaMill team
MediaMill team, TRECVID 2004-2012
Performance
doubled
in
just
3
years
Snoek & Smeulders, IEEE Computer 2010
So&ware
licensed
by
Euvision
Technologies
6. 22-‐05-‐13
6
MediaMill
video
search
engines
Learning
from
social-‐tagged
images
Xirong
Li
et
al,
TMM
2009
Exploit
consistency
in
tagging
behavior
of
different
users
for
visually
similar
images
7. 22-‐05-‐13
7
Tag
relevance
ObjecMve
tags
are
idenMfied
and
reinforced
Based
on
3.5
Million
images
downloaded
from
Flickr
RECOGNIZING
SENTENCES
Understanding
images
Mazloom
et
al.,
ICMR
2013
8. 22-‐05-‐13
8
Human
event
descripMon
on
web
video
We
analyze
13K
web
videos
and
their
descripMons
People
compe:ng
in
a
sand
sculp:ng
compe::on
and
children
playing
on
the
beach.
A
woman
folds
and
packages
a
scarf
she
has
made.
Habibian
et
al.,
ICMR
2013
Human
concept-‐vocabulary
Consists
of
5K
disMnct
and
mostly
rare
concepts
Includes
general
and
specialized
concepts
It
is
composed
of
various
concept
types
0 10 20 30 40 50
Non Visual
Attribute
Scene
Action
Object
Portions (in %)
Animal
People
9. 22-‐05-‐13
9
Concepts
categorized
by
type
Object
People
Animal
Scene
AcDon
A'ribute
From
concepts
to
sentences
Input
Video
Event
Models
Concept
1
Concept
2
Concept
K
…
Concept
Vocabulary
Train
SVM
Crea9ng
the
concept
vocabulary
is
cri9cal
Sadanand,
CVPR12
Merler,
TMM12
Althoff,
MM12
AEempMng
a
board
trick
10. 22-‐05-‐13
10
Video
sentence
examples
ABemp9ng
a
board
trick
Working
on
a
woodworking
project
Changing
a
vehicle
9re
Are
more
concepts
beEer?
In
general,
more
is
beBer.
But,
a
vocabulary
of
500
concepts
exists
that
outperforms
all
others
Mazloom
et
al.,
ICMR
2013
11. 22-‐05-‐13
11
Results
for
“Landing
a
fish
in”
A
vocabulary
of
100
concepts
is
the
best
performer
InformaMve
concepts
vs
All
concepts
The
23%
most
informa9ve
concepts
lead
to
a
65%
rela9ve
increase
in
event
detec9on
accuracy.
12. 22-‐05-‐13
12
What
concepts
are
informaMve
Font size correlates with informativeness
Wedding
Ceremony
Landing
a
Fish
Visual
translaMon
Represent images and text in unified semantic space
C1
Cn
C2
The
18th-‐largest
country
in
the
world
in
terms
of
area
at
1 , 6 4 8 , 1 9 5
I r a n
h a s
a
populaMon
of
around
75
million.
It
is
a
country
of
parMcular
geo..
Concept
Detectors
(Textual)
Concept
Detectors
(Visual)
SemanMc
Space
13. 22-‐05-‐13
13
Example:
query
by
a
video
Video
translaMon
Summary
of
most
likely
translaMons
Habibian
et
al.,
submi@ed
14. 22-‐05-‐13
14
Conclusion
AI-‐progress
and
human
descripMons
on
the
web
act
as
‘RoseEa
Stone’
for
image
understanding.
AutomaMc
metadata
generaMon
jumps
from
words
to
sentences.
www.ceessnoek.info