Presentation 17 may morning keynote cees snoek

22-‐05-‐13

1

A
Rose'a
Stone
for
Image
Understanding

Cees
Snoek

University
of
Amsterdam

The
Netherlands

Euvision
Technologies

The
Netherlands

A
classical
problem

Understanding
was
lost
from
394CE
to
1822

22-‐05-‐13

2

RoseEa
Stone
discovery
in
1799

A
decree
by
King
Ptolemy
V

– Hieroglyphs

– DemoMc
script

– Ancient
Greek

Key
to
decipherment
in
1822

JF
Champollion

RECOGNIZING
WORDS

Understanding
images

Mazloom
et
al.,
ICMR
201

22-‐05-‐13

3

How
diﬃcult
is
the
problem?

Human
vision
consumes
50%
brain
power…

Van
Essen,
Science
1992

Visual
labeling
in
a
nutshell

Visualization by
Jasper Schulte

22-‐05-‐13

4

Visual
labeling
by
machine

Encode Reduce
Encode Reduce
Learn
Label
InternaMonal
compeMMon

NIST
TRECVID
Benchmark

Promote
progress
in
video
retrieval
research

Open
data,
tasks,
evaluaMon
and
innovaMon

hEp://trecvid.nist.gov/

22-‐05-‐13

5

Are
we
making
progress?

•
1000+
others

x MediaMill team
MediaMill team, TRECVID 2004-2012
Performance
doubled
in
just
3
years

Snoek & Smeulders, IEEE Computer 2010
So&ware
licensed
by
Euvision
Technologies

22-‐05-‐13

6

MediaMill
video
search
engines

Learning
from
social-‐tagged
images

Xirong
Li
et
al,
TMM
2009

Exploit
consistency
in
tagging
behavior
of

diﬀerent
users
for
visually
similar
images

22-‐05-‐13

7

Tag
relevance

ObjecMve
tags
are
idenMﬁed
and
reinforced

Based
on
3.5
Million
images
downloaded
from
Flickr

RECOGNIZING
SENTENCES

Understanding
images

Mazloom
et
al.,
ICMR
2013

22-‐05-‐13

8

Human
event
descripMon
on
web
video

We
analyze
13K
web
videos
and
their
descripMons

People
compe:ng
in
a
sand
sculp:ng
compe::on
and
children
playing
on
the
beach.

A
woman
folds
and
packages
a
scarf
she
has
made.

Habibian
et
al.,
ICMR
2013

Human
concept-‐vocabulary

Consists
of
5K
disMnct
and
mostly
rare
concepts

Includes
general
and
specialized
concepts

It
is
composed
of
various
concept
types

0 10 20 30 40 50
Non Visual
Attribute
Scene
Action
Object
Portions (in %)
Animal
People

22-‐05-‐13

9

Concepts
categorized
by
type

Object

People

Animal

Scene

AcDon

A'ribute

From
concepts
to
sentences

Input
Video

Event
Models

Concept
1

Concept
2

Concept
K

…

Concept
Vocabulary

Train

SVM

Crea9ng
the
concept
vocabulary
is
cri9cal

Sadanand,
CVPR12

Merler,
TMM12

Althoﬀ,
MM12

AEempMng
a
board
trick

22-‐05-‐13

10

Video
sentence
examples

ABemp9ng
a
board
trick

Working
on
a
woodworking
project

Changing
a
vehicle
9re

Are
more
concepts
beEer?

In
general,
more
is
beBer.
But,
a
vocabulary
of

500
concepts
exists
that
outperforms
all
others

Mazloom
et
al.,
ICMR
2013

22-‐05-‐13

11

Results
for
“Landing
a
ﬁsh
in”

A
vocabulary
of
100
concepts
is
the
best
performer

InformaMve
concepts
vs
All
concepts

The
23%
most
informa9ve
concepts
lead
to

a
65%
rela9ve
increase
in
event
detec9on
accuracy.

22-‐05-‐13

12

What
concepts
are
informaMve

Font size correlates with informativeness
Wedding
Ceremony
Landing
a
Fish

Visual
translaMon

Represent images and text in unified semantic space
C1

Cn

C2

The
18th-‐largest
country
in
the

world
in
terms
of
area
at

1 , 6 4 8 , 1 9 5
I r a n
h a s
a

populaMon
of
around
75

million.
It
is
a
country
of

parMcular
geo..

Concept
Detectors
(Textual)
Concept
Detectors
(Visual)

SemanMc
Space

22-‐05-‐13

13

Example:
query
by
a
video

Video
translaMon

Summary
of
most
likely
translaMons

Habibian
et
al.,
submi@ed

22-‐05-‐13

14

Conclusion

AI-‐progress
and
human
descripMons
on
the
web

act
as
‘RoseEa
Stone’
for
image
understanding.

AutomaMc
metadata
generaMon
jumps
from

words
to
sentences.

www.ceessnoek.info

Presentation 17 may morning keynote cees snoek

Recomendados

Recomendados

Más contenido relacionado

Similar a Presentation 17 may morning keynote cees snoek

Similar a Presentation 17 may morning keynote cees snoek (20)

Más de Nederlands Instituut voor Beeld en Geluid

Más de Nederlands Instituut voor Beeld en Geluid (11)

Presentation 17 may morning keynote cees snoek