4. Current
situaRon
• A
huge
wealth
of
audiovisual
data
• Lots
of
data
in
archive
virtually
irretrievable
• Search/indexing
based
on
metadata
manually
entered
by
the
archivists
www.axes-‐project.eu
5. Some
causes
of
irretrievable
data
1. Old
material
from
pre-‐digital
era
2. Metadata
provided
at
document-‐level
3. Ideal
annotaRons
do
not
exist
– Different
uses
– Different
users
www.axes-‐project.eu
6. Developing
tools
providing
A ( u s e r / c e n t e e d ( a t p nteract
new
engaging
rways
p o
rio a c h
Our$aim$is$to$open$up$audiovisual$digital$libraries,$increasing$their$cul=
with
audiovisual
libraries…
Not
just
search:
an$early$stage.$Targeted$users$are$media$professionals,$educators,$stu=
dents,$amateur$researchers$and$home$users.
Browse
Explore
Experience
www.axes-‐project.eu
7. ...
for
a
mulRtude
of
end
users...
Media
professionals
(Y1-‐Y2)
Researchers
(Y2-‐Y3)
Home
users
(Y3-‐Y4)
!
!
www.axes-‐project.eu
8. d to define
on. Simiconstraints
fier. These
diction and
approaches
n addition,
d to define
eline to ex-
…
building
on
state-‐of-‐the-‐art
content
analysis
techniques…
Computer
vision
Speech
recogniRon
/
audio
analysis
Figure 1. An overview of our processing pipeline. (a) A face detector is applied to each video frame. (b) Face tracks are created
by associating face detections. (c) Facial points are localized. (d)
Locally SIFT appearance descriptors are extracted on the facial
features, and concatenated to form the final face descriptor.
Search
&
navigaRon
verviewFig- processing pipeline. (a) A face de1, see of our
d to present
to each video frame. (b) Face tracks1. We then extract SIFT descriptors at
nose, see Figure are created
ace the exdetections. (c) Facial points are localized. (d)
om
these nine locations at three different scales, which we conppearance
the facial
ack identi-descriptors are extracted on feature vector f ∈ IRD of dimension
catenate to form a
ncatenated to form the final face descriptor.
D = 3 × 9 × 128 = 3456. As the descriptors are computed
at facial feature points, it is robust to pose and expression
Weakly-‐supervised
ure 1. We thenchanges. SIFT descriptorsdescriptorethods
robust to
extract Using the SIFT at m makes it also
small errors in localization.
first use a
ations at three different scales, which we conen link the vector f ∈ IRD of dimension
rm a feature
3.2. Metric learning from face tracks
proach for
128 = 3456. As the descriptors are computed
ncontrolledit is robust to pose andface tracks we can extract face pairs from
re points,
Given a set of expression
ng the SIFT descriptorto learn it also robust to face descriptors in an unsuthem makes a metric over the
www.axes-‐project.eu
9. …
building
on
state-‐of-‐the-‐art
content
analysis
techniques…
People
Places
Events
Object
categories
www.axes-‐project.eu
10. …
with
a
user-‐centric
approach…
www.axes-‐project.eu
11. …
working
along
3
axes:
• Users
• Technology
• Content
AXES
OVERVIEW
Hilversum-‐
31/1/2013
www.axes-‐project.eu
13. Processing
pipeline
MulRmodal
query
Query
Language
LIMAS
Filter
Sigmoid
normalize
Category
Search
Instance
Search
Person
Clustering
Face
Search
Similarity
Search
MulRmedia
Event
DetecRon
Face
Similarity
Search
Pose
Retrieval
ASR/
Meta
…
www.axes-‐project.eu
14. Processing
pipeline
Data
Gathering
Metadata
Normalisation
Video
Normalisation
Shot
Extraction
Subtitle
Normalisation
Face
Detection
and
Tracking
Speech
Processing
Merge
Named
Entity
Extraction
Data
Exposer
VISOR
Indexer
Link
Management
Place
PresentaRon
Rtle
Indexer
LocaRon
-‐
Date
Face
Indexer
www.axes-‐project.eu
15. On-‐the-‐fly
Visual
Category
Search
Pre-‐computed
features
(NISV/
TRECVID)
~100
posiRve
(‘present’)
training
images
Visual
Model
~1000
negaRve
(‘not
present’)
training
images
Training
of
visual
model
from
mul$ple
images
allows
more
general
visual
searches
to
be
made
e.g.
all
motorbikes
and
not
just
Harleys
www.axes-‐project.eu
18. On-‐the-‐fly
Face
Search
ON-LINE
PROCESSING
Text Query
“Courteney Cox”
Negative Training
Images
Facial Features &
Descriptors
Google Image Search
“Courteney Cox”
OFF-LINE
PROCESSING
Facial Features &
Descriptors
Video Search Corpus
Fast Linear
Classifier
Ranking
Face Tracks
Facial Features &
Descriptors
Results
www.axes-‐project.eu
19. Face
Search
–
Feature
ComputaRon
Face
detecRon
and
facial
landmark
detecRon
Feature
region
descriptors
Faces
clustered
into
tracks
ConcatenaRon
Feature
Vector
www.axes-‐project.eu
20. On-‐the-‐fly
Face
Search
Facial landmark
detection
Aligned and
cropped face
Dense SIFT,
GMM, and FV
PresentaRon
Rtle
LocaRon
-‐
Date
Discriminative
dim. reduction
Compact face
representation
www.axes-‐project.eu
21. New
Face
Descriptor
Face
descriptor
for
face
recogniRon
• dense
sampling
• relevant
face
parts
learnt
automaRcally
• compact
and
discriminaRve
Current approach
(describe landmarks)
New
approach
(describe everything)
www.axes-‐project.eu
23. On-‐the-‐fly
Instance
Search
Text Query
“Earth”
ON-LINE
PROCESSING
OFF-LINE
PROCESSING
Precomputed
descriptors for the
entire corpus
Descriptors
Google Image Search
“Earth”
Inverted index for
fast search
Fast search
Reranking based on the
geometric configuration
of matched descriptors
Results
www.axes-‐project.eu
24. Instance
Search:
Feature
ComputaRon
Salient
image
regions
are
detected
and
a
descriptor
is
extracted
for
each
one
www.axes-‐project.eu
25. Instance
Search:
Geometric
verificaRon
SpaRal
arrangement
of
descriptors
is
used
to
verify
a
match
www.axes-‐project.eu
26. On-‐the-‐fly
Instance
Search
-‐
variant
Text Query
“Earth”
ON-LINE
PROCESSING
OFF-LINE
PROCESSING
Precomputed
descriptors for the
entire corpus
Descriptors
Unsupervised
pattern mining
Google Image Search
“Earth”
Inverted index for
fast search
Fast search
Reranking based on the
geometric configuration
of matched descriptors
Results
www.axes-‐project.eu
28. Instance
Search
–
Examples
‘Golden
Gate
Bridge’
–
TRECVid
2012
INS
‘Red
Bull’
–
TRECVid
2012
INS
www.axes-‐project.eu
29. Instance
Search
–
Examples
‘
US
Army
Corps
of
Engineers
logo’
–
TRECVid
2012
INS
‘Eiffel
tower’
–
TRECVid
2012
INS
www.axes-‐project.eu
30. Instance
Search
–
Examples
‘
US
Capitol’
–
TRECVid
2012
INS
‘Obama
logo’
–
TRECVid
2012
INS
www.axes-‐project.eu
31. Pretrained
classifiers
• On-‐the-‐fly
is
very
powerful,
but
takes
Rme
• Compromise
between
speed
and
accuracy
• For
common
queries:
bejer
use
models
that
have
been
trained
offline
• Cache
results
• Imagenet
categories,
celebriRes,
…
www.axes-‐project.eu
32. MulRmedia
Event
detecRon
Short
acRon,
i.e.
answer
phone,
hand
shake
answer
phone
hand
shake
www.axes-‐project.eu
33. MulRmedia
Event
detecRon
• AcRviRes/events,
i.e.
birthday
party,
parade
Birthday
party
Parade
TrecVid
MulR-‐media
event
detecRon
dataset
www.axes-‐project.eu
39. User
Requirements
PRO
vs.
RESEARCH
•
•
•
•
•
•
•
•
Browsing
and
Linking
Tagging
and
AnnotaRon
Search
trails
Metadata
export
CustomizaRon
and
personalizaRon
CollaboraRon
Usage
staRsRcs
Thesaurus
PresentaRon
Rtle
LocaRon
-‐
Date
www.axes-‐project.eu
45. TRECVID
INS
• Results
– About
average
in
terms
of
MAP
– Good
for
P@10,
P@30,
P@100
• Issues
– Duplicated
shots
from
SBD
WP7
Bonn,
Germany-‐
15
Oct
2013
www.axes-‐project.eu
53. TrecVid
MED
2012
-‐
results
• CombinaRon
of
MBH,SIFT,
audio
and
text
recogniRon
• AggregaRon
with
Fisher
vector
• First
in
adhoc
event
task,
second
in
known
event
task
Making sandwich – results
Rank 1
Rank 19
Rank 20
www.axes-‐project.eu
56. Lessons
learnt
• On-‐the-‐fly
model
learning
gives
great
flexibility.
High
potenRal
especially
when
combined
with
offline
learning
/
lifelong
learning.
• The
more
data,
the
bejer.
• It’s
important
to
educate
your
users.
• A
good
user
interface
is
important.
• MulRmodal
and
spaRo-‐temporal
processing:
easier
said
than
done.
www.axes-‐project.eu
57. AXES
HOME
What
do
home
users
want
?
– Google-‐like
search
?
– Have
fun
?
PresentaRon
Rtle
LocaRon
-‐
Date
www.axes-‐project.eu
58. Immediate
Pose
Retrieval
N.
Jammalamadaka,
A.
Zisserman,
M.
Eichner,
V.
Ferrari,
C.
V.
Jawahar
Video
Retrieval
by
Mimicking
Poses
ACM
InternaRonal
Conference
on
MulRmedia
Retrieval,
2012
www.axes-‐project.eu
60. AXES
HOME
What
do
home
users
want
?
– Google-‐like
search
?
– Have
fun
?
– Browse
?
PresentaRon
Rtle
LocaRon
-‐
Date
www.axes-‐project.eu
61. Linking
• What
are
interesRng
links
?
• SemanRcs,
semanRcs,
semanRcs
• Not
too
similar,
not
too
different
www.axes-‐project.eu
62. AXES
HOME
What
do
home
users
want
?
– Google-‐like
search
?
– Have
fun
?
– Browse
?
– Suggest
content
to
user
based
on
his/her
behavior
?
www.axes-‐project.eu