The 2012 ICSI/Berkeley Video Location Estimation System

The 2012 ICSI / Berkeley
Location Estimation System
Jaeyoung Choi,Venkatesan Ekambaram,
Gerald Friedland and Kannan Ramchandran

ICSI / UC Berkeley, USA
October 4th, 2012

Thursday, October 4, 12 1

Agenda

• Baseline Approach
• Drawbacks
• Graphical Model Framework
• Result


Baseline Approach

• Investigate ‘Spatial Variance’ of feature:
• spatial variance is small : feature is likely
location-indicative
• spatial variance is large : feature is likely
not indicative


Example
Tag Matches in Spatial Variance
Training set
pavement 2 5.739
ucberkeley 4 0.132
berkeley 14 68.138
greek 0 N/A
greektheatre 0 N/A
spitonastranger 0 N/A
live 91 6453.109
video 2967 6735.844


Problem:
Sparsity coming from biased dataset


The effect of sparsity
60"

50"
Percentage&[%]&

40"

30"
>6400"
20" 6400"
1600"
10" 400"
100"
0"
&

&

0&

&

e&
<1

00

00

0≤
00
e

<1

00
0≤

00
<1
≤e

<1

10
e
10

0≤

≤e
10

00
10

Distance&error&(e)&between&ground&truth&and&es<ma<on&[km]

*  Test"video"from"a"dense"area"has"higher"chance"of"being"
es<mated"with"lower"error"in"distance.""" 6


Geo-‐tagging:
an
es-ma-on
-‐theore-c
viewpoint
Observa(ons:

Images:

Tags: {berkeley,
sathergate,

campanile}
, {berkeley,
haas}
, ,
{campanile} {campanile,
haas}

k
{t1 } , k
{t2 } , k ,
{t3 } k
{t4 }
Es(mate:
Geo x1 , x2 , x3 , x4
loca-ons:

Interpre-ng
tradi-onal
approaches

Loca-ons
are
random
variables: {x1 , x2 , ....., xN }


Interpre-ng
tradi-onal
approaches

Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
p(xi |{ti }) p(xi |ti )
k
where k is
obtained
from
the
training
set
p(xi |ti )


Interpre-ng
tradi-onal
approaches

Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
k
where k is
obtained
from
the
training
set
p(xi |ti )

Example:
the
distribu-on
for
the
tag

“washington”
is
depicted
here


Interpre-ng
tradi-onal
approaches

Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
k
where k is
obtained
from
the
training
set
p(xi |ti )

Example:
the
distribu-on
for
the
tag

“washington”
is
depicted
here
Z
Loca-on
es-mate: k
xi p(xi |{ti })dxi


Drawbacks
Data
sparsity:

Not
all
tags
in
test
set
are
available
in
training
set.

Hence
es-mate
of

i
|tk
)can
be
bad

p(x

i
Sub-‐op(mality:

The
approaches
are
subop-mal
given
the
data.

What
we
ideally
want: k k k
p(x1 , x2 , ....., xN |{t1 }, {t2 }, ..., {tN })
Mean
of
the
above
distribu-on
gives
the
best
es-mate
of
the
loca-ons
i.e.
for
each
image
we
want k k k
p(xi |{t1 }, {t2 }, ...., {tN })
Tradi-onal
algorithms
only
give: k
p(xi |{ti })


Bayesian
graphical
framework
{berkeley,
sathergate,
{berkeley,
haas}
campanile}

Edge:
Correlated
loca-ons

(e.g.
common
tag)
Node:
Geoloca-on
of
the

image
k p(xj |{tk })
p(xi |{ti }) j

p(xi , xj |{tk }
i {tk })
j
{campanile} {campanile,
haas}
Edge
Poten(al:
Strength
of
an
edge,
(e.g.

posterior
distribu-on
of
loca-ons
given

common
tags)

Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have

correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on


Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have

correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag


Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have

correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag

k
p(xi |{ti }) is
obtained
from
the
training
set
as
before

p(xi , xj |{tk }
i {tk }) Modeled
as
an
indicator
func-on
j I(xi = xj )
If
the
common
tag
has
low
spa-al
variance
or
occurs
infrequently,

e.g.
if
the
common
tag
is
“haas”,
its
very
likely
the
loca-ons
are
the
same


Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have

correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag

k
p(xi |{ti }) is
obtained
from
the
training
set
as
before

p(xi , xj |{tk }
i {tk }) Modeled
as
an
indicator
func-on
j I(xi = xj )
If
the
common
tag
has
low
spa-al
variance
or
occurs
infrequently,

e.g.
if
the
common
tag
is
“haas”,
its
very
likely
the
loca-ons
are
the
same

Ques-on: How
to
es-mate
to
op-mal
marginal
distribu-on
?
k k k
p(xi |{t1 }, {t2 }, ...., {tN })

Belief
propaga-on
updates
Itera-ve
algorithm
to
approximate
k k k
p(xi |{t1 }, {t2 }, ...., {tN })
the
posterior
distribu-on

k 2
Gaussian
modeling p(xi |{ti }) N (µi , i)

2
At
itera-on
0
each
node
calculates (µi , i)

1 (t 1) P 1(t)
(t) 2 µi + k⇥N (i) ( (t) )2 µk
(t) ( i ) k
µi = (t) 2
At
itera-on
t
each
node
updates
( i )
its
loca-on
as
a
weighted
mean
of
its

previous
loca-on
and
that
of
its
1 1 X 1
neighbors (t) 2
= (t 1) 2
+ (t 1) 2
( i ) ( i ) k2i ( k )
The
weights
reﬂect
the
conﬁdence
in
that
measurements,

i.e.
higher
the
spa-al
variance
lower
is
the
weight

Belief
propaga-on

2
(µ2 , 2)

Posterior
mean
and
variance

2
(µ3 , 3) assuming
Gaussian
beliefs

2
(µ1 , 1)

Audio
visual
features
are
incorporated
in
modeling
the
edge
and
node
poten-als


Incorpora-ng
Audio-‐Visual
features
• GIST
features
are
extracted
for
the
images.
• MFCC
features
are
extracted
for
the
audio.
• These
are
now
incorporated
into
the
node
and
edge
poten-als
as

exponen-al
distribu-ons.
||xi xj ||
p(xi , xj |ai , aj ) ⇥ exp( )
||ai aj ||

ai are
the
audio
features
associated
with
image
i
The
intui-on
is
that
closer
the
audio
features
are,
higher
the

probability
that
the
geo-‐loca-ons
are
closer.
Similarly
this
can
be
included
in
the
node
poten-als
as
well
as
for

the
visual
features.


Result
• Percentage of test videos (out of 4182 videos)
correctly
es-mated
under

distances
in
the
top
row
from
the
groundtruth
loca-on.

– run1
-‐
baseline
approach
without
using
gaze_eer
– run2
-‐
graphical
model
based
approach
with
gaze_eer
– run3
-‐
baseline
approach
with
gaze_eer
– run4
-‐
k-‐NN
with
gist
visual
feature

• Graphical
model
approach
with
gaze_eer
outperforms
baseline
approaches
in

range
above
1km.

14


Conclusion
• graphical
model
framework
can
achieve

performance
improvement
over
baseline

approach
by
incorpora-ng
results
from
test
data

• various
issues
remain
to
be
explored
–
the
modeling
of
edge
poten-al

• text
:
hard
threshold
(current)
-‐-‐>
sod
• visual/audio
features

–
assump-on
of
condi-onal
independence
of
loca-on

distribu-on
given
mul-ple
tags

15


Thank You!
Questions?
http://mmle.icsi.berkeley.edu

Work together with:
Venkatesan Ekambaram, Kannan
Ramchandran, Giulia Fanti
Howard Lei, Adam Janin, and Gerald
Friedland 16


The 2012 ICSI/Berkeley Video Location Estimation System

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (9)

Similar a The 2012 ICSI/Berkeley Video Location Estimation System

Similar a The 2012 ICSI/Berkeley Video Location Estimation System (8)

Más de MediaEval2012

Más de MediaEval2012 (20)

Último

Último (20)

The 2012 ICSI/Berkeley Video Location Estimation System