Using games to improve computer vision solutions

Using
games
to
improve

computer
vision
solu1ons

Dr.
Oge
Marques

May
21,
2013

Take-‐home
message

Solu1ons
to
many
problems
in
computer

vision
can
be
improved
using
human

computa1on,
par1cularly
through
properly

designed
games
(with
a
purpose).

Oge Marques

Related
work

•  Luis
von
Ahn
(hKp://gwap.com)

Oge Marques

Related
work

•  ESP

Oge Marques
10 Axel Carlier, Oge Marques, Vincent Charvillat
6 Conclusions
This paper proposed a novel approach to solving a selected subset of computer vi-
sion problems using games and described Ask’nSeek, a novel, simple, fun, web-based
guessing game based on images, their most relevant regions, and the spatial relation-
ships among them. Two noteworthy aspects of the proposed game are: (i) it does in one
game what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels and
locating the objects associated with those labels); and (ii) it avoids explicitly asking the
user to map labels and regions thanks to our novel semi-supervised learning algorithm.
We also described how the information collected from very few game logs per image
was used to feed a machine learning algorithm, which in turn produces the outline of
the most relevant regions within the image and their labels.
Our game can also be extended and improved in several directions, among them:
different game modes, timer(s), addition of a social component (e.g., play against your
Facebook friends), extending the interface to allow touchscreen gestures for tablet-
based play, and incorporation of incentives to the game, e.g., badges or coins, which
should – among other things – encourage switching roles (master-seeker) periodically.
References
1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–67
2. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI Spring
Symposium: Knowledge Collection from Volunteer Contributors. (2005) 91–98
3. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.
(2006) 55–64
4. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive human
computation game for image annotation. SIGKDD Expl. 12 (2010) 21–24

Related
work

•  Peekaboom

Oge Marques
10 Axel Carlier, Oge Marques, Vincent Charvillat
6 Conclusions
This paper proposed a novel approach to solving a selected subset of computer vi-
sion problems using games and described Ask’nSeek, a novel, simple, fun, web-based
guessing game based on images, their most relevant regions, and the spatial relation-
ships among them. Two noteworthy aspects of the proposed game are: (i) it does in one
game what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels and
locating the objects associated with those labels); and (ii) it avoids explicitly asking the
user to map labels and regions thanks to our novel semi-supervised learning algorithm.
We also described how the information collected from very few game logs per image
was used to feed a machine learning algorithm, which in turn produces the outline of
the most relevant regions within the image and their labels.
Our game can also be extended and improved in several directions, among them:
different game modes, timer(s), addition of a social component (e.g., play against your
Facebook friends), extending the interface to allow touchscreen gestures for tablet-
based play, and incorporation of incentives to the game, e.g., badges or coins, which
should – among other things – encourage switching roles (master-seeker) periodically.
References
1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–67
2. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI Spring
Symposium: Knowledge Collection from Volunteer Contributors. (2005) 91–98
3. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.
(2006) 55–64
4. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive human
computation game for image annotation. SIGKDD Expl. 12 (2010) 21–24
5. Steggink, J., Snoek, C.G.M.: Adding semantics to image-region annotations with the name-

Our
work

•  Ask’nSeek

(with
Vincent
Charvillat
and
Axel
Carlier,
ECCV
2012)

– A
two-‐player
guessing
game
that
helps
solving
the

problems
of

object

detec/on
and

labeling

and

seman/c
scene

segmenta/on.

Oge Marques

Ask'nSeek

•  Contribu1ons

– It
solves
two
computer
vision
problems
–
object

detec1on
and
labeling
–
in
a
single
game

– It
learns
spa1al
rela1onships
within
the
image

from
game
logs.

Oge Marques

Ask'nSeek

•  The
basics

– Ask’nSeek
is
a
two-‐player,
web-‐based,
game
that

can
be
played
on
a
contemporary
browser

without
any
need
for
plug-‐ins.

– One
player,
the
master,
hides
a
rectangular
region

somewhere
within
a
randomly
chosen
image.

– The
second
player
(seeker)
tries
to
guess
the

loca1on
of
the
hidden
region
through
a
series
of

successive
guesses,
expressed
by
clicking
at
some

point
in
the
image.

Oge Marques

Ask'nSeek

•  Master
and
seeker

Oge Marques

Ask'nSeek
-‐
terminology

•  According
to
the
classiﬁca1on
in
[von
Ahn,

CACM
2008],
Ask’nSeek
is
an
Inversion-‐
Problem
game,
because
“given
an
input,

Player
1
(in
our
case,
called
master)
produces

an
output,
and
Player
2
(the
seeker)
guesses

the
input”.

– More
speciﬁcally,
the
input
in
ques1on
is
the

loca1on
of
the
hidden
region
within
an
image
and

the
outputs
produced
by
Player
1
are
what
we
call

indica/ons.

Oge Marques

Ask'nSeek
-‐
terminology

•  Ini*al
Setup:
Two
players
are
randomly
chosen
by
the

game
itself.

•  Rules:
The
master
produces
an
input
(by
hiding
a

rectangular
region
within
an
image).
Based
on
this

input,
the
master
produces
outputs
(spa1al
clues,
i.e.,

indica1ons)
that
are
sent
to
the
seeker.
The
outputs

from
the
master
should
help
the
seeker
produce
the

original
input,
i.e.,
locate
the
hidden
box.

•  Winning
Condi*on:
The
seeker
produces
the
input

that
was
originally
produced
by
the
master,
i.e.,

guesses
the
correct
loca1on
by
clicking
on
any
pixel

within
the
hidden
bounding
box.

Oge Marques

Ask'nSeek

•  Game
logs

– The
game
logs
contain
labels
(oaen
in
mul1ple

languages)
as
well
as
‘on’,
‘par1ally
on’
and
‘lea-‐
right-‐above-‐below’
rela1ons.

•  Examples
of
labels
include
foreground
objects
(e.g.,

dog,
bus)
as
well
as
other
seman1cally
meaningful

regions
within
the
image
(e.g.,
sky,
road).

Oge Marques

Ask'nSeek

•  Game
logs

– Very
few
(~
7)
games
per
image
are
needed!

Oge Marques
we considered only the game logs that use the spatial relations “abo
the left of” and “on the right of”. We then used that information t
g box limiting the region that respects all the constraints deﬁned by t
hin which the object must reside. Figure 3(a) plots in black all the p
de this bounding box for the ‘dog’ object.
(a) (b)
alysis of simulation logs with different number of simulated games: (a) 10,0

Ask'nSeek
"under
the
hood"

•  Super-‐pixel
segmenta1on
using
SLIC

Oge Marques

Ask'nSeek
"under
the
hood"

•  Learning
strategy:
GMM
+
EM

•  Crosses:
candidate
super-‐pixels

•  Red:
ON
points

•  Blue:
PARTIALLY
ON
points

•  Green:
LEFT-‐RIGHT-‐ABOVE-‐BELOW
bounding
box

Oge Marques

Ask'nSeek

•  Examples
of
results

Oge Marques

Ask'nSeek

•  Comparison
against
baseline
detector
(*)

Oge Marques
Ask’nSeek: a new game for object detection and labeling 9
(a) (b)
(c) (d)
Fig. 5. Representative representative results: (a) baseline dog detector from [9]; (b) result from
our approach for label ‘dog’; (c) baseline cat detector from [9]; (d) result from our approach for
label ‘cat’.
(1 a)eµl
(k)
and S
b
l = b bSl
(k 1)
+ (1 b) eSl
(k)
ac-
tually improve the likelihood:
a⇤
,b⇤
= argmax
a,b
P(X,R |µa
l ,S
b
l ,g(k)
) (11)
q(k) is updated with K new clusters locations
bµl
(k)
µa⇤
l and covariances bSl
(k)
S
b⇤
l , pro-
duced by K optimizations (equation 11 for l =
1,.,K) that make the likelihood monotonically in-
creasing.
Until P(X,R |q(k),g(k)) converge.
8. REFERENCES
[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
S. Susstrunk. Slic superpixels. Technical report, EPFL, 2010.
[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen.
Interactively co-segmentating topically related images with
intelligent scribble guidance. IJCV, 93(3):273–292, 2011.
[3] S. Branson, P. Perona, and S. Belongie. Strong supervision
from weak annotation: Interactive training of deformable
part models. In ICCV, Barcelona, 2011.
[4] S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder,
P. Perona, and S. Belongie. Visual recognition with humans
in the loop. In ECCV, Heraklion, 2010.
[5] A. Carlier, G. Ravindra, V. Charvillat, and W. T. Ooi.
Combining content-based analysis and crowdsourcing to
improve user interaction with zoomable video. In ACM
MM’11, pages 43–52, 2011.
[6] O. Chapelle, B. Schölkopf, and A. Zien, editors.
Semi-Supervised Learning. MIT Press, Cambridge, 2006.
[7] D. Comaniciu and P. Meer. Mean shift: A robust approach
toward feature space analysis. IEEE Trans. PAMI,
24(5):603–619, 2002.
[8] S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee,
M. Beenen, A. Leaver-Fay, D. Baker, Z. Popovic, and
F. Players. Predicting protein structures with a multiplayer
online game. Nature, 466(7307):756–760, Aug. 2010.
[9] P. Felzenszwalb, R. Girshick, and D. McAllester.
Discriminatively trained deformable part models, release 4.
http://people.cs.uchicago.edu/ pff/latent-release4/.
[10] P. Felzenszwalb, R. Girshick, D. McAllester, and
D. Ramanan. Object detection with discriminatively trained
part-based models. IEEE Trans. PAMI, 32:1627–1645, 2010.
Tagcaptcha: annotating images with captchas. In ACM
MM’10, pages 1557–1558, 2010.
[18] Y. Ni, J. Dong, J. Feng, and S. Yan. Purposive
hidden-object-game: embedding human computation in
popular game. In ACM MM’11, pages 1121–1124, 2011.
[19] A. J. Quinn and B. B. Bederson. Human computation: a
survey and taxonomy of a growing ﬁeld. In CHI’11, pages
1403–1412, 2011.
[20] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman.
Labelme: A database and web-based tool for image
annotation. IJCV, 77(1-3):157–173, 2008.
[21] N. Savage. Gaining wisdom from crowds. Commun. ACM,
55(3):13–15, Mar. 2012.
[22] S. Seung. Eyewire. http://eyewire.org/, 2012.
[23] J. Steggink and C. G. M. Snoek. Adding semantics to
image-region annotations with the name-it-game.
Multimedia Systems, 17(5):367–378, October 2011.
[24] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable
library of computer vision algorithms.
http://www.vlfeat.org/, 2008.
[25] S. Vijayanarasimhan and K. Grauman. Cost-sensitive active
visual category learning. IJCV, 91(1):24–44, 2011.
[26] L. von Ahn and L. Dabbish. Esp: Labeling images with a
computer game. In AAAI Spring Symposium: Knowledge
Collection from Volunteer Contributors, pages 91–98, 2005.
[27] L. von Ahn and L. Dabbish. Designing games with a
purpose. Commun. ACM, 51(8):58–67, 2008.
[28] L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game for
locating objects in images. In CHI, pages 55–64, 2006.
[29] C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclass
recognition and part localization with humans in the loop. In
ICCV, Barcelona, 2011.
[30] W. Wu and J. Yang. Smartlabel: an object labeling tool using
iterated harmonic energy minimization. In ACM MM’06,
pages 891–900, 2006.
[31] J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme
video: Building a video database with human annotations. In
ICCV, pages 1451–1458, 2009.
[32] M.-C. Yuen, L.-J. Chen, and I. King. A survey of human
computation systems. In Computational Science and
Engineering, 2009. CSE ’09. International Conference on,
volume 4, pages 723 –728, aug. 2009.
[33] X. Zhu and A. B. Goldberg. Introduction to Semi-Supervised
Learning. Synthesis Lectures on Artiﬁcial Intelligence and
Machine Learning. Morgan & Claypool Publishers, 2009.

Ask'nSeek

•  Comparison
against
saliency
map
results

– (e):
Harel's
implementa1on
of
In
et
al.

– (f):
results
from
our
game

Oge Marques
uc-
ed,
the
me-
ow
ate
and
ing
1],
sed
1].
l to
on-
(c) (d)
(e) (f)
Figure 7: Representative representative results: (a) baseline
dog detector from [9]; (b) result from our approach for la-
bel ‘dog’; (c) baseline cat detector from [9]; (d) result from our
approach for label ‘cat’; (e) saliency map produced by [11]; (f)
comparable result from our approach.

Ask'nSeek

•  Final
remarks

– It
does
in
one
game
what
ESP
and
Peekaboom
do

in
two
games
(namely,
collec1ng
labels
and

loca1ng
the
objects
associated
with
those
labels).

– It
avoids
explicitly
asking
the
user
to
map
labels

and
regions
thanks
to
our
novel
semi-‐supervised

learning
algorithm.

– It
requires
very
few
game
logs
per
image
to
feed
a

machine
learning
algorithm,
which
in
turn

produces
the
outline
of
the
most
relevant
regions

within
the
image
and
their
labels.

Oge Marques

Ask'nSeek

•  Ongoing
work

– Diﬀerent
segmenta1on
algorithms
and
ﬁgures
of

merit

•  Joint
work
with
UPC
(Barcelona)

– Extensive
evalua1on
against
PASCAL
VOC
dataset

Oge Marques

Ask'nSeek

•  Many
possibili1es

– Diﬀerent
segmenta1on
algorithms

– (Mul1-‐language)
text-‐based
processing

– "Objects
in
context"

•  Biggest
challenge

– Making
the
game
fun!

– Increasing
player
base

Oge Marques

Ask'nSeek

•  Let's
play!

•  hKp://1nyurl.com/asknseek

Oge Marques

Our
work

•  Guess
That
Face

(with
Mathias
Lux
and
Justyn
Snyder,
CHI
2013)

– a
face
recogni1on
game
that
reverse
engineers

the
human
biological
threshold
for
accurately

recognizing
blurred
faces
of
celebri1es
under

1me-‐varying
condi1ons.

Oge Marques
! ! !

•  Mo1va1on:

– Human
vision:
we
are
remarkably
good
at

recognizing
(severely
blurred)
famous
faces

Oge MarquesFig. 1. Unlike current machine-based systems, human observers are able to handle significant degradations in face images. For instance,
subjects are able to recognize more than half of all familiar faces shown to them at the resolution depicted here. Individuals shown in
order are: Michael Jordan, Woody Allen, Goldie Hawn, Bill Clinton, Tom Hanks, Saddam Hussein, Elvis Presley, Jay Leno,
Sinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know About
Sinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know About

•  Mo1va1on:

– Computer
vision:
face
recogni1on
s1ll
not
mature

– Machine
learning:
what
if
we
train
algorithms

with
blurry
faces
instead?

– Game
play:
SongPop
and
Facebook

Oge Marques

•  The
game:

–  Player
must
analyze
a
series
of

randomly-‐generated
images
of

celebri1es
while
the
images

transi1on
from
an
ini1ally
severely-‐
blurred
state
to
their
original
state

over
a
constant
interval
of
1me.

–  While
the
image
is
being

progressively
de-‐blurred
on
the

screen,
players
are
prompted
to

select
the
name
of
the
celebrity

who
they
believe
is
correct,
which

they
do
once
they
have
conﬁdence

in
their
answer.

Oge Marques
!

•  Dataset
(original):

– 48
popular
celebri1es
+
2
poli1cians

– Each
image
in
the
dataset
also
has
two
varia1ons:

de-‐saturated
and
horizontally-‐ﬂipped

Oge Marques

•  Game
logs:

–  the
image
iden1ﬁer
in
the
dataset
that
contained
the

correct
answer,

–  the
varia1on
of
the
image
(normal,
de-‐saturated,
or

horizontally
ﬂipped),

–  the
degree
of
blurring
(ranging
from
0
to
65),

–  the
answer
the
user
selected
(or
the
1me-‐out),

–  the
orienta1on
of
both
the
correct
answer
and
the

selected
answer,

–  the
1me
stamp,

–  the
round
number.

Oge Marques

•  Terminology:

–  Guess
That
Face
is
an
inversion-‐problem
game,
where

one
player
provides
output
while
the
second
player

guesses
the
input
provided
by
the
ﬁrst
player.

–  Speciﬁcally,
for
Guess
That
Face,
the
“player”
who

generates
the
random
output
is
a
computer,
while
the

player
who
guesses
the
correct
input
is
a
human.

–  The
output
in
ques1on
is
both
the
image
of
the

celebrity
and
the
set
of
op1ons
and
the
input
is

the
correct
answer
selected
by
the
player.

Oge Marques

•  User
studies:

– 28
undergraduate
students
at
FAU
par1cipated.

– The
mean
age
is
21.

– 82%
are
male
and
18%
are
female.

– Many
strongly
agreed
on
“I
found
the
game
to
be

very
easy
to
understand,”
with
a
mean
of
4.93
on

a
5-‐point
Likert
scale
[1
..
5].

Oge Marques

•  User
studies:

– 33.3%
of
the
students
answered
that
they

confessed
to
random
guessing.
S1ll,
80%
of
the

students
played
at
least
one
game
without
using

an
exploit
or
randomly
guessing.

– 80%
of
the
par1cipants
saw
poten1al
for
broader

deployment,
i.e.,
as
a
social
media
applica1on.

– 80%
of
the
players
would
recommend
the
game

to
their
friends.

Oge Marques

•  Deblurring
method

– StackBlur
Gaussian
blurring
algorithm

– Blurring
radius
ranging
from
0
to
65.

– Total
de-‐blurring
1me
=
8.125
seconds.

Oge Marques

•  Results:

Oge Marques

•  Conclusions:

– Promising
preliminary
results

– Work
in
Progress

– Poten1al
to
extend
to
other
areas
and
age
groups

Oge Marques

Guess
That
Face

•  Let's
play!

•  hKp://1nyurl.com/guessthazace

Oge Marques

Ongoing
and
future
work

•  Build
a
framework
for
development
of
games

for
vision
problems

•  Extend
work
of
Ask'nSeek
and
GTF
to
other

vision
problems,
e.g.,
scene
classiﬁca1on,

object
recogni1on

•  Develop
mobile
games
for
children
to
help

human
vision
scien1sts
study
developmental

aspects
of
vision

Oge Marques

Concluding
remarks

•  "Recipe
for
success"

– Games
that
are
well-‐designed
(look
like
games,

have
a
certain
addic1ve
component,
make
you

want
to
share
with
friends,
have
a
long
shelf
life)

– Games
that
are
fun
to
play

– Game
logs
that
convey
useful
informa1on
that

would
not
be
easily
obtained
otherwise

– Meaningful
(open)
computer
vision
problems

– Sound
machine
learning
strategies
to
leverage
the

knowledge
acquired
through
game
logs.

Oge Marques

Discussion

•  Which
(types
of)
games?

•  Which
(group
of)
vision
problems?

•  How
to
avoid
the
trap
of
"games
that
aren't"?

•  How
to
make
a
game
"go
viral"?

•  What
else
can
go
wrong?

Oge Marques

Let's
get
to
work!

•  Which
computer
vision
problem
would
you

like
to
solve
using
games?

•  Contact
me
with
ideas:
omarques@fau.edu

Oge Marques

Using games to improve computer vision solutions

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Using games to improve computer vision solutions

Similar a Using games to improve computer vision solutions (20)

Más de Oge Marques

Más de Oge Marques (7)

Último

Último (20)

Using games to improve computer vision solutions