Dr. Oge Marques discusses using games to improve computer vision solutions. Specifically, Dr. Marques describes a two-player web-based guessing game called Ask'nSeek that helps solve the computer vision problems of object detection, labeling, and semantic scene segmentation. Ask'nSeek logs spatial relationships and labels from a small number of games per image to train machine learning models for these tasks.
1. Using
games
to
improve
computer
vision
solu1ons
Dr.
Oge
Marques
May
21,
2013
2. Take-‐home
message
Solu1ons
to
many
problems
in
computer
vision
can
be
improved
using
human
computa1on,
par1cularly
through
properly
designed
games
(with
a
purpose).
Oge Marques
4. Related
work
• ESP
Oge Marques
10 Axel Carlier, Oge Marques, Vincent Charvillat
6 Conclusions
This paper proposed a novel approach to solving a selected subset of computer vi-
sion problems using games and described Ask’nSeek, a novel, simple, fun, web-based
guessing game based on images, their most relevant regions, and the spatial relation-
ships among them. Two noteworthy aspects of the proposed game are: (i) it does in one
game what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels and
locating the objects associated with those labels); and (ii) it avoids explicitly asking the
user to map labels and regions thanks to our novel semi-supervised learning algorithm.
We also described how the information collected from very few game logs per image
was used to feed a machine learning algorithm, which in turn produces the outline of
the most relevant regions within the image and their labels.
Our game can also be extended and improved in several directions, among them:
different game modes, timer(s), addition of a social component (e.g., play against your
Facebook friends), extending the interface to allow touchscreen gestures for tablet-
based play, and incorporation of incentives to the game, e.g., badges or coins, which
should – among other things – encourage switching roles (master-seeker) periodically.
References
1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–67
2. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI Spring
Symposium: Knowledge Collection from Volunteer Contributors. (2005) 91–98
3. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.
(2006) 55–64
4. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive human
computation game for image annotation. SIGKDD Expl. 12 (2010) 21–24
5. Related
work
• Peekaboom
Oge Marques
10 Axel Carlier, Oge Marques, Vincent Charvillat
6 Conclusions
This paper proposed a novel approach to solving a selected subset of computer vi-
sion problems using games and described Ask’nSeek, a novel, simple, fun, web-based
guessing game based on images, their most relevant regions, and the spatial relation-
ships among them. Two noteworthy aspects of the proposed game are: (i) it does in one
game what ESP [2] and Peekaboom [3] do in two games (namely, collecting labels and
locating the objects associated with those labels); and (ii) it avoids explicitly asking the
user to map labels and regions thanks to our novel semi-supervised learning algorithm.
We also described how the information collected from very few game logs per image
was used to feed a machine learning algorithm, which in turn produces the outline of
the most relevant regions within the image and their labels.
Our game can also be extended and improved in several directions, among them:
different game modes, timer(s), addition of a social component (e.g., play against your
Facebook friends), extending the interface to allow touchscreen gestures for tablet-
based play, and incorporation of incentives to the game, e.g., badges or coins, which
should – among other things – encourage switching roles (master-seeker) periodically.
References
1. von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51 (2008) 58–67
2. von Ahn, L., Dabbish, L.: Esp: Labeling images with a computer game. In: AAAI Spring
Symposium: Knowledge Collection from Volunteer Contributors. (2005) 91–98
3. von Ahn, L., Liu, R., Blum, M.: Peekaboom: a game for locating objects in images. In: CHI.
(2006) 55–64
4. Ho, C.J., Chang, T.H., Lee, J.C., jen Hsu, J.Y., Chen, K.T.: Kisskissban: a competitive human
computation game for image annotation. SIGKDD Expl. 12 (2010) 21–24
5. Steggink, J., Snoek, C.G.M.: Adding semantics to image-region annotations with the name-
6. Our
work
• Ask’nSeek
(with
Vincent
Charvillat
and
Axel
Carlier,
ECCV
2012)
– A
two-‐player
guessing
game
that
helps
solving
the
problems
of
object
detec/on
and
labeling
and
seman/c
scene
segmenta/on.
Oge Marques
7. Ask'nSeek
• Contribu1ons
– It
solves
two
computer
vision
problems
–
object
detec1on
and
labeling
–
in
a
single
game
– It
learns
spa1al
rela1onships
within
the
image
from
game
logs.
Oge Marques
8. Ask'nSeek
• The
basics
– Ask’nSeek
is
a
two-‐player,
web-‐based,
game
that
can
be
played
on
a
contemporary
browser
without
any
need
for
plug-‐ins.
– One
player,
the
master,
hides
a
rectangular
region
somewhere
within
a
randomly
chosen
image.
– The
second
player
(seeker)
tries
to
guess
the
loca1on
of
the
hidden
region
through
a
series
of
successive
guesses,
expressed
by
clicking
at
some
point
in
the
image.
Oge Marques
11. Ask'nSeek
-‐
terminology
• According
to
the
classifica1on
in
[von
Ahn,
CACM
2008],
Ask’nSeek
is
an
Inversion-‐
Problem
game,
because
“given
an
input,
Player
1
(in
our
case,
called
master)
produces
an
output,
and
Player
2
(the
seeker)
guesses
the
input”.
– More
specifically,
the
input
in
ques1on
is
the
loca1on
of
the
hidden
region
within
an
image
and
the
outputs
produced
by
Player
1
are
what
we
call
indica/ons.
Oge Marques
12. Ask'nSeek
-‐
terminology
• Ini*al
Setup:
Two
players
are
randomly
chosen
by
the
game
itself.
• Rules:
The
master
produces
an
input
(by
hiding
a
rectangular
region
within
an
image).
Based
on
this
input,
the
master
produces
outputs
(spa1al
clues,
i.e.,
indica1ons)
that
are
sent
to
the
seeker.
The
outputs
from
the
master
should
help
the
seeker
produce
the
original
input,
i.e.,
locate
the
hidden
box.
• Winning
Condi*on:
The
seeker
produces
the
input
that
was
originally
produced
by
the
master,
i.e.,
guesses
the
correct
loca1on
by
clicking
on
any
pixel
within
the
hidden
bounding
box.
Oge Marques
13. Ask'nSeek
• Game
logs
– The
game
logs
contain
labels
(oaen
in
mul1ple
languages)
as
well
as
‘on’,
‘par1ally
on’
and
‘lea-‐
right-‐above-‐below’
rela1ons.
• Examples
of
labels
include
foreground
objects
(e.g.,
dog,
bus)
as
well
as
other
seman1cally
meaningful
regions
within
the
image
(e.g.,
sky,
road).
Oge Marques
14. Ask'nSeek
• Game
logs
– Very
few
(~
7)
games
per
image
are
needed!
Oge Marques
we considered only the game logs that use the spatial relations “abo
the left of” and “on the right of”. We then used that information t
g box limiting the region that respects all the constraints defined by t
hin which the object must reside. Figure 3(a) plots in black all the p
de this bounding box for the ‘dog’ object.
(a) (b)
alysis of simulation logs with different number of simulated games: (a) 10,0
20. Ask'nSeek
• Comparison
against
baseline
detector
(*)
Oge Marques
Ask’nSeek: a new game for object detection and labeling 9
(a) (b)
(c) (d)
Fig. 5. Representative representative results: (a) baseline dog detector from [9]; (b) result from
our approach for label ‘dog’; (c) baseline cat detector from [9]; (d) result from our approach for
label ‘cat’.
(1 a)eµl
(k)
and S
b
l = b bSl
(k 1)
+ (1 b) eSl
(k)
ac-
tually improve the likelihood:
a⇤
,b⇤
= argmax
a,b
P(X,R |µa
l ,S
b
l ,g(k)
) (11)
q(k) is updated with K new clusters locations
bµl
(k)
µa⇤
l and covariances bSl
(k)
S
b⇤
l , pro-
duced by K optimizations (equation 11 for l =
1,.,K) that make the likelihood monotonically in-
creasing.
Until P(X,R |q(k),g(k)) converge.
8. REFERENCES
[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
S. Susstrunk. Slic superpixels. Technical report, EPFL, 2010.
[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen.
Interactively co-segmentating topically related images with
intelligent scribble guidance. IJCV, 93(3):273–292, 2011.
[3] S. Branson, P. Perona, and S. Belongie. Strong supervision
from weak annotation: Interactive training of deformable
part models. In ICCV, Barcelona, 2011.
[4] S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder,
P. Perona, and S. Belongie. Visual recognition with humans
in the loop. In ECCV, Heraklion, 2010.
[5] A. Carlier, G. Ravindra, V. Charvillat, and W. T. Ooi.
Combining content-based analysis and crowdsourcing to
improve user interaction with zoomable video. In ACM
MM’11, pages 43–52, 2011.
[6] O. Chapelle, B. Schölkopf, and A. Zien, editors.
Semi-Supervised Learning. MIT Press, Cambridge, 2006.
[7] D. Comaniciu and P. Meer. Mean shift: A robust approach
toward feature space analysis. IEEE Trans. PAMI,
24(5):603–619, 2002.
[8] S. Cooper, F. Khatib, A. Treuille, J. Barbero, J. Lee,
M. Beenen, A. Leaver-Fay, D. Baker, Z. Popovic, and
F. Players. Predicting protein structures with a multiplayer
online game. Nature, 466(7307):756–760, Aug. 2010.
[9] P. Felzenszwalb, R. Girshick, and D. McAllester.
Discriminatively trained deformable part models, release 4.
http://people.cs.uchicago.edu/ pff/latent-release4/.
[10] P. Felzenszwalb, R. Girshick, D. McAllester, and
D. Ramanan. Object detection with discriminatively trained
part-based models. IEEE Trans. PAMI, 32:1627–1645, 2010.
Tagcaptcha: annotating images with captchas. In ACM
MM’10, pages 1557–1558, 2010.
[18] Y. Ni, J. Dong, J. Feng, and S. Yan. Purposive
hidden-object-game: embedding human computation in
popular game. In ACM MM’11, pages 1121–1124, 2011.
[19] A. J. Quinn and B. B. Bederson. Human computation: a
survey and taxonomy of a growing field. In CHI’11, pages
1403–1412, 2011.
[20] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman.
Labelme: A database and web-based tool for image
annotation. IJCV, 77(1-3):157–173, 2008.
[21] N. Savage. Gaining wisdom from crowds. Commun. ACM,
55(3):13–15, Mar. 2012.
[22] S. Seung. Eyewire. http://eyewire.org/, 2012.
[23] J. Steggink and C. G. M. Snoek. Adding semantics to
image-region annotations with the name-it-game.
Multimedia Systems, 17(5):367–378, October 2011.
[24] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable
library of computer vision algorithms.
http://www.vlfeat.org/, 2008.
[25] S. Vijayanarasimhan and K. Grauman. Cost-sensitive active
visual category learning. IJCV, 91(1):24–44, 2011.
[26] L. von Ahn and L. Dabbish. Esp: Labeling images with a
computer game. In AAAI Spring Symposium: Knowledge
Collection from Volunteer Contributors, pages 91–98, 2005.
[27] L. von Ahn and L. Dabbish. Designing games with a
purpose. Commun. ACM, 51(8):58–67, 2008.
[28] L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game for
locating objects in images. In CHI, pages 55–64, 2006.
[29] C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclass
recognition and part localization with humans in the loop. In
ICCV, Barcelona, 2011.
[30] W. Wu and J. Yang. Smartlabel: an object labeling tool using
iterated harmonic energy minimization. In ACM MM’06,
pages 891–900, 2006.
[31] J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme
video: Building a video database with human annotations. In
ICCV, pages 1451–1458, 2009.
[32] M.-C. Yuen, L.-J. Chen, and I. King. A survey of human
computation systems. In Computational Science and
Engineering, 2009. CSE ’09. International Conference on,
volume 4, pages 723 –728, aug. 2009.
[33] X. Zhu and A. B. Goldberg. Introduction to Semi-Supervised
Learning. Synthesis Lectures on Artificial Intelligence and
Machine Learning. Morgan & Claypool Publishers, 2009.
21. Ask'nSeek
• Comparison
against
saliency
map
results
– (e):
Harel's
implementa1on
of
In
et
al.
– (f):
results
from
our
game
Oge Marques
uc-
ed,
the
me-
ow
ate
and
ing
1],
sed
1].
l to
on-
(c) (d)
(e) (f)
Figure 7: Representative representative results: (a) baseline
dog detector from [9]; (b) result from our approach for la-
bel ‘dog’; (c) baseline cat detector from [9]; (d) result from our
approach for label ‘cat’; (e) saliency map produced by [11]; (f)
comparable result from our approach.
22. Ask'nSeek
• Final
remarks
– It
does
in
one
game
what
ESP
and
Peekaboom
do
in
two
games
(namely,
collec1ng
labels
and
loca1ng
the
objects
associated
with
those
labels).
– It
avoids
explicitly
asking
the
user
to
map
labels
and
regions
thanks
to
our
novel
semi-‐supervised
learning
algorithm.
– It
requires
very
few
game
logs
per
image
to
feed
a
machine
learning
algorithm,
which
in
turn
produces
the
outline
of
the
most
relevant
regions
within
the
image
and
their
labels.
Oge Marques
23. Ask'nSeek
• Ongoing
work
– Different
segmenta1on
algorithms
and
figures
of
merit
• Joint
work
with
UPC
(Barcelona)
– Extensive
evalua1on
against
PASCAL
VOC
dataset
Oge Marques
24. Ask'nSeek
• Many
possibili1es
– Different
segmenta1on
algorithms
– (Mul1-‐language)
text-‐based
processing
– "Objects
in
context"
• Biggest
challenge
– Making
the
game
fun!
– Increasing
player
base
Oge Marques
26. Our
work
• Guess
That
Face
(with
Mathias
Lux
and
Justyn
Snyder,
CHI
2013)
– a
face
recogni1on
game
that
reverse
engineers
the
human
biological
threshold
for
accurately
recognizing
blurred
faces
of
celebri1es
under
1me-‐varying
condi1ons.
Oge Marques
! ! !
27. • Mo1va1on:
– Human
vision:
we
are
remarkably
good
at
recognizing
(severely
blurred)
famous
faces
Oge MarquesFig. 1. Unlike current machine-based systems, human observers are able to handle significant degradations in face images. For instance,
subjects are able to recognize more than half of all familiar faces shown to them at the resolution depicted here. Individuals shown in
order are: Michael Jordan, Woody Allen, Goldie Hawn, Bill Clinton, Tom Hanks, Saddam Hussein, Elvis Presley, Jay Leno,
Sinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know About
Sinha et al.: Face Recognition by Humans: Nineteen Results Researchers Should Know About
28. • Mo1va1on:
– Computer
vision:
face
recogni1on
s1ll
not
mature
– Machine
learning:
what
if
we
train
algorithms
with
blurry
faces
instead?
– Game
play:
SongPop
and
Facebook
Oge Marques
29. • The
game:
– Player
must
analyze
a
series
of
randomly-‐generated
images
of
celebri1es
while
the
images
transi1on
from
an
ini1ally
severely-‐
blurred
state
to
their
original
state
over
a
constant
interval
of
1me.
– While
the
image
is
being
progressively
de-‐blurred
on
the
screen,
players
are
prompted
to
select
the
name
of
the
celebrity
who
they
believe
is
correct,
which
they
do
once
they
have
confidence
in
their
answer.
Oge Marques
!
30. • Dataset
(original):
– 48
popular
celebri1es
+
2
poli1cians
– Each
image
in
the
dataset
also
has
two
varia1ons:
de-‐saturated
and
horizontally-‐flipped
Oge Marques
31. • Game
logs:
– the
image
iden1fier
in
the
dataset
that
contained
the
correct
answer,
– the
varia1on
of
the
image
(normal,
de-‐saturated,
or
horizontally
flipped),
– the
degree
of
blurring
(ranging
from
0
to
65),
– the
answer
the
user
selected
(or
the
1me-‐out),
– the
orienta1on
of
both
the
correct
answer
and
the
selected
answer,
– the
1me
stamp,
– the
round
number.
Oge Marques
32. • Terminology:
– Guess
That
Face
is
an
inversion-‐problem
game,
where
one
player
provides
output
while
the
second
player
guesses
the
input
provided
by
the
first
player.
– Specifically,
for
Guess
That
Face,
the
“player”
who
generates
the
random
output
is
a
computer,
while
the
player
who
guesses
the
correct
input
is
a
human.
– The
output
in
ques1on
is
both
the
image
of
the
celebrity
and
the
set
of
op1ons
and
the
input
is
the
correct
answer
selected
by
the
player.
Oge Marques
33. • User
studies:
– 28
undergraduate
students
at
FAU
par1cipated.
– The
mean
age
is
21.
– 82%
are
male
and
18%
are
female.
– Many
strongly
agreed
on
“I
found
the
game
to
be
very
easy
to
understand,”
with
a
mean
of
4.93
on
a
5-‐point
Likert
scale
[1
..
5].
Oge Marques
34. • User
studies:
– 33.3%
of
the
students
answered
that
they
confessed
to
random
guessing.
S1ll,
80%
of
the
students
played
at
least
one
game
without
using
an
exploit
or
randomly
guessing.
– 80%
of
the
par1cipants
saw
poten1al
for
broader
deployment,
i.e.,
as
a
social
media
applica1on.
– 80%
of
the
players
would
recommend
the
game
to
their
friends.
Oge Marques
35. • Deblurring
method
– StackBlur
Gaussian
blurring
algorithm
– Blurring
radius
ranging
from
0
to
65.
– Total
de-‐blurring
1me
=
8.125
seconds.
Oge Marques
37. • Conclusions:
– Promising
preliminary
results
– Work
in
Progress
– Poten1al
to
extend
to
other
areas
and
age
groups
Oge Marques
38. Guess
That
Face
• Let's
play!
• hKp://1nyurl.com/guessthazace
Oge Marques
39. Ongoing
and
future
work
• Build
a
framework
for
development
of
games
for
vision
problems
• Extend
work
of
Ask'nSeek
and
GTF
to
other
vision
problems,
e.g.,
scene
classifica1on,
object
recogni1on
• Develop
mobile
games
for
children
to
help
human
vision
scien1sts
study
developmental
aspects
of
vision
Oge Marques
40. Concluding
remarks
• "Recipe
for
success"
– Games
that
are
well-‐designed
(look
like
games,
have
a
certain
addic1ve
component,
make
you
want
to
share
with
friends,
have
a
long
shelf
life)
– Games
that
are
fun
to
play
– Game
logs
that
convey
useful
informa1on
that
would
not
be
easily
obtained
otherwise
– Meaningful
(open)
computer
vision
problems
– Sound
machine
learning
strategies
to
leverage
the
knowledge
acquired
through
game
logs.
Oge Marques
41. Discussion
• Which
(types
of)
games?
• Which
(group
of)
vision
problems?
• How
to
avoid
the
trap
of
"games
that
aren't"?
• How
to
make
a
game
"go
viral"?
• What
else
can
go
wrong?
Oge Marques
42. Let's
get
to
work!
• Which
computer
vision
problem
would
you
like
to
solve
using
games?
• Contact
me
with
ideas:
omarques@fau.edu
Oge Marques