@inproceedings{buffet:hal-00750577,
hal_id = {hal-00750577},
url = {http://hal.inria.fr/hal-00750577},
title = {{Optimistic Heuristics for MineSweeper}},
author = {Buffet, Olivier and Lee, Chang-Shing and Lin, Woanting and Teytaud, Olivier},
abstract = {{We present a combination of Upper Con dence Tree (UCT) and domain speci c solvers, aimed at improving the behavior of UCT for long term aspects of a problem. Results improve the state of the art, combining top performance on small boards (where UCT is the state of the art) and on big boards (where variants of CSP rule).}},
language = {Anglais},
affiliation = {MAIA - INRIA Nancy - Grand Est / LORIA , Department of Computer Science and Information Engineering - CSIE , National University of Tainan - NUTN , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , Department of Electrical Engineering and Computer Science - Institut Montefiore},
booktitle = {{International Computer Symposium}},
address = {Hualien, Ta{\"\i}wan, Province De Chine},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00750577/PDF/mines3.pdf},
}
16. What is
the optimal
move ?
Remark: the question makes sense, without
Knowing the history.
You don't need the history for playing optimaly.
==> (this fact is mathematically non trivial!)
17. What is
the optimal
move ?
This one is easy.
Both remaining locations win with proba 50%.
19. Probability
of a mine ?
- Top:
- Middle:
- Bottom:
20. Probability
of a mine ?
- Top: 33%
- Middle:
- Bottom:
21. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom:
22. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
23. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
==> so all moves
equivalent ?
24. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
==> so all moves
equivalent ?
==> NOOOOO!!!
25. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
Top or bottom:
66% of win!
Middle: 33%!
26. The myopic
(one-step ahead)
approach plays
randomly.
The middle is a
bad move!
Even with same
proba of mine,
some moves are
better than others!
27. State of the art:
- solved in 4x4
- NP-complete
- Constraint Satisfaction Problem approach:
= Find the location which is less likely
to be a mine, play there.
==> 80% success “beginner” (9x9, 10 mines)
==> 45% success “intermediate” (16x16, 40
mines)
==> 34% success “expert” (30x40, 99 mines)
28. 1. Rules of MineSweeper
2. State of the art
3. The CSP approach
(and other old known methods)
4. The UCT approach
5. The best of both worlds
29. - Exact MDP: very expensive. 4x4 solved.
- Single Point Strategy (SPS): simple local solving
- CSP (constraint satisf. problem): the main approach.
- (unknown) state:
x(i) = 1 if there is a mine at location i
- each visible location is a constraint:
If location 15 is 4, then the constraint is
x(04)+x(05)+x(06)
+x(14)+ x(16)
+x(24)+x(25)+x(26) = 4.
- find all solutions x1, x2, x3,...,xN
- P(mine in j) = (sumi Xij ) / N <== this is math. proved!
- play j such that P(mine in j) minimal
- if several such j, randomly break ties.
MDP= Markov Decision Process
CSP = Constraint Satisfaction Problem
30. CSP as modified by Legendre et al, 2012:
- (unknown) state:
x(i) = 1 if there is a mine at location i
- each visible location is a constraint:
If location 15 is 4, then the constraint is
x(04)+x(05)+x(06)
+x(14)+ x(16)
+x(24)+x(25)+x(26) = 4.
- find all solutions x1, x2, x3,...,xN
- P(mine in j) = (sumi Xij ) / N <== this is math. proved!
- play j such that P(mine in j) minimal
- if several such j, choose one “closest to the frontier”
(proposed by Legendre et al)
- if several such j, randomly break ties.
31. CSP
- is very fast
- but it's not optimal
- because of
Here CSP plays randomly!
Also for the initial move: don't play
randomly the first move! (sometimes opening book)
32. 1. Rules of MineSweeper
2. State of the art
3. The CSP approach
4. The UCT approach
5. The best of both worlds
33. Why not UCT ?
- looks like a stupid idea at first view
- can not compete with CSP in terms of speed
- But at least UCT is
consistent: if given
sufficient
time, it will play
optimally.
- Tested in Couetoux
and Teytaud, 2011
46. UCT in one slide
C SP by
se the al 2012
We u re et
d
Legen expansion
for ulation
.
a nd sim
47. Applying UCT here ?
• Might look like a hammer for a
drosophilia
• But in many cases CSP is suboptimal
• We have seen an example of suboptimal
move by CSP a few slides ago
• Let's see two additional examples
48. An example showing that the initial
move matters (UCT finds it, not CSP)..
3x3, 7 mines:
the optimal move
is anything but the center.
Optimal winning rate: 25%.
Optimal winning rate if
random uniform
initial move: 17/72.
(yes we get 1/72
improvement!)
49. Second such example:
15 mines on 5x5 board with
GnoMine rule
(i.e. initial move is a 0, i.e. no
mine in the neighborhood)
Optimal success rate = 100%!!!!!
Play the center, and you win (well, you have to work...)
The myopic CSP approach does not find it.
50. 1. Rules of MineSweeper
2. State of the art
3. The CSP approach
4. The UCT approach
5. The best of both worlds
51. Summary
I have two approaches:
• CSP:
• Fast
• Suboptimal (myopic, only 1-step ahead)
• UCT:
• needs a generative model (probability of
next states, given my action),
• Asymptotically optimal
52. The best of both worlds ?
• CSP:
• Fast
• Suboptimal (myopic, only 1-step ahead)
• UCT:
• needs a generative model by CSP,
• Asymptotically optimal
53. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what
are the possible next states ?
54. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
55. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
56. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
57. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
58. We published a version of UCT
for MineSweeper in which this was
What do I need for implementing UCT ?
implemented using
A complete generative model.
Given a state and an action,
the rejection method only.
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
59. Rejection algorithm:
1- randomly draw the mines
What do I need for implementing UCT ?
Given 2- if and an action, return the new observation
a state it's ok,
A complete generative model.
3- otherwise, go back to 1.
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
60. It is mathematically ok, but it is too slow.
Then,need for used a UCT ? CSP implementation.
What do I
we implementing weak
A complete generative model.
Given a state and an action,
Still too slow.
Now a reasonably fast implementation, with
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Legendre et al heuristic.
Example: given the state below, and the action “top left”, what are the possible next
states ?
61. EXPERIMENTAL RESULTS
Huge
computation
10 000 UCT-simulations time
per move Our results
(total = a few days)
62. CONCLUSIONS: a
methodology for sequential
decision making
- When you have a myopic solver
(i.e. which neglects long term
effects, as too often in industry!)
==> improve it with heuristics (as
Legendre et al)
==> combine with UCT (as we did)
==> significant improvements
- We have similar experiments on
industrial testbeds