SlideShare una empresa de Scribd logo
1 de 63
Optimistic Heuristics &
Application to MineSweeper

O. Buffet, W. Lin, O. Teytaud
A great challenge: MineSweeper.

- looks easy
- in fact, not easy:
    many myopic (one-
   step-ahead)
   approaches.
- partially observable
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
RULES



    At the
 beginning,
      all
  locations
     are
  Covered
(unkwown).
I play
here!
Good news!

 No mine in
     the
neighborhood!

 I can “click”
    all the
 neighbours.
I have 3
  uncovered
  neighbors,
 and I have 3
 mines in the
neighborhood
 ==> 3 flags!
I know
  it's a
 mine,
so I put
 a flag!
No info !
I play here and I lose...
The most
successful
game ever!
Who in this
room never
  played
   Mine-
Sweeper ?
1. Rules of MineSweeper

   2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
Do you
 think it's
  easy ?
 (10 mines)

MineSweeper
is not simple.
What is
the optimal
  move ?
What is
                                        the optimal
                                          move ?


 Remark: the question makes sense, without
             Knowing the history.
You don't need the history for playing optimaly.
 ==> (this fact is mathematically non trivial!)
What is
                                     the optimal
                                       move ?



             This one is easy.

Both remaining locations win with proba 50%.
More
difficult!
 Which
move is
optimal ?

Here, the
classical
approach
  (CSP)
is wrong.
Probability
   of a mine ?
- Top:
- Middle:
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle:
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle: 33%
- Bottom:
Probability
   of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

==> so all moves
    equivalent ?
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

==> so all moves
    equivalent ?
==> NOOOOO!!!
Probability
    of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%

Top or bottom:
  66% of win!

Middle: 33%!
The myopic
(one-step ahead)
 approach plays
   randomly.

 The middle is a
   bad move!

 Even with same
  proba of mine,
 some moves are
better than others!
State of the art:
- solved in 4x4
- NP-complete
- Constraint Satisfaction Problem approach:
    = Find the location which is less likely
        to be a mine, play there.
  ==> 80% success “beginner” (9x9, 10 mines)
  ==> 45% success “intermediate” (16x16, 40
                                           mines)
  ==> 34% success “expert” (30x40, 99 mines)
1. Rules of MineSweeper

       2. State of the art

     3. The CSP approach
(and other old known methods)

     4. The UCT approach

   5. The best of both worlds
- Exact MDP: very expensive. 4x4 solved.
- Single Point Strategy (SPS): simple local solving
- CSP (constraint satisf. problem): the main approach.
    - (unknown) state:
          x(i) = 1 if there is a mine at location i
    - each visible location is a constraint:
           If location 15 is 4, then the constraint is
           x(04)+x(05)+x(06)
          +x(14)+         x(16)
          +x(24)+x(25)+x(26) = 4.
    - find all solutions x1, x2, x3,...,xN
    - P(mine in j) = (sumi Xij ) / N <== this is math. proved!
    - play j such that P(mine in j) minimal
    - if several such j, randomly break ties.

                MDP= Markov Decision Process
              CSP = Constraint Satisfaction Problem
CSP as modified by Legendre et al, 2012:

   - (unknown) state:
         x(i) = 1 if there is a mine at location i
   - each visible location is a constraint:
          If location 15 is 4, then the constraint is
          x(04)+x(05)+x(06)
         +x(14)+         x(16)
         +x(24)+x(25)+x(26) = 4.
   - find all solutions x1, x2, x3,...,xN
   - P(mine in j) = (sumi Xij ) / N <== this is math. proved!
   - play j such that P(mine in j) minimal
   - if several such j, choose one “closest to the frontier”
                        (proposed by Legendre et al)
   - if several such j, randomly break ties.
CSP
- is very fast
- but it's not optimal
- because of




Here CSP plays randomly!
Also for the initial move: don't play
 randomly the first move!   (sometimes opening book)
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

 4. The UCT approach

5. The best of both worlds
Why not UCT ?
- looks like a stupid idea at first view
- can not compete with CSP in terms of speed
- But at least UCT is
  consistent: if given
  sufficient
  time, it will play
  optimally.
- Tested in Couetoux
  and Teytaud, 2011
UCT (Upper Confidence Trees)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                  0/2
               + k.sqrt( log(10)/2 )
UCT in one slide
UCT in one slide



            C SP by
     se the al 2012
We u re et
      d
Legen expansion
   for      ulation
                   .
   a nd sim
Applying UCT here ?
•   Might look like a hammer for a
    drosophilia
•   But in many cases CSP is suboptimal
•   We have seen an example of suboptimal
    move by CSP a few slides ago
•   Let's see two additional examples
An example showing that the initial
move matters (UCT finds it, not CSP)..

                              3x3, 7 mines:
                            the optimal move
                       is anything but the center.
                      Optimal winning rate: 25%.
                        Optimal winning rate if
                           random uniform
                         initial move: 17/72.

                           (yes we get 1/72
                            improvement!)
Second such example:
       15 mines on 5x5 board with
                GnoMine rule
      (i.e. initial move is a 0, i.e. no
        mine in the neighborhood)
           Optimal success rate = 100%!!!!!
Play the center, and you win (well, you have to work...)
      The myopic CSP approach does not find it.
1. Rules of MineSweeper

    2. State of the art

  3. The CSP approach

  4. The UCT approach

5. The best of both worlds
Summary
I have two approaches:
•   CSP:

     •     Fast

     •     Suboptimal (myopic, only 1-step ahead)

•   UCT:

     •     needs a generative model (probability of
           next states, given my action),

     •     Asymptotically optimal
The best of both worlds ?

•   CSP:

     •     Fast

     •     Suboptimal (myopic, only 1-step ahead)

•   UCT:

     •     needs a generative model by CSP,

     •     Asymptotically optimal
What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what
are the possible next states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
What do I need for implementing UCT ?

A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
We published a version of UCT
       for MineSweeper in which this was
What do I need for implementing UCT ?


                        implemented using
A complete generative model.
Given a state and an action,
                 the rejection method only.
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
Rejection algorithm:
      1- randomly draw the mines
What do I need for implementing UCT ?


Given 2- if and an action, return the new observation
       a state it's ok,
A complete generative model.


      3- otherwise, go back to 1.
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'

Example: given the state below, and the action “top left”, what are the possible next
states ?
It is mathematically ok, but it is too slow.
Then,need for used a UCT ? CSP implementation.
What do I
            we implementing weak
A complete generative model.
Given a state and an action,
                               Still too slow.
Now a reasonably fast implementation, with
I must be able to simulate possible transitions.

State S, Action a:
(S,a) ==> S'
                    Legendre et al heuristic.
Example: given the state below, and the action “top left”, what are the possible next
states ?
EXPERIMENTAL RESULTS




                            Huge
                         computation
10 000 UCT-simulations      time
       per move                      Our results
                                  (total = a few days)
CONCLUSIONS: a
       methodology for sequential
           decision making

- When you have a myopic solver
  (i.e. which neglects long term
  effects, as too often in industry!)
     ==> improve it with heuristics (as
            Legendre et al)
     ==> combine with UCT (as we did)
     ==> significant improvements

- We have similar experiments on
   industrial testbeds
Thanks for your
attention!

    9 Mines.
  What is the
optimal move ?

Más contenido relacionado

La actualidad más candente

Lection1
Lection1Lection1
Lection1CDN_IF
 
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodFEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodSuddhasheel GHOSH, PhD
 
Regula Falsi (False position) Method
Regula Falsi (False position) MethodRegula Falsi (False position) Method
Regula Falsi (False position) MethodIsaac Yowetu
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson methodJayesh Ranjan
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithmMani Kanth
 
2D Transformation in Computer Graphics
2D Transformation in Computer Graphics2D Transformation in Computer Graphics
2D Transformation in Computer GraphicsA. S. M. Shafi
 
Computer Graphics - Lecture 02 transformation
Computer Graphics - Lecture 02 transformationComputer Graphics - Lecture 02 transformation
Computer Graphics - Lecture 02 transformation💻 Anton Gerdelan
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common SubsequenceSyeda
 
3 d transformation
3 d transformation3 d transformation
3 d transformationMani Kanth
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output PrimitivesPrathimaBaliga
 
практ.заняття 4 теорія поля
практ.заняття 4 теорія поляпракт.заняття 4 теорія поля
практ.заняття 4 теорія поляCit Cit
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference FormulaJas Singh Bhasin
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder홍배 김
 
Amortized Analysis
Amortized Analysis Amortized Analysis
Amortized Analysis sathish sak
 
scan conversion of point , line and circle
scan conversion of point , line and circlescan conversion of point , line and circle
scan conversion of point , line and circleDivy Kumar Gupta
 

La actualidad más candente (20)

Lection1
Lection1Lection1
Lection1
 
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's MethodFEM Introduction: Solving ODE-BVP using the Galerkin's Method
FEM Introduction: Solving ODE-BVP using the Galerkin's Method
 
Regula Falsi (False position) Method
Regula Falsi (False position) MethodRegula Falsi (False position) Method
Regula Falsi (False position) Method
 
Newton raphson method
Newton raphson methodNewton raphson method
Newton raphson method
 
2d transformation
2d transformation2d transformation
2d transformation
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithm
 
Bresenham algorithm
Bresenham algorithmBresenham algorithm
Bresenham algorithm
 
2D Transformation in Computer Graphics
2D Transformation in Computer Graphics2D Transformation in Computer Graphics
2D Transformation in Computer Graphics
 
Computer Graphics - Lecture 02 transformation
Computer Graphics - Lecture 02 transformationComputer Graphics - Lecture 02 transformation
Computer Graphics - Lecture 02 transformation
 
Longest Common Subsequence
Longest Common SubsequenceLongest Common Subsequence
Longest Common Subsequence
 
3 d transformation
3 d transformation3 d transformation
3 d transformation
 
Chapter 3 Output Primitives
Chapter 3 Output PrimitivesChapter 3 Output Primitives
Chapter 3 Output Primitives
 
практ.заняття 4 теорія поля
практ.заняття 4 теорія поляпракт.заняття 4 теорія поля
практ.заняття 4 теорія поля
 
Newton’s Divided Difference Formula
Newton’s Divided Difference FormulaNewton’s Divided Difference Formula
Newton’s Divided Difference Formula
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
Graphics a buffer
Graphics a bufferGraphics a buffer
Graphics a buffer
 
Integration
IntegrationIntegration
Integration
 
Amortized Analysis
Amortized Analysis Amortized Analysis
Amortized Analysis
 
Fourier series
Fourier seriesFourier series
Fourier series
 
scan conversion of point , line and circle
scan conversion of point , line and circlescan conversion of point , line and circle
scan conversion of point , line and circle
 

Similar a Combining UCT and Constraint Satisfaction Problems for Minesweeper

Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial informationOlivier Teytaud
 
Games with partial information
Games with partial informationGames with partial information
Games with partial informationOlivier Teytaud
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement LearningUtkarsh Garg
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01David Robles
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchOlivier Teytaud
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimizationAng Sovann
 
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfconstructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfSayanSamanta39
 
Search-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfSearch-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfMrRRThirrunavukkaras
 
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekAIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekpavan402055
 
Knights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingKnights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingAbhishek Singh
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Iwan Sofana
 
BeyondClassicalSearch.ppt
BeyondClassicalSearch.pptBeyondClassicalSearch.ppt
BeyondClassicalSearch.pptGauravWani20
 

Similar a Combining UCT and Constraint Satisfaction Problems for Minesweeper (20)

Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial information
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Ucb
UcbUcb
Ucb
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01Silverdisappointing8 120924091642-phpapp01
Silverdisappointing8 120924091642-phpapp01
 
Disappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree SearchDisappointing results & open problems in Monte-Carlo Tree Search
Disappointing results & open problems in Monte-Carlo Tree Search
 
Heuristic approach optimization
Heuristic  approach optimizationHeuristic  approach optimization
Heuristic approach optimization
 
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdfconstructing_generic_algorithms__ben_deane__cppcon_2020.pdf
constructing_generic_algorithms__ben_deane__cppcon_2020.pdf
 
Search-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdfSearch-Beyond-Classical-no-exercise-answers.pdf
Search-Beyond-Classical-no-exercise-answers.pdf
 
simple
simplesimple
simple
 
simple
simplesimple
simple
 
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjekAIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
AIMA_ch3_L2-complement.ppt kjekfkjekjfkjefkjefkjek
 
Knights tour on chessboard using backtracking
Knights tour on chessboard using backtrackingKnights tour on chessboard using backtracking
Knights tour on chessboard using backtracking
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017
 
BeyondClassicalSearch.ppt
BeyondClassicalSearch.pptBeyondClassicalSearch.ppt
BeyondClassicalSearch.ppt
 

Último

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 

Último (20)

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 

Combining UCT and Constraint Satisfaction Problems for Minesweeper

  • 1. Optimistic Heuristics & Application to MineSweeper O. Buffet, W. Lin, O. Teytaud
  • 2. A great challenge: MineSweeper. - looks easy - in fact, not easy: many myopic (one- step-ahead) approaches. - partially observable
  • 3. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 4. RULES At the beginning, all locations are Covered (unkwown).
  • 6. Good news! No mine in the neighborhood! I can “click” all the neighbours.
  • 7. I have 3 uncovered neighbors, and I have 3 mines in the neighborhood ==> 3 flags!
  • 8.
  • 9. I know it's a mine, so I put a flag!
  • 11. I play here and I lose...
  • 12. The most successful game ever! Who in this room never played Mine- Sweeper ?
  • 13. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 14. Do you think it's easy ? (10 mines) MineSweeper is not simple.
  • 16. What is the optimal move ? Remark: the question makes sense, without Knowing the history. You don't need the history for playing optimaly. ==> (this fact is mathematically non trivial!)
  • 17. What is the optimal move ? This one is easy. Both remaining locations win with proba 50%.
  • 18. More difficult! Which move is optimal ? Here, the classical approach (CSP) is wrong.
  • 19. Probability of a mine ? - Top: - Middle: - Bottom:
  • 20. Probability of a mine ? - Top: 33% - Middle: - Bottom:
  • 21. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom:
  • 22. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33%
  • 23. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% ==> so all moves equivalent ?
  • 24. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% ==> so all moves equivalent ? ==> NOOOOO!!!
  • 25. Probability of a mine ? - Top: 33% - Middle: 33% - Bottom: 33% Top or bottom: 66% of win! Middle: 33%!
  • 26. The myopic (one-step ahead) approach plays randomly. The middle is a bad move! Even with same proba of mine, some moves are better than others!
  • 27. State of the art: - solved in 4x4 - NP-complete - Constraint Satisfaction Problem approach: = Find the location which is less likely to be a mine, play there. ==> 80% success “beginner” (9x9, 10 mines) ==> 45% success “intermediate” (16x16, 40 mines) ==> 34% success “expert” (30x40, 99 mines)
  • 28. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach (and other old known methods) 4. The UCT approach 5. The best of both worlds
  • 29. - Exact MDP: very expensive. 4x4 solved. - Single Point Strategy (SPS): simple local solving - CSP (constraint satisf. problem): the main approach. - (unknown) state: x(i) = 1 if there is a mine at location i - each visible location is a constraint: If location 15 is 4, then the constraint is x(04)+x(05)+x(06) +x(14)+ x(16) +x(24)+x(25)+x(26) = 4. - find all solutions x1, x2, x3,...,xN - P(mine in j) = (sumi Xij ) / N <== this is math. proved! - play j such that P(mine in j) minimal - if several such j, randomly break ties. MDP= Markov Decision Process CSP = Constraint Satisfaction Problem
  • 30. CSP as modified by Legendre et al, 2012: - (unknown) state: x(i) = 1 if there is a mine at location i - each visible location is a constraint: If location 15 is 4, then the constraint is x(04)+x(05)+x(06) +x(14)+ x(16) +x(24)+x(25)+x(26) = 4. - find all solutions x1, x2, x3,...,xN - P(mine in j) = (sumi Xij ) / N <== this is math. proved! - play j such that P(mine in j) minimal - if several such j, choose one “closest to the frontier” (proposed by Legendre et al) - if several such j, randomly break ties.
  • 31. CSP - is very fast - but it's not optimal - because of Here CSP plays randomly! Also for the initial move: don't play randomly the first move! (sometimes opening book)
  • 32. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 33. Why not UCT ? - looks like a stupid idea at first view - can not compete with CSP in terms of speed - But at least UCT is consistent: if given sufficient time, it will play optimally. - Tested in Couetoux and Teytaud, 2011
  • 34. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 35. UCT
  • 36. UCT
  • 37. UCT
  • 38. UCT
  • 39. UCT Kocsis & Szepesvari (06)
  • 41. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 42. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 43. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 44. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 45. UCT in one slide
  • 46. UCT in one slide C SP by se the al 2012 We u re et d Legen expansion for ulation . a nd sim
  • 47. Applying UCT here ? • Might look like a hammer for a drosophilia • But in many cases CSP is suboptimal • We have seen an example of suboptimal move by CSP a few slides ago • Let's see two additional examples
  • 48. An example showing that the initial move matters (UCT finds it, not CSP).. 3x3, 7 mines: the optimal move is anything but the center. Optimal winning rate: 25%. Optimal winning rate if random uniform initial move: 17/72. (yes we get 1/72 improvement!)
  • 49. Second such example: 15 mines on 5x5 board with GnoMine rule (i.e. initial move is a 0, i.e. no mine in the neighborhood) Optimal success rate = 100%!!!!! Play the center, and you win (well, you have to work...) The myopic CSP approach does not find it.
  • 50. 1. Rules of MineSweeper 2. State of the art 3. The CSP approach 4. The UCT approach 5. The best of both worlds
  • 51. Summary I have two approaches: • CSP: • Fast • Suboptimal (myopic, only 1-step ahead) • UCT: • needs a generative model (probability of next states, given my action), • Asymptotically optimal
  • 52. The best of both worlds ? • CSP: • Fast • Suboptimal (myopic, only 1-step ahead) • UCT: • needs a generative model by CSP, • Asymptotically optimal
  • 53. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 54. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 55. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 56. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 57. What do I need for implementing UCT ? A complete generative model. Given a state and an action, I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 58. We published a version of UCT for MineSweeper in which this was What do I need for implementing UCT ? implemented using A complete generative model. Given a state and an action, the rejection method only. I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 59. Rejection algorithm: 1- randomly draw the mines What do I need for implementing UCT ? Given 2- if and an action, return the new observation a state it's ok, A complete generative model. 3- otherwise, go back to 1. I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Example: given the state below, and the action “top left”, what are the possible next states ?
  • 60. It is mathematically ok, but it is too slow. Then,need for used a UCT ? CSP implementation. What do I we implementing weak A complete generative model. Given a state and an action, Still too slow. Now a reasonably fast implementation, with I must be able to simulate possible transitions. State S, Action a: (S,a) ==> S' Legendre et al heuristic. Example: given the state below, and the action “top left”, what are the possible next states ?
  • 61. EXPERIMENTAL RESULTS Huge computation 10 000 UCT-simulations time per move Our results (total = a few days)
  • 62. CONCLUSIONS: a methodology for sequential decision making - When you have a myopic solver (i.e. which neglects long term effects, as too often in industry!) ==> improve it with heuristics (as Legendre et al) ==> combine with UCT (as we did) ==> significant improvements - We have similar experiments on industrial testbeds
  • 63. Thanks for your attention! 9 Mines. What is the optimal move ?