SlideShare a Scribd company logo
1 of 103
Download to read offline
Monte-Carlo Tree Search (MCTS)
       for Computer Go

            Bruno Bouzy
    bruno.bouzy@parisdescartes.fr
      Université Paris Descartes

          Séminaire BigMC
            5 mai 2011
Outline

●   The game of Go: a 9x9 game
●   The « old » approach (*-2002)
●   The Monte-Carlo approach (2002-2005)
●   The MCTS approach (2006-today)
●   Conclusion


                   MCTS for Computer Go    2
The game of Go




   MCTS for Computer Go   3
The game of Go

●   4000 years
●   Originated from China
●
    Developed by Japan (20th century)
●   Best players in Korea, Japan, China
●   19x19: official board size
●   9x9: beginners' board size
                    MCTS for Computer Go   4
A 9x9 game
●   The board has 81 « intersections ». Initially,it
    is empty.




                      MCTS for Computer Go             5
A 9x9 game
●   Black moves first. A « stone » is played on
    an intersection.




                   MCTS for Computer Go           6
A 9x9 game
●   White moves second.




                 MCTS for Computer Go   7
A 9x9 game
●   Moves alternate between Black and White.




                  MCTS for Computer Go         8
A 9x9 game
●   Two adjacent stones of the same color
    builds a « string » with « liberties ».
●   4-adjacency




                    MCTS for Computer Go      9
A 9x9 game
●   Strings are created.




                   MCTS for Computer Go   10
A 9x9 game
●   A white stone is in « atari » (one liberty).




                     MCTS for Computer Go          11
A 9x9 game
●   The white string has five liberties.




                    MCTS for Computer Go   12
A 9x9 game
●   The black stone is « atari ».




                    MCTS for Computer Go   13
A 9x9 game
●   White « captures » the black stone.




                   MCTS for Computer Go   14
A 9x9 game
●   For human players, the game is over.

       –   Hu?

       –   Why?




                    MCTS for Computer Go   15
A 9x9 game
●   What happens if White contests black
    « territory »?




                   MCTS for Computer Go    16
A 9x9 game
●   White has invaded. Two strings are atari!




                   MCTS for Computer Go         17
A 9x9 game
●   Black captures !




                   MCTS for Computer Go   18
A 9x9 game
●   White insists but its string is atari...




                      MCTS for Computer Go     19
A 9x9 game
●   Black has proved is « territory ».




                    MCTS for Computer Go   20
A 9x9 game
●   Black may contest white territory too.




                    MCTS for Computer Go     21
A 9x9 terminal position
●   The game is over for computers.

       –   Hu?

       –   Who won ?




                       MCTS for Computer Go   22
A 9x9 game
●   The game ends when both players pass.
●   One black (resp. white) point for each black
    (resp. white) stone and each black (resp.
    white) « eye » on the board.
●   One black (resp. white) eye = an empty
    intersection surrounded by black (resp.
    white) stones.

                   MCTS for Computer Go            23
A 9x9 game
●   Scoring:
       –   Black = 44
       –   White = 37
       –   Komi = 7.5
       –   Score = -0.5
●   White wins!


                        MCTS for Computer Go   24
Go ranking: « kyu » and « dan »
                            Pro ranking                 Amateur ranking

 Top professional players
                                    9 dan                      9 dan

  Very strong players              1 dan                        6 dan

                                                                1 dan
                                       Strong players           1 kyu

                                    Average playersl            10 kyu

                                            Beginners           20 kyu


                                     Very beginners             30 kyu
                             MCTS for Computer Go                         25
Computer Go (old history)
●   First go program (Lefkovitz 1960)
●   Zobrist hashing (Zobrist 1969)
●   Interim2 (Wilcox 1979)
●   Life and death model (Benson 1988)
●   Patterns: Goliath (Boon 1990)
●   Mathematical Go (Berlekamp 1991)
●   Handtalk (Chen 1995)
                    MCTS for Computer Go   26
The old approach
●   Evaluation of non terminal positions
        –   Knowledge-based
        –   Breaking-down of a position into sub-
             positions
●   Fixed-depth global tree search
        –   Depth = 0 : action with the best value
        –   Depth = 1: action leading to the position
             with the best evaluation
        –   Depth > 1: alfa-beta or minmax
                       MCTS for Computer Go             27
The old approach
                             Current position
2 or 3

Bounded depth Tree search
                                          Evaluation of
                                          non terminal positions




                                                               Huhu?
361

                        MCTS for Computer Go              Terminal positions
                                                                          28
Position evaluation
●   Break-down
       –   Whole game (win/loss or score)
       –   Goal-oriented sub-game
               ●   String capture
               ●   Connections, dividers, eyes, life and death
●   Local searches
       –   Alpha-beta and enhancements
       –   Proof-number search
                         MCTS for Computer Go                    29
A 19x19 middle-game position




          MCTS for Computer Go   30
A possible black break-down




          MCTS for Computer Go   31
A possible white break-down




          MCTS for Computer Go   32
Possible local evaluations (1)
                                                   unstable

                         Alive and territory


 Not important                                     alive




                                                  dead


                 alive
                           MCTS for Computer Go               33
Possible local evaluations (2)
      alive


                                          unstable




    unstable




          alive + big territory             unstable




                   MCTS for Computer Go                34
Position evaluation
●   Local results
        –   Obtained with local tree search
        –   Result if white plays first (resp. black)
        –   Combinatorial game theory (Conway)
        –   Switches {a|b}, >, <, *, 0
●   Global recomposition
        –   move generation and evaluation
        –   position evaluation
                        MCTS for Computer Go            35
Position evaluation




     MCTS for Computer Go   36
Drawbacks (1/2)
●   The break-down is not unique
●   Performing a (wrong) local tree search on a
    (possibly irrelevant) local position
●   Misevaluating the size of the local position
●   Different kinds of local information
        –   Symbolic (group: dead alive unstable)
        –   Numerical (territory size, reduction,
             increase)
                        MCTS for Computer Go        37
Drawbacks (2/2)
●   Local positions interact
●   Complicated
●   Domain-dependent knowledge
●   Need of human expertise
●   Difficult to program and maintain
●   Holes of knowledge
●   Erratic behaviour
                    MCTS for Computer Go   38
Upsides
●   Feasible on 1990's computers
●   Execution is fast

●   Some specific local tree searches are
    accurate and fast



                    MCTS for Computer Go    39
The old approach
                            Pro ranking                 Amateur ranking

 Top professional players
                                    9 dan                      9 dan

 Very strong players               1 dan                        6 dan

                                                                1 dan
                       Strong players                           1 kyu

                                    Average playersl            10 kyu

                                            Beginners           20 kyu
Old
approach                             Very beginners             30 kyu
                             MCTS for Computer Go                         40
End of part one!
●   Next: the Monte-Carlo approach...




                   MCTS for Computer Go   41
The Monte-Carlo (MC) approach
●   Games containing chance
       –   Backgammon (Tesauro 1989)

●   Games with hidden information
       –   Bridge (Ginsberg 2001)
       –   Poker (Billings & al. 2002)
       –   Scrabble (Sheppard 2002)

                       MCTS for Computer Go   42
The Monte-Carlo approach
●   Games with complete information
       –   A general model (Abramson 1990)

●   Simulated annealing Go
       –   (Brügmann 1993)
       –   2 sequences of moves
       –   « all moves as first » heuristic
       –   Gobble on 9x9
                       MCTS for Computer Go   43
The Monte-Carlo approach
●   Position evaluation:
         Launch N random games
         Evaluation = mean value of outcomes
●   Depth-one MC algorithm:
         For each move m {
               Play m on the ref position
               Launch N random games
               Move value (m) = mean value
         }
                     MCTS for Computer Go      44
Depth-one Monte-Carlo


                         ...


...    ...                     ...




       MCTS for Computer Go          45
Progressive pruning
●   (Billings 2002, Sheppard 2002, Bouzy &
    Helmstetter 2003)

                                          Second best

                                          Current best move

                                          Still explored

                                          Pruned

                   MCTS for Computer Go                    46
Upper bound
●   Optimism in face of uncertainty
       –   Intestim (Kaelbling 1993),
       –   UCB multi-armed bandit (Auer & al 2002)
                                  Second best promising

                                  Current best proven move

                                  Current best promising move



                      MCTS for Computer Go                   47
All-moves-as-first heuristic (1/3)
                                              B   C
                   r
                                      A       D


  A        B                   C          D




               MCTS for Computer Go                   48
All-moves-as-first heuristic (2/3)
                                                            B   C
                                 r
                                                    A       D
                 A

  A                      B                   C          D

      B


      C       Actual
              simulation
      D
                     B       C
          o      A   D

                             MCTS for Computer Go                   49
All-moves-as-first heuristic (3/3)
                                                                      B   C
                                   r
                                                              A       D
                  A                    C
  A                        B                   C                  D

      B                                            B


      C       Actual                               A       Virtual simulation =
              simulation                                   actual simulation
                                                           assuming c is played
      D                                            D
                       B
                                                           « as first »
                               C
          o        A   D                               o

                               MCTS for Computer Go                           50
The Monte-Carlo approach
●   Upsides
       –   Robust evaluation
       –   Global search
       –   Move quality increases with computing power
●   Way of playing
       –   Good strategical sense but weak tactically
●   Easy to program
       –   Follow the rules of the game
       –   No break-down problem
                       MCTS for Computer Go             51
Monte-Carlo and knowledge
●   Pseudo-random simulations using Go
    knowledge (Bouzy 2003)
       –   Moves played with a probability depending on
            specific domain-dependent knowledge
●   2 basic concepts
       –   string capture                  3x3 shapes




                        MCTS for Computer Go            52
Monte-Carlo and knowledge
●   Results are impressive
       –   MC(random) << MC(pseudo random)
       –   Size      9x9               13x13   19x19
       –   % wins    68                93      98
●   Other works on simulations
       –   Patterns in MoGo, proximity rule (Wang & al
            2006)
       –   Simulation balancing (Silver & Tesauro 2009)
                       MCTS for Computer Go              53
Monte-Carlo and knowledge
●   Pseudo-random player
       –   3x3 pattern urgency table with 38 patterns
       –   Few dizains of relevant patterns only
       –   Patterns gathered by
               ●   Human expertise
               ●   Reinforcement Learning (Bouzy & Chaslot
                    2006)
●   Warning
       –   p1 better than p2 does not mean MC(p1)
            better than MC(p2)
                          MCTS for Computer Go               54
Monte-Carlo Tree Search (MCTS)
●   How to integrate MC and TS ?
●   UCT = UCB for Trees
       –   (Kocsis & Szepesvari 2006)
       –   Superposition of UCB (Auer & al 2002)
●   MCTS
       –   Selection, expansion, updating (Chaslot & al)
            (Coulom 2006)
       –   Simulation (Bouzy 2003) (Wang & Gelly
             2006)
                       MCTS for Computer Go            55
MCTS (1/2)
while (hasTime) {
     playOutTreeBasedGame()
     expandTree()
     outcome = playOutRandomGame()
     updateNodes(outcome)
}
then choose the node with...
     ... the best mean value
     ... the highest visit number
                  MCTS for Computer Go   56
MCTS (2/2)
PlayOutTreeBasedGame() {
    node = getNode(position)
    while (node) {
         move=selectMove(node)
         play(move)
         node = getNode(position)
    }
}

              MCTS for Computer Go   57
UCT move selection
●   Move selection rule to browse the tree:
            move=argmax (s*mean + C*sqrt(log(t)/n)
●   Mean value for exploitation
        –   s (=+-1): color to move
●   UCT bias for exploration
        –   C: constant term set up by experiments
        –   t: number of visits of the parent node
        –   n: number of visits of the current node
                         MCTS for Computer Go         58
Example
●   1 iteration                          1/1




                                           1


                  MCTS for Computer Go         59
Example
●   2 iterations                          1/2



                                            0/1




                                                  0


                   MCTS for Computer Go               60
Example
●   3 iterations                              2/3



                                   1/1          0/1




                                          1

                   MCTS for Computer Go               61
Example

                                          2/4
●   4 iterations
                       0/1         1/1      0/1




                   MCTS for Computer Go           62
Example

                                          3/5
●   5 iterations
                      0/1          1/1      0/1   1/1




                   MCTS for Computer Go                 63
Example
●   6 iterations
                                          3/6



                      0/1         1/2       0/1   1/1



                                     0/1




                   MCTS for Computer Go                 64
Example
●   7 iterations                          3/7



                      0/1          1/2      0/1   1/2



                                      0/1               0/1




                   MCTS for Computer Go                       65
Example
●   8 iterations                          4/8



                      0/1          2/3      0/1   1/2



                            1/1       0/1               0/1




                   MCTS for Computer Go                       66
Example
●   9 iterations                           4/9



                         0/1         2/4     0/1   1/2


                   0/1         1/1     0/1               0/1




                   MCTS for Computer Go                        67
Example
●   10 iterations                           5/10




                          0/1         2/4      0/1         2/3


                    0/1         1/1     0/1          1/1         0/1




                    MCTS for Computer Go                               68
Example
●   11 iterations                           6/11




                          0/1         2/4      0/1         3/4


                    0/1         1/1     0/1          1/1         0/1   1/1




                     MCTS for Computer Go                                    69
Example
●   12 iterations                           7/12




                          0/1         2/4      0/1         4/5


                    0/1         1/1     0/1          1/1         1/2   1/1



                                                       1/1




                    MCTS for Computer Go                                     70
Example
●   Clarity
        –   C=0
●   Notice
        –   with C != 0 a node cannot stay unvisited
        –   min or max rule according to the node depth
        –   not visited children have an infinite mean
●   Practice
        –   Mean initialized optimistically
                        MCTS for Computer Go             71
MCTS enhancements
●   The raw version can be enhanced
       –   Tuning UCT C value
       –   Outcome = score or win loss info (+1/-1)
       –   Doubling the simulation number
       –   RAVE
       –   Using Go knowledge
               ●   In the tree or in the simulations
       –   Speed-up
               ●   Optimizing, pondering, parallelizing
                          MCTS for Computer Go            72
Assessing an enhancement
●   Self-play
        –   The new version vs the reference version
        –   % wins with few hundred games
        –   9x9 (or 19x19 boards)
●   Against differently designed programs
        –   GTP (Go Text Protocol)
        –   CGOS (Computer Go Operating System)
●   Competitions
                       MCTS for Computer Go            73
Move selection formula tuning
●   Using UCB
       –   Best value for C ?
       –   60-40%

●   Using « UCB-tuned » (Auer & al 2002)
       –   C replaced by min(1/4,variance)
       –   55-45%

                      MCTS for Computer Go   74
Exploration vs exploitation
●   General idea: explore at the beginning and
    exploit in the end of thinking time
●   Diminishing C linearly in the remaining time
        –   (Vermorel & al 2005)
        –   55-45%
●   At the end:
        –   Argmax over the mean value or over the
             number of visits ?
        –   55-45%     MCTS for Computer Go          75
Kind of outcome
●   2 kinds of outcomes
       –   Score (S) or win loss information (WLI) ?
       –   Probability of winning or expected score ?
       –   Combining both (S+WLI) (score +45 if win)
●   Results
       –   WLI vs S      65-35%
       –   S+WLI vs S 65-35%

                      MCTS for Computer Go              76
Doubling the number of
             simulations
●   N = 100,000

●   Results
       –   2N vs N       60-40%
       –   4N vs 2N      58-42%


                      MCTS for Computer Go   77
Tree management
●   Transposition tables
       –   Tree -> Directed Acyclic Graph (DAG)
       –   Different sequences of moves may lead to
            the same position
       –   Interest for MC Go: merge the results
       –   Result: 60-40%
●   Keeping the tree from one move to the next
       –   Result: 65-35%
                      MCTS for Computer Go            78
RAVE (1/3)
●   Rapid Action Value Estimation
       –   Mogo 2007
       –   Use the AMAF heuristic (Brugmann 1993)
       –   There are « many » virtual sequences that
            are transposed from the actually played
            sequence
●   Result:
       –   70-30%
                       MCTS for Computer Go            79
RAVE (2/3)
●   AMAF heuristic
●   Which nodes to update?                  A           B

●   Actual                                          C
                                       C    D               D
        –   Sequence ACBD
        –   Nodes                               B   A
                                                            A
                                      B
●   Virtual
        –   BCAD, ADBC, BDAC                D           C

        –   Nodes
                     MCTS for Computer Go                       80
RAVE (3/3)
●   3 variables
        –   Usual mean value Mu
        –   AMAF mean value Mamaf
        –   M = β Mamaf + (1-β) Mu
        –   β = sqrt(k/(k+3N))
        –   K set up experimentally
●   M varies from Mamaf to Mu
                       MCTS for Computer Go   81
Knowledge in the simulations
●   High urgency for...
        –   capture/escape                        55-45%
        –   3x3 patterns                          60-40%
        –   Proximity rule                        60-40%
●   Mercy rule
        –   Interrupt the game when the difference of
              captured stones is greater than a
              threshold (Hillis 2006)
        –   51-49%         MCTS for Computer Go            82
Knowledge in the tree

●   Virtual wins for good looking moves
●   Automatic acquisition of patterns of pro
    games (Coulom 2007) (Bouzy & Chaslot 2005)
●   Matching has a high cost
●   Progressive widening (Chaslot & al 2008)
●   Interesting under strong time constraints
●   Result: 60-40%
                     MCTS for Computer Go        83
Speeding up the simulations
●   Fully random simulations (2007)
       –   50,000 game/second (Lew 2006)
       –   20,000 (commonly eared)
       –   10,000 (my program)
●   Pseudo-random
       –   5,000 (my program in 2007)
●   Rough optimization is worthwhile
                     MCTS for Computer Go   84
Pondering

●   Think on the opponent time
       –   55-45%
       –   Possible doubling of thinking time
       –   The move of the opponent may not be
            the planned move on which you think
       –   Side effect: play quickly to think on the
             opponent time

                     MCTS for Computer Go              85
Summing up the enhancements
●   MCTS with all enhancements vs raw MCTS
       –   Exploration and exploitation:            60-40%
       –   Win/loss outcome:                        65-35%
       –   Rough optimization of simulations        60-40%
       –   Transposition table                      60-40%
       –   RAVE                                     70-30%
       –   Knowledge in the simulations             70-30%
       –   Knowledge in the tree                    60-40%
       –   Pondering                                55-45%
       –   Parallelization                          70-30%

●   Result: 99-1%
                             MCTS for Computer Go            86
Parallelization
●   Computer Chess: Deep Blue
●   Multi-core computer
       –   Symmetric MultiProcessor (SMP)
       –   one thread per processor
       –   shared memory, low latency
       –   mutual exclusion (mutex) mechanism
●   Cluster of computers
       –   Message Passing Information (MPI)
                     MCTS for Computer Go       87
Parallelization

while (hasTime) {
    playOutTreeBasedGame()
    expandTree()
    outcome = playOutRandomGame()
    updateNodes(outcome)
}


             MCTS for Computer Go   88
Leaf parallelization




      MCTS for Computer Go   89
Leaf parallelization
●   (Cazenave Jouandeau 2007)
●   Easy to program
●   Drawbacks
       –   Wait for the longest simulation
       –   When part of the simulation outcomes is a
            loss, performing the remaining may not
            be a relevant strategy.

                      MCTS for Computer Go             90
Root parallelization




      MCTS for Computer Go   91
Root parallelization
●   (Cazenave Jouandeau 2007)
●   Easy to program
●   No communication
●   At completion, merge the trees
●   4 MCTS for 1sec > 1 MCTS for 4 sec
●   Good way for low time settings and a small
    number of threads
                   MCTS for Computer Go          92
Tree parallelization – global mutex




             MCTS for Computer Go   93
Tree parallelization – local mutex




             MCTS for Computer Go    94
Tree parallelization
●   One shared tree, several threads
●   Mutex
        –   Global: the whole tree has a mutex
        –   Local: each node has a mutex
●   « Virtual loss »
        –   Given to a node browsed by a thread
        –   Removed at update stage
        –   Preventing threads from similar simulations
                       MCTS for Computer Go               95
Computer-computer results
●   Computer Olympiads
                        19x19                      9x9
       –   2010 Erica, Zen, MFGo                MyGoFriend
       –   2009 Zen, Fuego, Mogo                  Fuego
       –   2008 MFGo, Mogo, Leela                  MFGo
       –   2007 Mogo, CrazyStone, GNU Go        Steenvreter
       –   2006 GNU Go, Go Intellect, Indigo    CrazyStone
       –   2005 Handtalk, Go Intellect, Aya     Go Intellect
       –   2004 Go Intellect, MFGo, Indigo      Go Intellect
                         MCTS for Computer Go                  96
Human-computer results
●   9x9
          –   2009: Mogo won a pro with black
          –   2009: Fuego won a pro with white
●   19x19:
          –   2008: Mogo won a pro with 9 stones
                   Crazy Stone won a pro with 8 stones
                   Crazy Stone won a pro with 7 stones
          –   2009: Mogo won a pro with 6 stones
                        MCTS for Computer Go             97
MCTS and the old approach
                       Pro ranking                     Amateur ranking

 Top professional players
                                   9 dan                      9 dan
                                              9x9 go
 Very strong players              1 dan                        6 dan

                                                               1 dan     19x19 go
                                    Strong players             1 kyu

                                   Average players             10 kyu

MCTS                                       Beginners           20 kyu

Old
                                    Very beginners             30 kyu
approach
                            MCTS for Computer Go                               98
Computer Go (MC history)

●   Monte-Carlo Go (Brugmann 1993)
●   MCGo devel. (Bouzy & Helmstetter 2003)
●   MC+knowledge (Bouzy 2003)
●   UCT (Kocsis & Szepesvari 2006)
●   Crazy Stone (Coulom 2006)
●   Mogo (Wang & Gelly 2006)
                  MCTS for Computer Go       99
Conclusion
●   Monte-Carlo brought a Big
    improvement in Computer Go over the last
    decade!
       –   No old approach based program anymore!
       –   All go programs are MCTS based!
       –   Professional level on 9x9!
       –   Dan level on 19x19!
●   Unbelievable 10 years ago!
                      MCTS for Computer Go          100
Some references

●   PhD, MCTS and Go (Chaslot 2010)
●   PhD, Reinf. Learning and Go (Silver 2010)
●   PhD, R. Learning: applic. to Go (Gelly 2007)
●   UCT (Kocsis & Szepesvari 2006)
●
    1st MCTS go program (Coulom 2006)


                    MCTS for Computer Go           101
Web links
●   http://www.grappa.univ-lille3.fr/icga/
●   http://cgos.boardspace.net/
●   http://www.gokgs.com/
●   http://www.lri.fr/~gelly/MoGo.htm
●   http://remi.coulom.free.fr/CrazyStone/
●   http://fuego.sourceforge.net/
●   ...
                    MCTS for Computer Go     102
Thank you for your attention!




          MCTS for Computer Go   103

More Related Content

What's hot

What's hot (20)

The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
LAFS Game Mechanics - Resource Management Mechanics
LAFS Game Mechanics - Resource Management MechanicsLAFS Game Mechanics - Resource Management Mechanics
LAFS Game Mechanics - Resource Management Mechanics
 
LAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative ElementsLAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative Elements
 
Artificial Intelligence in Gaming.pptx
Artificial Intelligence in Gaming.pptxArtificial Intelligence in Gaming.pptx
Artificial Intelligence in Gaming.pptx
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Game Analytics: Opening the Black Box
Game Analytics: Opening the Black BoxGame Analytics: Opening the Black Box
Game Analytics: Opening the Black Box
 
11 Awesome Quotes About Game Design
11 Awesome Quotes About Game Design11 Awesome Quotes About Game Design
11 Awesome Quotes About Game Design
 
LAFS Game Mechanics - The Core Mechanic
LAFS Game Mechanics - The Core MechanicLAFS Game Mechanics - The Core Mechanic
LAFS Game Mechanics - The Core Mechanic
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
1-Introduction (Game Design and Development)
1-Introduction (Game Design and Development)1-Introduction (Game Design and Development)
1-Introduction (Game Design and Development)
 
Horizon Zero Dawn: A Game Design Post-Mortem
Horizon Zero Dawn: A Game Design Post-MortemHorizon Zero Dawn: A Game Design Post-Mortem
Horizon Zero Dawn: A Game Design Post-Mortem
 
LAFS Game Mechanics - Social Mechanics
LAFS Game Mechanics - Social MechanicsLAFS Game Mechanics - Social Mechanics
LAFS Game Mechanics - Social Mechanics
 
Game Design Fundamentals: The Formal Elements
Game Design Fundamentals: The  Formal ElementsGame Design Fundamentals: The  Formal Elements
Game Design Fundamentals: The Formal Elements
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Unity Roadmap 2020: Live games
Unity Roadmap 2020: Live games Unity Roadmap 2020: Live games
Unity Roadmap 2020: Live games
 
Process of Game Design
Process of Game DesignProcess of Game Design
Process of Game Design
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image SegmentationU-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
 
Online games
Online gamesOnline games
Online games
 
Octalysis Gamification Design for Tesla (Intro)
Octalysis Gamification Design for Tesla (Intro)Octalysis Gamification Design for Tesla (Intro)
Octalysis Gamification Design for Tesla (Intro)
 

Viewers also liked

Viewers also liked (20)

Mcts ai
Mcts aiMcts ai
Mcts ai
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Applying fuzzy control in fighting game ai
Applying fuzzy control in fighting game aiApplying fuzzy control in fighting game ai
Applying fuzzy control in fighting game ai
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
Monte carlo tree search
Monte carlo tree searchMonte carlo tree search
Monte carlo tree search
 
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
 
2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competition2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competition
 
2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competition2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competition
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Bayesian intro
Bayesian introBayesian intro
Bayesian intro
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r intro
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
An introduction to bayesian statistics
An introduction to bayesian statisticsAn introduction to bayesian statistics
An introduction to bayesian statistics
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems Concepts
 

More from BigMC

Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)
BigMC
 

More from BigMC (12)

Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithms
 
Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMC
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Estimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneEstimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienne
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 

"Monte-Carlo Tree Search for the game of Go"

  • 1. Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université Paris Descartes Séminaire BigMC 5 mai 2011
  • 2. Outline ● The game of Go: a 9x9 game ● The « old » approach (*-2002) ● The Monte-Carlo approach (2002-2005) ● The MCTS approach (2006-today) ● Conclusion MCTS for Computer Go 2
  • 3. The game of Go MCTS for Computer Go 3
  • 4. The game of Go ● 4000 years ● Originated from China ● Developed by Japan (20th century) ● Best players in Korea, Japan, China ● 19x19: official board size ● 9x9: beginners' board size MCTS for Computer Go 4
  • 5. A 9x9 game ● The board has 81 « intersections ». Initially,it is empty. MCTS for Computer Go 5
  • 6. A 9x9 game ● Black moves first. A « stone » is played on an intersection. MCTS for Computer Go 6
  • 7. A 9x9 game ● White moves second. MCTS for Computer Go 7
  • 8. A 9x9 game ● Moves alternate between Black and White. MCTS for Computer Go 8
  • 9. A 9x9 game ● Two adjacent stones of the same color builds a « string » with « liberties ». ● 4-adjacency MCTS for Computer Go 9
  • 10. A 9x9 game ● Strings are created. MCTS for Computer Go 10
  • 11. A 9x9 game ● A white stone is in « atari » (one liberty). MCTS for Computer Go 11
  • 12. A 9x9 game ● The white string has five liberties. MCTS for Computer Go 12
  • 13. A 9x9 game ● The black stone is « atari ». MCTS for Computer Go 13
  • 14. A 9x9 game ● White « captures » the black stone. MCTS for Computer Go 14
  • 15. A 9x9 game ● For human players, the game is over. – Hu? – Why? MCTS for Computer Go 15
  • 16. A 9x9 game ● What happens if White contests black « territory »? MCTS for Computer Go 16
  • 17. A 9x9 game ● White has invaded. Two strings are atari! MCTS for Computer Go 17
  • 18. A 9x9 game ● Black captures ! MCTS for Computer Go 18
  • 19. A 9x9 game ● White insists but its string is atari... MCTS for Computer Go 19
  • 20. A 9x9 game ● Black has proved is « territory ». MCTS for Computer Go 20
  • 21. A 9x9 game ● Black may contest white territory too. MCTS for Computer Go 21
  • 22. A 9x9 terminal position ● The game is over for computers. – Hu? – Who won ? MCTS for Computer Go 22
  • 23. A 9x9 game ● The game ends when both players pass. ● One black (resp. white) point for each black (resp. white) stone and each black (resp. white) « eye » on the board. ● One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones. MCTS for Computer Go 23
  • 24. A 9x9 game ● Scoring: – Black = 44 – White = 37 – Komi = 7.5 – Score = -0.5 ● White wins! MCTS for Computer Go 24
  • 25. Go ranking: « kyu » and « dan » Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyu Very beginners 30 kyu MCTS for Computer Go 25
  • 26. Computer Go (old history) ● First go program (Lefkovitz 1960) ● Zobrist hashing (Zobrist 1969) ● Interim2 (Wilcox 1979) ● Life and death model (Benson 1988) ● Patterns: Goliath (Boon 1990) ● Mathematical Go (Berlekamp 1991) ● Handtalk (Chen 1995) MCTS for Computer Go 26
  • 27. The old approach ● Evaluation of non terminal positions – Knowledge-based – Breaking-down of a position into sub- positions ● Fixed-depth global tree search – Depth = 0 : action with the best value – Depth = 1: action leading to the position with the best evaluation – Depth > 1: alfa-beta or minmax MCTS for Computer Go 27
  • 28. The old approach Current position 2 or 3 Bounded depth Tree search Evaluation of non terminal positions Huhu? 361 MCTS for Computer Go Terminal positions 28
  • 29. Position evaluation ● Break-down – Whole game (win/loss or score) – Goal-oriented sub-game ● String capture ● Connections, dividers, eyes, life and death ● Local searches – Alpha-beta and enhancements – Proof-number search MCTS for Computer Go 29
  • 30. A 19x19 middle-game position MCTS for Computer Go 30
  • 31. A possible black break-down MCTS for Computer Go 31
  • 32. A possible white break-down MCTS for Computer Go 32
  • 33. Possible local evaluations (1) unstable Alive and territory Not important alive dead alive MCTS for Computer Go 33
  • 34. Possible local evaluations (2) alive unstable unstable alive + big territory unstable MCTS for Computer Go 34
  • 35. Position evaluation ● Local results – Obtained with local tree search – Result if white plays first (resp. black) – Combinatorial game theory (Conway) – Switches {a|b}, >, <, *, 0 ● Global recomposition – move generation and evaluation – position evaluation MCTS for Computer Go 35
  • 36. Position evaluation MCTS for Computer Go 36
  • 37. Drawbacks (1/2) ● The break-down is not unique ● Performing a (wrong) local tree search on a (possibly irrelevant) local position ● Misevaluating the size of the local position ● Different kinds of local information – Symbolic (group: dead alive unstable) – Numerical (territory size, reduction, increase) MCTS for Computer Go 37
  • 38. Drawbacks (2/2) ● Local positions interact ● Complicated ● Domain-dependent knowledge ● Need of human expertise ● Difficult to program and maintain ● Holes of knowledge ● Erratic behaviour MCTS for Computer Go 38
  • 39. Upsides ● Feasible on 1990's computers ● Execution is fast ● Some specific local tree searches are accurate and fast MCTS for Computer Go 39
  • 40. The old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyu Old approach Very beginners 30 kyu MCTS for Computer Go 40
  • 41. End of part one! ● Next: the Monte-Carlo approach... MCTS for Computer Go 41
  • 42. The Monte-Carlo (MC) approach ● Games containing chance – Backgammon (Tesauro 1989) ● Games with hidden information – Bridge (Ginsberg 2001) – Poker (Billings & al. 2002) – Scrabble (Sheppard 2002) MCTS for Computer Go 42
  • 43. The Monte-Carlo approach ● Games with complete information – A general model (Abramson 1990) ● Simulated annealing Go – (Brügmann 1993) – 2 sequences of moves – « all moves as first » heuristic – Gobble on 9x9 MCTS for Computer Go 43
  • 44. The Monte-Carlo approach ● Position evaluation: Launch N random games Evaluation = mean value of outcomes ● Depth-one MC algorithm: For each move m { Play m on the ref position Launch N random games Move value (m) = mean value } MCTS for Computer Go 44
  • 45. Depth-one Monte-Carlo ... ... ... ... MCTS for Computer Go 45
  • 46. Progressive pruning ● (Billings 2002, Sheppard 2002, Bouzy & Helmstetter 2003) Second best Current best move Still explored Pruned MCTS for Computer Go 46
  • 47. Upper bound ● Optimism in face of uncertainty – Intestim (Kaelbling 1993), – UCB multi-armed bandit (Auer & al 2002) Second best promising Current best proven move Current best promising move MCTS for Computer Go 47
  • 48. All-moves-as-first heuristic (1/3) B C r A D A B C D MCTS for Computer Go 48
  • 49. All-moves-as-first heuristic (2/3) B C r A D A A B C D B C Actual simulation D B C o A D MCTS for Computer Go 49
  • 50. All-moves-as-first heuristic (3/3) B C r A D A C A B C D B B C Actual A Virtual simulation = simulation actual simulation assuming c is played D D B « as first » C o A D o MCTS for Computer Go 50
  • 51. The Monte-Carlo approach ● Upsides – Robust evaluation – Global search – Move quality increases with computing power ● Way of playing – Good strategical sense but weak tactically ● Easy to program – Follow the rules of the game – No break-down problem MCTS for Computer Go 51
  • 52. Monte-Carlo and knowledge ● Pseudo-random simulations using Go knowledge (Bouzy 2003) – Moves played with a probability depending on specific domain-dependent knowledge ● 2 basic concepts – string capture 3x3 shapes MCTS for Computer Go 52
  • 53. Monte-Carlo and knowledge ● Results are impressive – MC(random) << MC(pseudo random) – Size 9x9 13x13 19x19 – % wins 68 93 98 ● Other works on simulations – Patterns in MoGo, proximity rule (Wang & al 2006) – Simulation balancing (Silver & Tesauro 2009) MCTS for Computer Go 53
  • 54. Monte-Carlo and knowledge ● Pseudo-random player – 3x3 pattern urgency table with 38 patterns – Few dizains of relevant patterns only – Patterns gathered by ● Human expertise ● Reinforcement Learning (Bouzy & Chaslot 2006) ● Warning – p1 better than p2 does not mean MC(p1) better than MC(p2) MCTS for Computer Go 54
  • 55. Monte-Carlo Tree Search (MCTS) ● How to integrate MC and TS ? ● UCT = UCB for Trees – (Kocsis & Szepesvari 2006) – Superposition of UCB (Auer & al 2002) ● MCTS – Selection, expansion, updating (Chaslot & al) (Coulom 2006) – Simulation (Bouzy 2003) (Wang & Gelly 2006) MCTS for Computer Go 55
  • 56. MCTS (1/2) while (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome) } then choose the node with... ... the best mean value ... the highest visit number MCTS for Computer Go 56
  • 57. MCTS (2/2) PlayOutTreeBasedGame() { node = getNode(position) while (node) { move=selectMove(node) play(move) node = getNode(position) } } MCTS for Computer Go 57
  • 58. UCT move selection ● Move selection rule to browse the tree: move=argmax (s*mean + C*sqrt(log(t)/n) ● Mean value for exploitation – s (=+-1): color to move ● UCT bias for exploration – C: constant term set up by experiments – t: number of visits of the parent node – n: number of visits of the current node MCTS for Computer Go 58
  • 59. Example ● 1 iteration 1/1 1 MCTS for Computer Go 59
  • 60. Example ● 2 iterations 1/2 0/1 0 MCTS for Computer Go 60
  • 61. Example ● 3 iterations 2/3 1/1 0/1 1 MCTS for Computer Go 61
  • 62. Example 2/4 ● 4 iterations 0/1 1/1 0/1 MCTS for Computer Go 62
  • 63. Example 3/5 ● 5 iterations 0/1 1/1 0/1 1/1 MCTS for Computer Go 63
  • 64. Example ● 6 iterations 3/6 0/1 1/2 0/1 1/1 0/1 MCTS for Computer Go 64
  • 65. Example ● 7 iterations 3/7 0/1 1/2 0/1 1/2 0/1 0/1 MCTS for Computer Go 65
  • 66. Example ● 8 iterations 4/8 0/1 2/3 0/1 1/2 1/1 0/1 0/1 MCTS for Computer Go 66
  • 67. Example ● 9 iterations 4/9 0/1 2/4 0/1 1/2 0/1 1/1 0/1 0/1 MCTS for Computer Go 67
  • 68. Example ● 10 iterations 5/10 0/1 2/4 0/1 2/3 0/1 1/1 0/1 1/1 0/1 MCTS for Computer Go 68
  • 69. Example ● 11 iterations 6/11 0/1 2/4 0/1 3/4 0/1 1/1 0/1 1/1 0/1 1/1 MCTS for Computer Go 69
  • 70. Example ● 12 iterations 7/12 0/1 2/4 0/1 4/5 0/1 1/1 0/1 1/1 1/2 1/1 1/1 MCTS for Computer Go 70
  • 71. Example ● Clarity – C=0 ● Notice – with C != 0 a node cannot stay unvisited – min or max rule according to the node depth – not visited children have an infinite mean ● Practice – Mean initialized optimistically MCTS for Computer Go 71
  • 72. MCTS enhancements ● The raw version can be enhanced – Tuning UCT C value – Outcome = score or win loss info (+1/-1) – Doubling the simulation number – RAVE – Using Go knowledge ● In the tree or in the simulations – Speed-up ● Optimizing, pondering, parallelizing MCTS for Computer Go 72
  • 73. Assessing an enhancement ● Self-play – The new version vs the reference version – % wins with few hundred games – 9x9 (or 19x19 boards) ● Against differently designed programs – GTP (Go Text Protocol) – CGOS (Computer Go Operating System) ● Competitions MCTS for Computer Go 73
  • 74. Move selection formula tuning ● Using UCB – Best value for C ? – 60-40% ● Using « UCB-tuned » (Auer & al 2002) – C replaced by min(1/4,variance) – 55-45% MCTS for Computer Go 74
  • 75. Exploration vs exploitation ● General idea: explore at the beginning and exploit in the end of thinking time ● Diminishing C linearly in the remaining time – (Vermorel & al 2005) – 55-45% ● At the end: – Argmax over the mean value or over the number of visits ? – 55-45% MCTS for Computer Go 75
  • 76. Kind of outcome ● 2 kinds of outcomes – Score (S) or win loss information (WLI) ? – Probability of winning or expected score ? – Combining both (S+WLI) (score +45 if win) ● Results – WLI vs S 65-35% – S+WLI vs S 65-35% MCTS for Computer Go 76
  • 77. Doubling the number of simulations ● N = 100,000 ● Results – 2N vs N 60-40% – 4N vs 2N 58-42% MCTS for Computer Go 77
  • 78. Tree management ● Transposition tables – Tree -> Directed Acyclic Graph (DAG) – Different sequences of moves may lead to the same position – Interest for MC Go: merge the results – Result: 60-40% ● Keeping the tree from one move to the next – Result: 65-35% MCTS for Computer Go 78
  • 79. RAVE (1/3) ● Rapid Action Value Estimation – Mogo 2007 – Use the AMAF heuristic (Brugmann 1993) – There are « many » virtual sequences that are transposed from the actually played sequence ● Result: – 70-30% MCTS for Computer Go 79
  • 80. RAVE (2/3) ● AMAF heuristic ● Which nodes to update? A B ● Actual C C D D – Sequence ACBD – Nodes B A A B ● Virtual – BCAD, ADBC, BDAC D C – Nodes MCTS for Computer Go 80
  • 81. RAVE (3/3) ● 3 variables – Usual mean value Mu – AMAF mean value Mamaf – M = β Mamaf + (1-β) Mu – β = sqrt(k/(k+3N)) – K set up experimentally ● M varies from Mamaf to Mu MCTS for Computer Go 81
  • 82. Knowledge in the simulations ● High urgency for... – capture/escape 55-45% – 3x3 patterns 60-40% – Proximity rule 60-40% ● Mercy rule – Interrupt the game when the difference of captured stones is greater than a threshold (Hillis 2006) – 51-49% MCTS for Computer Go 82
  • 83. Knowledge in the tree ● Virtual wins for good looking moves ● Automatic acquisition of patterns of pro games (Coulom 2007) (Bouzy & Chaslot 2005) ● Matching has a high cost ● Progressive widening (Chaslot & al 2008) ● Interesting under strong time constraints ● Result: 60-40% MCTS for Computer Go 83
  • 84. Speeding up the simulations ● Fully random simulations (2007) – 50,000 game/second (Lew 2006) – 20,000 (commonly eared) – 10,000 (my program) ● Pseudo-random – 5,000 (my program in 2007) ● Rough optimization is worthwhile MCTS for Computer Go 84
  • 85. Pondering ● Think on the opponent time – 55-45% – Possible doubling of thinking time – The move of the opponent may not be the planned move on which you think – Side effect: play quickly to think on the opponent time MCTS for Computer Go 85
  • 86. Summing up the enhancements ● MCTS with all enhancements vs raw MCTS – Exploration and exploitation: 60-40% – Win/loss outcome: 65-35% – Rough optimization of simulations 60-40% – Transposition table 60-40% – RAVE 70-30% – Knowledge in the simulations 70-30% – Knowledge in the tree 60-40% – Pondering 55-45% – Parallelization 70-30% ● Result: 99-1% MCTS for Computer Go 86
  • 87. Parallelization ● Computer Chess: Deep Blue ● Multi-core computer – Symmetric MultiProcessor (SMP) – one thread per processor – shared memory, low latency – mutual exclusion (mutex) mechanism ● Cluster of computers – Message Passing Information (MPI) MCTS for Computer Go 87
  • 88. Parallelization while (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome) } MCTS for Computer Go 88
  • 89. Leaf parallelization MCTS for Computer Go 89
  • 90. Leaf parallelization ● (Cazenave Jouandeau 2007) ● Easy to program ● Drawbacks – Wait for the longest simulation – When part of the simulation outcomes is a loss, performing the remaining may not be a relevant strategy. MCTS for Computer Go 90
  • 91. Root parallelization MCTS for Computer Go 91
  • 92. Root parallelization ● (Cazenave Jouandeau 2007) ● Easy to program ● No communication ● At completion, merge the trees ● 4 MCTS for 1sec > 1 MCTS for 4 sec ● Good way for low time settings and a small number of threads MCTS for Computer Go 92
  • 93. Tree parallelization – global mutex MCTS for Computer Go 93
  • 94. Tree parallelization – local mutex MCTS for Computer Go 94
  • 95. Tree parallelization ● One shared tree, several threads ● Mutex – Global: the whole tree has a mutex – Local: each node has a mutex ● « Virtual loss » – Given to a node browsed by a thread – Removed at update stage – Preventing threads from similar simulations MCTS for Computer Go 95
  • 96. Computer-computer results ● Computer Olympiads 19x19 9x9 – 2010 Erica, Zen, MFGo MyGoFriend – 2009 Zen, Fuego, Mogo Fuego – 2008 MFGo, Mogo, Leela MFGo – 2007 Mogo, CrazyStone, GNU Go Steenvreter – 2006 GNU Go, Go Intellect, Indigo CrazyStone – 2005 Handtalk, Go Intellect, Aya Go Intellect – 2004 Go Intellect, MFGo, Indigo Go Intellect MCTS for Computer Go 96
  • 97. Human-computer results ● 9x9 – 2009: Mogo won a pro with black – 2009: Fuego won a pro with white ● 19x19: – 2008: Mogo won a pro with 9 stones Crazy Stone won a pro with 8 stones Crazy Stone won a pro with 7 stones – 2009: Mogo won a pro with 6 stones MCTS for Computer Go 97
  • 98. MCTS and the old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan 9x9 go Very strong players 1 dan 6 dan 1 dan 19x19 go Strong players 1 kyu Average players 10 kyu MCTS Beginners 20 kyu Old Very beginners 30 kyu approach MCTS for Computer Go 98
  • 99. Computer Go (MC history) ● Monte-Carlo Go (Brugmann 1993) ● MCGo devel. (Bouzy & Helmstetter 2003) ● MC+knowledge (Bouzy 2003) ● UCT (Kocsis & Szepesvari 2006) ● Crazy Stone (Coulom 2006) ● Mogo (Wang & Gelly 2006) MCTS for Computer Go 99
  • 100. Conclusion ● Monte-Carlo brought a Big improvement in Computer Go over the last decade! – No old approach based program anymore! – All go programs are MCTS based! – Professional level on 9x9! – Dan level on 19x19! ● Unbelievable 10 years ago! MCTS for Computer Go 100
  • 101. Some references ● PhD, MCTS and Go (Chaslot 2010) ● PhD, Reinf. Learning and Go (Silver 2010) ● PhD, R. Learning: applic. to Go (Gelly 2007) ● UCT (Kocsis & Szepesvari 2006) ● 1st MCTS go program (Coulom 2006) MCTS for Computer Go 101
  • 102. Web links ● http://www.grappa.univ-lille3.fr/icga/ ● http://cgos.boardspace.net/ ● http://www.gokgs.com/ ● http://www.lri.fr/~gelly/MoGo.htm ● http://remi.coulom.free.fr/CrazyStone/ ● http://fuego.sourceforge.net/ ● ... MCTS for Computer Go 102
  • 103. Thank you for your attention! MCTS for Computer Go 103