SlideShare una empresa de Scribd logo
1 de 103
Descargar para leer sin conexión
Monte-Carlo Tree Search (MCTS)
       for Computer Go

            Bruno Bouzy
    bruno.bouzy@parisdescartes.fr
      Université Paris Descartes

          Séminaire BigMC
            5 mai 2011
Outline

●   The game of Go: a 9x9 game
●   The « old » approach (*-2002)
●   The Monte-Carlo approach (2002-2005)
●   The MCTS approach (2006-today)
●   Conclusion


                   MCTS for Computer Go    2
The game of Go




   MCTS for Computer Go   3
The game of Go

●   4000 years
●   Originated from China
●
    Developed by Japan (20th century)
●   Best players in Korea, Japan, China
●   19x19: official board size
●   9x9: beginners' board size
                    MCTS for Computer Go   4
A 9x9 game
●   The board has 81 « intersections ». Initially,it
    is empty.




                      MCTS for Computer Go             5
A 9x9 game
●   Black moves first. A « stone » is played on
    an intersection.




                   MCTS for Computer Go           6
A 9x9 game
●   White moves second.




                 MCTS for Computer Go   7
A 9x9 game
●   Moves alternate between Black and White.




                  MCTS for Computer Go         8
A 9x9 game
●   Two adjacent stones of the same color
    builds a « string » with « liberties ».
●   4-adjacency




                    MCTS for Computer Go      9
A 9x9 game
●   Strings are created.




                   MCTS for Computer Go   10
A 9x9 game
●   A white stone is in « atari » (one liberty).




                     MCTS for Computer Go          11
A 9x9 game
●   The white string has five liberties.




                    MCTS for Computer Go   12
A 9x9 game
●   The black stone is « atari ».




                    MCTS for Computer Go   13
A 9x9 game
●   White « captures » the black stone.




                   MCTS for Computer Go   14
A 9x9 game
●   For human players, the game is over.

       –   Hu?

       –   Why?




                    MCTS for Computer Go   15
A 9x9 game
●   What happens if White contests black
    « territory »?




                   MCTS for Computer Go    16
A 9x9 game
●   White has invaded. Two strings are atari!




                   MCTS for Computer Go         17
A 9x9 game
●   Black captures !




                   MCTS for Computer Go   18
A 9x9 game
●   White insists but its string is atari...




                      MCTS for Computer Go     19
A 9x9 game
●   Black has proved is « territory ».




                    MCTS for Computer Go   20
A 9x9 game
●   Black may contest white territory too.




                    MCTS for Computer Go     21
A 9x9 terminal position
●   The game is over for computers.

       –   Hu?

       –   Who won ?




                       MCTS for Computer Go   22
A 9x9 game
●   The game ends when both players pass.
●   One black (resp. white) point for each black
    (resp. white) stone and each black (resp.
    white) « eye » on the board.
●   One black (resp. white) eye = an empty
    intersection surrounded by black (resp.
    white) stones.

                   MCTS for Computer Go            23
A 9x9 game
●   Scoring:
       –   Black = 44
       –   White = 37
       –   Komi = 7.5
       –   Score = -0.5
●   White wins!


                        MCTS for Computer Go   24
Go ranking: « kyu » and « dan »
                            Pro ranking                 Amateur ranking

 Top professional players
                                    9 dan                      9 dan

  Very strong players              1 dan                        6 dan

                                                                1 dan
                                       Strong players           1 kyu

                                    Average playersl            10 kyu

                                            Beginners           20 kyu


                                     Very beginners             30 kyu
                             MCTS for Computer Go                         25
Computer Go (old history)
●   First go program (Lefkovitz 1960)
●   Zobrist hashing (Zobrist 1969)
●   Interim2 (Wilcox 1979)
●   Life and death model (Benson 1988)
●   Patterns: Goliath (Boon 1990)
●   Mathematical Go (Berlekamp 1991)
●   Handtalk (Chen 1995)
                    MCTS for Computer Go   26
The old approach
●   Evaluation of non terminal positions
        –   Knowledge-based
        –   Breaking-down of a position into sub-
             positions
●   Fixed-depth global tree search
        –   Depth = 0 : action with the best value
        –   Depth = 1: action leading to the position
             with the best evaluation
        –   Depth > 1: alfa-beta or minmax
                       MCTS for Computer Go             27
The old approach
                             Current position
2 or 3

Bounded depth Tree search
                                          Evaluation of
                                          non terminal positions




                                                               Huhu?
361

                        MCTS for Computer Go              Terminal positions
                                                                          28
Position evaluation
●   Break-down
       –   Whole game (win/loss or score)
       –   Goal-oriented sub-game
               ●   String capture
               ●   Connections, dividers, eyes, life and death
●   Local searches
       –   Alpha-beta and enhancements
       –   Proof-number search
                         MCTS for Computer Go                    29
A 19x19 middle-game position




          MCTS for Computer Go   30
A possible black break-down




          MCTS for Computer Go   31
A possible white break-down




          MCTS for Computer Go   32
Possible local evaluations (1)
                                                   unstable

                         Alive and territory


 Not important                                     alive




                                                  dead


                 alive
                           MCTS for Computer Go               33
Possible local evaluations (2)
      alive


                                          unstable




    unstable




          alive + big territory             unstable




                   MCTS for Computer Go                34
Position evaluation
●   Local results
        –   Obtained with local tree search
        –   Result if white plays first (resp. black)
        –   Combinatorial game theory (Conway)
        –   Switches {a|b}, >, <, *, 0
●   Global recomposition
        –   move generation and evaluation
        –   position evaluation
                        MCTS for Computer Go            35
Position evaluation




     MCTS for Computer Go   36
Drawbacks (1/2)
●   The break-down is not unique
●   Performing a (wrong) local tree search on a
    (possibly irrelevant) local position
●   Misevaluating the size of the local position
●   Different kinds of local information
        –   Symbolic (group: dead alive unstable)
        –   Numerical (territory size, reduction,
             increase)
                        MCTS for Computer Go        37
Drawbacks (2/2)
●   Local positions interact
●   Complicated
●   Domain-dependent knowledge
●   Need of human expertise
●   Difficult to program and maintain
●   Holes of knowledge
●   Erratic behaviour
                    MCTS for Computer Go   38
Upsides
●   Feasible on 1990's computers
●   Execution is fast

●   Some specific local tree searches are
    accurate and fast



                    MCTS for Computer Go    39
The old approach
                            Pro ranking                 Amateur ranking

 Top professional players
                                    9 dan                      9 dan

 Very strong players               1 dan                        6 dan

                                                                1 dan
                       Strong players                           1 kyu

                                    Average playersl            10 kyu

                                            Beginners           20 kyu
Old
approach                             Very beginners             30 kyu
                             MCTS for Computer Go                         40
End of part one!
●   Next: the Monte-Carlo approach...




                   MCTS for Computer Go   41
The Monte-Carlo (MC) approach
●   Games containing chance
       –   Backgammon (Tesauro 1989)

●   Games with hidden information
       –   Bridge (Ginsberg 2001)
       –   Poker (Billings & al. 2002)
       –   Scrabble (Sheppard 2002)

                       MCTS for Computer Go   42
The Monte-Carlo approach
●   Games with complete information
       –   A general model (Abramson 1990)

●   Simulated annealing Go
       –   (Brügmann 1993)
       –   2 sequences of moves
       –   « all moves as first » heuristic
       –   Gobble on 9x9
                       MCTS for Computer Go   43
The Monte-Carlo approach
●   Position evaluation:
         Launch N random games
         Evaluation = mean value of outcomes
●   Depth-one MC algorithm:
         For each move m {
               Play m on the ref position
               Launch N random games
               Move value (m) = mean value
         }
                     MCTS for Computer Go      44
Depth-one Monte-Carlo


                         ...


...    ...                     ...




       MCTS for Computer Go          45
Progressive pruning
●   (Billings 2002, Sheppard 2002, Bouzy &
    Helmstetter 2003)

                                          Second best

                                          Current best move

                                          Still explored

                                          Pruned

                   MCTS for Computer Go                    46
Upper bound
●   Optimism in face of uncertainty
       –   Intestim (Kaelbling 1993),
       –   UCB multi-armed bandit (Auer & al 2002)
                                  Second best promising

                                  Current best proven move

                                  Current best promising move



                      MCTS for Computer Go                   47
All-moves-as-first heuristic (1/3)
                                              B   C
                   r
                                      A       D


  A        B                   C          D




               MCTS for Computer Go                   48
All-moves-as-first heuristic (2/3)
                                                            B   C
                                 r
                                                    A       D
                 A

  A                      B                   C          D

      B


      C       Actual
              simulation
      D
                     B       C
          o      A   D

                             MCTS for Computer Go                   49
All-moves-as-first heuristic (3/3)
                                                                      B   C
                                   r
                                                              A       D
                  A                    C
  A                        B                   C                  D

      B                                            B


      C       Actual                               A       Virtual simulation =
              simulation                                   actual simulation
                                                           assuming c is played
      D                                            D
                       B
                                                           « as first »
                               C
          o        A   D                               o

                               MCTS for Computer Go                           50
The Monte-Carlo approach
●   Upsides
       –   Robust evaluation
       –   Global search
       –   Move quality increases with computing power
●   Way of playing
       –   Good strategical sense but weak tactically
●   Easy to program
       –   Follow the rules of the game
       –   No break-down problem
                       MCTS for Computer Go             51
Monte-Carlo and knowledge
●   Pseudo-random simulations using Go
    knowledge (Bouzy 2003)
       –   Moves played with a probability depending on
            specific domain-dependent knowledge
●   2 basic concepts
       –   string capture                  3x3 shapes




                        MCTS for Computer Go            52
Monte-Carlo and knowledge
●   Results are impressive
       –   MC(random) << MC(pseudo random)
       –   Size      9x9               13x13   19x19
       –   % wins    68                93      98
●   Other works on simulations
       –   Patterns in MoGo, proximity rule (Wang & al
            2006)
       –   Simulation balancing (Silver & Tesauro 2009)
                       MCTS for Computer Go              53
Monte-Carlo and knowledge
●   Pseudo-random player
       –   3x3 pattern urgency table with 38 patterns
       –   Few dizains of relevant patterns only
       –   Patterns gathered by
               ●   Human expertise
               ●   Reinforcement Learning (Bouzy & Chaslot
                    2006)
●   Warning
       –   p1 better than p2 does not mean MC(p1)
            better than MC(p2)
                          MCTS for Computer Go               54
Monte-Carlo Tree Search (MCTS)
●   How to integrate MC and TS ?
●   UCT = UCB for Trees
       –   (Kocsis & Szepesvari 2006)
       –   Superposition of UCB (Auer & al 2002)
●   MCTS
       –   Selection, expansion, updating (Chaslot & al)
            (Coulom 2006)
       –   Simulation (Bouzy 2003) (Wang & Gelly
             2006)
                       MCTS for Computer Go            55
MCTS (1/2)
while (hasTime) {
     playOutTreeBasedGame()
     expandTree()
     outcome = playOutRandomGame()
     updateNodes(outcome)
}
then choose the node with...
     ... the best mean value
     ... the highest visit number
                  MCTS for Computer Go   56
MCTS (2/2)
PlayOutTreeBasedGame() {
    node = getNode(position)
    while (node) {
         move=selectMove(node)
         play(move)
         node = getNode(position)
    }
}

              MCTS for Computer Go   57
UCT move selection
●   Move selection rule to browse the tree:
            move=argmax (s*mean + C*sqrt(log(t)/n)
●   Mean value for exploitation
        –   s (=+-1): color to move
●   UCT bias for exploration
        –   C: constant term set up by experiments
        –   t: number of visits of the parent node
        –   n: number of visits of the current node
                         MCTS for Computer Go         58
Example
●   1 iteration                          1/1




                                           1


                  MCTS for Computer Go         59
Example
●   2 iterations                          1/2



                                            0/1




                                                  0


                   MCTS for Computer Go               60
Example
●   3 iterations                              2/3



                                   1/1          0/1




                                          1

                   MCTS for Computer Go               61
Example

                                          2/4
●   4 iterations
                       0/1         1/1      0/1




                   MCTS for Computer Go           62
Example

                                          3/5
●   5 iterations
                      0/1          1/1      0/1   1/1




                   MCTS for Computer Go                 63
Example
●   6 iterations
                                          3/6



                      0/1         1/2       0/1   1/1



                                     0/1




                   MCTS for Computer Go                 64
Example
●   7 iterations                          3/7



                      0/1          1/2      0/1   1/2



                                      0/1               0/1




                   MCTS for Computer Go                       65
Example
●   8 iterations                          4/8



                      0/1          2/3      0/1   1/2



                            1/1       0/1               0/1




                   MCTS for Computer Go                       66
Example
●   9 iterations                           4/9



                         0/1         2/4     0/1   1/2


                   0/1         1/1     0/1               0/1




                   MCTS for Computer Go                        67
Example
●   10 iterations                           5/10




                          0/1         2/4      0/1         2/3


                    0/1         1/1     0/1          1/1         0/1




                    MCTS for Computer Go                               68
Example
●   11 iterations                           6/11




                          0/1         2/4      0/1         3/4


                    0/1         1/1     0/1          1/1         0/1   1/1




                     MCTS for Computer Go                                    69
Example
●   12 iterations                           7/12




                          0/1         2/4      0/1         4/5


                    0/1         1/1     0/1          1/1         1/2   1/1



                                                       1/1




                    MCTS for Computer Go                                     70
Example
●   Clarity
        –   C=0
●   Notice
        –   with C != 0 a node cannot stay unvisited
        –   min or max rule according to the node depth
        –   not visited children have an infinite mean
●   Practice
        –   Mean initialized optimistically
                        MCTS for Computer Go             71
MCTS enhancements
●   The raw version can be enhanced
       –   Tuning UCT C value
       –   Outcome = score or win loss info (+1/-1)
       –   Doubling the simulation number
       –   RAVE
       –   Using Go knowledge
               ●   In the tree or in the simulations
       –   Speed-up
               ●   Optimizing, pondering, parallelizing
                          MCTS for Computer Go            72
Assessing an enhancement
●   Self-play
        –   The new version vs the reference version
        –   % wins with few hundred games
        –   9x9 (or 19x19 boards)
●   Against differently designed programs
        –   GTP (Go Text Protocol)
        –   CGOS (Computer Go Operating System)
●   Competitions
                       MCTS for Computer Go            73
Move selection formula tuning
●   Using UCB
       –   Best value for C ?
       –   60-40%

●   Using « UCB-tuned » (Auer & al 2002)
       –   C replaced by min(1/4,variance)
       –   55-45%

                      MCTS for Computer Go   74
Exploration vs exploitation
●   General idea: explore at the beginning and
    exploit in the end of thinking time
●   Diminishing C linearly in the remaining time
        –   (Vermorel & al 2005)
        –   55-45%
●   At the end:
        –   Argmax over the mean value or over the
             number of visits ?
        –   55-45%     MCTS for Computer Go          75
Kind of outcome
●   2 kinds of outcomes
       –   Score (S) or win loss information (WLI) ?
       –   Probability of winning or expected score ?
       –   Combining both (S+WLI) (score +45 if win)
●   Results
       –   WLI vs S      65-35%
       –   S+WLI vs S 65-35%

                      MCTS for Computer Go              76
Doubling the number of
             simulations
●   N = 100,000

●   Results
       –   2N vs N       60-40%
       –   4N vs 2N      58-42%


                      MCTS for Computer Go   77
Tree management
●   Transposition tables
       –   Tree -> Directed Acyclic Graph (DAG)
       –   Different sequences of moves may lead to
            the same position
       –   Interest for MC Go: merge the results
       –   Result: 60-40%
●   Keeping the tree from one move to the next
       –   Result: 65-35%
                      MCTS for Computer Go            78
RAVE (1/3)
●   Rapid Action Value Estimation
       –   Mogo 2007
       –   Use the AMAF heuristic (Brugmann 1993)
       –   There are « many » virtual sequences that
            are transposed from the actually played
            sequence
●   Result:
       –   70-30%
                       MCTS for Computer Go            79
RAVE (2/3)
●   AMAF heuristic
●   Which nodes to update?                  A           B

●   Actual                                          C
                                       C    D               D
        –   Sequence ACBD
        –   Nodes                               B   A
                                                            A
                                      B
●   Virtual
        –   BCAD, ADBC, BDAC                D           C

        –   Nodes
                     MCTS for Computer Go                       80
RAVE (3/3)
●   3 variables
        –   Usual mean value Mu
        –   AMAF mean value Mamaf
        –   M = β Mamaf + (1-β) Mu
        –   β = sqrt(k/(k+3N))
        –   K set up experimentally
●   M varies from Mamaf to Mu
                       MCTS for Computer Go   81
Knowledge in the simulations
●   High urgency for...
        –   capture/escape                        55-45%
        –   3x3 patterns                          60-40%
        –   Proximity rule                        60-40%
●   Mercy rule
        –   Interrupt the game when the difference of
              captured stones is greater than a
              threshold (Hillis 2006)
        –   51-49%         MCTS for Computer Go            82
Knowledge in the tree

●   Virtual wins for good looking moves
●   Automatic acquisition of patterns of pro
    games (Coulom 2007) (Bouzy & Chaslot 2005)
●   Matching has a high cost
●   Progressive widening (Chaslot & al 2008)
●   Interesting under strong time constraints
●   Result: 60-40%
                     MCTS for Computer Go        83
Speeding up the simulations
●   Fully random simulations (2007)
       –   50,000 game/second (Lew 2006)
       –   20,000 (commonly eared)
       –   10,000 (my program)
●   Pseudo-random
       –   5,000 (my program in 2007)
●   Rough optimization is worthwhile
                     MCTS for Computer Go   84
Pondering

●   Think on the opponent time
       –   55-45%
       –   Possible doubling of thinking time
       –   The move of the opponent may not be
            the planned move on which you think
       –   Side effect: play quickly to think on the
             opponent time

                     MCTS for Computer Go              85
Summing up the enhancements
●   MCTS with all enhancements vs raw MCTS
       –   Exploration and exploitation:            60-40%
       –   Win/loss outcome:                        65-35%
       –   Rough optimization of simulations        60-40%
       –   Transposition table                      60-40%
       –   RAVE                                     70-30%
       –   Knowledge in the simulations             70-30%
       –   Knowledge in the tree                    60-40%
       –   Pondering                                55-45%
       –   Parallelization                          70-30%

●   Result: 99-1%
                             MCTS for Computer Go            86
Parallelization
●   Computer Chess: Deep Blue
●   Multi-core computer
       –   Symmetric MultiProcessor (SMP)
       –   one thread per processor
       –   shared memory, low latency
       –   mutual exclusion (mutex) mechanism
●   Cluster of computers
       –   Message Passing Information (MPI)
                     MCTS for Computer Go       87
Parallelization

while (hasTime) {
    playOutTreeBasedGame()
    expandTree()
    outcome = playOutRandomGame()
    updateNodes(outcome)
}


             MCTS for Computer Go   88
Leaf parallelization




      MCTS for Computer Go   89
Leaf parallelization
●   (Cazenave Jouandeau 2007)
●   Easy to program
●   Drawbacks
       –   Wait for the longest simulation
       –   When part of the simulation outcomes is a
            loss, performing the remaining may not
            be a relevant strategy.

                      MCTS for Computer Go             90
Root parallelization




      MCTS for Computer Go   91
Root parallelization
●   (Cazenave Jouandeau 2007)
●   Easy to program
●   No communication
●   At completion, merge the trees
●   4 MCTS for 1sec > 1 MCTS for 4 sec
●   Good way for low time settings and a small
    number of threads
                   MCTS for Computer Go          92
Tree parallelization – global mutex




             MCTS for Computer Go   93
Tree parallelization – local mutex




             MCTS for Computer Go    94
Tree parallelization
●   One shared tree, several threads
●   Mutex
        –   Global: the whole tree has a mutex
        –   Local: each node has a mutex
●   « Virtual loss »
        –   Given to a node browsed by a thread
        –   Removed at update stage
        –   Preventing threads from similar simulations
                       MCTS for Computer Go               95
Computer-computer results
●   Computer Olympiads
                        19x19                      9x9
       –   2010 Erica, Zen, MFGo                MyGoFriend
       –   2009 Zen, Fuego, Mogo                  Fuego
       –   2008 MFGo, Mogo, Leela                  MFGo
       –   2007 Mogo, CrazyStone, GNU Go        Steenvreter
       –   2006 GNU Go, Go Intellect, Indigo    CrazyStone
       –   2005 Handtalk, Go Intellect, Aya     Go Intellect
       –   2004 Go Intellect, MFGo, Indigo      Go Intellect
                         MCTS for Computer Go                  96
Human-computer results
●   9x9
          –   2009: Mogo won a pro with black
          –   2009: Fuego won a pro with white
●   19x19:
          –   2008: Mogo won a pro with 9 stones
                   Crazy Stone won a pro with 8 stones
                   Crazy Stone won a pro with 7 stones
          –   2009: Mogo won a pro with 6 stones
                        MCTS for Computer Go             97
MCTS and the old approach
                       Pro ranking                     Amateur ranking

 Top professional players
                                   9 dan                      9 dan
                                              9x9 go
 Very strong players              1 dan                        6 dan

                                                               1 dan     19x19 go
                                    Strong players             1 kyu

                                   Average players             10 kyu

MCTS                                       Beginners           20 kyu

Old
                                    Very beginners             30 kyu
approach
                            MCTS for Computer Go                               98
Computer Go (MC history)

●   Monte-Carlo Go (Brugmann 1993)
●   MCGo devel. (Bouzy & Helmstetter 2003)
●   MC+knowledge (Bouzy 2003)
●   UCT (Kocsis & Szepesvari 2006)
●   Crazy Stone (Coulom 2006)
●   Mogo (Wang & Gelly 2006)
                  MCTS for Computer Go       99
Conclusion
●   Monte-Carlo brought a Big
    improvement in Computer Go over the last
    decade!
       –   No old approach based program anymore!
       –   All go programs are MCTS based!
       –   Professional level on 9x9!
       –   Dan level on 19x19!
●   Unbelievable 10 years ago!
                      MCTS for Computer Go          100
Some references

●   PhD, MCTS and Go (Chaslot 2010)
●   PhD, Reinf. Learning and Go (Silver 2010)
●   PhD, R. Learning: applic. to Go (Gelly 2007)
●   UCT (Kocsis & Szepesvari 2006)
●
    1st MCTS go program (Coulom 2006)


                    MCTS for Computer Go           101
Web links
●   http://www.grappa.univ-lille3.fr/icga/
●   http://cgos.boardspace.net/
●   http://www.gokgs.com/
●   http://www.lri.fr/~gelly/MoGo.htm
●   http://remi.coulom.free.fr/CrazyStone/
●   http://fuego.sourceforge.net/
●   ...
                    MCTS for Computer Go     102
Thank you for your attention!




          MCTS for Computer Go   103

Más contenido relacionado

La actualidad más candente

Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Njohuribazemarketing 100202085715-phpapp01
Njohuribazemarketing 100202085715-phpapp01Njohuribazemarketing 100202085715-phpapp01
Njohuribazemarketing 100202085715-phpapp01valmirejetishi
 
Game Playing in Artificial Intelligence
Game Playing in Artificial IntelligenceGame Playing in Artificial Intelligence
Game Playing in Artificial Intelligencelordmwesh
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchDheerendra k
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Apprentissage Automatique et moteurs de recherche
Apprentissage Automatique et moteurs de rechercheApprentissage Automatique et moteurs de recherche
Apprentissage Automatique et moteurs de recherchePhilippe YONNET
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchTekendra Nath Yogi
 
Perzantimi enver kutllovci mrh
Perzantimi enver kutllovci   mrhPerzantimi enver kutllovci   mrh
Perzantimi enver kutllovci mrhdrilon emini
 
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimit
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimitMetoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimit
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimitSokol Luzi
 
sjellje organizative - berimi pytje
sjellje organizative - berimi pytjesjellje organizative - berimi pytje
sjellje organizative - berimi pytjedrilon emini
 
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...Fidan Haliti
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationMohammed Bennamoun
 
MENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqi
MENAXHIMI I INOVACIONEVE - Dr. Besnik KrasniqiMENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqi
MENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqifatonbajrami1
 
Projekti per call center
Projekti per call centerProjekti per call center
Projekti per call centershar bakalli
 

La actualidad más candente (20)

Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Njohuribazemarketing 100202085715-phpapp01
Njohuribazemarketing 100202085715-phpapp01Njohuribazemarketing 100202085715-phpapp01
Njohuribazemarketing 100202085715-phpapp01
 
Game Playing in Artificial Intelligence
Game Playing in Artificial IntelligenceGame Playing in Artificial Intelligence
Game Playing in Artificial Intelligence
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Apprentissage Automatique et moteurs de recherche
Apprentissage Automatique et moteurs de rechercheApprentissage Automatique et moteurs de recherche
Apprentissage Automatique et moteurs de recherche
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Unit8: Uncertainty in AI
Unit8: Uncertainty in AIUnit8: Uncertainty in AI
Unit8: Uncertainty in AI
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Perzantimi enver kutllovci mrh
Perzantimi enver kutllovci   mrhPerzantimi enver kutllovci   mrh
Perzantimi enver kutllovci mrh
 
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimit
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimitMetoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimit
Metoda Kërkimi Shkencor - Kapitulli 2 - Formulimi i problemit të kërkimit
 
sjellje organizative - berimi pytje
sjellje organizative - berimi pytjesjellje organizative - berimi pytje
sjellje organizative - berimi pytje
 
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...
Pytje dhe pergjigjje nga lwnda analiza e tw dhwnave pwr hulumtime nw biznes (...
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
MENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqi
MENAXHIMI I INOVACIONEVE - Dr. Besnik KrasniqiMENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqi
MENAXHIMI I INOVACIONEVE - Dr. Besnik Krasniqi
 
Projekti per call center
Projekti per call centerProjekti per call center
Projekti per call center
 

Destacado

Mcts ai
Mcts aiMcts ai
Mcts aiftgaic
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
Applying fuzzy control in fighting game ai
Applying fuzzy control in fighting game aiApplying fuzzy control in fighting game ai
Applying fuzzy control in fighting game aiftgaic
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
 
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...Carlo Lancia
 
2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competition2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competitionftgaic
 
2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competition2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competitionftgaic
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3Srinivasan R
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r introBayesLaplace1
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game TheoryKarel Ha
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
An introduction to bayesian statistics
An introduction to bayesian statisticsAn introduction to bayesian statistics
An introduction to bayesian statisticsJohn Tyndall
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기영우 김
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian MethodsCorey Chivers
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems ConceptsHarmony Kwawu
 

Destacado (20)

Mcts ai
Mcts aiMcts ai
Mcts ai
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Applying fuzzy control in fighting game ai
Applying fuzzy control in fighting game aiApplying fuzzy control in fighting game ai
Applying fuzzy control in fighting game ai
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
Monte carlo tree search
Monte carlo tree searchMonte carlo tree search
Monte carlo tree search
 
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
 
2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competition2016 Fighting Game Artificial Intelligence Competition
2016 Fighting Game Artificial Intelligence Competition
 
2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competition2013 Fighting Game Artificial Intelligence Competition
2013 Fighting Game Artificial Intelligence Competition
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Bayesian intro
Bayesian introBayesian intro
Bayesian intro
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r intro
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
An introduction to bayesian statistics
An introduction to bayesian statisticsAn introduction to bayesian statistics
An introduction to bayesian statistics
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Key Expert Systems Concepts
Key Expert Systems ConceptsKey Expert Systems Concepts
Key Expert Systems Concepts
 

Más de BigMC

Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...BigMC
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...BigMC
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsBigMC
 
Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMCBigMC
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011BigMC
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011BigMC
 
Estimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneEstimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneBigMC
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsBigMC
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihoodBigMC
 
Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)BigMC
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problemBigMC
 

Más de BigMC (12)

Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithms
 
Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMC
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Estimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneEstimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienne
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 

"Monte-Carlo Tree Search for the game of Go"

  • 1. Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université Paris Descartes Séminaire BigMC 5 mai 2011
  • 2. Outline ● The game of Go: a 9x9 game ● The « old » approach (*-2002) ● The Monte-Carlo approach (2002-2005) ● The MCTS approach (2006-today) ● Conclusion MCTS for Computer Go 2
  • 3. The game of Go MCTS for Computer Go 3
  • 4. The game of Go ● 4000 years ● Originated from China ● Developed by Japan (20th century) ● Best players in Korea, Japan, China ● 19x19: official board size ● 9x9: beginners' board size MCTS for Computer Go 4
  • 5. A 9x9 game ● The board has 81 « intersections ». Initially,it is empty. MCTS for Computer Go 5
  • 6. A 9x9 game ● Black moves first. A « stone » is played on an intersection. MCTS for Computer Go 6
  • 7. A 9x9 game ● White moves second. MCTS for Computer Go 7
  • 8. A 9x9 game ● Moves alternate between Black and White. MCTS for Computer Go 8
  • 9. A 9x9 game ● Two adjacent stones of the same color builds a « string » with « liberties ». ● 4-adjacency MCTS for Computer Go 9
  • 10. A 9x9 game ● Strings are created. MCTS for Computer Go 10
  • 11. A 9x9 game ● A white stone is in « atari » (one liberty). MCTS for Computer Go 11
  • 12. A 9x9 game ● The white string has five liberties. MCTS for Computer Go 12
  • 13. A 9x9 game ● The black stone is « atari ». MCTS for Computer Go 13
  • 14. A 9x9 game ● White « captures » the black stone. MCTS for Computer Go 14
  • 15. A 9x9 game ● For human players, the game is over. – Hu? – Why? MCTS for Computer Go 15
  • 16. A 9x9 game ● What happens if White contests black « territory »? MCTS for Computer Go 16
  • 17. A 9x9 game ● White has invaded. Two strings are atari! MCTS for Computer Go 17
  • 18. A 9x9 game ● Black captures ! MCTS for Computer Go 18
  • 19. A 9x9 game ● White insists but its string is atari... MCTS for Computer Go 19
  • 20. A 9x9 game ● Black has proved is « territory ». MCTS for Computer Go 20
  • 21. A 9x9 game ● Black may contest white territory too. MCTS for Computer Go 21
  • 22. A 9x9 terminal position ● The game is over for computers. – Hu? – Who won ? MCTS for Computer Go 22
  • 23. A 9x9 game ● The game ends when both players pass. ● One black (resp. white) point for each black (resp. white) stone and each black (resp. white) « eye » on the board. ● One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones. MCTS for Computer Go 23
  • 24. A 9x9 game ● Scoring: – Black = 44 – White = 37 – Komi = 7.5 – Score = -0.5 ● White wins! MCTS for Computer Go 24
  • 25. Go ranking: « kyu » and « dan » Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyu Very beginners 30 kyu MCTS for Computer Go 25
  • 26. Computer Go (old history) ● First go program (Lefkovitz 1960) ● Zobrist hashing (Zobrist 1969) ● Interim2 (Wilcox 1979) ● Life and death model (Benson 1988) ● Patterns: Goliath (Boon 1990) ● Mathematical Go (Berlekamp 1991) ● Handtalk (Chen 1995) MCTS for Computer Go 26
  • 27. The old approach ● Evaluation of non terminal positions – Knowledge-based – Breaking-down of a position into sub- positions ● Fixed-depth global tree search – Depth = 0 : action with the best value – Depth = 1: action leading to the position with the best evaluation – Depth > 1: alfa-beta or minmax MCTS for Computer Go 27
  • 28. The old approach Current position 2 or 3 Bounded depth Tree search Evaluation of non terminal positions Huhu? 361 MCTS for Computer Go Terminal positions 28
  • 29. Position evaluation ● Break-down – Whole game (win/loss or score) – Goal-oriented sub-game ● String capture ● Connections, dividers, eyes, life and death ● Local searches – Alpha-beta and enhancements – Proof-number search MCTS for Computer Go 29
  • 30. A 19x19 middle-game position MCTS for Computer Go 30
  • 31. A possible black break-down MCTS for Computer Go 31
  • 32. A possible white break-down MCTS for Computer Go 32
  • 33. Possible local evaluations (1) unstable Alive and territory Not important alive dead alive MCTS for Computer Go 33
  • 34. Possible local evaluations (2) alive unstable unstable alive + big territory unstable MCTS for Computer Go 34
  • 35. Position evaluation ● Local results – Obtained with local tree search – Result if white plays first (resp. black) – Combinatorial game theory (Conway) – Switches {a|b}, >, <, *, 0 ● Global recomposition – move generation and evaluation – position evaluation MCTS for Computer Go 35
  • 36. Position evaluation MCTS for Computer Go 36
  • 37. Drawbacks (1/2) ● The break-down is not unique ● Performing a (wrong) local tree search on a (possibly irrelevant) local position ● Misevaluating the size of the local position ● Different kinds of local information – Symbolic (group: dead alive unstable) – Numerical (territory size, reduction, increase) MCTS for Computer Go 37
  • 38. Drawbacks (2/2) ● Local positions interact ● Complicated ● Domain-dependent knowledge ● Need of human expertise ● Difficult to program and maintain ● Holes of knowledge ● Erratic behaviour MCTS for Computer Go 38
  • 39. Upsides ● Feasible on 1990's computers ● Execution is fast ● Some specific local tree searches are accurate and fast MCTS for Computer Go 39
  • 40. The old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyu Old approach Very beginners 30 kyu MCTS for Computer Go 40
  • 41. End of part one! ● Next: the Monte-Carlo approach... MCTS for Computer Go 41
  • 42. The Monte-Carlo (MC) approach ● Games containing chance – Backgammon (Tesauro 1989) ● Games with hidden information – Bridge (Ginsberg 2001) – Poker (Billings & al. 2002) – Scrabble (Sheppard 2002) MCTS for Computer Go 42
  • 43. The Monte-Carlo approach ● Games with complete information – A general model (Abramson 1990) ● Simulated annealing Go – (Brügmann 1993) – 2 sequences of moves – « all moves as first » heuristic – Gobble on 9x9 MCTS for Computer Go 43
  • 44. The Monte-Carlo approach ● Position evaluation: Launch N random games Evaluation = mean value of outcomes ● Depth-one MC algorithm: For each move m { Play m on the ref position Launch N random games Move value (m) = mean value } MCTS for Computer Go 44
  • 45. Depth-one Monte-Carlo ... ... ... ... MCTS for Computer Go 45
  • 46. Progressive pruning ● (Billings 2002, Sheppard 2002, Bouzy & Helmstetter 2003) Second best Current best move Still explored Pruned MCTS for Computer Go 46
  • 47. Upper bound ● Optimism in face of uncertainty – Intestim (Kaelbling 1993), – UCB multi-armed bandit (Auer & al 2002) Second best promising Current best proven move Current best promising move MCTS for Computer Go 47
  • 48. All-moves-as-first heuristic (1/3) B C r A D A B C D MCTS for Computer Go 48
  • 49. All-moves-as-first heuristic (2/3) B C r A D A A B C D B C Actual simulation D B C o A D MCTS for Computer Go 49
  • 50. All-moves-as-first heuristic (3/3) B C r A D A C A B C D B B C Actual A Virtual simulation = simulation actual simulation assuming c is played D D B « as first » C o A D o MCTS for Computer Go 50
  • 51. The Monte-Carlo approach ● Upsides – Robust evaluation – Global search – Move quality increases with computing power ● Way of playing – Good strategical sense but weak tactically ● Easy to program – Follow the rules of the game – No break-down problem MCTS for Computer Go 51
  • 52. Monte-Carlo and knowledge ● Pseudo-random simulations using Go knowledge (Bouzy 2003) – Moves played with a probability depending on specific domain-dependent knowledge ● 2 basic concepts – string capture 3x3 shapes MCTS for Computer Go 52
  • 53. Monte-Carlo and knowledge ● Results are impressive – MC(random) << MC(pseudo random) – Size 9x9 13x13 19x19 – % wins 68 93 98 ● Other works on simulations – Patterns in MoGo, proximity rule (Wang & al 2006) – Simulation balancing (Silver & Tesauro 2009) MCTS for Computer Go 53
  • 54. Monte-Carlo and knowledge ● Pseudo-random player – 3x3 pattern urgency table with 38 patterns – Few dizains of relevant patterns only – Patterns gathered by ● Human expertise ● Reinforcement Learning (Bouzy & Chaslot 2006) ● Warning – p1 better than p2 does not mean MC(p1) better than MC(p2) MCTS for Computer Go 54
  • 55. Monte-Carlo Tree Search (MCTS) ● How to integrate MC and TS ? ● UCT = UCB for Trees – (Kocsis & Szepesvari 2006) – Superposition of UCB (Auer & al 2002) ● MCTS – Selection, expansion, updating (Chaslot & al) (Coulom 2006) – Simulation (Bouzy 2003) (Wang & Gelly 2006) MCTS for Computer Go 55
  • 56. MCTS (1/2) while (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome) } then choose the node with... ... the best mean value ... the highest visit number MCTS for Computer Go 56
  • 57. MCTS (2/2) PlayOutTreeBasedGame() { node = getNode(position) while (node) { move=selectMove(node) play(move) node = getNode(position) } } MCTS for Computer Go 57
  • 58. UCT move selection ● Move selection rule to browse the tree: move=argmax (s*mean + C*sqrt(log(t)/n) ● Mean value for exploitation – s (=+-1): color to move ● UCT bias for exploration – C: constant term set up by experiments – t: number of visits of the parent node – n: number of visits of the current node MCTS for Computer Go 58
  • 59. Example ● 1 iteration 1/1 1 MCTS for Computer Go 59
  • 60. Example ● 2 iterations 1/2 0/1 0 MCTS for Computer Go 60
  • 61. Example ● 3 iterations 2/3 1/1 0/1 1 MCTS for Computer Go 61
  • 62. Example 2/4 ● 4 iterations 0/1 1/1 0/1 MCTS for Computer Go 62
  • 63. Example 3/5 ● 5 iterations 0/1 1/1 0/1 1/1 MCTS for Computer Go 63
  • 64. Example ● 6 iterations 3/6 0/1 1/2 0/1 1/1 0/1 MCTS for Computer Go 64
  • 65. Example ● 7 iterations 3/7 0/1 1/2 0/1 1/2 0/1 0/1 MCTS for Computer Go 65
  • 66. Example ● 8 iterations 4/8 0/1 2/3 0/1 1/2 1/1 0/1 0/1 MCTS for Computer Go 66
  • 67. Example ● 9 iterations 4/9 0/1 2/4 0/1 1/2 0/1 1/1 0/1 0/1 MCTS for Computer Go 67
  • 68. Example ● 10 iterations 5/10 0/1 2/4 0/1 2/3 0/1 1/1 0/1 1/1 0/1 MCTS for Computer Go 68
  • 69. Example ● 11 iterations 6/11 0/1 2/4 0/1 3/4 0/1 1/1 0/1 1/1 0/1 1/1 MCTS for Computer Go 69
  • 70. Example ● 12 iterations 7/12 0/1 2/4 0/1 4/5 0/1 1/1 0/1 1/1 1/2 1/1 1/1 MCTS for Computer Go 70
  • 71. Example ● Clarity – C=0 ● Notice – with C != 0 a node cannot stay unvisited – min or max rule according to the node depth – not visited children have an infinite mean ● Practice – Mean initialized optimistically MCTS for Computer Go 71
  • 72. MCTS enhancements ● The raw version can be enhanced – Tuning UCT C value – Outcome = score or win loss info (+1/-1) – Doubling the simulation number – RAVE – Using Go knowledge ● In the tree or in the simulations – Speed-up ● Optimizing, pondering, parallelizing MCTS for Computer Go 72
  • 73. Assessing an enhancement ● Self-play – The new version vs the reference version – % wins with few hundred games – 9x9 (or 19x19 boards) ● Against differently designed programs – GTP (Go Text Protocol) – CGOS (Computer Go Operating System) ● Competitions MCTS for Computer Go 73
  • 74. Move selection formula tuning ● Using UCB – Best value for C ? – 60-40% ● Using « UCB-tuned » (Auer & al 2002) – C replaced by min(1/4,variance) – 55-45% MCTS for Computer Go 74
  • 75. Exploration vs exploitation ● General idea: explore at the beginning and exploit in the end of thinking time ● Diminishing C linearly in the remaining time – (Vermorel & al 2005) – 55-45% ● At the end: – Argmax over the mean value or over the number of visits ? – 55-45% MCTS for Computer Go 75
  • 76. Kind of outcome ● 2 kinds of outcomes – Score (S) or win loss information (WLI) ? – Probability of winning or expected score ? – Combining both (S+WLI) (score +45 if win) ● Results – WLI vs S 65-35% – S+WLI vs S 65-35% MCTS for Computer Go 76
  • 77. Doubling the number of simulations ● N = 100,000 ● Results – 2N vs N 60-40% – 4N vs 2N 58-42% MCTS for Computer Go 77
  • 78. Tree management ● Transposition tables – Tree -> Directed Acyclic Graph (DAG) – Different sequences of moves may lead to the same position – Interest for MC Go: merge the results – Result: 60-40% ● Keeping the tree from one move to the next – Result: 65-35% MCTS for Computer Go 78
  • 79. RAVE (1/3) ● Rapid Action Value Estimation – Mogo 2007 – Use the AMAF heuristic (Brugmann 1993) – There are « many » virtual sequences that are transposed from the actually played sequence ● Result: – 70-30% MCTS for Computer Go 79
  • 80. RAVE (2/3) ● AMAF heuristic ● Which nodes to update? A B ● Actual C C D D – Sequence ACBD – Nodes B A A B ● Virtual – BCAD, ADBC, BDAC D C – Nodes MCTS for Computer Go 80
  • 81. RAVE (3/3) ● 3 variables – Usual mean value Mu – AMAF mean value Mamaf – M = β Mamaf + (1-β) Mu – β = sqrt(k/(k+3N)) – K set up experimentally ● M varies from Mamaf to Mu MCTS for Computer Go 81
  • 82. Knowledge in the simulations ● High urgency for... – capture/escape 55-45% – 3x3 patterns 60-40% – Proximity rule 60-40% ● Mercy rule – Interrupt the game when the difference of captured stones is greater than a threshold (Hillis 2006) – 51-49% MCTS for Computer Go 82
  • 83. Knowledge in the tree ● Virtual wins for good looking moves ● Automatic acquisition of patterns of pro games (Coulom 2007) (Bouzy & Chaslot 2005) ● Matching has a high cost ● Progressive widening (Chaslot & al 2008) ● Interesting under strong time constraints ● Result: 60-40% MCTS for Computer Go 83
  • 84. Speeding up the simulations ● Fully random simulations (2007) – 50,000 game/second (Lew 2006) – 20,000 (commonly eared) – 10,000 (my program) ● Pseudo-random – 5,000 (my program in 2007) ● Rough optimization is worthwhile MCTS for Computer Go 84
  • 85. Pondering ● Think on the opponent time – 55-45% – Possible doubling of thinking time – The move of the opponent may not be the planned move on which you think – Side effect: play quickly to think on the opponent time MCTS for Computer Go 85
  • 86. Summing up the enhancements ● MCTS with all enhancements vs raw MCTS – Exploration and exploitation: 60-40% – Win/loss outcome: 65-35% – Rough optimization of simulations 60-40% – Transposition table 60-40% – RAVE 70-30% – Knowledge in the simulations 70-30% – Knowledge in the tree 60-40% – Pondering 55-45% – Parallelization 70-30% ● Result: 99-1% MCTS for Computer Go 86
  • 87. Parallelization ● Computer Chess: Deep Blue ● Multi-core computer – Symmetric MultiProcessor (SMP) – one thread per processor – shared memory, low latency – mutual exclusion (mutex) mechanism ● Cluster of computers – Message Passing Information (MPI) MCTS for Computer Go 87
  • 88. Parallelization while (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome) } MCTS for Computer Go 88
  • 89. Leaf parallelization MCTS for Computer Go 89
  • 90. Leaf parallelization ● (Cazenave Jouandeau 2007) ● Easy to program ● Drawbacks – Wait for the longest simulation – When part of the simulation outcomes is a loss, performing the remaining may not be a relevant strategy. MCTS for Computer Go 90
  • 91. Root parallelization MCTS for Computer Go 91
  • 92. Root parallelization ● (Cazenave Jouandeau 2007) ● Easy to program ● No communication ● At completion, merge the trees ● 4 MCTS for 1sec > 1 MCTS for 4 sec ● Good way for low time settings and a small number of threads MCTS for Computer Go 92
  • 93. Tree parallelization – global mutex MCTS for Computer Go 93
  • 94. Tree parallelization – local mutex MCTS for Computer Go 94
  • 95. Tree parallelization ● One shared tree, several threads ● Mutex – Global: the whole tree has a mutex – Local: each node has a mutex ● « Virtual loss » – Given to a node browsed by a thread – Removed at update stage – Preventing threads from similar simulations MCTS for Computer Go 95
  • 96. Computer-computer results ● Computer Olympiads 19x19 9x9 – 2010 Erica, Zen, MFGo MyGoFriend – 2009 Zen, Fuego, Mogo Fuego – 2008 MFGo, Mogo, Leela MFGo – 2007 Mogo, CrazyStone, GNU Go Steenvreter – 2006 GNU Go, Go Intellect, Indigo CrazyStone – 2005 Handtalk, Go Intellect, Aya Go Intellect – 2004 Go Intellect, MFGo, Indigo Go Intellect MCTS for Computer Go 96
  • 97. Human-computer results ● 9x9 – 2009: Mogo won a pro with black – 2009: Fuego won a pro with white ● 19x19: – 2008: Mogo won a pro with 9 stones Crazy Stone won a pro with 8 stones Crazy Stone won a pro with 7 stones – 2009: Mogo won a pro with 6 stones MCTS for Computer Go 97
  • 98. MCTS and the old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan 9x9 go Very strong players 1 dan 6 dan 1 dan 19x19 go Strong players 1 kyu Average players 10 kyu MCTS Beginners 20 kyu Old Very beginners 30 kyu approach MCTS for Computer Go 98
  • 99. Computer Go (MC history) ● Monte-Carlo Go (Brugmann 1993) ● MCGo devel. (Bouzy & Helmstetter 2003) ● MC+knowledge (Bouzy 2003) ● UCT (Kocsis & Szepesvari 2006) ● Crazy Stone (Coulom 2006) ● Mogo (Wang & Gelly 2006) MCTS for Computer Go 99
  • 100. Conclusion ● Monte-Carlo brought a Big improvement in Computer Go over the last decade! – No old approach based program anymore! – All go programs are MCTS based! – Professional level on 9x9! – Dan level on 19x19! ● Unbelievable 10 years ago! MCTS for Computer Go 100
  • 101. Some references ● PhD, MCTS and Go (Chaslot 2010) ● PhD, Reinf. Learning and Go (Silver 2010) ● PhD, R. Learning: applic. to Go (Gelly 2007) ● UCT (Kocsis & Szepesvari 2006) ● 1st MCTS go program (Coulom 2006) MCTS for Computer Go 101
  • 102. Web links ● http://www.grappa.univ-lille3.fr/icga/ ● http://cgos.boardspace.net/ ● http://www.gokgs.com/ ● http://www.lri.fr/~gelly/MoGo.htm ● http://remi.coulom.free.fr/CrazyStone/ ● http://fuego.sourceforge.net/ ● ... MCTS for Computer Go 102
  • 103. Thank you for your attention! MCTS for Computer Go 103