SlideShare una empresa de Scribd logo
1 de 72
Sequential decision making:
 decidability and complexity



Searching with partial
observation
 Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project

TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.


Bielefeld
September 2012.
A quite general model

        A directed graph (finite).
 A Starting point on the graph, a target (or
  several targets, with different rewards).
          I want to reach a target.

     Labels(=decisions) on edges:
  Next node = f( current node, decision)

            Each node is either:
      - random node (random decision).
    - decision node (I choose a decision)
  - opponent node (an opponent chooses)
Partial observation



           Each decision node
   is equipped with an observation;
     you can make decisions using
      the list of past observations

       ==> you don't know
     where you are in the graph
Overview

●   10%: overview of Alternating Turing
    machine & computational complexity
                          (great tool for complexity upper bounds)

●   50%: general culture on games
                          (including undecidability)
●   35%: general culture on fictitious play
         (matrix games)       (probably no time for this...)
●   4%: my results on that stuff
    ==> 2 detailed proofs (one new)
    ==> feel free of interrupting
Outline


●   Complexity and ATM


●   Complexity and games (incl. planning)


●   Bounded horizon games
Classical complexity classes
 P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE


 Proved:
 PSPACE ≠ EXPSPACE       P ≠ EXPTIME
 NP ≠ NEXPTIME


 Believed, not proved:
 P≠NP                    EXPTIME≠NEXPTIME
 NEXPTIME≠EXPSPACE
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “for all” states (i.e. several
       transitions, accepts if all transitions
       accept)
●   Co-NTM: TM with “exists” states (i.e.
    several transitions, accepts if at least one
    transition accepts)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “exists” states (i.e.
    several transitions, accepts if at least one
    transition accepts)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “for all” states (i.e.
    several transitions, accepts if all lead to
    accept)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “for all” states (i.e.
    several transitions, accepts if all lead to
    accept)
●   ATM: TM with both “exists” and “for all”
    states.
Alternation
Non-determinism & alternation
Outline


●   Complexity and ATM


●   Complexity and games (incl.
    planning)


●   Bounded horizon games
Computational complexity:
 framework



 Uncertainty can be:
     –   Adversarial: I focus on worst case
     –   Stochastic: I focus on average result
     –   Or both.


 “Stochastic = adversarial” if goal = 100%
 success.
 “Stochastic != adversarial” in the general case.
Computational complexity:
 framework

 Many representations for problems. E.g.:
    –   Succinct: a circuit computes the ith bit of
         the proba that action a leads to a
         transition from s to s'
    –   Compressed: a circuit computes many bits
         simultaneously
    –   Flat: longer encoding (transition tables)

 ==> does not matter for decidability
 ==> matters for complexity
Computational complexity:
 framework

 Many representations for problems. E.g.:
    –   Succinct
    –   Compressed
    –   Flat


 Compressed representation “somehow” natural
 (state space has exponential size, transitions
 are fast): see e.g. Mundhenk for detailed defs
 and flat representations.
Computational complexity:
 framework
 We use mainly compressed representation; see
 also Mundhenk for flat representations.


 Typically, exponentially small representations
 lead to exponentially higher complexity
 ==> but it's not always the case...


 Simple things can change a lot the complexity:
 “superko”: rules forbid twice the same position;
 some fully observable 2Player games become
 EXPSPACE instead of EXP ==> discussed later
Computational complexity: framework
 for first tables of results

  Either search (find a target)
        or optimize (cumulate rewards over time)

  Compressed (written with circuits or others...)
  or not (flat).

  Horizon:
  - Short horizon: horizon ≤ size of input
  - Long horizon: log2(horizon) ≤ size of input
  - Infinite horizon: no limit
Mundhenk's summary: one player,
 limited horizon: expected reward >0 ?
Mundhenk's summary: one player, non-negative
 reward, looking for non-neg. average reward
 (= positive proba of reaching): easier
Complexity, partial observation, infinite
 horizon, proba of reaching a target


●   1P+random, unobservable: undecidable
    (Madani et al)
●   1P+random, P(win=1),
        or equivalently 2P, P(win=1):
                      [Rintanen and refs therein]
         –   Fully observable: EXP   [Littman94]

         –   Unobservable: EXPSPACE       [Hasslum et al 2000]
         –   Partial observability: 2EXP [rintanen, 2003]


             Rmk: “2P, P(win=1)” is not “2P”!
Complexity, partial observation,
 infinite horizon

●   2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]
●   2P (random or not):
       –   Existence of sure win: equiv. to 1P+random !
              ●   EXP full-observable (e.g. Go, Robson 1984)
              ●   PSPACE unobservable
              ●   2EXP partially observable
       –   Existence of sure win, same state forbidden:
            EXPSPACE-complete (Go with Chinese rules ?
            rather conjectured EXPTIME or PSPACE...)
       –   General case (optimal play): undecidable
            (Auger, Teytaud) (what about phantom-Go ?)
Complexity, partial observation

    Remarks:
●   Continuous case ?
●   Purely epistemic (we gather information, we
    don't change the state) ? [Sabbadin et al]
●   Restrictions on the policy, on the set of
    actions...
●   Discounted reward
●   DEC-POMDP, POSG : many players,
    same/opposite/different reward functions...
What are the approaches ?

 –   Dynamic programming              (Massé – Bellman 50's) (still
     the main approach in industry), alpha-beta, retrograde analysis
 –   Reinforcement learning
 –   MCTS (R. Coulom. Efficient Selectivity and Backup
     Operators in Monte-Carlo Tree Search. In
     Proceedings of the 5th International Conference on
     Computers and Games, Turin, Italy, 2006)
 –   Scripts + Tuning / Direct Policy Search
 –   Coevolution


     All have their PO extensions but the two last
     are the most convenient in this case.
Partially observable games

    Many tools for fully observable games.
    Not so many for partially observable ones.


●   Shi-Fu-Mi (Rock Paper Scissor)


●   Card games


●   Phantom games
Shi-Fu-Mi (Rock-Paper-Scissors)
●   Fully observable in simultaneous play, but
    partially observable in turn-based version.




●   Computers stronger than humans (yes, it's
    true).
Card games, phantom games
●   Phantomized version of a game:
       –   You don't see the move of your opponents
       –   If you play an illegal move, you are
              informed that it's illegal, you play again
       –   Usually, you get a few more information
            (captures, threats...) <== game-dependent
●   Phantom-games:
       –   phantom-Chess = Kriegspiel
           ==> Dark Chess: more info
       –   phantom-Go
       –   etc.
Partially observable games
●   Usually quite heuristic algorithms
●   Best performing algorithms combine:
       –   Opponent modelling (as for Shi-Fu-Mi)
       –   Belief state (often by Monte-Carlo
            simulations)
       –   Not a lot of tree search
       –   A lot of tuning
           ==> usually no consistency analysis
Part I: Complexity analysis
(unbounded horizon)


 –   Game:
             ●   One or two players
             ●   Win, loss, draw (incl. endless loop)


 –   Partial observability, no random part


 –   Finite state space:
             ●   state=transition(state,action)
             ●   action decided by each player in turn
State of the art




 - makes sense in fully observable games
 - not so much in non-observable games
State of the art




 EXPTIME-complete in the general
   fully-observable case
EXPTIME-complete fully
observable games


  - Chess (for some nxn generalization)

  - Go (with no superko)

  - Draughts (international or english)

  - Chinese checkers

  - Shogi
PSPACE-complete fully
observable games

    - Amazons
    - Hex                        polynomial horizon
    - Go-moku                              +
    - Connect-6                    full observation
    - Qubic                         ==> PSPACE
    - Reversi
    - Tic-Tac-Toe


      Many games with filling of each cell once and only once
EXPSPACE-complete
unobservable games                (Hasslun & Jonnsson)



      The two-player unobservable case is
      EXPSPACE-complete
      (games in succinct form, infinite horizon).

              (still for 100%win “UD” criterion -
                   for not fully observable cases it
                       is necessary to be precise...)

Importantly, the UD criterion means that strategies
  are the same if the opponent has full observation
 as if he has no observation ==> UD is very bad :-(
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       Actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions (exponential list of actions is enough...)
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).
  PROOF of the hardness:
   Reduction to: is my TM with exponential tape
    going to halt ?

  Consider a TM with tape of size N=2^n.

  We must find a game
  - with size n              ( n= log2(N) )
  - such that the first player has a winning
         strategy for player 1 iff the TM halts.
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


        Player 1 chooses the sequence of
        configurations of the tape (N=4):

         x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
         x(1,1),x(1,2),x(1,3),x(1,4)
         x(2,1),x(2,2),x(2,3),x(2,4)
         x(3,1),x(3,2),x(3,3),x(3,4)
          .....................................
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)
                 x(3,1),x(3,2),x(3,3),x(3,4)
                  .....................................
                 x(N,1), x(N,2), x(N,3), x(N,4)

  Wins by
final state !
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
                 x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
                  ..................................... ==> P2 can check the
                 x(N,1), x(N,2), x(N,3), x(N,4)
                                         consistency of one 3-uple per line

  Wins by                                       ==> requests space log(N)
final state !                                   ( = position of the 3-uple)
EXPSPACE-complete
unobservable games



     The 1P+unknown initial state in the
     unobservable case is
     EXPSPACE-complete
     (games in succinct form).

     2P+unobservable as well.
2EXPTIME-complete PO games




     The two-player PO case,
       or 1P+random PO is
     2EXP-complete
     (games in succinct form).

   (2P = 1P+random because of UD)
Undecidable games               (B. Hearn)




     The three-player PO case is
     undecidable. (two players against one,
     not allowed to communicate)
Hummm ?




 Do you know a PO game in which you can
 ensure a win with probability 1 ?
Another formalization




                       c




 ==> much more satisfactory
    (might have drawbacks as well...)
Madani et al.




                        c




  1 player + random = undecidable
                    (even without opponent!)
Madani et al.

 1 player + random = undecidable.
 ==> answers a (related) question by
          Papadimitriou and Tsitsiklis.

 Proof ?

 Based on the emptiness problem for
 probabilistic finite automata (see Paz 71):

 Given a probabilistic finite automaton,
 is there a word accepted with proba at least c ?
 ==> undecidable
Consequence for unobservable
games




                       c




 1 player + random = undecidable
 ==> 2 players = undecidable.
Proof of “undecidability with 1 player
against random” ==> “undecidability with
2 players”


   How to simulate 1 player + random with 2
   players ?
A random node to be rewritten
A random node to be rewritten
A random node to be rewritten

     Rewritten as follows:
 ●   Player 1 chooses a in [[0,N-1]]
 ●   Player 2 chooses b in [[0,N-1]]
 ●   c=(a+b) modulo N
 ●   Go to tc

Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
   is the same as for for the initial game
==> undecidability!
Important remark



  Existence of a strategy for winning with
  proba 0.5 = also undecidable for the
  restriction to games in which the proba
  is >0.6 or <0.4 ==> not just a subtle
  precision trouble.
So what ?

 We have seen that
           unbounded horizon
        + partial observability
        + natural criterion (not sure win)
        ==> undecidability
        contrarily to what is expected from usual definitions.



 What about bounded horizon, 2P ?
    –   Clearly decidable
    –   Complexity ?
    –   Algorithms ? (==> coevolution & LP)
Complexity (2P, 0-sum, no
  random)
                   Unbounded               Exponential    Polynomial
                     horizon                horizon        horizon
Full
Observability         EXP                   EXP          PSPACE

No obs             EXPSPACE                 NEXP
(X=100%)           (Hasslum et al, 2000)


Partially             2EXP                  EXPSPACE
Observable             (Rintanen)           (Mundhenk)
(X=100%)

Simult. Actions   ? EXPSPACE ?             <<<= EXP        <<<= EXP

No obs             undecidable             <=2EXP (PL)           <=EXP (PL)
                                               (concise matrix games)
Partially          undecidable             <= 2EXP (PL)          <= EXP (PL)
Observable
Part II: Fictitious play (bounded
horizon) in the antagonist case

    Fictitious play ?
    Somehow an abstract version of
    antagonist coevolution with full memory

●   illimited population (finite, but
    increasing): one more indiv. per iteration
●   perfect choice of each mutation against
    the current population of opponents
Part II: Fictitious play in the
zero-sum case

  Why zero-sum cases ?


  Evolutionary stable solutions (found by
  FP) are usually sub-optimal (as well as nature,
  for choosing lion's strategies or cheating behaviors in Scaly-
  breasted Munia)
What is a matrix 0-sum game ?


●   A matrix M is given (type n x m).
●   Player 1 chooses (privately) i in [[1,n]]
●   Player 2 chooses              j in [[1,n]]
●   Reward
      = Mij for player 1
      = -Mij for player 2 (zero-sum game)
    ==> Model for finite antagonist games
Nash equilibrium

●   Nash equilibrium: there is a distribution
    of probability for each player
             (= mixed strategy)
    such that the reward is optimum (for the
    worst case on the distribution of
    probabilities by the opponent)
●   Linear programming is a polynomial
    algorithm for finding the Nash eq.
●   FP= tool for approximating it
                   (at least in 0-sum cases)
Fictitious play                 (Brown 1949)


●   Each player starts with a distribution on
    its strategies
●   Each player in turn:
       –   Finds an optimal strategy against the
             current opponent's distribution (randomly
            break ties)

       –   Adds it to its distribution (the distrib. does
            not sum to 1!)
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Rock-paper-scissor


●   Rock:1, Papers=0, Scissors:0
●   RPS1=(1,0,0)          RPS2=(1,0,0)
●   RPS1=(1,1,0)          RPS2=(1,1,0)
●   RPS1=(1,2,0)          RPS2=(1,1,1)
●   RPS1=(1,3,0)          RPS2=(1,1,2)
●   RPS1=(2,3,0)          RPS2=(1,2,2)
●   …
    ===> converges to Nash (Robinson 51)
Fictitious play

  TODO
Improvements for KxK matrix
game: approximations

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
Improvements for KxK matrix
game: exact solution if k-sparse

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
Improvements for KxK matrix
game: approximations

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
●   Exact solution in time         (Auger, Ruette, Teytaud)

      O (K log K · k 2k + poly(k) )
       if solution k-sparse (good only if k
       smaller than log(K)/log(log(K)) !
       better ?)
Improvements for KxK matrix
game: approximations

 So, LP & FP are two tools for matrix
 games.


 LP programming can be adapted to PO
 games without building the complete
 matrix (using information sets).


 The same for FP variants ?
Conclusions

  There are still natural questions which
  provide nice decidability problems
  Madani et al (1 player against random, no observability), extended here to
  2 players with no random



  ==> undecidable problems “less than”
     the Halting problem ?

  Solving zero-sum matrix-games is still an
  active area of research
                ●   Approximate cases
                ●   Sparse case
Open problems

●   Phantom-Go undecidable ?             (or other “real” game...)
●   Complexity of Go with Chinese rules ?
      (conjectured: PSPACE or EXPTIME;
       proved PSPACE-hard + EXPSPACE)
●   More to say about “epistemic” games (internal
    state not modified)
●   Frontier of undecidability in PO games ?
    (100% halting game: 2P become decidable)
●   Chess with finitely many pieces on infinite board:
    decidability of forced-mate ?
    (n-move: Brumleve et al, 2012, simulation in Presburger
                                               (thanks S. Riis :-) )

Más contenido relacionado

Destacado

Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
 
Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Olivier Teytaud
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systemsOlivier Teytaud
 
Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Olivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchOlivier Teytaud
 
Tools for artificial intelligence
Tools for artificial intelligenceTools for artificial intelligence
Tools for artificial intelligenceOlivier Teytaud
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyOlivier Teytaud
 
Computers and Killall-Go
Computers and Killall-GoComputers and Killall-Go
Computers and Killall-GoOlivier Teytaud
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
 

Destacado (11)

Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 
Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...
 
3slides
3slides3slides
3slides
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree Search
 
Tools for artificial intelligence
Tools for artificial intelligenceTools for artificial intelligence
Tools for artificial intelligence
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 
Computers and Killall-Go
Computers and Killall-GoComputers and Killall-Go
Computers and Killall-Go
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for Minesweeper
 

Similar a Complexity of planning and games with partial information

Class 16: Making Loops
Class 16: Making LoopsClass 16: Making Loops
Class 16: Making LoopsDavid Evans
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiouvafopoulos
 
Imperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsImperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsPaolo Penna
 
Recurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineRecurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineGAYO3
 
Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Amine Bendahmane
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies vpnmentor
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrenciesvpnmentor
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
Lecture 1
Lecture 1Lecture 1
Lecture 1butest
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence gamesSujithmlamthadam
 

Similar a Complexity of planning and games with partial information (20)

Class 16: Making Loops
Class 16: Making LoopsClass 16: Making Loops
Class 16: Making Loops
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou
 
Imperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsImperfect Best-Response Mechanisms
Imperfect Best-Response Mechanisms
 
Pathfinding in games
Pathfinding in gamesPathfinding in games
Pathfinding in games
 
Recurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineRecurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machine
 
Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)
 
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrencies
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
AI.ppt
AI.pptAI.ppt
AI.ppt
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Daa notes 2
Daa notes 2Daa notes 2
Daa notes 2
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Introduction
IntroductionIntroduction
Introduction
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence games
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Complexity of planning and games with partial information

  • 1. Sequential decision making: decidability and complexity Searching with partial observation Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ. Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. Bielefeld September 2012.
  • 2. A quite general model A directed graph (finite). A Starting point on the graph, a target (or several targets, with different rewards). I want to reach a target. Labels(=decisions) on edges: Next node = f( current node, decision) Each node is either: - random node (random decision). - decision node (I choose a decision) - opponent node (an opponent chooses)
  • 3. Partial observation Each decision node is equipped with an observation; you can make decisions using the list of past observations ==> you don't know where you are in the graph
  • 4. Overview ● 10%: overview of Alternating Turing machine & computational complexity (great tool for complexity upper bounds) ● 50%: general culture on games (including undecidability) ● 35%: general culture on fictitious play (matrix games) (probably no time for this...) ● 4%: my results on that stuff ==> 2 detailed proofs (one new) ==> feel free of interrupting
  • 5. Outline ● Complexity and ATM ● Complexity and games (incl. planning) ● Bounded horizon games
  • 6. Classical complexity classes P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
  • 7. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept) ● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts) ● ATM: TM with both “exists” and “for all” states.
  • 8. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts) ● ATM: TM with both “exists” and “for all” states.
  • 9. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept) ● ATM: TM with both “exists” and “for all” states.
  • 10. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept) ● ATM: TM with both “exists” and “for all” states.
  • 13. Outline ● Complexity and ATM ● Complexity and games (incl. planning) ● Bounded horizon games
  • 14. Computational complexity: framework Uncertainty can be: – Adversarial: I focus on worst case – Stochastic: I focus on average result – Or both. “Stochastic = adversarial” if goal = 100% success. “Stochastic != adversarial” in the general case.
  • 15. Computational complexity: framework Many representations for problems. E.g.: – Succinct: a circuit computes the ith bit of the proba that action a leads to a transition from s to s' – Compressed: a circuit computes many bits simultaneously – Flat: longer encoding (transition tables) ==> does not matter for decidability ==> matters for complexity
  • 16. Computational complexity: framework Many representations for problems. E.g.: – Succinct – Compressed – Flat Compressed representation “somehow” natural (state space has exponential size, transitions are fast): see e.g. Mundhenk for detailed defs and flat representations.
  • 17. Computational complexity: framework We use mainly compressed representation; see also Mundhenk for flat representations. Typically, exponentially small representations lead to exponentially higher complexity ==> but it's not always the case... Simple things can change a lot the complexity: “superko”: rules forbid twice the same position; some fully observable 2Player games become EXPSPACE instead of EXP ==> discussed later
  • 18. Computational complexity: framework for first tables of results Either search (find a target) or optimize (cumulate rewards over time) Compressed (written with circuits or others...) or not (flat). Horizon: - Short horizon: horizon ≤ size of input - Long horizon: log2(horizon) ≤ size of input - Infinite horizon: no limit
  • 19. Mundhenk's summary: one player, limited horizon: expected reward >0 ?
  • 20. Mundhenk's summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): easier
  • 21. Complexity, partial observation, infinite horizon, proba of reaching a target ● 1P+random, unobservable: undecidable (Madani et al) ● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP [rintanen, 2003] Rmk: “2P, P(win=1)” is not “2P”!
  • 22. Complexity, partial observation, infinite horizon ● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine] ● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
  • 23. Complexity, partial observation Remarks: ● Continuous case ? ● Purely epistemic (we gather information, we don't change the state) ? [Sabbadin et al] ● Restrictions on the policy, on the set of actions... ● Discounted reward ● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
  • 24. What are the approaches ? – Dynamic programming (Massé – Bellman 50's) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution All have their PO extensions but the two last are the most convenient in this case.
  • 25. Partially observable games Many tools for fully observable games. Not so many for partially observable ones. ● Shi-Fu-Mi (Rock Paper Scissor) ● Card games ● Phantom games
  • 26. Shi-Fu-Mi (Rock-Paper-Scissors) ● Fully observable in simultaneous play, but partially observable in turn-based version. ● Computers stronger than humans (yes, it's true).
  • 27. Card games, phantom games ● Phantomized version of a game: – You don't see the move of your opponents – If you play an illegal move, you are informed that it's illegal, you play again – Usually, you get a few more information (captures, threats...) <== game-dependent ● Phantom-games: – phantom-Chess = Kriegspiel ==> Dark Chess: more info – phantom-Go – etc.
  • 28. Partially observable games ● Usually quite heuristic algorithms ● Best performing algorithms combine: – Opponent modelling (as for Shi-Fu-Mi) – Belief state (often by Monte-Carlo simulations) – Not a lot of tree search – A lot of tuning ==> usually no consistency analysis
  • 29. Part I: Complexity analysis (unbounded horizon) – Game: ● One or two players ● Win, loss, draw (incl. endless loop) – Partial observability, no random part – Finite state space: ● state=transition(state,action) ● action decided by each player in turn
  • 30. State of the art - makes sense in fully observable games - not so much in non-observable games
  • 31. State of the art EXPTIME-complete in the general fully-observable case
  • 32. EXPTIME-complete fully observable games - Chess (for some nxn generalization) - Go (with no superko) - Draughts (international or english) - Chinese checkers - Shogi
  • 33. PSPACE-complete fully observable games - Amazons - Hex polynomial horizon - Go-moku + - Connect-6 full observation - Qubic ==> PSPACE - Reversi - Tic-Tac-Toe Many games with filling of each cell once and only once
  • 34. EXPSPACE-complete unobservable games (Hasslun & Jonnsson) The two-player unobservable case is EXPSPACE-complete (games in succinct form, infinite horizon). (still for 100%win “UD” criterion - for not fully observable cases it is necessary to be precise...) Importantly, the UD criterion means that strategies are the same if the opponent has full observation as if he has no observation ==> UD is very bad :-(
  • 35. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 36. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (exponential list of actions is enough...) (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 37. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 38. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ? Consider a TM with tape of size N=2^n. We must find a game - with size n ( n= log2(N) ) - such that the first player has a winning strategy for player 1 iff the TM halts.
  • 39. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) .....................................
  • 40. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4) Wins by final state !
  • 41. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an x(3,1),x(3,2),x(3,3),x(3,4) illegal transition! ..................................... ==> P2 can check the x(N,1), x(N,2), x(N,3), x(N,4) consistency of one 3-uple per line Wins by ==> requests space log(N) final state ! ( = position of the 3-uple)
  • 42. EXPSPACE-complete unobservable games The 1P+unknown initial state in the unobservable case is EXPSPACE-complete (games in succinct form). 2P+unobservable as well.
  • 43. 2EXPTIME-complete PO games The two-player PO case, or 1P+random PO is 2EXP-complete (games in succinct form). (2P = 1P+random because of UD)
  • 44. Undecidable games (B. Hearn) The three-player PO case is undecidable. (two players against one, not allowed to communicate)
  • 45. Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
  • 46. Another formalization c ==> much more satisfactory (might have drawbacks as well...)
  • 47. Madani et al. c 1 player + random = undecidable (even without opponent!)
  • 48. Madani et al. 1 player + random = undecidable. ==> answers a (related) question by Papadimitriou and Tsitsiklis. Proof ? Based on the emptiness problem for probabilistic finite automata (see Paz 71): Given a probabilistic finite automaton, is there a word accepted with proba at least c ? ==> undecidable
  • 49. Consequence for unobservable games c 1 player + random = undecidable ==> 2 players = undecidable.
  • 50. Proof of “undecidability with 1 player against random” ==> “undecidability with 2 players” How to simulate 1 player + random with 2 players ?
  • 51. A random node to be rewritten
  • 52. A random node to be rewritten
  • 53. A random node to be rewritten Rewritten as follows: ● Player 1 chooses a in [[0,N-1]] ● Player 2 chooses b in [[0,N-1]] ● c=(a+b) modulo N ● Go to tc Each player can force the game to be equivalent to the initial one (by playing uniformly) ==> the proba of winning for player 1 (in case of perfect play) is the same as for for the initial game ==> undecidability!
  • 54. Important remark Existence of a strategy for winning with proba 0.5 = also undecidable for the restriction to games in which the proba is >0.6 or <0.4 ==> not just a subtle precision trouble.
  • 55. So what ? We have seen that unbounded horizon + partial observability + natural criterion (not sure win) ==> undecidability contrarily to what is expected from usual definitions. What about bounded horizon, 2P ? – Clearly decidable – Complexity ? – Algorithms ? (==> coevolution & LP)
  • 56. Complexity (2P, 0-sum, no random) Unbounded Exponential Polynomial horizon horizon horizon Full Observability EXP EXP PSPACE No obs EXPSPACE NEXP (X=100%) (Hasslum et al, 2000) Partially 2EXP EXPSPACE Observable (Rintanen) (Mundhenk) (X=100%) Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP No obs undecidable <=2EXP (PL) <=EXP (PL) (concise matrix games) Partially undecidable <= 2EXP (PL) <= EXP (PL) Observable
  • 57. Part II: Fictitious play (bounded horizon) in the antagonist case Fictitious play ? Somehow an abstract version of antagonist coevolution with full memory ● illimited population (finite, but increasing): one more indiv. per iteration ● perfect choice of each mutation against the current population of opponents
  • 58. Part II: Fictitious play in the zero-sum case Why zero-sum cases ? Evolutionary stable solutions (found by FP) are usually sub-optimal (as well as nature, for choosing lion's strategies or cheating behaviors in Scaly- breasted Munia)
  • 59. What is a matrix 0-sum game ? ● A matrix M is given (type n x m). ● Player 1 chooses (privately) i in [[1,n]] ● Player 2 chooses j in [[1,n]] ● Reward = Mij for player 1 = -Mij for player 2 (zero-sum game) ==> Model for finite antagonist games
  • 60. Nash equilibrium ● Nash equilibrium: there is a distribution of probability for each player (= mixed strategy) such that the reward is optimum (for the worst case on the distribution of probabilities by the opponent) ● Linear programming is a polynomial algorithm for finding the Nash eq. ● FP= tool for approximating it (at least in 0-sum cases)
  • 61. Fictitious play (Brown 1949) ● Each player starts with a distribution on its strategies ● Each player in turn: – Finds an optimal strategy against the current opponent's distribution (randomly break ties) – Adds it to its distribution (the distrib. does not sum to 1!)
  • 62. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 63. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 64. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 65. Rock-paper-scissor ● Rock:1, Papers=0, Scissors:0 ● RPS1=(1,0,0) RPS2=(1,0,0) ● RPS1=(1,1,0) RPS2=(1,1,0) ● RPS1=(1,2,0) RPS2=(1,1,1) ● RPS1=(1,3,0) RPS2=(1,1,2) ● RPS1=(2,3,0) RPS2=(1,2,2) ● … ===> converges to Nash (Robinson 51)
  • 67. Improvements for KxK matrix game: approximations ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 68. Improvements for KxK matrix game: exact solution if k-sparse ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 69. Improvements for KxK matrix game: approximations ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP ● Exact solution in time (Auger, Ruette, Teytaud) O (K log K · k 2k + poly(k) ) if solution k-sparse (good only if k smaller than log(K)/log(log(K)) ! better ?)
  • 70. Improvements for KxK matrix game: approximations So, LP & FP are two tools for matrix games. LP programming can be adapted to PO games without building the complete matrix (using information sets). The same for FP variants ?
  • 71. Conclusions There are still natural questions which provide nice decidability problems Madani et al (1 player against random, no observability), extended here to 2 players with no random ==> undecidable problems “less than” the Halting problem ? Solving zero-sum matrix-games is still an active area of research ● Approximate cases ● Sparse case
  • 72. Open problems ● Phantom-Go undecidable ? (or other “real” game...) ● Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE) ● More to say about “epistemic” games (internal state not modified) ● Frontier of undecidability in PO games ? (100% halting game: 2P become decidable) ● Chess with finitely many pieces on infinite board: decidability of forced-mate ? (n-move: Brumleve et al, 2012, simulation in Presburger (thanks S. Riis :-) )