Survey of computational complexity or computability of sequential decision making (games, planning)
contains two more detailed proofs:
- EXPSPACE completeness of unobservable adversarial planning for existence of 100% winning strategy (Hasslum et al)
- undecidability of unobservable adversarial planning for arbitrary winning rate (including optimal play in the Nash sense)
Unraveling Multimodality with Large Language Models.pdf
Complexity of planning and games with partial information
1. Sequential decision making:
decidability and complexity
Searching with partial
observation
Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project
TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.
Bielefeld
September 2012.
2. A quite general model
A directed graph (finite).
A Starting point on the graph, a target (or
several targets, with different rewards).
I want to reach a target.
Labels(=decisions) on edges:
Next node = f( current node, decision)
Each node is either:
- random node (random decision).
- decision node (I choose a decision)
- opponent node (an opponent chooses)
3. Partial observation
Each decision node
is equipped with an observation;
you can make decisions using
the list of past observations
==> you don't know
where you are in the graph
4. Overview
● 10%: overview of Alternating Turing
machine & computational complexity
(great tool for complexity upper bounds)
● 50%: general culture on games
(including undecidability)
● 35%: general culture on fictitious play
(matrix games) (probably no time for this...)
● 4%: my results on that stuff
==> 2 detailed proofs (one new)
==> feel free of interrupting
5. Outline
● Complexity and ATM
● Complexity and games (incl. planning)
● Bounded horizon games
7. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “for all” states (i.e. several
transitions, accepts if all transitions
accept)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
8. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “exists” states (i.e.
several transitions, accepts if at least one
transition accepts)
● ATM: TM with both “exists” and “for all”
states.
9. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
10. Complexity and alternating
Turing machines
● Turing machine (TM)= abstract computer
● Non-deterministic Turing Machine (NTM)
= TM with “exists” states (i.e. several
transitions, accepts if at least one
accepts)
● Co-NTM: TM with “for all” states (i.e.
several transitions, accepts if all lead to
accept)
● ATM: TM with both “exists” and “for all”
states.
13. Outline
● Complexity and ATM
● Complexity and games (incl.
planning)
● Bounded horizon games
14. Computational complexity:
framework
Uncertainty can be:
– Adversarial: I focus on worst case
– Stochastic: I focus on average result
– Or both.
“Stochastic = adversarial” if goal = 100%
success.
“Stochastic != adversarial” in the general case.
15. Computational complexity:
framework
Many representations for problems. E.g.:
– Succinct: a circuit computes the ith bit of
the proba that action a leads to a
transition from s to s'
– Compressed: a circuit computes many bits
simultaneously
– Flat: longer encoding (transition tables)
==> does not matter for decidability
==> matters for complexity
16. Computational complexity:
framework
Many representations for problems. E.g.:
– Succinct
– Compressed
– Flat
Compressed representation “somehow” natural
(state space has exponential size, transitions
are fast): see e.g. Mundhenk for detailed defs
and flat representations.
17. Computational complexity:
framework
We use mainly compressed representation; see
also Mundhenk for flat representations.
Typically, exponentially small representations
lead to exponentially higher complexity
==> but it's not always the case...
Simple things can change a lot the complexity:
“superko”: rules forbid twice the same position;
some fully observable 2Player games become
EXPSPACE instead of EXP ==> discussed later
18. Computational complexity: framework
for first tables of results
Either search (find a target)
or optimize (cumulate rewards over time)
Compressed (written with circuits or others...)
or not (flat).
Horizon:
- Short horizon: horizon ≤ size of input
- Long horizon: log2(horizon) ≤ size of input
- Infinite horizon: no limit
20. Mundhenk's summary: one player, non-negative
reward, looking for non-neg. average reward
(= positive proba of reaching): easier
21. Complexity, partial observation, infinite
horizon, proba of reaching a target
● 1P+random, unobservable: undecidable
(Madani et al)
● 1P+random, P(win=1),
or equivalently 2P, P(win=1):
[Rintanen and refs therein]
– Fully observable: EXP [Littman94]
– Unobservable: EXPSPACE [Hasslum et al 2000]
– Partial observability: 2EXP [rintanen, 2003]
Rmk: “2P, P(win=1)” is not “2P”!
22. Complexity, partial observation,
infinite horizon
● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]
● 2P (random or not):
– Existence of sure win: equiv. to 1P+random !
● EXP full-observable (e.g. Go, Robson 1984)
● PSPACE unobservable
● 2EXP partially observable
– Existence of sure win, same state forbidden:
EXPSPACE-complete (Go with Chinese rules ?
rather conjectured EXPTIME or PSPACE...)
– General case (optimal play): undecidable
(Auger, Teytaud) (what about phantom-Go ?)
23. Complexity, partial observation
Remarks:
● Continuous case ?
● Purely epistemic (we gather information, we
don't change the state) ? [Sabbadin et al]
● Restrictions on the policy, on the set of
actions...
● Discounted reward
● DEC-POMDP, POSG : many players,
same/opposite/different reward functions...
24. What are the approaches ?
– Dynamic programming (Massé – Bellman 50's) (still
the main approach in industry), alpha-beta, retrograde analysis
– Reinforcement learning
– MCTS (R. Coulom. Efficient Selectivity and Backup
Operators in Monte-Carlo Tree Search. In
Proceedings of the 5th International Conference on
Computers and Games, Turin, Italy, 2006)
– Scripts + Tuning / Direct Policy Search
– Coevolution
All have their PO extensions but the two last
are the most convenient in this case.
25. Partially observable games
Many tools for fully observable games.
Not so many for partially observable ones.
● Shi-Fu-Mi (Rock Paper Scissor)
● Card games
● Phantom games
26. Shi-Fu-Mi (Rock-Paper-Scissors)
● Fully observable in simultaneous play, but
partially observable in turn-based version.
● Computers stronger than humans (yes, it's
true).
27. Card games, phantom games
● Phantomized version of a game:
– You don't see the move of your opponents
– If you play an illegal move, you are
informed that it's illegal, you play again
– Usually, you get a few more information
(captures, threats...) <== game-dependent
● Phantom-games:
– phantom-Chess = Kriegspiel
==> Dark Chess: more info
– phantom-Go
– etc.
28. Partially observable games
● Usually quite heuristic algorithms
● Best performing algorithms combine:
– Opponent modelling (as for Shi-Fu-Mi)
– Belief state (often by Monte-Carlo
simulations)
– Not a lot of tree search
– A lot of tuning
==> usually no consistency analysis
29. Part I: Complexity analysis
(unbounded horizon)
– Game:
● One or two players
● Win, loss, draw (incl. endless loop)
– Partial observability, no random part
– Finite state space:
● state=transition(state,action)
● action decided by each player in turn
30. State of the art
- makes sense in fully observable games
- not so much in non-observable games
31. State of the art
EXPTIME-complete in the general
fully-observable case
32. EXPTIME-complete fully
observable games
- Chess (for some nxn generalization)
- Go (with no superko)
- Draughts (international or english)
- Chinese checkers
- Shogi
33. PSPACE-complete fully
observable games
- Amazons
- Hex polynomial horizon
- Go-moku +
- Connect-6 full observation
- Qubic ==> PSPACE
- Reversi
- Tic-Tac-Toe
Many games with filling of each cell once and only once
34. EXPSPACE-complete
unobservable games (Hasslun & Jonnsson)
The two-player unobservable case is
EXPSPACE-complete
(games in succinct form, infinite horizon).
(still for 100%win “UD” criterion -
for not fully observable cases it
is necessary to be precise...)
Importantly, the UD criterion means that strategies
are the same if the opponent has full observation
as if he has no observation ==> UD is very bad :-(
35. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
Actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
36. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions (exponential list of actions is enough...)
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
37. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF:
(I) First note that strategies are just sequences of actions
(no observability!)
(II) It is in EXPSPACE=NEXPSPACE, because of the
following algorithm:
(a) Non-deterministically choose the sequence of
actions
(b) Check the result against all possible strategies
(III) We have to check the hardness only.
38. E X P S P Atwo-player unobservable case is
The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
EXPSPACE-complete
(games in succinct form).
PROOF of the hardness:
Reduction to: is my TM with exponential tape
going to halt ?
Consider a TM with tape of size N=2^n.
We must find a game
- with size n ( n= log2(N) )
- such that the first player has a winning
strategy for player 1 iff the TM halts.
39. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
40. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)
x(3,1),x(3,2),x(3,3),x(3,4)
.....................................
x(N,1), x(N,2), x(N,3), x(N,4)
Wins by
final state !
41. EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
n o b s e a b l e g a m e with a s l u n of n n s n
as a game with state O(log(N))
Player 1 chooses the sequence of
configurations of the tape (N=4):
x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
x(1,1),x(1,2),x(1,3),x(1,4)
x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
..................................... ==> P2 can check the
x(N,1), x(N,2), x(N,3), x(N,4)
consistency of one 3-uple per line
Wins by ==> requests space log(N)
final state ! ( = position of the 3-uple)
42. EXPSPACE-complete
unobservable games
The 1P+unknown initial state in the
unobservable case is
EXPSPACE-complete
(games in succinct form).
2P+unobservable as well.
43. 2EXPTIME-complete PO games
The two-player PO case,
or 1P+random PO is
2EXP-complete
(games in succinct form).
(2P = 1P+random because of UD)
44. Undecidable games (B. Hearn)
The three-player PO case is
undecidable. (two players against one,
not allowed to communicate)
45. Hummm ?
Do you know a PO game in which you can
ensure a win with probability 1 ?
46. Another formalization
c
==> much more satisfactory
(might have drawbacks as well...)
47. Madani et al.
c
1 player + random = undecidable
(even without opponent!)
48. Madani et al.
1 player + random = undecidable.
==> answers a (related) question by
Papadimitriou and Tsitsiklis.
Proof ?
Based on the emptiness problem for
probabilistic finite automata (see Paz 71):
Given a probabilistic finite automaton,
is there a word accepted with proba at least c ?
==> undecidable
53. A random node to be rewritten
Rewritten as follows:
● Player 1 chooses a in [[0,N-1]]
● Player 2 chooses b in [[0,N-1]]
● c=(a+b) modulo N
● Go to tc
Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
is the same as for for the initial game
==> undecidability!
54. Important remark
Existence of a strategy for winning with
proba 0.5 = also undecidable for the
restriction to games in which the proba
is >0.6 or <0.4 ==> not just a subtle
precision trouble.
55. So what ?
We have seen that
unbounded horizon
+ partial observability
+ natural criterion (not sure win)
==> undecidability
contrarily to what is expected from usual definitions.
What about bounded horizon, 2P ?
– Clearly decidable
– Complexity ?
– Algorithms ? (==> coevolution & LP)
57. Part II: Fictitious play (bounded
horizon) in the antagonist case
Fictitious play ?
Somehow an abstract version of
antagonist coevolution with full memory
● illimited population (finite, but
increasing): one more indiv. per iteration
● perfect choice of each mutation against
the current population of opponents
58. Part II: Fictitious play in the
zero-sum case
Why zero-sum cases ?
Evolutionary stable solutions (found by
FP) are usually sub-optimal (as well as nature,
for choosing lion's strategies or cheating behaviors in Scaly-
breasted Munia)
59. What is a matrix 0-sum game ?
● A matrix M is given (type n x m).
● Player 1 chooses (privately) i in [[1,n]]
● Player 2 chooses j in [[1,n]]
● Reward
= Mij for player 1
= -Mij for player 2 (zero-sum game)
==> Model for finite antagonist games
60. Nash equilibrium
● Nash equilibrium: there is a distribution
of probability for each player
(= mixed strategy)
such that the reward is optimum (for the
worst case on the distribution of
probabilities by the opponent)
● Linear programming is a polynomial
algorithm for finding the Nash eq.
● FP= tool for approximating it
(at least in 0-sum cases)
61. Fictitious play (Brown 1949)
● Each player starts with a distribution on
its strategies
● Each player in turn:
– Finds an optimal strategy against the
current opponent's distribution (randomly
break ties)
– Adds it to its distribution (the distrib. does
not sum to 1!)
67. Improvements for KxK matrix
game: approximations
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
68. Improvements for KxK matrix
game: exact solution if k-sparse
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
69. Improvements for KxK matrix
game: approximations
● There exists approximations in size
O(log(K)/2) [Althoefer]
● Such an approximation can be found in
time O(Klog K / 2) [Grigoriadis et al]: basically a
stochastic FP
● Exact solution in time (Auger, Ruette, Teytaud)
O (K log K · k 2k + poly(k) )
if solution k-sparse (good only if k
smaller than log(K)/log(log(K)) !
better ?)
70. Improvements for KxK matrix
game: approximations
So, LP & FP are two tools for matrix
games.
LP programming can be adapted to PO
games without building the complete
matrix (using information sets).
The same for FP variants ?
71. Conclusions
There are still natural questions which
provide nice decidability problems
Madani et al (1 player against random, no observability), extended here to
2 players with no random
==> undecidable problems “less than”
the Halting problem ?
Solving zero-sum matrix-games is still an
active area of research
● Approximate cases
● Sparse case
72. Open problems
● Phantom-Go undecidable ? (or other “real” game...)
● Complexity of Go with Chinese rules ?
(conjectured: PSPACE or EXPTIME;
proved PSPACE-hard + EXPSPACE)
● More to say about “epistemic” games (internal
state not modified)
● Frontier of undecidability in PO games ?
(100% halting game: 2P become decidable)
● Chess with finitely many pieces on infinite board:
decidability of forced-mate ?
(n-move: Brumleve et al, 2012, simulation in Presburger
(thanks S. Riis :-) )