6. TYPES OF BANDIT PROBLEM
2/8/2016
6
Psy606:HumanProblemSolvingPurdue
University
Stationary
Fixed
Horizon
Infinite
Horizon
Dynamic
(Restless)
Fixed
Horizon
Infinite
Horizon
7. TYPES OF BANDIT PROBLEMS
2/8/2016
7
Psy606:HumanProblemSolvingPurdue
University
One-armed
refers to a choice between an option with
a known payout versus a different option
with an unknown payout.
8. TYPES OF BANDIT PROBLEMS
2/8/2016
8
Psy606:HumanProblemSolvingPurdue
University
Multi-armed
refer to the situation where there
are multiple unknown alternatives.
9. APPLICATIONS
Problem of managing research projects
Stock Market
Sport coaches track changes in team performance
Drivers choosing number of routes
2/8/2016
9
Psy606:HumanProblemSolvingPurdue
University
10. EXPLORATION AND EXPLOITATION
Exploration-
Selection task is to get information about the hidden
arms.
Exploitation-
a focus on a single arm, in order to obtain rewards
from an option that is believed to be sufficiently good
as compared to the other competing options.
Expected behavior :
Exploration Exploitation.
2/8/2016
10
Psy606:HumanProblemSolvingPurdue
University
11. SHORT SUMMARY
Stationary Bandit Problem-
The reward rate for each alternative is kept constant over
all of the trials.
The number of trials in each game may be known,
creating a finite horizon problem, or unknown, creating an
infinite horizon problem.
Optimal solutions can be found for all cases in
finite horizon environments by using a dynamic
programming approach, where optimal decisions
are computed for all potential cases starting from
the final trial and solving for each trial toward the
first (Kaelbling et al., 1996).
2/8/2016
11
Psy606:HumanProblemSolvingPurdue
University
12. SHORT SUMMARY
As the length of a game increases or the number of
alternatives increases, the computation necessary
to create a complete decision tree increases
exponentially.
Restless Bandit Problems-
The rewards rates for alternatives may change over
time, rather than remaining stationary through each
trail of the game.
Change detection, forcing a switch between
exploration and exploitation
2/8/2016
12
Psy606:HumanProblemSolvingPurdue
University
13. OPTIMAL SOLUTIONS VERSUS HEURISTICS
Tend to be fairly ponderous in terms of
computational cost and often can only be applied in
limited situations.
Heuristics are geared towards obtaining
performance that, while not optimal, is still good but
with comparatively much less work.
Of course, there are also models that fall between
the two extremes in complexity - the particle filter
model used in paper can't really be counted in
either of the two groups
2/8/2016
13
Psy606:HumanProblemSolvingPurdue
University
14. GITTINS INDEX?
A Gittins index gives each alternative an utility that
takes into account an alternative’s current estimated
value and the information that can be gained from
choosing the alternative; the optimal
decision is the arm which has the largest index value.
Gittins indices are only applicable to a limited number
of bandit problems, and can be difficult to compute
even in those cases (Berry & Fristedt, 1985).
2/8/2016
14
Psy606:HumanProblemSolvingPurdue
University
15. GITTINS INDEX?
The Gittins index is a measure of the reward that
can be achieved by a process evolving from its
present state onwards with the probability that it will
be terminated in the future.
It is a real scalar value associated to the state of a
stochastic process with a reward function and with
a probability of termination.
2/8/2016
15
Psy606:HumanProblemSolvingPurdue
University
16. TSP AND BANDIT PROBLEMS ?SAME?
Bandit problems are highly sequential, where
information you gain on each trial can be used to
inform your decisions on subsequent trials.
TSPs are spatial tasks where generally, all
information is available at the outset of the task.
The connections you make between nodes on each
step are really only sequential in the sense that
they aren't made simultaneously.
2/8/2016
16
Psy606:HumanProblemSolvingPurdue
University
17. AUTHORS’ MOTIVATION?
When optimal solutions are available, bandit problems
provide an opportunity to examine whether or how people
make the best possible decisions.
For this reason, many previous empirical studies have
been motivated by economic theories, with a focus on
deviations from rationality in human decision-making (e.g.,
Banks, Olson, & Porter, 1997;Meyer & Shi, 1995).
More recently, human performance on the bandit
problem has been studied within cognitive neuroscience
(e.g., Cohen, McClure, &Yu, 2007; Daw, O’Doherty,
Dayan, Seymour, & Dolan, 2006) and probabilistic models
of human cognition (e.g., Steyvers,
Lee, & Wagenmakers, 2009).
2/8/2016
17
Psy606:HumanProblemSolvingPurdue
University
18. PARTICLE FILTERS
http://www.youtube.com/watch?v=O-lAJVra1PU
2/8/2016
18
Psy606:HumanProblemSolvingPurdue
University
Particle Filter MCMC
Depending on the design need less
computation time
More computation time with
increasing information
A sophisticated model estimation
technique based on simulation.
Particle filters are usually used to
estimate Bayesian models in which
the latent variables are connected
in a Markov chain
A class of algorithms for sampling
from probability distributions based
on constructing a Markov chain that
has the desired distribution as its
equilibrium distribution.
Estimate only the distribution of only
one of the latent variables at a time,
rather than attempting to estimate
them all at once, and produce a set
of weighted samples, rather than a
(usually much larger) set of
unweighted samples.
19. THE PAPER
Modeling Human Performance ?
in Restless Bandits ?
with Particle Filters?
(Fall 2009)
2/8/2016
19
Psy606:HumanProblemSolvingPurdue
University
20. EXPERIMENT 1
Restless bandit problem is an extension of
sequential stationary infinite-horizon problems.
The behavior of human participants in restless
bandit environment is observed and compared to
two different particle filter methods of solutions.
Optimal ,other sub optimal
27 participants ,UCI, course credit
2/8/2016
20
Psy606:HumanProblemSolvingPurdue
University
24. OVER ALL CONCLUSIONS?
Many potential applications:
Clinical trials
Advertising: what ad to put on a web-page?
Labor markets: which job a worker should choose?
Optimization of noisy function
Numerical resource allocation
2/8/2016
24
Psy606:HumanProblemSolvingPurdue
University
25. OVER ALL CONCLUSION
How to solve:
Monte Carlo, Markov chain, Particle filter
Use Gittens index
Paper
focuses on human performance and not optimal
solution, does not use Gittens index
2/8/2016
25
Psy606:HumanProblemSolvingPurdue
University
27. REFERENCES?
Robbins, H. (1952). "Some aspects of the sequential design of
experiments". Bulletin of the American Mathematical Society 58 (5):
527–535
Berry, Donald A. and Fristedt, Bert (1985. viii+275). Bandit problems:
Sequential allocation of experiments. Monographs on Statistics and
Applied Probability. London: Chapman & Hall. ISBN 0-412-24810-7.
Gittins, J.C. (1989). Multi-armed bandit allocation indices. Wiley-
Interscience Series in Systems and Optimization.. Chichester: John
Wiley & Sons, Ltd.. ISBN 0-471-92059-2.
Doucet, A.; De Freitas, N.; Gordon, N.J. (2001). Sequential Monte
Carlo Methods in Practice. Springer.
2/8/2016
27
Psy606:HumanProblemSolvingPurdue
University
28. QUESTIONS?
That’s All Folks!
How do we make Money?
If we understand this model well ,Vegas is waiting!
2/8/2016
28
Psy606:HumanProblemSolvingPurdue
University