SlideShare una empresa de Scribd logo
1 de 26
METAGAMING:
Bandits with simple regret and small
budget
Chen-Wei Chou, Ping-Chiang Chou,
Chang-Shing Lee, David Lupien St-Pierre,
Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu
and Shi-Jim Yen
Outline:
- what is a bandit problem ?
- what is a strategic bandit problem ?
- is a strategic bandit different from a bandit ?
- algorithms
- results
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we collect
information
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we use
information for
the final choice
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we
explore
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we take no risk
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step (exploration):
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end (recommendation):
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Which kind of bandit ?
- in the bandit literature, options are
also termed “arms”
- here the criterion is the expected reward
of the option chosen at the end
(sometimes it is the sum
of the rewards during exploration)
- we presented here stochastic bandits
(a probability distribution
per option) ==> next slide is different
And adversarial bandit ?
A finite number of time steps
A (finite) number of options for player 1,
and a finite number of options for player 2.
An unknown probability distribution for each pair of options
At each time step:
- you choose one option for P1 and one option for P2
- you get a reward, distributed according to the
corresponding proba distribution
At the end:
- you choose one **probabilistic** option for P1
(you can not change anymore...)
- your reward is the expected reward associated to this option,
for the worst choice by P2
What is meta-gaming ?
What is “strategic choice” ?
Strategic choices:
- decisions once and for all, at a high level
- ≠ from tactical level
Meta-gaming: choice at a strategic level, in games:
- choosings cards, in card games
- choosing handicap positioning, in Go
==> once and for all, at the beginning of the game
Example of stochastic bandit
(i.e. 1P strategic choice)
Game of Go handicap bandit problem, at each time step:
- you choose one handicap positioning
- then you simulate one game from this position
==> only one player has a strategic choice
==> stochastic bandit
Example of adversarial bandit
(i.e. 2P strategic choice)
Urban Rivals bandit problem, at each time step:
- you choose
- one set of cards for you (P1)
- one set of cards for P2
- then you simulate one Urban Rivals game from this position
PLAYER 1:
PLAYER 2:
==> two players have a strategic choice
==> adversarial bandit
Is a strategic bandit problem
different from
a classical bandit problem ?
No difference in nature
Just a much
smaller budget
Algorithms
Reminder:
- two algorithms needed:
- one for choosing during N exploration steps
- one for choosing during 1 recommendation step
- two settings
- one-player case
- two-player case
Algorithms for exploration
Uniform: test all options uniformly
Bernstein races:
- uniformly among non discarded options,
- discard options with statistical tests
Successive reject:
- uniformly among non discarded options,
- discard periodically the worst option
UCB: choose option with best average result + bonus
for options weakly sampled,
Adaptive-UCB-E: a variant of UCB aimed at removing
hyper-parameters
EXP3: empirically best option + random perturbation
Algorithms for recommendation
Empirically Best Arm: choose empirically best option
Most Played Arm: choose most simulated option
Successive reject:: the only non discarded option
UCB: choose option with best average result + bonus
for options weakly sampled.
LCB: choose option with best average result + malus for
options weakly sampled.
Empirical distribution of play: an option has its
frequency (during exploration) as probability (for
recommendation)
TEXP3: idem, but discard low probability options
Experimental results
Big boring tables of results
are in the paper.
Only a sample of most clear
results here.
One player case
Killall Go stones positionning
One player case
Killall Go stones positionning
Uncertainty
should
have
malus in
recommend.
One player case
Killall Go stones positionning
EXP3 for
2player
case
Experimental results: TEXP3
outperforms EXP3 by far
2-player case, game =
Urban Rivals (free online card game)
Do you know killall-Go ?
Black has stones in advance (e.g. 8 in 13x13).
If white makes life, white wins.
If black kills everything, black wins.
Black choose stones
positioning
(strategic decisions).
Left: human is Black and chooses E3 C4.
Right: computer is Black and chooses D3 D5.
White won both.
Human said that the computer choice D3 D5 is good.
Killall Go, H8 (left) H9 (right)
Left: Human Pro Player (5P) as black has 8 handicap stones.
White (computer) makes life and wins.
Right: Human Pro Player (5P) as black has 9 handicap stones
and kills everything and wins.
CONCLUSIONS
1 player case:
UCB for exploration,
LCB or MPA for recommendation
2 player case:
TEXP3 performs best.
Killall-Go
Win against pro with H2 in 7x7 Killall-Go as white.
Loss against pro with H2 in 7x7 Killall-Go as black.
13x13: Computer won as white with H8, lost with H9.
13x13: Computer lost as black with H8 and with H9.
Further work:
Structured bandit: some options are close to each other.
Batoo: Go with strategic choice for both players; nice test case.
Industry: choosing investments for power grid simulations – in progress.

Más contenido relacionado

Destacado

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
 
Games with partial information
Games with partial informationGames with partial information
Games with partial informationOlivier Teytaud
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...Olivier Teytaud
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new toolsOlivier Teytaud
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind GoOlivier Teytaud
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)Olivier Teytaud
 

Destacado (11)

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary Algorithms
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Grenoble
GrenobleGrenoble
Grenoble
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
 

Similar a Choosing between several options in uncertain environments

Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docxsnehil35
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theorytutor2u
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxDrNavaneethaKumar
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdfDebadattaPanda4
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game TheoryPaul Trafford
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptSanGeet25
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionJohn Bradford
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAsst.prof M.Gokilavani
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and youtrixobird
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...DevGAMM Conference
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxVivekSaurabh7
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentDrDeeptiSharma12
 

Similar a Choosing between several options in uncertain environments (20)

file1
file1file1
file1
 
Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docx
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theory
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptx
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdf
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game Theory
 
game THEORY ppt
game THEORY pptgame THEORY ppt
game THEORY ppt
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 
cai
caicai
cai
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective Action
 
adversial search.pptx
adversial search.pptxadversial search.pptx
adversial search.pptx
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and you
 
Adversarial search
Adversarial search Adversarial search
Adversarial search
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptx
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics content
 
AI_unit3.pptx
AI_unit3.pptxAI_unit3.pptx
AI_unit3.pptx
 

Último

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Último (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

Choosing between several options in uncertain environments

  • 1. METAGAMING: Bandits with simple regret and small budget Chen-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, David Lupien St-Pierre, Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu and Shi-Jim Yen
  • 2. Outline: - what is a bandit problem ? - what is a strategic bandit problem ? - is a strategic bandit different from a bandit ? - algorithms - results
  • 3. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 4. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we collect information
  • 5. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we use information for the final choice
  • 6. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we explore
  • 7. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we take no risk
  • 8. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step (exploration): - you choose one option - you get a reward, distributed according to its proba distribution At the end (recommendation): - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 9. Which kind of bandit ? - in the bandit literature, options are also termed “arms” - here the criterion is the expected reward of the option chosen at the end (sometimes it is the sum of the rewards during exploration) - we presented here stochastic bandits (a probability distribution per option) ==> next slide is different
  • 10. And adversarial bandit ? A finite number of time steps A (finite) number of options for player 1, and a finite number of options for player 2. An unknown probability distribution for each pair of options At each time step: - you choose one option for P1 and one option for P2 - you get a reward, distributed according to the corresponding proba distribution At the end: - you choose one **probabilistic** option for P1 (you can not change anymore...) - your reward is the expected reward associated to this option, for the worst choice by P2
  • 11. What is meta-gaming ? What is “strategic choice” ? Strategic choices: - decisions once and for all, at a high level - ≠ from tactical level Meta-gaming: choice at a strategic level, in games: - choosings cards, in card games - choosing handicap positioning, in Go ==> once and for all, at the beginning of the game
  • 12. Example of stochastic bandit (i.e. 1P strategic choice) Game of Go handicap bandit problem, at each time step: - you choose one handicap positioning - then you simulate one game from this position ==> only one player has a strategic choice ==> stochastic bandit
  • 13. Example of adversarial bandit (i.e. 2P strategic choice) Urban Rivals bandit problem, at each time step: - you choose - one set of cards for you (P1) - one set of cards for P2 - then you simulate one Urban Rivals game from this position PLAYER 1: PLAYER 2: ==> two players have a strategic choice ==> adversarial bandit
  • 14. Is a strategic bandit problem different from a classical bandit problem ? No difference in nature Just a much smaller budget
  • 15. Algorithms Reminder: - two algorithms needed: - one for choosing during N exploration steps - one for choosing during 1 recommendation step - two settings - one-player case - two-player case
  • 16. Algorithms for exploration Uniform: test all options uniformly Bernstein races: - uniformly among non discarded options, - discard options with statistical tests Successive reject: - uniformly among non discarded options, - discard periodically the worst option UCB: choose option with best average result + bonus for options weakly sampled, Adaptive-UCB-E: a variant of UCB aimed at removing hyper-parameters EXP3: empirically best option + random perturbation
  • 17. Algorithms for recommendation Empirically Best Arm: choose empirically best option Most Played Arm: choose most simulated option Successive reject:: the only non discarded option UCB: choose option with best average result + bonus for options weakly sampled. LCB: choose option with best average result + malus for options weakly sampled. Empirical distribution of play: an option has its frequency (during exploration) as probability (for recommendation) TEXP3: idem, but discard low probability options
  • 18. Experimental results Big boring tables of results are in the paper. Only a sample of most clear results here.
  • 19. One player case Killall Go stones positionning
  • 20. One player case Killall Go stones positionning Uncertainty should have malus in recommend.
  • 21. One player case Killall Go stones positionning EXP3 for 2player case
  • 22. Experimental results: TEXP3 outperforms EXP3 by far 2-player case, game = Urban Rivals (free online card game)
  • 23. Do you know killall-Go ? Black has stones in advance (e.g. 8 in 13x13). If white makes life, white wins. If black kills everything, black wins. Black choose stones positioning (strategic decisions).
  • 24. Left: human is Black and chooses E3 C4. Right: computer is Black and chooses D3 D5. White won both. Human said that the computer choice D3 D5 is good.
  • 25. Killall Go, H8 (left) H9 (right) Left: Human Pro Player (5P) as black has 8 handicap stones. White (computer) makes life and wins. Right: Human Pro Player (5P) as black has 9 handicap stones and kills everything and wins.
  • 26. CONCLUSIONS 1 player case: UCB for exploration, LCB or MPA for recommendation 2 player case: TEXP3 performs best. Killall-Go Win against pro with H2 in 7x7 Killall-Go as white. Loss against pro with H2 in 7x7 Killall-Go as black. 13x13: Computer won as white with H8, lost with H9. 13x13: Computer lost as black with H8 and with H9. Further work: Structured bandit: some options are close to each other. Batoo: Go with strategic choice for both players; nice test case. Industry: choosing investments for power grid simulations – in progress.