SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
MULTI-ARMED BANDITS
JANANI SRIRAM 

MAD STREET DEN
FOR FUN AND PROFIT
MULTI-ARMED BANDITS
OVERVIEW
▸ Overview - Background, Introduction and Formulation
▸ Optimality - Gittin’s Index
▸ Optimization Strategies
▸ Epsilon greedy
▸ Upper Confidence Bound
▸ Boltzmann Exploration
▸ Bayesian Bandits
OVERVIEW
EXPLORATION VS. EXPLOITATION
▸ Tradeoff between the necessity to try out all arms
and to minimize the total regret suffered due to
sub-optimal arms.
▸ The agent can gain knowledge about the
environment only by pulling an arm.
▸ But by pulling a bad arm it suffers some regret.
▸ If an algorithms explores forever or exploits forever
it will have linear total regret
▸ Usage Scenarios
▸ Clinical Trials
▸ A/B Testing online ads*
▸ Restaurant selection
▸ Feynman’s restaurant problem
V ⇤
(s) = R(s) + maxa
P
s0 P(s0
|s, a)V ⇤
(s0
)
https://support.google.com/analytics/answer/2844870?hl=en
OVERVIEW
MARKOV DECISION PROCESSES
▸ Sequential decision making process with stationary
markov property s.t.
▸ States
▸ Transition model
▸ Reward
▸ Actions
▸ Discount factor
S = {s1, s2...sn}
(S, A, Pr, R, )
* Introduction to Reinforcement Learning Sutton and Barto [1998]
A = {a1, a2...an}
Pa
ss0 = P(St+1 = s0 | (St = s, At = a))
Ra
s = E(Rt+1 | (St = s, At = a))
2 [0, 1]
OVERVIEW
WHAT ARE BANDITS?
▸ Originally described by Robbins [1952]
▸ A gambler is faced with K slot
machines each with an unknown
distribution of rewards. The goal is to
maximize cumulative rewards over a
finite number of trials (horizon T).
▸ A Bernoulli bandit is a special case of
MAB that has a Bernoulli distributed
reward.
▸ Stochastic MABs - Each arm k is associated with an unknown probability. Rewards
are drawn i.i.d from
▸ Adversarial Bandits - Rewards are generated by an adversary.
vk 2 [0, 1]
OVERVIEW
BACKGROUND
▸ Notation
▸ Goal: To maximize total reward
▸ Or minimize total expected regret (optimal - obtained
reward)
▸ Lai and Robbins [1985] showed that optimal regret is
t = {1, 2, ...T}Trials: Choice: ti 2 {1, 2, ...K}
Reward: for chosen arm i at trial trit 2 R
TP
t=1
rit
i⇤
= arg maxi=1,...,K µi µ⇤
= maxi=1,...,K µi i = µ⇤
µi
Regret: is no. of times arm j is selectedTj(T)
O(log T)
Tµ⇤
TP
t=1
E[µit
] =
KP
j=1
jE[Tj(T)]
STRATEGIES
EPSILON GREEDY
▸ Select initial empirical means for each arm i,
▸ A time t, with probability play the arm with highest
empirical mean and with probability , play a random arm
ˆµi(0)
1 ✏t
✏t
BOLTZMANN EXPLORATION
pk = e
ˆµi(t)
⌧
kP
j=1
e
ˆµi(t)
⌧
, i = 1, ...n
▸ At trial t, arm k is selected with probability given by Gibb’s
distribution
is a temperature parameter controlling the
randomness of the choice
⌧
STRATEGIES
UPPER CONFIDENCE BOUND
▸ ‘Optimism in the face of uncertainty’.
▸ Chernoff-Hoeffding bound on deviation from mean
▸ Algorithm:
▸ Setup: Select empirical mean payoffs for each arm i,
▸ For each round pick arm with probability,.
▸ Optimal lower bound on regret
*Using Confidence Bounds for Exploitation-Exploration Trade-offs Auer, Cesa-Bianchi & Fisher [2002]
ˆµi
P(Y + a + µ)  e 2na2
j(t) = arg maxi( ˆµi +
q
2 ln t
ni
)
O(log n)
(Knowledge) (Uncertainty)
STRATEGIES
BAYESIAN BANDITS
▸ Assume a prior distribution on parameters
▸ The likelihood of reward is given by
▸ Sample from the posterior distribution and update priors
▸ For bandits with Bernoulli rewards start with standard conjugate
prior - Beta distribution. The posterior is also a Beta distribution.
P(r | a, ✓)
P(✓)
red : ↵ = 2, = 2
green : ↵ = 12, = 12
blue : ↵ = 102, = 102
f(x; ↵, ) = (↵+ )
(↵) ( ) x↵ 1
(1 x) 1
pdf of a Beta distribution with parameters ↵ > 0, > 0
STRATEGIES
GITTIN’S INDEX (INFORMATION STATE SEARCH)
▸ Goal: to maximize the total expected discounted reward
▸ Reduces to solving the stopping problem
▸ Bayesian adaptive MDP: Assume prior on reward distribution and
geometric discounting. Each state transition is a Bayes model update.
For Bernoulli bandits this means Beta prior.
▸ Optimal policy: Select arm that maximizes Gittin’s dynamic allocation
Index which is a a normalized sum of time discounted reward.
▸ For arm i,
⇡(r|↵, ) = r↵ 1
(1 r) 1
B(↵, )
where B is the Beta function
vi = max
⌧>0
E(
1P
t=0
t
rit(xit))
E[
1P
t=0
t]
Reward discount parameter
⌧ Stopping time
STRATEGIES
THOMSON SAMPLING (PROBABILITY MATCHING)
▸ Start with a prior belief on parameters of the distribution
▸ Play arm according to probability that it is optimal
▸ After every trial, observe a reward and do a Bayesian update
▸ Shown to have logarithmic expected regret [Agrawal 2012]
at = arg maxa E(r | a, ✓t
)
STRATEGIES
THOMSON SAMPLING
▸ Simulation from http://bit.ly/2fqR57P
CONCLUSION
REFERENCES
▸ D. Berry and B. Fristedt. Bandit problems. Chapman and Hall, 1985
▸ J Gittins. Multi-armed bandit allocation indices. Wiley, 1989
▸ Lai and Robbins. Asymptotically Efficient Adaptive Allocation Rules
▸ Shipra Agrawal and Navin Goyal. Analysis of Thompson Sampling for
the Multi-armed Bandit Problem.
▸ Volodymyr Kuleshov, Doina Precup. Algorithms for the multi-armed
bandit problem.
▸ Finite-time analysis of the multi-armed bandit problem. Auer, P., Cesa-
Bianchi, N., and Fischer, P.
THANK YOU

Más contenido relacionado

Último

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Destacado

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Destacado (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Multi-armed bandits for fun and profit

  • 1. MULTI-ARMED BANDITS JANANI SRIRAM 
 MAD STREET DEN FOR FUN AND PROFIT
  • 2. MULTI-ARMED BANDITS OVERVIEW ▸ Overview - Background, Introduction and Formulation ▸ Optimality - Gittin’s Index ▸ Optimization Strategies ▸ Epsilon greedy ▸ Upper Confidence Bound ▸ Boltzmann Exploration ▸ Bayesian Bandits
  • 3. OVERVIEW EXPLORATION VS. EXPLOITATION ▸ Tradeoff between the necessity to try out all arms and to minimize the total regret suffered due to sub-optimal arms. ▸ The agent can gain knowledge about the environment only by pulling an arm. ▸ But by pulling a bad arm it suffers some regret. ▸ If an algorithms explores forever or exploits forever it will have linear total regret ▸ Usage Scenarios ▸ Clinical Trials ▸ A/B Testing online ads* ▸ Restaurant selection ▸ Feynman’s restaurant problem V ⇤ (s) = R(s) + maxa P s0 P(s0 |s, a)V ⇤ (s0 ) https://support.google.com/analytics/answer/2844870?hl=en
  • 4. OVERVIEW MARKOV DECISION PROCESSES ▸ Sequential decision making process with stationary markov property s.t. ▸ States ▸ Transition model ▸ Reward ▸ Actions ▸ Discount factor S = {s1, s2...sn} (S, A, Pr, R, ) * Introduction to Reinforcement Learning Sutton and Barto [1998] A = {a1, a2...an} Pa ss0 = P(St+1 = s0 | (St = s, At = a)) Ra s = E(Rt+1 | (St = s, At = a)) 2 [0, 1]
  • 5. OVERVIEW WHAT ARE BANDITS? ▸ Originally described by Robbins [1952] ▸ A gambler is faced with K slot machines each with an unknown distribution of rewards. The goal is to maximize cumulative rewards over a finite number of trials (horizon T). ▸ A Bernoulli bandit is a special case of MAB that has a Bernoulli distributed reward. ▸ Stochastic MABs - Each arm k is associated with an unknown probability. Rewards are drawn i.i.d from ▸ Adversarial Bandits - Rewards are generated by an adversary. vk 2 [0, 1]
  • 6. OVERVIEW BACKGROUND ▸ Notation ▸ Goal: To maximize total reward ▸ Or minimize total expected regret (optimal - obtained reward) ▸ Lai and Robbins [1985] showed that optimal regret is t = {1, 2, ...T}Trials: Choice: ti 2 {1, 2, ...K} Reward: for chosen arm i at trial trit 2 R TP t=1 rit i⇤ = arg maxi=1,...,K µi µ⇤ = maxi=1,...,K µi i = µ⇤ µi Regret: is no. of times arm j is selectedTj(T) O(log T) Tµ⇤ TP t=1 E[µit ] = KP j=1 jE[Tj(T)]
  • 7. STRATEGIES EPSILON GREEDY ▸ Select initial empirical means for each arm i, ▸ A time t, with probability play the arm with highest empirical mean and with probability , play a random arm ˆµi(0) 1 ✏t ✏t BOLTZMANN EXPLORATION pk = e ˆµi(t) ⌧ kP j=1 e ˆµi(t) ⌧ , i = 1, ...n ▸ At trial t, arm k is selected with probability given by Gibb’s distribution is a temperature parameter controlling the randomness of the choice ⌧
  • 8. STRATEGIES UPPER CONFIDENCE BOUND ▸ ‘Optimism in the face of uncertainty’. ▸ Chernoff-Hoeffding bound on deviation from mean ▸ Algorithm: ▸ Setup: Select empirical mean payoffs for each arm i, ▸ For each round pick arm with probability,. ▸ Optimal lower bound on regret *Using Confidence Bounds for Exploitation-Exploration Trade-offs Auer, Cesa-Bianchi & Fisher [2002] ˆµi P(Y + a + µ)  e 2na2 j(t) = arg maxi( ˆµi + q 2 ln t ni ) O(log n) (Knowledge) (Uncertainty)
  • 9. STRATEGIES BAYESIAN BANDITS ▸ Assume a prior distribution on parameters ▸ The likelihood of reward is given by ▸ Sample from the posterior distribution and update priors ▸ For bandits with Bernoulli rewards start with standard conjugate prior - Beta distribution. The posterior is also a Beta distribution. P(r | a, ✓) P(✓) red : ↵ = 2, = 2 green : ↵ = 12, = 12 blue : ↵ = 102, = 102 f(x; ↵, ) = (↵+ ) (↵) ( ) x↵ 1 (1 x) 1 pdf of a Beta distribution with parameters ↵ > 0, > 0
  • 10. STRATEGIES GITTIN’S INDEX (INFORMATION STATE SEARCH) ▸ Goal: to maximize the total expected discounted reward ▸ Reduces to solving the stopping problem ▸ Bayesian adaptive MDP: Assume prior on reward distribution and geometric discounting. Each state transition is a Bayes model update. For Bernoulli bandits this means Beta prior. ▸ Optimal policy: Select arm that maximizes Gittin’s dynamic allocation Index which is a a normalized sum of time discounted reward. ▸ For arm i, ⇡(r|↵, ) = r↵ 1 (1 r) 1 B(↵, ) where B is the Beta function vi = max ⌧>0 E( 1P t=0 t rit(xit)) E[ 1P t=0 t] Reward discount parameter ⌧ Stopping time
  • 11. STRATEGIES THOMSON SAMPLING (PROBABILITY MATCHING) ▸ Start with a prior belief on parameters of the distribution ▸ Play arm according to probability that it is optimal ▸ After every trial, observe a reward and do a Bayesian update ▸ Shown to have logarithmic expected regret [Agrawal 2012] at = arg maxa E(r | a, ✓t )
  • 12. STRATEGIES THOMSON SAMPLING ▸ Simulation from http://bit.ly/2fqR57P
  • 13. CONCLUSION REFERENCES ▸ D. Berry and B. Fristedt. Bandit problems. Chapman and Hall, 1985 ▸ J Gittins. Multi-armed bandit allocation indices. Wiley, 1989 ▸ Lai and Robbins. Asymptotically Efficient Adaptive Allocation Rules ▸ Shipra Agrawal and Navin Goyal. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. ▸ Volodymyr Kuleshov, Doina Precup. Algorithms for the multi-armed bandit problem. ▸ Finite-time analysis of the multi-armed bandit problem. Auer, P., Cesa- Bianchi, N., and Fischer, P.