SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
1/29
Multi Armed Bandits
Alireza Shafaei
Machine Learning Reading Group
The University of British Columbia
Summer 2017
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
2/29
Outline
A Quick Review
Online Convex Optimization (OCO)
Measuring The Performance
Bandit Convex Optimization
Motivation
Multi Armed Bandit
A Simple MAB Algorithm
EXP3
Stochastic Multi Armed Bandit
Definition
Bernoulli Multi Armed Bandit
Algorithms
Contextual Bandits
Motivation
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
4/29
Online Convex Optimization
Definition
At each iteration t, the player chooses xt ∈ K.
A convex loss function ft ∈ F : K → R is revealed.
A cost ft(xt) is incurred.
→ F is a set of bounded functions.
→ ft is revealed after choosing xt.
→ ft can be adversarially chosen.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
6/29
Online Convex Optimization
Regret
Given an algorithm A.
Regret of A after T iterations is defined as:
regretT (A) := sup
{fi }T
i=1∈F
{
T
t=1
ft(xt) − min
x∈K
T
t=1
ft(x)}
Online Gradient Descent O(GD
√
T).
If fi is α-strongly convex then O(G2
2α (1 + log(T)).
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
7/29
Bandit Convex Optimization
Definition1
In OCO we had access to ft(xt).
in BCO we only observe ft(xt).
Multi Armed Bandit further constrains the BCO setting.
1
Material from [1].
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
9/29
Bandit Convex Optimization
Motivation
In data networks, the decision maker can measure the
RTD of a packet, but rarely has access to the
congestion pattern of the entire network.
In ad-placement, the search engine can inspect which
ads were clicked through, but cannot know whether
different ads, had they been chosen, would have been
click through or not.
Given a fixed budget, how to allocate resources among
the research projects whose outcome is only partially
known at the time of allocation and may change
through time.
Wikipedia. Originally considered by Allied scientists in
World War II, it proved so intractable that, according to
Peter Whittle, the problem was proposed to be dropped
over Germany so that German scientists could also
waste their time on it.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
11/29
Multi Armed Bandit (MAB)
Definition
On each iteration t, the player choses an action it from
a predefined set of discrete actions {1, . . . , n}.
An adversary, independently, chooses a loss ∈ [0, 1] for
each action.
The loss associated with it is then revealed to the
player.
There are a variety of MAB specifications with various
assumptions and constraints.
This definition is similar to the multi-expert problem,
except we do not observe the loss associated with the
other experts.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
12/29
MAB as a BCO
The algorithms usually choose an action w.r.t a
distribution over the actions.
If we define K = ∆n, an n-dimensional simplex then
ft(x) = t x =
n
i=1
t(i)x(i) ∀x ∈ K
We have an exploration-exploitation trade-off.
A simple approach would be to
→ Exploration With some probability, explore by choosing
actions uniformly at random. Construct an estimate of
the actions’ losses with the feedback.
→ Exploitation Otherwise, use the estimates to make a
decision.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
14/29
A Simple MAB Algorithm
Definition
An algorithm can be constructed:
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
15/29
A Simple MAB Algorithm
Analysis
This algorithm guarantees:
E[
T
t=1
t(it) − min
i
T
t=1
t(i)] ≤ O(T
3
4
√
n)
E[ˆt(i)] = P[bt = 1] · P[it = i|bt = 1] ·
n
δ
t(i) = t(i)
ˆt 2 ≤
n
δ
· | t(it)| ≤
n
δ
E[ˆft] = ft
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
16/29
A Simple MAB Algorithm
Analysis
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
18/29
EXP3
Has a worst-case near-optimal regret bound of
O(
√
Tn log n).
See P104 of the OCO book for proof.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
20/29
Stochastic Multi Armed Bandit
Definition
On each iteration t, the player choses an action it from
a predefined set of discrete actions {1, . . . , n}.
Each action i has an underlying (fixed) probability
distribution Pi with mean µi .
The loss associated with it is then revealed to the
player. (A sample is taken from Pit ).
Pi ’s could be a simple Bernoulli variable.
A more complex version could assume a Markov process
for each action, within which the state of one or all
processes change after each iteration.
We still have the exploration-exploitation trade-off.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
22/29
Bernoulli Multi Armed Bandit
Definition2
N[a] The number of times arm a is pulled.
Q[a] The running average of rewards for arm a.
S[a] The number of successes for arm a.
F[a] The number of failures for arm a.
The same notion of regret, except the optimal strategy
is to pull the arm with the largest mean, µ∗.
2
Material from [2].
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
24/29
Algorithms
Random Selection ⇒ O(T).
Greedy Selection ⇒ O(T).
-Greedy Selection ⇒ O(T).
Boltzmann Exploration ⇒ O(T).
Upper-Confidence Bound ⇒ O(ln(T)).
Thompson Sampling ⇒ O(ln(T)).
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
25/29
Algorithms
Empirical Evaluation
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
27/29
Contextual Bandits
Motivation
There are scenarios within which we can have access to
more information.
The extra information can be encoded as a context
vector.
In online advertising, the behaviour of each user, or the
search context for instance, can provide valuable
information.
One simple way is to treat each context having its own
bandit problem.
Variations of the previous algorithms relate the context
vector with the expected reward through linear models,
neural networks, kernels, or random forests.
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
28/29
Thanks
Thanks!
Questions?
Multi Armed
Bandits
Alireza Shafaei
A Quick Review
Online Convex
Optimization (OCO)
Measuring The
Performance
Bandit Convex
Optimization
Motivation
Multi Armed Bandit
A Simple MAB
Algorithm
EXP3
Stochastic Multi
Armed Bandit
Definition
Bernoulli Multi Armed
Bandit
Algorithms
Contextual Bandits
Motivation
29/29
References I
E. Hazan.
Introduction to Online Convex Optimization.
http://ocobook.cs.princeton.edu/OCObook.pdf
S. Raja.
Multi Armed Bandits and Exploration Strategies.
https://sudeepraja.github.io/Bandits/

Más contenido relacionado

Similar a Multi Armed Bandits

Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveGabriele Sottocornola
 
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...Codemotion
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemsfuchaoqun
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsEmiliano De Cristofaro
 
Two Step Gravitational Search Algorithm Essay
Two Step Gravitational Search Algorithm EssayTwo Step Gravitational Search Algorithm Essay
Two Step Gravitational Search Algorithm EssayKristen Wilson
 
Using Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value TestingUsing Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value TestingFelix Dobslaw
 
Nt1330 Week 2 Agression Analysis Paper
Nt1330 Week 2 Agression Analysis PaperNt1330 Week 2 Agression Analysis Paper
Nt1330 Week 2 Agression Analysis PaperAshley Richards
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryKwanghee Choi
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
 
SIGEVOlution Summer 2007
SIGEVOlution Summer 2007SIGEVOlution Summer 2007
SIGEVOlution Summer 2007Pier Luca Lanzi
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsSunghwan Kim
 
Laplace Daemon: from a math theory to AI practice
Laplace Daemon: from a math theory to AI practiceLaplace Daemon: from a math theory to AI practice
Laplace Daemon: from a math theory to AI practiceEric Horesnyi
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAPUniversity of Bologna
 
Neural Networks Are Used For Forecasting
Neural Networks Are Used For ForecastingNeural Networks Are Used For Forecasting
Neural Networks Are Used For ForecastingStephanie Roberts
 
Clonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressClonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressAyi Purbasari
 
Essay On Biomechanics Of Running
Essay On Biomechanics Of RunningEssay On Biomechanics Of Running
Essay On Biomechanics Of RunningKrystal Ellison
 

Similar a Multi Armed Bandits (20)

Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...
Claudia Vicol - Solving the multi armed bandit problem - Codemotion Amsterdam...
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Ag35183189
Ag35183189Ag35183189
Ag35183189
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility Analytics
 
Two Step Gravitational Search Algorithm Essay
Two Step Gravitational Search Algorithm EssayTwo Step Gravitational Search Algorithm Essay
Two Step Gravitational Search Algorithm Essay
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Using Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value TestingUsing Diversity for Automated Boundary Value Testing
Using Diversity for Automated Boundary Value Testing
 
Nt1330 Week 2 Agression Analysis Paper
Nt1330 Week 2 Agression Analysis PaperNt1330 Week 2 Agression Analysis Paper
Nt1330 Week 2 Agression Analysis Paper
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summary
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
J011119198
J011119198J011119198
J011119198
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
SIGEVOlution Summer 2007
SIGEVOlution Summer 2007SIGEVOlution Summer 2007
SIGEVOlution Summer 2007
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
 
Laplace Daemon: from a math theory to AI practice
Laplace Daemon: from a math theory to AI practiceLaplace Daemon: from a math theory to AI practice
Laplace Daemon: from a math theory to AI practice
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP
 
Neural Networks Are Used For Forecasting
Neural Networks Are Used For ForecastingNeural Networks Are Used For Forecasting
Neural Networks Are Used For Forecasting
 
Clonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressClonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpress
 
Essay On Biomechanics Of Running
Essay On Biomechanics Of RunningEssay On Biomechanics Of Running
Essay On Biomechanics Of Running
 

Último

CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPirithiRaju
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptxVaishnaviAware
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersAndreaLucarelli
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)GRAPE
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfNetHelix
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestAkashDTejwani
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxUalikhanKalkhojayev1
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function. MUKTA MANJARI SAHOO
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 

Último (20)

CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptx
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and Engineers
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdf
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening Test
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptx
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function.
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 

Multi Armed Bandits

  • 1. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 1/29 Multi Armed Bandits Alireza Shafaei Machine Learning Reading Group The University of British Columbia Summer 2017
  • 2. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 2/29 Outline A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation
  • 3. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 4/29 Online Convex Optimization Definition At each iteration t, the player chooses xt ∈ K. A convex loss function ft ∈ F : K → R is revealed. A cost ft(xt) is incurred. → F is a set of bounded functions. → ft is revealed after choosing xt. → ft can be adversarially chosen.
  • 4. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 6/29 Online Convex Optimization Regret Given an algorithm A. Regret of A after T iterations is defined as: regretT (A) := sup {fi }T i=1∈F { T t=1 ft(xt) − min x∈K T t=1 ft(x)} Online Gradient Descent O(GD √ T). If fi is α-strongly convex then O(G2 2α (1 + log(T)).
  • 5. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 7/29 Bandit Convex Optimization Definition1 In OCO we had access to ft(xt). in BCO we only observe ft(xt). Multi Armed Bandit further constrains the BCO setting. 1 Material from [1].
  • 6. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 9/29 Bandit Convex Optimization Motivation In data networks, the decision maker can measure the RTD of a packet, but rarely has access to the congestion pattern of the entire network. In ad-placement, the search engine can inspect which ads were clicked through, but cannot know whether different ads, had they been chosen, would have been click through or not. Given a fixed budget, how to allocate resources among the research projects whose outcome is only partially known at the time of allocation and may change through time. Wikipedia. Originally considered by Allied scientists in World War II, it proved so intractable that, according to Peter Whittle, the problem was proposed to be dropped over Germany so that German scientists could also waste their time on it.
  • 7. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 11/29 Multi Armed Bandit (MAB) Definition On each iteration t, the player choses an action it from a predefined set of discrete actions {1, . . . , n}. An adversary, independently, chooses a loss ∈ [0, 1] for each action. The loss associated with it is then revealed to the player. There are a variety of MAB specifications with various assumptions and constraints. This definition is similar to the multi-expert problem, except we do not observe the loss associated with the other experts.
  • 8. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 12/29 MAB as a BCO The algorithms usually choose an action w.r.t a distribution over the actions. If we define K = ∆n, an n-dimensional simplex then ft(x) = t x = n i=1 t(i)x(i) ∀x ∈ K We have an exploration-exploitation trade-off. A simple approach would be to → Exploration With some probability, explore by choosing actions uniformly at random. Construct an estimate of the actions’ losses with the feedback. → Exploitation Otherwise, use the estimates to make a decision.
  • 9. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 14/29 A Simple MAB Algorithm Definition An algorithm can be constructed:
  • 10. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 15/29 A Simple MAB Algorithm Analysis This algorithm guarantees: E[ T t=1 t(it) − min i T t=1 t(i)] ≤ O(T 3 4 √ n) E[ˆt(i)] = P[bt = 1] · P[it = i|bt = 1] · n δ t(i) = t(i) ˆt 2 ≤ n δ · | t(it)| ≤ n δ E[ˆft] = ft
  • 11. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 16/29 A Simple MAB Algorithm Analysis
  • 12. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 18/29 EXP3 Has a worst-case near-optimal regret bound of O( √ Tn log n). See P104 of the OCO book for proof.
  • 13. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 20/29 Stochastic Multi Armed Bandit Definition On each iteration t, the player choses an action it from a predefined set of discrete actions {1, . . . , n}. Each action i has an underlying (fixed) probability distribution Pi with mean µi . The loss associated with it is then revealed to the player. (A sample is taken from Pit ). Pi ’s could be a simple Bernoulli variable. A more complex version could assume a Markov process for each action, within which the state of one or all processes change after each iteration. We still have the exploration-exploitation trade-off.
  • 14. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 22/29 Bernoulli Multi Armed Bandit Definition2 N[a] The number of times arm a is pulled. Q[a] The running average of rewards for arm a. S[a] The number of successes for arm a. F[a] The number of failures for arm a. The same notion of regret, except the optimal strategy is to pull the arm with the largest mean, µ∗. 2 Material from [2].
  • 15. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 24/29 Algorithms Random Selection ⇒ O(T). Greedy Selection ⇒ O(T). -Greedy Selection ⇒ O(T). Boltzmann Exploration ⇒ O(T). Upper-Confidence Bound ⇒ O(ln(T)). Thompson Sampling ⇒ O(ln(T)).
  • 16. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 25/29 Algorithms Empirical Evaluation
  • 17. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 27/29 Contextual Bandits Motivation There are scenarios within which we can have access to more information. The extra information can be encoded as a context vector. In online advertising, the behaviour of each user, or the search context for instance, can provide valuable information. One simple way is to treat each context having its own bandit problem. Variations of the previous algorithms relate the context vector with the expected reward through linear models, neural networks, kernels, or random forests.
  • 18. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 28/29 Thanks Thanks! Questions?
  • 19. Multi Armed Bandits Alireza Shafaei A Quick Review Online Convex Optimization (OCO) Measuring The Performance Bandit Convex Optimization Motivation Multi Armed Bandit A Simple MAB Algorithm EXP3 Stochastic Multi Armed Bandit Definition Bernoulli Multi Armed Bandit Algorithms Contextual Bandits Motivation 29/29 References I E. Hazan. Introduction to Online Convex Optimization. http://ocobook.cs.princeton.edu/OCObook.pdf S. Raja. Multi Armed Bandits and Exploration Strategies. https://sudeepraja.github.io/Bandits/