SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Lab Seminar: Contextual Bandit Survey
Sangwoo Mo
KAIST
swmo@kaist.ac.kr
August 4, 2016
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 1 / 32
Overview
1 Problem Setting
2 Na¨ıve Approach: Reduce to MAB
3 Stochastic Contextual Bandit
UCB & Thompson Sampling
Arbitrary Set of Policies
4 Adversarial Contextual Bandit
5 Supervised Learning to Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 2 / 32
Problem Setting
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 3 / 32
Multi-Armed Bandit
At each time t, the agent selects an arm at (at ∈ {1, ..., K})
Then, the agent recieves a reward rt(= rat ,t) from the enviroment
If ri,t is i.i.d. of some distribution, we call it stochastic bandit, and if
ri,t is selected by the enviroment, we call it adversarial bandit
The goal of MAB is to find the policy π ∈ Π s.t.
π(a1, r1, ...at−1, rt−1) = at
which minimizes the regret1
RT := max
i=1,...,K
E
T
t=1
ri,t −
T
t=1
rat ,t
1
Properly speaking, cumulative pseudo-regret.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 4 / 32
Contextual Bandit
In contextual bandit, the agent recieves an additional information
(=context) ct
1 ∈ C at the begining of time t
In stochastic contextual bandit, the reward ri,t can be represented as
a function of the context ci,t and noise i,t
ri,t = f (ci,t) + i,t
or simply ri,t = fi (ct) + i,t if ct is independent to i
In adversarial contextual bandit, the reward ri,t is selected by the
enviroment, as in the non-contextual MAB
1
Many literatures often notate ci,t to emphasize that each arm i has a corresponding context ci,t . However, both notations
are identical since we can construct a single vector ct by concatenating ci,t s.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 5 / 32
Optimal Regret Bound
Stochastic Bandit: Ω(log T)1
Adversarial Bandit: Ω(
√
KT)2
Contextual Bandit: Ω(d
√
T)3
1
Lai & Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985.
2
Auer et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS, 1995. By minmax strategy.
Note that adversarial bandit can be thought as a 2-player game by the agent and the enviroment.
3
Dani et al. Stochastic Linear Optimization under Bandit Feedback. COLT, 2012. Remark that the lower bound is Ω(
√
T)
even for the stochastic contextual bandit, since context may come in adversarially.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 6 / 32
Na¨ıve Approach: Reduce to MAB
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 7 / 32
Na¨ıve Approach: Reduce to MAB
Approach 1: assume the context set is finite (|C| = N)
Run MAB algorithm (ex. EXP3) for each context independently
The regret bound is O(
√
TNK log K)1 (w/ EXP3)
Approach 2: assume the policy space is finite (|H| = M)
Run MAB algorithm (ex. EXP3) on policies, instead of arms
The regret bound is O(
√
TM log M) (w/ EXP3)
1 N
c=1 O(nc
√
K log K) ≤ O(
√
TN
√
K log K) where nc is number of context c observed (by Cauchy-Schwarz inequality)
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 8 / 32
Stochastic Contextual Bandit
UCB & Thompson Sampling
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 9 / 32
Review: Index Policy and Greedy Algorithm
Since Gittins Index1, index policy became one of the most popular
strategy for MAB problems
Idea: for each time t, define a score si,t (=index) for each arm i.
Select an arm which has the highest score
Question: how to define proper si,t?
Na¨ıve approach: use empirical mean2! (greedy algorithm)
However, na¨ıve greedy algorithm may occur O(T) regret
1
Gittins. Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society, 1979.
2
Note that MAB becomes trivial if we know the true mean. The general goal of MAB algorithms is to estimate mean
correctly and rapidly (explore-exploit dilema)
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 10 / 32
Review: UCB1
Assume ri,t ∼ Pi with support [0, 1] and mean µi
Idea: select more seldom-selected arms and less often-selected arms.
In other words, give a confidence bonus1!
UCB12: define score as
si,t = ˆµi,t +
2 log t
ni,t
where ˆµi,t is empirical mean, and ni,t is number of arm i selected
UCB1 policy garantees the optimal regret O(log T)
Also, there are other choices for UCB (ex. KL-UCB3, Bayes-UCB4)
1
We call this bonus UCB(upper confidence bound). Thus, score = estimated mean + UCB.
2
Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002.
3
Garivier & Capp´e. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT, 2011.
4
Kaufmann et al. On Bayesian Upper Confidence Bounds for Bandit Problems. AISTATS, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 11 / 32
LinUCB
Assume ri,t ∼ P(ri,t | ci,t, θ) where E[ri,t] = cT
i,tθ∗ (ci,t, θ ∈ Rd )
Like UCB1, want to define score as
si,t = cT
i,t
ˆθt + UCBi,t
Question: how to choose proper UCBi,t?
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 12 / 32
LinUCB
Idea: let ˆθt be an estimator of θ∗ by ridge regression
ˆθt = (CT
t Ct + λId )−1
CT
t Rt
where Ct = {c1, ..., ct−1} and Rt = {r1, ..., rt−1}
Then, the inequality below holds with probability 1 − δ
T
cT
i,t
ˆθt − cT
i,tθ∗
≤ ( + 1) cT
i,tA−1
t ci,t
where At = CT
t Ct + Id and = 1
2 log 2TK
δ
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 13 / 32
LinUCB
LinUCB1: define score as
si,t = cT
i,t
ˆθt + α cT
i,tA−1
t ci,t
Regret bound (with probability 1 − δ) is
O(d T log
1 + T
δ
)
LinUCB policy garantees the optimal regret ˜O(d
√
T)
Also, there are other choices for UCB (ex. LinREL2, CoFineUCB3)
1
Li et al. A contextual-bandit approach to personalized news article recommendation. WWW, 2010.
2
Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. JMLR, 2002.
3
Yue et al. Hierarchical Exploration for Accelerating Contextual Bandits. ICML, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 14 / 32
Review: Thompson Sampling
Another popular strategy for MAB is Thompson Sampling1
It can be applied to both contextual and non-contextual bandit
Assume ri,t ∼ P(ri,t | ci,t, θ∗) with prior θ∗ ∼ P(θ)
Idea: sample estimator ˆθt from the posterior distribution
step 1. draw θt from posterior P(θ | D = {ct, at, rt})
step 2. select arm ai = arg maxi E[ri,t | ci,t, θt]
The idea is simple, but it works well both in theory2 and in practice3
1
Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples.
Biometrica, 1933.
2
Agrawal et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. COLT, 2012.
3
Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 2010.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 15 / 32
LinTS
Assume ri,t ∼ N(cT
i,tθ∗, v2) and θ∗ ∼ N(θt, v2B−1
t ) where
Bt =
t−1
τ=1
ci,τ cT
i,τ + Id , ˆθt = B−1
t
t−1
τ=1
ci,τ ri,τ
ri,t ∈ [¯ri,t − R, ¯ri,t + R], v = R
24
d log
t
δ
Then, the posterior of θ∗ is N(θt+1, v2B−1
t+1)
LinTS1: run Thompson Sampling in this assumption
Regret bound (with probability 1 − δ) is
O(
d2 √
T1+ log(Td) log
1
δ
)
1
Agrawal et al. Thompson Sampling for Contextual Bandits with Linear Payoffs. ICML, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 16 / 32
UCB & TS: Nonlinear Case
Assume E[ri,t] = f (ci,t) is general nonlinear function
If we assume f is a member of exponential family, we can use
GLM-UCB1
If we assume f is sampled from a Guassian Process, we can use
GP-UCB2/CGP-UCB3
If we assume f is an element of Reproducing Kernel Hilbert Space,
we can use KernelUCB4
Also, we can use Thompson Sampling if we know the form of
probability distribution
1
Filippi et al. Parametric Bandits: The Generalized Linear Case. NIPS, 2010.
2
Srinivas et al. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML, 2010.
3
Krause & Ong. Contextual Gaussian Process Bandit Optimization. NIPS, 2011.
4
Valko et al. Finite-Time Analysis of Kernelised Contextual Bandits. UAI, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 17 / 32
Stochastic Contextual Bandit
Arbitrary Set of Policies
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 18 / 32
Epoch-Greedy
Assume policy space H if finite1
Idea: explore T steps and exploit T − T steps (epsilon-first)
issue 1. how to get an unbiased estimator of the best policy?
issue 2. how to balance explore and exploit if we don’t know T?
trick 1: use D = {ct, at, rt} observed in explore step
ˆπ = max
π∈H
(ct ,at ,rt )∈D
raI(π(ct) = at)
1/K
trick 2: run epsilon-first in mini-batches (partition of T)
1
Infinite w/ finite VC-dimension can be derived in similar way
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 19 / 32
Epoch-Greedy
Epoch-Greedy1: combine trick 1 & trick 2
Regret bound is ˜O(T2/3) (not optimal!)
1
Langford & Zhang. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS, 2007.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 20 / 32
RandomizedUCB
Idea: estimate the distribution Pt over the policy space H
RandomizedUCB1:
Regret bound is ˜O(
√
T), but time complexity is O(T6)
1
Dudik et al. Efficient Optimal Learning for Contextual Bandits. UAI, 2011.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 21 / 32
ILOVECONBANDITS
Idea: similar to RandomizedUCB, improve time complexity
ILOVECONBANDITS1 (Importance-weighted LOw-Variance
Epoch-Timed Oracleized CONtextual BANDITS):
Regret bound is ˜O(
√
T), and time complexity is O(T1.5)
1
Agrawal et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. ICML, 2014.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 22 / 32
Adversarial Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 23 / 32
Review: EXP3
Assume ri,t ∈ [0, 1] is selected by the enviroment
In adversarial setting, the agent must select arm randomly
Idea: weight more probability to higher-reward ovserved arms
EXP31 (EXPonential-weight algorithm for EXPloration and
EXPloitation):
Regret bound is O(
√
TK log K)
1
Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 24 / 32
EXP4
Idea: run EXP3 on policies, instead of arms
EXP41 (EXPonential-weight algorithm for EXPloration and
EXPloitation using EXPert advice):
Regret bound is O(
√
TK log N), but variance is high
1
Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 25 / 32
EXP4.P
Idea: run EXP4 with better weight, to make algorithm stable
EXP4.P1 (EXP4 with Probability):
Regret bound is O(
√
TK log N), with high probability
1
Beygelzimer et al. Contextual Bandit Algorithms with Supervised Learning Guarantees. AISTATS, 2011.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 26 / 32
Supervised Learning to Contextual Bandit
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 27 / 32
Supervised Learning to Contextual Bandit
Idea: note that contextual bandit can be thought as a supervised
learing problem with partially-observed restriction
Trick: use randomized algorithm (ex. epsilon-greedy) and unbiased
(true) reward estimator ˆrat ,t =
rat ,t
pat
instead of observed reward rat ,t.
Then,
E[ˆri,t] = pi ·
ri,t
pi
+ (1 − pi ) · 0 = ri,t
Using this trick, any supervised learning algorithm can be converted
to a contextual bandit algorithm
Banditron and NeuralBandit are examples using neural network
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 28 / 32
Banditron and NeuralBandit
Both Banditron1 and NeuralBandit2 uses multi-layer perceptron and
epsilon-greedy algorithm w/ unbiased reward estimator
However, Banditron uses 0-1 loss (classification) while NeuralBandit
uses L2 loss (regression)
Regret bound of original Banditron is O(T2/3), and a 2nd-order
variant3 reduced it to ˜O(
√
T)
No theoretical garnatee is proved for NeuralBandit yet
1
Kakade et al. Efficient Bandit Algorithms for Online Multiclass Prediction. ICML, 2008.
2
Allesiardo et al. A Neural Networks Committee for the Contextual Bandit Problem. ICONIP, 2014.
3
Crammer & Gentile. Multiclass Classification with Bandit Feedback using Adaptive Regularization. ICML, 2013.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 29 / 32
Summary & Reference
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 30 / 32
Summary
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 31 / 32
Reference
[Zhou 2015] A Survey on Contextual Multi-armed Bandits. arXiv,
2015.
[Burtini’ 2015] A Survey of Online Experiment Design with the
Stochastic Multi-Armed Bandit. arXiv, 2015.
[Bubeck’ 2012] Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems. arXiv, 2012.
Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 32 / 32

Más contenido relacionado

La actualidad más candente

Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Bandit Algorithms
Bandit AlgorithmsBandit Algorithms
Bandit AlgorithmsSC5.io
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Shohei Taniguchi
 
MLaPP 5章 「ベイズ統計学」
MLaPP 5章 「ベイズ統計学」MLaPP 5章 「ベイズ統計学」
MLaPP 5章 「ベイズ統計学」moterech
 
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介Yusuke Nakata
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is funZhen Li
 
(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習Ichigaku Takigawa
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Yusuke Oda
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章Kota Matsui
 
パターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムパターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムMiyoshi Yuya
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Contexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMiningContexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMining正志 坪坂
 
Control as Inference.pptx
Control as Inference.pptxControl as Inference.pptx
Control as Inference.pptxssuserbd1647
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processJie-Han Chen
 

La actualidad más candente (20)

Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Bandit Algorithms
Bandit AlgorithmsBandit Algorithms
Bandit Algorithms
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
MLaPP 5章 「ベイズ統計学」
MLaPP 5章 「ベイズ統計学」MLaPP 5章 「ベイズ統計学」
MLaPP 5章 「ベイズ統計学」
 
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介
Multi-agent actor-critic for mixed cooperative-competitive environmentsの紹介
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
T-sne
T-sneT-sne
T-sne
 
(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章
 
パターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズムパターン認識 第12章 正則化とパス追跡アルゴリズム
パターン認識 第12章 正則化とパス追跡アルゴリズム
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Contexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMiningContexual bandit @TokyoWebMining
Contexual bandit @TokyoWebMining
 
Control as Inference.pptx
Control as Inference.pptxControl as Inference.pptx
Control as Inference.pptx
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 

Similar a Contextual Bandit Survey

A Transformational Approach to Resource Analysis with Typed-Norms
A Transformational Approach to Resource Analysis  with Typed-NormsA Transformational Approach to Resource Analysis  with Typed-Norms
A Transformational Approach to Resource Analysis with Typed-NormsFacultad de Informática UCM
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Dmitrii Ignatov
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...Alexander Decker
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceArthur Charpentier
 
1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docxstilliegeorgiana
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit MahajanAMDSeminarSeries
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Exploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesExploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesRob Hyndman
 
Rotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsRotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsJunghunKim27
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningUniversité de Liège (ULg)
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Istituto nazionale di statistica
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Jihun Yun
 

Similar a Contextual Bandit Survey (20)

A Transformational Approach to Resource Analysis with Typed-Norms
A Transformational Approach to Resource Analysis  with Typed-NormsA Transformational Approach to Resource Analysis  with Typed-Norms
A Transformational Approach to Resource Analysis with Typed-Norms
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...
 
11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...11.the comparative study of finite difference method and monte carlo method f...
11.the comparative study of finite difference method and monte carlo method f...
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx1. Consider experiments with the following censoring mechanism A gr.docx
1. Consider experiments with the following censoring mechanism A gr.docx
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Exploring the feature space of large collections of time series
Exploring the feature space of large collections of time seriesExploring the feature space of large collections of time series
Exploring the feature space of large collections of time series
 
Phd Proposal
Phd ProposalPhd Proposal
Phd Proposal
 
Rotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed BanditsRotting Infinitely Many-Armed Bandits
Rotting Infinitely Many-Armed Bandits
 
Meta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learningMeta-learning of exploration-exploitation strategies in reinforcement learning
Meta-learning of exploration-exploitation strategies in reinforcement learning
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
 

Más de Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 

Más de Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Último

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Último (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Contextual Bandit Survey

  • 1. Lab Seminar: Contextual Bandit Survey Sangwoo Mo KAIST swmo@kaist.ac.kr August 4, 2016 Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 1 / 32
  • 2. Overview 1 Problem Setting 2 Na¨ıve Approach: Reduce to MAB 3 Stochastic Contextual Bandit UCB & Thompson Sampling Arbitrary Set of Policies 4 Adversarial Contextual Bandit 5 Supervised Learning to Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 2 / 32
  • 3. Problem Setting Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 3 / 32
  • 4. Multi-Armed Bandit At each time t, the agent selects an arm at (at ∈ {1, ..., K}) Then, the agent recieves a reward rt(= rat ,t) from the enviroment If ri,t is i.i.d. of some distribution, we call it stochastic bandit, and if ri,t is selected by the enviroment, we call it adversarial bandit The goal of MAB is to find the policy π ∈ Π s.t. π(a1, r1, ...at−1, rt−1) = at which minimizes the regret1 RT := max i=1,...,K E T t=1 ri,t − T t=1 rat ,t 1 Properly speaking, cumulative pseudo-regret. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 4 / 32
  • 5. Contextual Bandit In contextual bandit, the agent recieves an additional information (=context) ct 1 ∈ C at the begining of time t In stochastic contextual bandit, the reward ri,t can be represented as a function of the context ci,t and noise i,t ri,t = f (ci,t) + i,t or simply ri,t = fi (ct) + i,t if ct is independent to i In adversarial contextual bandit, the reward ri,t is selected by the enviroment, as in the non-contextual MAB 1 Many literatures often notate ci,t to emphasize that each arm i has a corresponding context ci,t . However, both notations are identical since we can construct a single vector ct by concatenating ci,t s. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 5 / 32
  • 6. Optimal Regret Bound Stochastic Bandit: Ω(log T)1 Adversarial Bandit: Ω( √ KT)2 Contextual Bandit: Ω(d √ T)3 1 Lai & Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985. 2 Auer et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS, 1995. By minmax strategy. Note that adversarial bandit can be thought as a 2-player game by the agent and the enviroment. 3 Dani et al. Stochastic Linear Optimization under Bandit Feedback. COLT, 2012. Remark that the lower bound is Ω( √ T) even for the stochastic contextual bandit, since context may come in adversarially. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 6 / 32
  • 7. Na¨ıve Approach: Reduce to MAB Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 7 / 32
  • 8. Na¨ıve Approach: Reduce to MAB Approach 1: assume the context set is finite (|C| = N) Run MAB algorithm (ex. EXP3) for each context independently The regret bound is O( √ TNK log K)1 (w/ EXP3) Approach 2: assume the policy space is finite (|H| = M) Run MAB algorithm (ex. EXP3) on policies, instead of arms The regret bound is O( √ TM log M) (w/ EXP3) 1 N c=1 O(nc √ K log K) ≤ O( √ TN √ K log K) where nc is number of context c observed (by Cauchy-Schwarz inequality) Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 8 / 32
  • 9. Stochastic Contextual Bandit UCB & Thompson Sampling Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 9 / 32
  • 10. Review: Index Policy and Greedy Algorithm Since Gittins Index1, index policy became one of the most popular strategy for MAB problems Idea: for each time t, define a score si,t (=index) for each arm i. Select an arm which has the highest score Question: how to define proper si,t? Na¨ıve approach: use empirical mean2! (greedy algorithm) However, na¨ıve greedy algorithm may occur O(T) regret 1 Gittins. Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society, 1979. 2 Note that MAB becomes trivial if we know the true mean. The general goal of MAB algorithms is to estimate mean correctly and rapidly (explore-exploit dilema) Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 10 / 32
  • 11. Review: UCB1 Assume ri,t ∼ Pi with support [0, 1] and mean µi Idea: select more seldom-selected arms and less often-selected arms. In other words, give a confidence bonus1! UCB12: define score as si,t = ˆµi,t + 2 log t ni,t where ˆµi,t is empirical mean, and ni,t is number of arm i selected UCB1 policy garantees the optimal regret O(log T) Also, there are other choices for UCB (ex. KL-UCB3, Bayes-UCB4) 1 We call this bonus UCB(upper confidence bound). Thus, score = estimated mean + UCB. 2 Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002. 3 Garivier & Capp´e. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT, 2011. 4 Kaufmann et al. On Bayesian Upper Confidence Bounds for Bandit Problems. AISTATS, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 11 / 32
  • 12. LinUCB Assume ri,t ∼ P(ri,t | ci,t, θ) where E[ri,t] = cT i,tθ∗ (ci,t, θ ∈ Rd ) Like UCB1, want to define score as si,t = cT i,t ˆθt + UCBi,t Question: how to choose proper UCBi,t? Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 12 / 32
  • 13. LinUCB Idea: let ˆθt be an estimator of θ∗ by ridge regression ˆθt = (CT t Ct + λId )−1 CT t Rt where Ct = {c1, ..., ct−1} and Rt = {r1, ..., rt−1} Then, the inequality below holds with probability 1 − δ T cT i,t ˆθt − cT i,tθ∗ ≤ ( + 1) cT i,tA−1 t ci,t where At = CT t Ct + Id and = 1 2 log 2TK δ Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 13 / 32
  • 14. LinUCB LinUCB1: define score as si,t = cT i,t ˆθt + α cT i,tA−1 t ci,t Regret bound (with probability 1 − δ) is O(d T log 1 + T δ ) LinUCB policy garantees the optimal regret ˜O(d √ T) Also, there are other choices for UCB (ex. LinREL2, CoFineUCB3) 1 Li et al. A contextual-bandit approach to personalized news article recommendation. WWW, 2010. 2 Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. JMLR, 2002. 3 Yue et al. Hierarchical Exploration for Accelerating Contextual Bandits. ICML, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 14 / 32
  • 15. Review: Thompson Sampling Another popular strategy for MAB is Thompson Sampling1 It can be applied to both contextual and non-contextual bandit Assume ri,t ∼ P(ri,t | ci,t, θ∗) with prior θ∗ ∼ P(θ) Idea: sample estimator ˆθt from the posterior distribution step 1. draw θt from posterior P(θ | D = {ct, at, rt}) step 2. select arm ai = arg maxi E[ri,t | ci,t, θt] The idea is simple, but it works well both in theory2 and in practice3 1 Thompson. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrica, 1933. 2 Agrawal et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. COLT, 2012. 3 Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 2010. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 15 / 32
  • 16. LinTS Assume ri,t ∼ N(cT i,tθ∗, v2) and θ∗ ∼ N(θt, v2B−1 t ) where Bt = t−1 τ=1 ci,τ cT i,τ + Id , ˆθt = B−1 t t−1 τ=1 ci,τ ri,τ ri,t ∈ [¯ri,t − R, ¯ri,t + R], v = R 24 d log t δ Then, the posterior of θ∗ is N(θt+1, v2B−1 t+1) LinTS1: run Thompson Sampling in this assumption Regret bound (with probability 1 − δ) is O( d2 √ T1+ log(Td) log 1 δ ) 1 Agrawal et al. Thompson Sampling for Contextual Bandits with Linear Payoffs. ICML, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 16 / 32
  • 17. UCB & TS: Nonlinear Case Assume E[ri,t] = f (ci,t) is general nonlinear function If we assume f is a member of exponential family, we can use GLM-UCB1 If we assume f is sampled from a Guassian Process, we can use GP-UCB2/CGP-UCB3 If we assume f is an element of Reproducing Kernel Hilbert Space, we can use KernelUCB4 Also, we can use Thompson Sampling if we know the form of probability distribution 1 Filippi et al. Parametric Bandits: The Generalized Linear Case. NIPS, 2010. 2 Srinivas et al. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML, 2010. 3 Krause & Ong. Contextual Gaussian Process Bandit Optimization. NIPS, 2011. 4 Valko et al. Finite-Time Analysis of Kernelised Contextual Bandits. UAI, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 17 / 32
  • 18. Stochastic Contextual Bandit Arbitrary Set of Policies Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 18 / 32
  • 19. Epoch-Greedy Assume policy space H if finite1 Idea: explore T steps and exploit T − T steps (epsilon-first) issue 1. how to get an unbiased estimator of the best policy? issue 2. how to balance explore and exploit if we don’t know T? trick 1: use D = {ct, at, rt} observed in explore step ˆπ = max π∈H (ct ,at ,rt )∈D raI(π(ct) = at) 1/K trick 2: run epsilon-first in mini-batches (partition of T) 1 Infinite w/ finite VC-dimension can be derived in similar way Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 19 / 32
  • 20. Epoch-Greedy Epoch-Greedy1: combine trick 1 & trick 2 Regret bound is ˜O(T2/3) (not optimal!) 1 Langford & Zhang. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS, 2007. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 20 / 32
  • 21. RandomizedUCB Idea: estimate the distribution Pt over the policy space H RandomizedUCB1: Regret bound is ˜O( √ T), but time complexity is O(T6) 1 Dudik et al. Efficient Optimal Learning for Contextual Bandits. UAI, 2011. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 21 / 32
  • 22. ILOVECONBANDITS Idea: similar to RandomizedUCB, improve time complexity ILOVECONBANDITS1 (Importance-weighted LOw-Variance Epoch-Timed Oracleized CONtextual BANDITS): Regret bound is ˜O( √ T), and time complexity is O(T1.5) 1 Agrawal et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. ICML, 2014. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 22 / 32
  • 23. Adversarial Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 23 / 32
  • 24. Review: EXP3 Assume ri,t ∈ [0, 1] is selected by the enviroment In adversarial setting, the agent must select arm randomly Idea: weight more probability to higher-reward ovserved arms EXP31 (EXPonential-weight algorithm for EXPloration and EXPloitation): Regret bound is O( √ TK log K) 1 Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 24 / 32
  • 25. EXP4 Idea: run EXP3 on policies, instead of arms EXP41 (EXPonential-weight algorithm for EXPloration and EXPloitation using EXPert advice): Regret bound is O( √ TK log N), but variance is high 1 Auer et al. The nonstochastic multiarmed bandit problem. SIAM, 2002. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 25 / 32
  • 26. EXP4.P Idea: run EXP4 with better weight, to make algorithm stable EXP4.P1 (EXP4 with Probability): Regret bound is O( √ TK log N), with high probability 1 Beygelzimer et al. Contextual Bandit Algorithms with Supervised Learning Guarantees. AISTATS, 2011. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 26 / 32
  • 27. Supervised Learning to Contextual Bandit Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 27 / 32
  • 28. Supervised Learning to Contextual Bandit Idea: note that contextual bandit can be thought as a supervised learing problem with partially-observed restriction Trick: use randomized algorithm (ex. epsilon-greedy) and unbiased (true) reward estimator ˆrat ,t = rat ,t pat instead of observed reward rat ,t. Then, E[ˆri,t] = pi · ri,t pi + (1 − pi ) · 0 = ri,t Using this trick, any supervised learning algorithm can be converted to a contextual bandit algorithm Banditron and NeuralBandit are examples using neural network Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 28 / 32
  • 29. Banditron and NeuralBandit Both Banditron1 and NeuralBandit2 uses multi-layer perceptron and epsilon-greedy algorithm w/ unbiased reward estimator However, Banditron uses 0-1 loss (classification) while NeuralBandit uses L2 loss (regression) Regret bound of original Banditron is O(T2/3), and a 2nd-order variant3 reduced it to ˜O( √ T) No theoretical garnatee is proved for NeuralBandit yet 1 Kakade et al. Efficient Bandit Algorithms for Online Multiclass Prediction. ICML, 2008. 2 Allesiardo et al. A Neural Networks Committee for the Contextual Bandit Problem. ICONIP, 2014. 3 Crammer & Gentile. Multiclass Classification with Bandit Feedback using Adaptive Regularization. ICML, 2013. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 29 / 32
  • 30. Summary & Reference Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 30 / 32
  • 31. Summary Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 31 / 32
  • 32. Reference [Zhou 2015] A Survey on Contextual Multi-armed Bandits. arXiv, 2015. [Burtini’ 2015] A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit. arXiv, 2015. [Bubeck’ 2012] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. arXiv, 2012. Sangwoo Mo (KAIST) Contextual Bandit Survey August 4, 2016 32 / 32