Sponsored content in contextual bandits. Deconfounding targeting not at random

GRAPE
GRAPEGRAPE
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Sponsored content in contextual bandits.
Deconfounding Targeting Not At Random
MIUE 2023
Hubert Drążkowski
GRAPE|FAME, Warsaw University of Technology
September 22, 2023
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Motivational examples
Recommender systems
• Suggest best ads/movies a ∈ {a1, a2, ...aK }
• Users X1, X2, ...., XT
• Design of the study {na1
, na2
, ..., naK
},
P
i nai
= T
• Measured satisfaction {Rt(a1), ...Rt(aK )}T
t=1
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The flow of information
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Inverse Gap Weighting
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Causal interpretation
Figure 1: TCAR
Figure 2: TAR
Figure 3: TNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS
Internal validity
External validity
Propensity score ?
Table 1: Differences and similarities between data sources
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0
(X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0
(xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Deconfounded CATE IGW (D-CATE-IGW)
• Let b = arg maxa b
µa(xt).
π(a|x) =
( 1
K+γm(b
µm
b (x)−b
µm
a (x))
for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
=
(
1
K+γm b
τb,a(x) for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
,
• Each round/epoch deconfound the CATE
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result I
Figure 4: Normed cumulative regret for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result II
Figure 5: True and estimated CATE values for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Contribution
1 Pioneering model for sponsored content in contextual bandits framework
2 Bandits not as experimental studies, but as observational studies
3 Confounding scenario and deconfounding application
4 D-CATE-IGW works
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The beginning ...
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang
(2020). Causal inference methods for combining randomized trials and observational studies: a
review. arXiv preprint arXiv:2011.08047.
Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental
grounding. Advances in neural information processing systems 31.
Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating
heterogeneous treatment effects using machine learning. Proceedings of the national academy of
sciences 116(10), 4156–4165.
Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press.
Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining
experimental and observational studies. In Conference on Causal Learning and Reasoning, pp.
904–926. PMLR.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26
1 de 68

Recomendados

Sequential Monte Carlo algorithms for agent-based models of disease transmission por
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
62 vistas57 diapositivas
block-mdp-masters-defense.pdf por
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
63 vistas75 diapositivas
Classification por
ClassificationClassification
ClassificationArthur Charpentier
16.7K vistas199 diapositivas
Micro to macro passage in traffic models including multi-anticipation effect por
Micro to macro passage in traffic models including multi-anticipation effectMicro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effectGuillaume Costeseque
108 vistas28 diapositivas
Locality-sensitive hashing for search in metric space por
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Eliezer Silva
104 vistas47 diapositivas
Low Complexity Regularization of Inverse Problems por
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
1.3K vistas56 diapositivas

Más contenido relacionado

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi... por
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...asahiushio1
119 vistas22 diapositivas
ijcai09submodularity.ppt por
ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt42HSQuangMinh
7 vistas154 diapositivas
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... por
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...Hui Yang
113 vistas42 diapositivas
Sequential Monte Carlo algorithms for agent-based models of disease transmission por
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
38 vistas52 diapositivas
ppt0320defenseday por
ppt0320defensedayppt0320defenseday
ppt0320defensedayXi (Shay) Zhang, PhD
542 vistas48 diapositivas
main por
mainmain
mainDavid Mateos
194 vistas75 diapositivas

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random(20)

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi... por asahiushio1
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1119 vistas
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... por Hui Yang
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Hui Yang113 vistas
Sequential Monte Carlo algorithms for agent-based models of disease transmission por JeremyHeng10
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng1038 vistas
ESRA2015 course: Latent Class Analysis for Survey Research por Daniel Oberski
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
Daniel Oberski4.7K vistas
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences por Oana Tifrea-Marciuska
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI por Jack Clark
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark2.9K vistas
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習 por Deep Learning JP
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
Deep Learning JP661 vistas
Hierarchical Reinforcement Learning with Option-Critic Architecture por Necip Oguz Serbetci
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
Necip Oguz Serbetci367 vistas
Linear Discriminant Analysis and Its Generalization por 일상 온
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
일상 온3.6K vistas
Applied machine learning for search engine relevance 3 por Charles Martin
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin1.7K vistas

Más de GRAPE

ENTIME_GEM___GAP.pdf por
ENTIME_GEM___GAP.pdfENTIME_GEM___GAP.pdf
ENTIME_GEM___GAP.pdfGRAPE
5 vistas15 diapositivas
Boston_College Slides.pdf por
Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdfGRAPE
4 vistas208 diapositivas
Presentation_Yale.pdf por
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdfGRAPE
9 vistas207 diapositivas
Presentation_Columbia.pdf por
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdfGRAPE
4 vistas187 diapositivas
Presentation.pdf por
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
4 vistas175 diapositivas
Presentation.pdf por
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
18 vistas113 diapositivas

Más de GRAPE(20)

ENTIME_GEM___GAP.pdf por GRAPE
ENTIME_GEM___GAP.pdfENTIME_GEM___GAP.pdf
ENTIME_GEM___GAP.pdf
GRAPE5 vistas
Boston_College Slides.pdf por GRAPE
Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdf
GRAPE4 vistas
Presentation_Yale.pdf por GRAPE
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdf
GRAPE9 vistas
Presentation_Columbia.pdf por GRAPE
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdf
GRAPE4 vistas
Presentation.pdf por GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE4 vistas
Presentation.pdf por GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE18 vistas
Presentation.pdf por GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE16 vistas
Slides.pdf por GRAPE
Slides.pdfSlides.pdf
Slides.pdf
GRAPE14 vistas
Slides.pdf por GRAPE
Slides.pdfSlides.pdf
Slides.pdf
GRAPE16 vistas
DDKT-Munich.pdf por GRAPE
DDKT-Munich.pdfDDKT-Munich.pdf
DDKT-Munich.pdf
GRAPE7 vistas
DDKT-Praga.pdf por GRAPE
DDKT-Praga.pdfDDKT-Praga.pdf
DDKT-Praga.pdf
GRAPE11 vistas
DDKT-Southern.pdf por GRAPE
DDKT-Southern.pdfDDKT-Southern.pdf
DDKT-Southern.pdf
GRAPE25 vistas
DDKT-SummerWorkshop.pdf por GRAPE
DDKT-SummerWorkshop.pdfDDKT-SummerWorkshop.pdf
DDKT-SummerWorkshop.pdf
GRAPE15 vistas
DDKT-SAET.pdf por GRAPE
DDKT-SAET.pdfDDKT-SAET.pdf
DDKT-SAET.pdf
GRAPE29 vistas
The European Unemployment Puzzle: implications from population aging por GRAPE
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE53 vistas
Matching it up: non-standard work and job satisfaction.pdf por GRAPE
Matching it up: non-standard work and job satisfaction.pdfMatching it up: non-standard work and job satisfaction.pdf
Matching it up: non-standard work and job satisfaction.pdf
GRAPE20 vistas
Investment in human capital: an optimal taxation approach por GRAPE
Investment in human capital: an optimal taxation approachInvestment in human capital: an optimal taxation approach
Investment in human capital: an optimal taxation approach
GRAPE22 vistas
slides_cef.pdf por GRAPE
slides_cef.pdfslides_cef.pdf
slides_cef.pdf
GRAPE22 vistas
Fertility, contraceptives and gender inequality por GRAPE
Fertility, contraceptives and gender inequalityFertility, contraceptives and gender inequality
Fertility, contraceptives and gender inequality
GRAPE24 vistas
The European Unemployment Puzzle: implications from population aging por GRAPE
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE50 vistas

Último

Debt Watch | ICICI Prudential Mutual Fund por
Debt Watch | ICICI Prudential Mutual FundDebt Watch | ICICI Prudential Mutual Fund
Debt Watch | ICICI Prudential Mutual Fundiciciprumf
8 vistas2 diapositivas
The breath of the investment grade and the unpredictability of inflation - Eu... por
The breath of the investment grade and the unpredictability of inflation - Eu...The breath of the investment grade and the unpredictability of inflation - Eu...
The breath of the investment grade and the unpredictability of inflation - Eu...Antonis Zairis
12 vistas1 diapositiva
Rajat_capital_Newsletter_December_2023.pdf por
Rajat_capital_Newsletter_December_2023.pdfRajat_capital_Newsletter_December_2023.pdf
Rajat_capital_Newsletter_December_2023.pdfRajatGhosh32
44 vistas10 diapositivas
How to Undermine the Russian War Effort and Support Ukraine por
How to Undermine the Russian War Effort and Support UkraineHow to Undermine the Russian War Effort and Support Ukraine
How to Undermine the Russian War Effort and Support UkraineThe Forum for Research on Eastern Europe and Emerging Economies
19 vistas3 diapositivas
InitVerse :Blockchain technology trends in 2024.pdf por
InitVerse :Blockchain technology trends in 2024.pdfInitVerse :Blockchain technology trends in 2024.pdf
InitVerse :Blockchain technology trends in 2024.pdfInitVerse Blockchain
23 vistas9 diapositivas
TriStar Gold- Corporate Presentation - December 2023 por
TriStar Gold- Corporate Presentation - December 2023TriStar Gold- Corporate Presentation - December 2023
TriStar Gold- Corporate Presentation - December 2023Adnet Communications
406 vistas14 diapositivas

Último(20)

Debt Watch | ICICI Prudential Mutual Fund por iciciprumf
Debt Watch | ICICI Prudential Mutual FundDebt Watch | ICICI Prudential Mutual Fund
Debt Watch | ICICI Prudential Mutual Fund
iciciprumf8 vistas
The breath of the investment grade and the unpredictability of inflation - Eu... por Antonis Zairis
The breath of the investment grade and the unpredictability of inflation - Eu...The breath of the investment grade and the unpredictability of inflation - Eu...
The breath of the investment grade and the unpredictability of inflation - Eu...
Antonis Zairis12 vistas
Rajat_capital_Newsletter_December_2023.pdf por RajatGhosh32
Rajat_capital_Newsletter_December_2023.pdfRajat_capital_Newsletter_December_2023.pdf
Rajat_capital_Newsletter_December_2023.pdf
RajatGhosh3244 vistas
TriStar Gold- Corporate Presentation - December 2023 por Adnet Communications
TriStar Gold- Corporate Presentation - December 2023TriStar Gold- Corporate Presentation - December 2023
TriStar Gold- Corporate Presentation - December 2023
Product Listing Optimization.pdf por AllenSingson
Product Listing Optimization.pdfProduct Listing Optimization.pdf
Product Listing Optimization.pdf
AllenSingson21 vistas
The implementation of government subsidies and tax incentives to enhance the ... por Fardeen Ahmed
The implementation of government subsidies and tax incentives to enhance the ...The implementation of government subsidies and tax incentives to enhance the ...
The implementation of government subsidies and tax incentives to enhance the ...
Fardeen Ahmed6 vistas
Rob Tolley London - Financial Strategy por Rob Tolley
Rob Tolley London - Financial StrategyRob Tolley London - Financial Strategy
Rob Tolley London - Financial Strategy
Rob Tolley6 vistas
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf por multigainfinancial
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf1_updated_Axis India Manufacturing Fund-NFO One pager.pdf
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf
multigainfinancial38 vistas
Stock Market Brief Deck 121.pdf por Michael Silva
Stock Market Brief Deck 121.pdfStock Market Brief Deck 121.pdf
Stock Market Brief Deck 121.pdf
Michael Silva37 vistas
Practices of corporate governance in Commercial Banks of Bangladesh por Ariful Saimon
Practices of corporate governance in Commercial Banks of BangladeshPractices of corporate governance in Commercial Banks of Bangladesh
Practices of corporate governance in Commercial Banks of Bangladesh
Ariful Saimon12 vistas
Indias Sparkling Future : Lab-Grown Diamonds in Focus por anujadeodhar4
Indias Sparkling Future : Lab-Grown Diamonds in FocusIndias Sparkling Future : Lab-Grown Diamonds in Focus
Indias Sparkling Future : Lab-Grown Diamonds in Focus
anujadeodhar49 vistas
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa... por aljazeeramasoom
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...
aljazeeramasoom7 vistas
InitVerse :Blockchain development trends in 2024.pdf por InitVerse Blockchain
InitVerse :Blockchain development trends in 2024.pdfInitVerse :Blockchain development trends in 2024.pdf
InitVerse :Blockchain development trends in 2024.pdf
Blockchain, AI & Metaverse for Football Clubs - 2023.pdf por kelroyjames1
Blockchain, AI & Metaverse for Football Clubs - 2023.pdfBlockchain, AI & Metaverse for Football Clubs - 2023.pdf
Blockchain, AI & Metaverse for Football Clubs - 2023.pdf
kelroyjames112 vistas
Stabilizing Algorithmic Stablecoins: the TerraLuna case study por FedericoCalandra1
Stabilizing Algorithmic Stablecoins: the TerraLuna case studyStabilizing Algorithmic Stablecoins: the TerraLuna case study
Stabilizing Algorithmic Stablecoins: the TerraLuna case study
Supplier Sourcing presentation.pdf por AllenSingson
Supplier Sourcing presentation.pdfSupplier Sourcing presentation.pdf
Supplier Sourcing presentation.pdf
AllenSingson20 vistas

Sponsored content in contextual bandits. Deconfounding targeting not at random

  • 1. Authoritarian Sponsor Deconfounding Experiment Conclusions References Sponsored content in contextual bandits. Deconfounding Targeting Not At Random MIUE 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology September 22, 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
  • 2. Authoritarian Sponsor Deconfounding Experiment Conclusions References Motivational examples Recommender systems • Suggest best ads/movies a ∈ {a1, a2, ...aK } • Users X1, X2, ...., XT • Design of the study {na1 , na2 , ..., naK }, P i nai = T • Measured satisfaction {Rt(a1), ...Rt(aK )}T t=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
  • 3. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 4. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 5. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 6. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 7. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 8. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 9. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 10. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 11. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 12. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 13. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 14. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 15. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 16. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 17. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 18. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 19. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 20. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 21. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 22. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 23. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 24. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 25. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 26. Authoritarian Sponsor Deconfounding Experiment Conclusions References The flow of information Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
  • 27. Authoritarian Sponsor Deconfounding Experiment Conclusions References Inverse Gap Weighting Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
  • 28. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
  • 29. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 30. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 31. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 32. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 33. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 34. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 35. Authoritarian Sponsor Deconfounding Experiment Conclusions References Causal interpretation Figure 1: TCAR Figure 2: TAR Figure 3: TNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
  • 36. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
  • 37. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Internal validity External validity Propensity score ? Table 1: Differences and similarities between data sources Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
  • 38. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 39. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 40. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 41. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 42. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 43. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 44. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 45. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 46. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 47. Authoritarian Sponsor Deconfounding Experiment Conclusions References Deconfounded CATE IGW (D-CATE-IGW) • Let b = arg maxa b µa(xt). π(a|x) = ( 1 K+γm(b µm b (x)−b µm a (x)) for a ̸= b 1 − P c̸=b π(c|x) for a = b = ( 1 K+γm b τb,a(x) for a ̸= b 1 − P c̸=b π(c|x) for a = b , • Each round/epoch deconfound the CATE Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
  • 48. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
  • 49. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 50. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 51. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 52. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 53. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result I Figure 4: Normed cumulative regret for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
  • 54. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result II Figure 5: True and estimated CATE values for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
  • 55. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
  • 56. Authoritarian Sponsor Deconfounding Experiment Conclusions References Contribution 1 Pioneering model for sponsored content in contextual bandits framework 2 Bandits not as experimental studies, but as observational studies 3 Confounding scenario and deconfounding application 4 D-CATE-IGW works Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
  • 57. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 58. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 59. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 60. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 61. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 62. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 63. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 64. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 65. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 66. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 67. Authoritarian Sponsor Deconfounding Experiment Conclusions References The beginning ... Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
  • 68. Authoritarian Sponsor Deconfounding Experiment Conclusions References Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang (2020). Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047. Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental grounding. Advances in neural information processing systems 31. Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116(10), 4156–4165. Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press. Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining experimental and observational studies. In Conference on Causal Learning and Reasoning, pp. 904–926. PMLR. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26