We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
Similar a Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019
Analytic Methods and Issues in CER from Observational DataCTSI at UCSF
Similar a Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019 (20)
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019
1. New Statistical Learning Methods for Estimating the
Optimal Dynamic Treatment Regime
Lu Wang
Associate Professor
University of Michigan, Ann Arbor
luwang@umich.edu
December 10, 2019
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 1 / 27
2. Overview
1 Introduction: Dynamic Treatment Regime (DTR)
2 Adaptive Contrast Weighted Learning (ACWL)
3 Tree-based Reinforcement Learning (T-RL)
4 Summary and Other Ongoing Research
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 2 / 27
3. Motivating Example: Cancer Management
Esophageal Cancer Patients, MD Anderson, 1998 to 2012
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 3 / 27
4. Motivation of Dynamic Treatment Regimes (DTRs)
Chronic Disease management: routinely adjust, change, add, or
discontinue treatment based on progress, side effects, patient burden,
compliance, etc.
→ Dynamic Treatment Regimes (DTRs)
Goals: To account for patient heterogeneity and to personalize health
care strategy:
One Size Fits All
Degree of Tailoring
−−−−−−−−−−−−−−→
Inherent Characteristics
Targeted/Individualized
Once and for All
Degree of Tailoring
−−−−−−−−−−−−−−−−−−→
Time-Varying Characteristics
Dynamic
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 4 / 27
5. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 5 / 27
6. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 6 / 27
7. Evidence-based Personalized Health Care
To provide meaningful improved health outcomes for patients by
delivering the right drug with the right dose at the right time.
One size fits all Individualized TherapyDegree of Tailoring
Low predictability
Suboptimal health outcomes
Asses patient response heterogeneity;
Stratify patient population;
Optimize benefits/risk, etc.
Better predictability
Improved health outcomes
Family History
Medical
Genetics
Lifestyle
Environment
Engine
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 7 / 27
8. Key Notation
Stage j, j = 1, . . . , T with treatment options Kj (≥ 2)
Aj: treatment at stage j with value aj ∈ Aj = {1, . . . , Kj}
Xj: covariate history just prior to assign Aj
Y : overall outcome of interest
DTR g = (g1, . . . , gT ): a set of decision rules with
gj : Domain of Hj = (A1, . . . , Aj−1, Xj ) → Aj.
Y ∗(a): counterfactual outcome under treatment a
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 8 / 27
9. Assumptions for Identifiability using Observational Data
Use causal framework to assess DTRs under three assumptions
(Murphy et al. 2001; Robins and Hernan, 2009)
1 Consistency (no interference)
If a patient’s observed treatment history is compatible with a
given DTR, his or her observed clinical outcomes are the same as
the counterfactual ones under the DTR.
2 No Unmeasured Confounding
Treatment decision at a given time is independent of future
observations and counterfactual outcomes, conditional on all
previous covariate and treatment history.
3 Positivity
With probability one, every subject follows a specific DTR with a
nonnegative probability, which is bounded away from 0.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 9 / 27
10. Adaptive Contrast Weighted Learning (ACWL)
Tao and Wang, 2017 Biometrics
For T = 1:
gopt
= arg max
g∈G
EH
K
a=1
E{Y ∗
(a)|H}I{g(H) = a} .
Under the Causal Assumptions 1-3, we have
gopt
= arg max
g∈G
EH
K
a=1
µa(H)I{g(H) = a} ,
where µa(H) E(Y |A = a, H).
Incorporate order statistics µ(1)(H) ≤ · · · ≤ µ(K)(H), then
gopt
= arg max
g∈G
EH
K
a=1
µ(a)(H)I{g(H) = la(H)} ,
where µ(a)(H) = µla (H)
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 10 / 27
11. Property of gopt
gopt
= arg min
g∈G
EH
K−1
a=1
{µ(K)(H) − µ(a)(H)}I{g(H) = la(H)} .
Minimizes the expected loss in the outcome due to sub-optimal
treatments in the entire population of interest.
Classify as many patients as possible to their optimal treatment
lK (i.e., letting I{g(H) = la(H)} = 0, a = 1, . . . , K − 1) while
putting more emphasis on patients with larger contrasts (i.e.,
larger µ(K)(H) − µ(a)(H)).
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 11 / 27
12. Bounds of the Objective Function
Let C1(H) µ(K)(H) − µ(K−1)(H), and C2(H) µ(K)(H) − µ(1)(H),
then
0 ≤ C1(H) ≤ µ(K)(H) − µ(a)(H) ≤ C2(H)
Therefore,
EH [C1(H)I{g(H) = lK(H)}]
≤ EH
K−1
a=1
{µ(K)(H) − µ(a)(H)}I{g(H) = la(H)}
≤ EH [C2(H)I{g(H) = lK(H)}]
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 12 / 27
13. Adaptive Contrasts for Transformation to Weighted
Classification
Contrasts C1 and C2: minimum and maximum expected losses in
the outcome given sub-optimal treatments, adaptive to each
patient’s own treatment effect ordering
In the best case where sub-optimal treatments only lead to
minimal expected losses in the outcome, gopt is equal to
arg min
g∈G
EH [C1(H)I{g(H) = lK(H)}] ,
In the worst case where sub-optimal treatments all lead to
maximal expected losses in the outcome, gopt is equal to
arg min
g∈G
EH [C2(H)I{g(H) = lK(H)}] .
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 13 / 27
14. Estimate µA(H) to Construct C1(H) and C2(H)
Given estimated propensity score ˆπa(H), the AIPW estimator
ˆµAIPW
a for E{Y ∗(a)} is
ˆµAIPW
a = Pn
I(A = a)
ˆπa(H)
Y + 1 −
I(A = a)
ˆπa(H)
ˆµa(H) .
Lemma (Double Robustness)
ˆµAIPW
a is a consistent estimator of E{Y ∗(a)} if either the propensity
model πa(H) or the conditional mean model µa(H) is correctly specified.
At a subject level,
ˆµAIPW
a (H) =
I(A = a)
ˆπa(H)
Y + 1 −
I(A = a)
ˆπa(H)
ˆµa(H).
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 14 / 27
15. Adaptive Contrast Weighted Learning (ACWL) with T > 1
For stage T, the assumptions and method derivation are the same
as in a single stage scenario.
For stage j, T − 1 ≥ j ≥ 1, we incorporate Q-learning to conduct
backward induction. The stage-specific pseudo outcome POj for
estimating treatment effect ordering and adaptive contrasts is a
predicted counterfactual outcome under optimal treatments at all
future stages, i.e.,
POj = E Y ∗
(A1, . . . , Aj, gopt
j+1, . . . , gopt
T ) .
Replace Y with POj to apply the same method as in T = 1.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 15 / 27
16. Simulation Results
Q-learning BOWL ACWL
Stage1Stage2
Q-learning BOWL ACWL
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 16 / 27
17. Tree-based Reinforcement Learning (T-RL): Motivation
Tao, Wang and Armirall D. (2018) Annals of Applied Statistics
Supervised learning (SL):
True label is known
Then tree-based methods are easy to understand and interpret
without distributional assumptions for data, e.g., Classification
and Regression Tree (CART)
In a DTR problem, label (the optimal treatment) at each stage is not
directly observed
ACWL uses semiparametric regression to estimate the label first
and then convert RL to SL to apply existing classification methods
such as CART
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 17 / 27
18. T-RL: Methodological Basis and Feature
Batch-mode RL using a sequence of unsupervised decision trees with
backward induction
Maintain the RL feature without the extra step of conversion to
SL, and thus reduce estimation uncertainty (improvement upon
ACWL)
Incorporate multiple treatment comparisons directly, to improve
efficiency (improvement upon ACWL)
Also allow multi-categorical and ordinal treatments
Similar to ACWL, we expect T-RL to be robust and efficient by
combining robust semiparametric regression estimators with
nonparametric machine learning methods.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 18 / 27
19. Example of an Unsupervised Decision Tree
The selected split at each node must improve the expected
counterfactual outcome.
Nodes Ωm, m = 1, 2, . . . , are regions defined by subset covariate space.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 19 / 27
20. Tree-based RL (T-RL): Purity Measures
Use the purity measure to improve the expected counterfactual
outcome and apply doubly robust AIPW estimators as in ACWL.
For a given partition ω and ωc of node Ω, let gj,ω,a1,a2 denote the
decision rule that assigns treatment a1 to subjects in ω and
treatment a2 to subjects in ωc at stage j.
We define the purity measure Pj(Ω, ω) as
max
a1,a2∈Aj
Pn
Kj
aj=1
ˆµAIPW
j,aj
(Hj)I{gj,ω,a1,a2 (Hj) = aj}I(Hj ∈ Ω)
.
Pj(Ω, ω) evaluates the performance of the best decision rule which
assigns a single treatment for each of the two arms under partition.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 20 / 27
21. T-RL: Recursive Partitioning
The best split ωopt is chosen to maximize the improvement in the
purity, Pj(Ω, ω) − Pj(Ω).
Stopping Rules, given minimum node size n0 and minimum
purity improvement λ
1 If the node size is less than 2n0, the node will not be split.
2 If all possible splits of a node result in a child node with size smaller
than n0, the node will not be split.
3 If the current tree depth reaches the user-specified maximum depth,
the tree growing process will stop.
4 If the maximum purity improvement Pj(Ω, ˆωopt
) − Pj(Ω) is less
than λ, the node will not be split, where
ˆωopt
= arg max
ω
[Pj(Ω, ω) : min{nPnI(Hj ∈ ω), nPnI(Hj ∈ ωc
)} ≥ n0]
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 21 / 27
22. T-RL Algorithm
1 Start with stage j = T.
2 Obtain AIPW estimates ˆµAIPW
j,aj
(Hj), aj = 1, . . . , Kj,.
3 At root node Ωj,m, m = 1, set values for λ and n0.
4 At node Ωj,m, evaluate the four Stopping Rules. If any of them
is satisfied, assign a single best treatment
arg max
aj∈Aj
Pn[ˆµAIPW
j,aj
(Hj)I(Hj ∈ Ωj,m)]
to all subjects in Ωj,m. Otherwise, split Ωj,m into child nodes
Ωj,2m and Ωj,2m+1 by ˆωopt.
5 Set m = m + 1 and repeat Step 4 until all nodes are terminal.
6 If j > 1, set j = j − 1 and repeat steps 2 to 5. If j = 1, stop.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 22 / 27
25. Summary
ACWL transforms batch-mode RL to SL and incorporates existing
classification methods: flexible and easy to implement.
T-RL directly deals with batch-mode RL and uses unsupervised
trees to learn optimal DTRs with purity measures maximizing the
counterfactual mean outcome.
ACWL and T-RL both combines semiparametric regression with
machine learning and can handle multiple stages and multiple
treatments.
Overall, both methods work well across different scenarios. T-RL
has slightly better robustness. T-RL works better for tree-type
DTRs while ACWL works better for non-tree-type ones.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 25 / 27
26. Other Ongoing Research Projects
Stochastic Tree-based Reinforcement Learning (ST-RL) for
estimating optimal DTRs
Explore different structures of optimal DTRs
with restricted arms from observational data
with restricted tailoring variables
Some special DTR frameworks: e.g. Nested Test-and-Treat DTRs
Multivariate utility function: multiple competing clinical priorities
Continuous treatment options, e.g., radiation doses
Mobile health: continuous decisions and data updating
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 26 / 27
27. Acknowledgment
My Ph.D. students:
Yebin Tao, Yilun Sun, Nina Zhou, Ming Tang, Kelly Speth,
Jincheng Shen, Mochuan Liu, Yingchao Zhong.
My collaborators:
Daniel Almirall, Jeremy Taylor, Peter Thall, Stewart Wang.
Thank You!
luwang@umich.edu
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 27 / 27