SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
New Statistical Learning Methods for Estimating the
Optimal Dynamic Treatment Regime
Lu Wang
Associate Professor
University of Michigan, Ann Arbor
luwang@umich.edu
December 10, 2019
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 1 / 27
Overview
1 Introduction: Dynamic Treatment Regime (DTR)
2 Adaptive Contrast Weighted Learning (ACWL)
3 Tree-based Reinforcement Learning (T-RL)
4 Summary and Other Ongoing Research
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 2 / 27
Motivating Example: Cancer Management
Esophageal Cancer Patients, MD Anderson, 1998 to 2012
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 3 / 27
Motivation of Dynamic Treatment Regimes (DTRs)
Chronic Disease management: routinely adjust, change, add, or
discontinue treatment based on progress, side effects, patient burden,
compliance, etc.
→ Dynamic Treatment Regimes (DTRs)
Goals: To account for patient heterogeneity and to personalize health
care strategy:
One Size Fits All
Degree of Tailoring
−−−−−−−−−−−−−−→
Inherent Characteristics
Targeted/Individualized
Once and for All
Degree of Tailoring
−−−−−−−−−−−−−−−−−−→
Time-Varying Characteristics
Dynamic
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 4 / 27
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 5 / 27
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 6 / 27
Evidence-based Personalized Health Care
To provide meaningful improved health outcomes for patients by
delivering the right drug with the right dose at the right time.
One size fits all Individualized TherapyDegree of Tailoring
Low predictability
Suboptimal health outcomes
Asses patient response heterogeneity;
Stratify patient population;
Optimize benefits/risk, etc.
Better predictability
Improved health outcomes
Family History
Medical
Genetics
Lifestyle
Environment
Engine
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 7 / 27
Key Notation
Stage j, j = 1, . . . , T with treatment options Kj (≥ 2)
Aj: treatment at stage j with value aj ∈ Aj = {1, . . . , Kj}
Xj: covariate history just prior to assign Aj
Y : overall outcome of interest
DTR g = (g1, . . . , gT ): a set of decision rules with
gj : Domain of Hj = (A1, . . . , Aj−1, Xj ) → Aj.
Y ∗(a): counterfactual outcome under treatment a
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 8 / 27
Assumptions for Identifiability using Observational Data
Use causal framework to assess DTRs under three assumptions
(Murphy et al. 2001; Robins and Hernan, 2009)
1 Consistency (no interference)
If a patient’s observed treatment history is compatible with a
given DTR, his or her observed clinical outcomes are the same as
the counterfactual ones under the DTR.
2 No Unmeasured Confounding
Treatment decision at a given time is independent of future
observations and counterfactual outcomes, conditional on all
previous covariate and treatment history.
3 Positivity
With probability one, every subject follows a specific DTR with a
nonnegative probability, which is bounded away from 0.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 9 / 27
Adaptive Contrast Weighted Learning (ACWL)
Tao and Wang, 2017 Biometrics
For T = 1:
gopt
= arg max
g∈G
EH
K
a=1
E{Y ∗
(a)|H}I{g(H) = a} .
Under the Causal Assumptions 1-3, we have
gopt
= arg max
g∈G
EH
K
a=1
µa(H)I{g(H) = a} ,
where µa(H) E(Y |A = a, H).
Incorporate order statistics µ(1)(H) ≤ · · · ≤ µ(K)(H), then
gopt
= arg max
g∈G
EH
K
a=1
µ(a)(H)I{g(H) = la(H)} ,
where µ(a)(H) = µla (H)
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 10 / 27
Property of gopt
gopt
= arg min
g∈G
EH
K−1
a=1
{µ(K)(H) − µ(a)(H)}I{g(H) = la(H)} .
Minimizes the expected loss in the outcome due to sub-optimal
treatments in the entire population of interest.
Classify as many patients as possible to their optimal treatment
lK (i.e., letting I{g(H) = la(H)} = 0, a = 1, . . . , K − 1) while
putting more emphasis on patients with larger contrasts (i.e.,
larger µ(K)(H) − µ(a)(H)).
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 11 / 27
Bounds of the Objective Function
Let C1(H) µ(K)(H) − µ(K−1)(H), and C2(H) µ(K)(H) − µ(1)(H),
then
0 ≤ C1(H) ≤ µ(K)(H) − µ(a)(H) ≤ C2(H)
Therefore,
EH [C1(H)I{g(H) = lK(H)}]
≤ EH
K−1
a=1
{µ(K)(H) − µ(a)(H)}I{g(H) = la(H)}
≤ EH [C2(H)I{g(H) = lK(H)}]
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 12 / 27
Adaptive Contrasts for Transformation to Weighted
Classification
Contrasts C1 and C2: minimum and maximum expected losses in
the outcome given sub-optimal treatments, adaptive to each
patient’s own treatment effect ordering
In the best case where sub-optimal treatments only lead to
minimal expected losses in the outcome, gopt is equal to
arg min
g∈G
EH [C1(H)I{g(H) = lK(H)}] ,
In the worst case where sub-optimal treatments all lead to
maximal expected losses in the outcome, gopt is equal to
arg min
g∈G
EH [C2(H)I{g(H) = lK(H)}] .
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 13 / 27
Estimate µA(H) to Construct C1(H) and C2(H)
Given estimated propensity score ˆπa(H), the AIPW estimator
ˆµAIPW
a for E{Y ∗(a)} is
ˆµAIPW
a = Pn
I(A = a)
ˆπa(H)
Y + 1 −
I(A = a)
ˆπa(H)
ˆµa(H) .
Lemma (Double Robustness)
ˆµAIPW
a is a consistent estimator of E{Y ∗(a)} if either the propensity
model πa(H) or the conditional mean model µa(H) is correctly specified.
At a subject level,
ˆµAIPW
a (H) =
I(A = a)
ˆπa(H)
Y + 1 −
I(A = a)
ˆπa(H)
ˆµa(H).
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 14 / 27
Adaptive Contrast Weighted Learning (ACWL) with T > 1
For stage T, the assumptions and method derivation are the same
as in a single stage scenario.
For stage j, T − 1 ≥ j ≥ 1, we incorporate Q-learning to conduct
backward induction. The stage-specific pseudo outcome POj for
estimating treatment effect ordering and adaptive contrasts is a
predicted counterfactual outcome under optimal treatments at all
future stages, i.e.,
POj = E Y ∗
(A1, . . . , Aj, gopt
j+1, . . . , gopt
T ) .
Replace Y with POj to apply the same method as in T = 1.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 15 / 27
Simulation Results
Q-learning BOWL ACWL
Stage1Stage2
Q-learning BOWL ACWL
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 16 / 27
Tree-based Reinforcement Learning (T-RL): Motivation
Tao, Wang and Armirall D. (2018) Annals of Applied Statistics
Supervised learning (SL):
True label is known
Then tree-based methods are easy to understand and interpret
without distributional assumptions for data, e.g., Classification
and Regression Tree (CART)
In a DTR problem, label (the optimal treatment) at each stage is not
directly observed
ACWL uses semiparametric regression to estimate the label first
and then convert RL to SL to apply existing classification methods
such as CART
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 17 / 27
T-RL: Methodological Basis and Feature
Batch-mode RL using a sequence of unsupervised decision trees with
backward induction
Maintain the RL feature without the extra step of conversion to
SL, and thus reduce estimation uncertainty (improvement upon
ACWL)
Incorporate multiple treatment comparisons directly, to improve
efficiency (improvement upon ACWL)
Also allow multi-categorical and ordinal treatments
Similar to ACWL, we expect T-RL to be robust and efficient by
combining robust semiparametric regression estimators with
nonparametric machine learning methods.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 18 / 27
Example of an Unsupervised Decision Tree
The selected split at each node must improve the expected
counterfactual outcome.
Nodes Ωm, m = 1, 2, . . . , are regions defined by subset covariate space.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 19 / 27
Tree-based RL (T-RL): Purity Measures
Use the purity measure to improve the expected counterfactual
outcome and apply doubly robust AIPW estimators as in ACWL.
For a given partition ω and ωc of node Ω, let gj,ω,a1,a2 denote the
decision rule that assigns treatment a1 to subjects in ω and
treatment a2 to subjects in ωc at stage j.
We define the purity measure Pj(Ω, ω) as
max
a1,a2∈Aj
Pn


Kj
aj=1
ˆµAIPW
j,aj
(Hj)I{gj,ω,a1,a2 (Hj) = aj}I(Hj ∈ Ω)

 .
Pj(Ω, ω) evaluates the performance of the best decision rule which
assigns a single treatment for each of the two arms under partition.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 20 / 27
T-RL: Recursive Partitioning
The best split ωopt is chosen to maximize the improvement in the
purity, Pj(Ω, ω) − Pj(Ω).
Stopping Rules, given minimum node size n0 and minimum
purity improvement λ
1 If the node size is less than 2n0, the node will not be split.
2 If all possible splits of a node result in a child node with size smaller
than n0, the node will not be split.
3 If the current tree depth reaches the user-specified maximum depth,
the tree growing process will stop.
4 If the maximum purity improvement Pj(Ω, ˆωopt
) − Pj(Ω) is less
than λ, the node will not be split, where
ˆωopt
= arg max
ω
[Pj(Ω, ω) : min{nPnI(Hj ∈ ω), nPnI(Hj ∈ ωc
)} ≥ n0]
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 21 / 27
T-RL Algorithm
1 Start with stage j = T.
2 Obtain AIPW estimates ˆµAIPW
j,aj
(Hj), aj = 1, . . . , Kj,.
3 At root node Ωj,m, m = 1, set values for λ and n0.
4 At node Ωj,m, evaluate the four Stopping Rules. If any of them
is satisfied, assign a single best treatment
arg max
aj∈Aj
Pn[ˆµAIPW
j,aj
(Hj)I(Hj ∈ Ωj,m)]
to all subjects in Ωj,m. Otherwise, split Ωj,m into child nodes
Ωj,2m and Ωj,2m+1 by ˆωopt.
5 Set m = m + 1 and repeat Step 4 until all nodes are terminal.
6 If j > 1, set j = j − 1 and repeat steps 2 to 5. If j = 1, stop.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 22 / 27
Simulation 1: Tree-type DTR
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 23 / 27
Simulation 2: Non-tree-type DTR
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 24 / 27
Summary
ACWL transforms batch-mode RL to SL and incorporates existing
classification methods: flexible and easy to implement.
T-RL directly deals with batch-mode RL and uses unsupervised
trees to learn optimal DTRs with purity measures maximizing the
counterfactual mean outcome.
ACWL and T-RL both combines semiparametric regression with
machine learning and can handle multiple stages and multiple
treatments.
Overall, both methods work well across different scenarios. T-RL
has slightly better robustness. T-RL works better for tree-type
DTRs while ACWL works better for non-tree-type ones.
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 25 / 27
Other Ongoing Research Projects
Stochastic Tree-based Reinforcement Learning (ST-RL) for
estimating optimal DTRs
Explore different structures of optimal DTRs
with restricted arms from observational data
with restricted tailoring variables
Some special DTR frameworks: e.g. Nested Test-and-Treat DTRs
Multivariate utility function: multiple competing clinical priorities
Continuous treatment options, e.g., radiation doses
Mobile health: continuous decisions and data updating
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 26 / 27
Acknowledgment
My Ph.D. students:
Yebin Tao, Yilun Sun, Nina Zhou, Ming Tang, Kelly Speth,
Jincheng Shen, Mochuan Liu, Yingchao Zhong.
My collaborators:
Daniel Almirall, Jeremy Taylor, Peter Thall, Stewart Wang.
Thank You!
luwang@umich.edu
Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 27 / 27

Más contenido relacionado

La actualidad más candente

1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1Krish_ver2
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
 
Thresholding.ppt
Thresholding.pptThresholding.ppt
Thresholding.pptshankar64
 
Principle of photogrammetry
Principle of photogrammetryPrinciple of photogrammetry
Principle of photogrammetrySumant Diwakar
 
Image segmentation
Image segmentationImage segmentation
Image segmentationDeepak Kumar
 
Orthorectification and triangulation
Orthorectification and triangulationOrthorectification and triangulation
Orthorectification and triangulationMesfin Yeshitla
 
Land Information System Of Nepal (LIS)
Land Information System Of Nepal (LIS)Land Information System Of Nepal (LIS)
Land Information System Of Nepal (LIS)Bishwa oli
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 
Image classification, remote sensing, P K MANI
Image classification, remote sensing, P K MANIImage classification, remote sensing, P K MANI
Image classification, remote sensing, P K MANIP.K. Mani
 

La actualidad más candente (11)

1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Thresholding.ppt
Thresholding.pptThresholding.ppt
Thresholding.ppt
 
Principle of photogrammetry
Principle of photogrammetryPrinciple of photogrammetry
Principle of photogrammetry
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Orthorectification and triangulation
Orthorectification and triangulationOrthorectification and triangulation
Orthorectification and triangulation
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Land Information System Of Nepal (LIS)
Land Information System Of Nepal (LIS)Land Information System Of Nepal (LIS)
Land Information System Of Nepal (LIS)
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Object tracking
Object trackingObject tracking
Object tracking
 
Image classification, remote sensing, P K MANI
Image classification, remote sensing, P K MANIImage classification, remote sensing, P K MANI
Image classification, remote sensing, P K MANI
 

Similar a Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019

Subgroup identification for precision medicine. a comparative review of 13 me...
Subgroup identification for precision medicine. a comparative review of 13 me...Subgroup identification for precision medicine. a comparative review of 13 me...
Subgroup identification for precision medicine. a comparative review of 13 me...SuciAidaDahhar
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptxMinilikDerseh1
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI posterFunan Shi
 
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkjBereketRegassa1
 
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkjBereketRegassa1
 
Strategies for setting futility analyses at multiple time points in clinical ...
Strategies for setting futility analyses at multiple time points in clinical ...Strategies for setting futility analyses at multiple time points in clinical ...
Strategies for setting futility analyses at multiple time points in clinical ...Lu Mao
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makersLuizOlimpio4
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Jihun Yun
 
Defensive Efficacy Interim Design
Defensive Efficacy Interim DesignDefensive Efficacy Interim Design
Defensive Efficacy Interim DesignZhongwen Tang
 
Chi square(hospital admin)
Chi square(hospital admin) Chi square(hospital admin)
Chi square(hospital admin) Mmedsc Hahm
 
Managerial Economics - Demand Estimation (regression analysis)
Managerial Economics - Demand Estimation (regression analysis)Managerial Economics - Demand Estimation (regression analysis)
Managerial Economics - Demand Estimation (regression analysis)JooneEltanal
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsRonald Teo
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataCTSI at UCSF
 

Similar a Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019 (20)

Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Subgroup identification for precision medicine. a comparative review of 13 me...
Subgroup identification for precision medicine. a comparative review of 13 me...Subgroup identification for precision medicine. a comparative review of 13 me...
Subgroup identification for precision medicine. a comparative review of 13 me...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Lg ph d_slides_vfinal
Lg ph d_slides_vfinalLg ph d_slides_vfinal
Lg ph d_slides_vfinal
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
Gap correction
Gap correctionGap correction
Gap correction
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI poster
 
Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2
 
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
 
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
 
Strategies for setting futility analyses at multiple time points in clinical ...
Strategies for setting futility analyses at multiple time points in clinical ...Strategies for setting futility analyses at multiple time points in clinical ...
Strategies for setting futility analyses at multiple time points in clinical ...
 
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makers
 
Cost indexes
Cost indexesCost indexes
Cost indexes
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
 
Defensive Efficacy Interim Design
Defensive Efficacy Interim DesignDefensive Efficacy Interim Design
Defensive Efficacy Interim Design
 
Chi square(hospital admin)
Chi square(hospital admin) Chi square(hospital admin)
Chi square(hospital admin)
 
Managerial Economics - Demand Estimation (regression analysis)
Managerial Economics - Demand Estimation (regression analysis)Managerial Economics - Demand Estimation (regression analysis)
Managerial Economics - Demand Estimation (regression analysis)
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational Data
 

Más de The Statistical and Applied Mathematical Sciences Institute

Más de The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
 
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Último (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Causal Inference Opening Workshop - New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime - Lu Wang, December 10, 2019

  • 1. New Statistical Learning Methods for Estimating the Optimal Dynamic Treatment Regime Lu Wang Associate Professor University of Michigan, Ann Arbor luwang@umich.edu December 10, 2019 Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 1 / 27
  • 2. Overview 1 Introduction: Dynamic Treatment Regime (DTR) 2 Adaptive Contrast Weighted Learning (ACWL) 3 Tree-based Reinforcement Learning (T-RL) 4 Summary and Other Ongoing Research Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 2 / 27
  • 3. Motivating Example: Cancer Management Esophageal Cancer Patients, MD Anderson, 1998 to 2012 Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 3 / 27
  • 4. Motivation of Dynamic Treatment Regimes (DTRs) Chronic Disease management: routinely adjust, change, add, or discontinue treatment based on progress, side effects, patient burden, compliance, etc. → Dynamic Treatment Regimes (DTRs) Goals: To account for patient heterogeneity and to personalize health care strategy: One Size Fits All Degree of Tailoring −−−−−−−−−−−−−−→ Inherent Characteristics Targeted/Individualized Once and for All Degree of Tailoring −−−−−−−−−−−−−−−−−−→ Time-Varying Characteristics Dynamic Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 4 / 27
  • 5. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 5 / 27
  • 6. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 6 / 27
  • 7. Evidence-based Personalized Health Care To provide meaningful improved health outcomes for patients by delivering the right drug with the right dose at the right time. One size fits all Individualized TherapyDegree of Tailoring Low predictability Suboptimal health outcomes Asses patient response heterogeneity; Stratify patient population; Optimize benefits/risk, etc. Better predictability Improved health outcomes Family History Medical Genetics Lifestyle Environment Engine Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 7 / 27
  • 8. Key Notation Stage j, j = 1, . . . , T with treatment options Kj (≥ 2) Aj: treatment at stage j with value aj ∈ Aj = {1, . . . , Kj} Xj: covariate history just prior to assign Aj Y : overall outcome of interest DTR g = (g1, . . . , gT ): a set of decision rules with gj : Domain of Hj = (A1, . . . , Aj−1, Xj ) → Aj. Y ∗(a): counterfactual outcome under treatment a Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 8 / 27
  • 9. Assumptions for Identifiability using Observational Data Use causal framework to assess DTRs under three assumptions (Murphy et al. 2001; Robins and Hernan, 2009) 1 Consistency (no interference) If a patient’s observed treatment history is compatible with a given DTR, his or her observed clinical outcomes are the same as the counterfactual ones under the DTR. 2 No Unmeasured Confounding Treatment decision at a given time is independent of future observations and counterfactual outcomes, conditional on all previous covariate and treatment history. 3 Positivity With probability one, every subject follows a specific DTR with a nonnegative probability, which is bounded away from 0. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 9 / 27
  • 10. Adaptive Contrast Weighted Learning (ACWL) Tao and Wang, 2017 Biometrics For T = 1: gopt = arg max g∈G EH K a=1 E{Y ∗ (a)|H}I{g(H) = a} . Under the Causal Assumptions 1-3, we have gopt = arg max g∈G EH K a=1 µa(H)I{g(H) = a} , where µa(H) E(Y |A = a, H). Incorporate order statistics µ(1)(H) ≤ · · · ≤ µ(K)(H), then gopt = arg max g∈G EH K a=1 µ(a)(H)I{g(H) = la(H)} , where µ(a)(H) = µla (H) Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 10 / 27
  • 11. Property of gopt gopt = arg min g∈G EH K−1 a=1 {µ(K)(H) − µ(a)(H)}I{g(H) = la(H)} . Minimizes the expected loss in the outcome due to sub-optimal treatments in the entire population of interest. Classify as many patients as possible to their optimal treatment lK (i.e., letting I{g(H) = la(H)} = 0, a = 1, . . . , K − 1) while putting more emphasis on patients with larger contrasts (i.e., larger µ(K)(H) − µ(a)(H)). Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 11 / 27
  • 12. Bounds of the Objective Function Let C1(H) µ(K)(H) − µ(K−1)(H), and C2(H) µ(K)(H) − µ(1)(H), then 0 ≤ C1(H) ≤ µ(K)(H) − µ(a)(H) ≤ C2(H) Therefore, EH [C1(H)I{g(H) = lK(H)}] ≤ EH K−1 a=1 {µ(K)(H) − µ(a)(H)}I{g(H) = la(H)} ≤ EH [C2(H)I{g(H) = lK(H)}] Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 12 / 27
  • 13. Adaptive Contrasts for Transformation to Weighted Classification Contrasts C1 and C2: minimum and maximum expected losses in the outcome given sub-optimal treatments, adaptive to each patient’s own treatment effect ordering In the best case where sub-optimal treatments only lead to minimal expected losses in the outcome, gopt is equal to arg min g∈G EH [C1(H)I{g(H) = lK(H)}] , In the worst case where sub-optimal treatments all lead to maximal expected losses in the outcome, gopt is equal to arg min g∈G EH [C2(H)I{g(H) = lK(H)}] . Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 13 / 27
  • 14. Estimate µA(H) to Construct C1(H) and C2(H) Given estimated propensity score ˆπa(H), the AIPW estimator ˆµAIPW a for E{Y ∗(a)} is ˆµAIPW a = Pn I(A = a) ˆπa(H) Y + 1 − I(A = a) ˆπa(H) ˆµa(H) . Lemma (Double Robustness) ˆµAIPW a is a consistent estimator of E{Y ∗(a)} if either the propensity model πa(H) or the conditional mean model µa(H) is correctly specified. At a subject level, ˆµAIPW a (H) = I(A = a) ˆπa(H) Y + 1 − I(A = a) ˆπa(H) ˆµa(H). Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 14 / 27
  • 15. Adaptive Contrast Weighted Learning (ACWL) with T > 1 For stage T, the assumptions and method derivation are the same as in a single stage scenario. For stage j, T − 1 ≥ j ≥ 1, we incorporate Q-learning to conduct backward induction. The stage-specific pseudo outcome POj for estimating treatment effect ordering and adaptive contrasts is a predicted counterfactual outcome under optimal treatments at all future stages, i.e., POj = E Y ∗ (A1, . . . , Aj, gopt j+1, . . . , gopt T ) . Replace Y with POj to apply the same method as in T = 1. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 15 / 27
  • 16. Simulation Results Q-learning BOWL ACWL Stage1Stage2 Q-learning BOWL ACWL Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 16 / 27
  • 17. Tree-based Reinforcement Learning (T-RL): Motivation Tao, Wang and Armirall D. (2018) Annals of Applied Statistics Supervised learning (SL): True label is known Then tree-based methods are easy to understand and interpret without distributional assumptions for data, e.g., Classification and Regression Tree (CART) In a DTR problem, label (the optimal treatment) at each stage is not directly observed ACWL uses semiparametric regression to estimate the label first and then convert RL to SL to apply existing classification methods such as CART Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 17 / 27
  • 18. T-RL: Methodological Basis and Feature Batch-mode RL using a sequence of unsupervised decision trees with backward induction Maintain the RL feature without the extra step of conversion to SL, and thus reduce estimation uncertainty (improvement upon ACWL) Incorporate multiple treatment comparisons directly, to improve efficiency (improvement upon ACWL) Also allow multi-categorical and ordinal treatments Similar to ACWL, we expect T-RL to be robust and efficient by combining robust semiparametric regression estimators with nonparametric machine learning methods. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 18 / 27
  • 19. Example of an Unsupervised Decision Tree The selected split at each node must improve the expected counterfactual outcome. Nodes Ωm, m = 1, 2, . . . , are regions defined by subset covariate space. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 19 / 27
  • 20. Tree-based RL (T-RL): Purity Measures Use the purity measure to improve the expected counterfactual outcome and apply doubly robust AIPW estimators as in ACWL. For a given partition ω and ωc of node Ω, let gj,ω,a1,a2 denote the decision rule that assigns treatment a1 to subjects in ω and treatment a2 to subjects in ωc at stage j. We define the purity measure Pj(Ω, ω) as max a1,a2∈Aj Pn   Kj aj=1 ˆµAIPW j,aj (Hj)I{gj,ω,a1,a2 (Hj) = aj}I(Hj ∈ Ω)   . Pj(Ω, ω) evaluates the performance of the best decision rule which assigns a single treatment for each of the two arms under partition. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 20 / 27
  • 21. T-RL: Recursive Partitioning The best split ωopt is chosen to maximize the improvement in the purity, Pj(Ω, ω) − Pj(Ω). Stopping Rules, given minimum node size n0 and minimum purity improvement λ 1 If the node size is less than 2n0, the node will not be split. 2 If all possible splits of a node result in a child node with size smaller than n0, the node will not be split. 3 If the current tree depth reaches the user-specified maximum depth, the tree growing process will stop. 4 If the maximum purity improvement Pj(Ω, ˆωopt ) − Pj(Ω) is less than λ, the node will not be split, where ˆωopt = arg max ω [Pj(Ω, ω) : min{nPnI(Hj ∈ ω), nPnI(Hj ∈ ωc )} ≥ n0] Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 21 / 27
  • 22. T-RL Algorithm 1 Start with stage j = T. 2 Obtain AIPW estimates ˆµAIPW j,aj (Hj), aj = 1, . . . , Kj,. 3 At root node Ωj,m, m = 1, set values for λ and n0. 4 At node Ωj,m, evaluate the four Stopping Rules. If any of them is satisfied, assign a single best treatment arg max aj∈Aj Pn[ˆµAIPW j,aj (Hj)I(Hj ∈ Ωj,m)] to all subjects in Ωj,m. Otherwise, split Ωj,m into child nodes Ωj,2m and Ωj,2m+1 by ˆωopt. 5 Set m = m + 1 and repeat Step 4 until all nodes are terminal. 6 If j > 1, set j = j − 1 and repeat steps 2 to 5. If j = 1, stop. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 22 / 27
  • 23. Simulation 1: Tree-type DTR Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 23 / 27
  • 24. Simulation 2: Non-tree-type DTR Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 24 / 27
  • 25. Summary ACWL transforms batch-mode RL to SL and incorporates existing classification methods: flexible and easy to implement. T-RL directly deals with batch-mode RL and uses unsupervised trees to learn optimal DTRs with purity measures maximizing the counterfactual mean outcome. ACWL and T-RL both combines semiparametric regression with machine learning and can handle multiple stages and multiple treatments. Overall, both methods work well across different scenarios. T-RL has slightly better robustness. T-RL works better for tree-type DTRs while ACWL works better for non-tree-type ones. Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 25 / 27
  • 26. Other Ongoing Research Projects Stochastic Tree-based Reinforcement Learning (ST-RL) for estimating optimal DTRs Explore different structures of optimal DTRs with restricted arms from observational data with restricted tailoring variables Some special DTR frameworks: e.g. Nested Test-and-Treat DTRs Multivariate utility function: multiple competing clinical priorities Continuous treatment options, e.g., radiation doses Mobile health: continuous decisions and data updating Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 26 / 27
  • 27. Acknowledgment My Ph.D. students: Yebin Tao, Yilun Sun, Nina Zhou, Ming Tang, Kelly Speth, Jincheng Shen, Mochuan Liu, Yingchao Zhong. My collaborators: Daniel Almirall, Jeremy Taylor, Peter Thall, Stewart Wang. Thank You! luwang@umich.edu Lu Wang ACWL and T-RL for Optimal DTR December 10, 2019 27 / 27