Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
1. O P E N
D A T A
S C I E N C E
C O N F E R E N C E_ BOSTON
2015
@opendatasci
Victor S.Y. Lo
May, 2015
Machine Learning Based
Personalization Using
Uplift Analytics:
Examples and Applications
Uplift Modeling Workshop
2. Outline
Why do we need Uplift modeling? 10 min
Various methods for Uplift modeling 30 min
Break 5 min
Direct response vs. Uplift modeling 10 min
Prescriptive Analytics for Multiple Treatments 20 min
Q&A 10 min
2
4. 4
0%
2%
4%
6%
8%
10%
1 2 3 4 5 6 7 8 9 10
Decile
Response rate
Average2.5%
Top decile lift (over random) = 4 times
Top 3 deciles lift = 2.6 times
Big Lift
Modelers: VERY SUCCESSFUL MODEL!
Response Modeling
5. 5
Top 3 Deciles Random
Treatment 6.7% 2.5%
Control 6.7% 2.5%
Lift 0.0% 0.0%
Campaign Results
No Lift
Marketers: VERY DISAPPOINTING!
Modelers:
Not my problem, it is the mail design!
7. A successful response model
1 2 3 4 5 6 7 8 9 10
A successful marketing campaign
What’s wrong with this picture?
3.3%
2.7%
3.0%
2.3%
1.7%
2.0%
Test 1 Test 2 Total
Treatment Response Rate
Control Response Rate
7
14%
7%
4%
2%
1% 1% 0% 0% 0% 0%
1 2 3 4 5 6 7 8 9 10
Decile
Incidence of Treatment
Responders
0%
50%
100%
0% 50% 100%
PctofTreatmentResponders
Pct of Treatment Group
CUME Pct of Responders Random
DM LIFT? DM LIFT?
8. Motivation
Based on the following campaign result, which of the customer
groups is the best for future targeting ?
Treatment Control Difference
<35 0.5% 0.2% 0.3%
35-60 2.5% 0.5% 2.0%
>60 3.5% 2.5% 1.0%
Age
Response Rate By Age and Treatment/Control
• >60 has the highest response rate – treatment-only
focus (common practice)
• 35-60 has the highest Lift (most positively influenced by
the treatment) 8
9. Framework for Causal and Association
Analysis
9
Causal
Inference
(Lift Analysis,
Average Treatment
Effect)
Uplift
Modeling
(Heterogeneous
Treatment Effect,
Effect Modification)
Reporting /
Summary
Statistics
Response
Modeling /
Propensity
Modeling
Population /
Sub-population
Personalized
FromAssociationtoCausality
Granularity
10. 1 2 3 4 5 6 7 8 9 10
Decile
Treatment "Responders"
Control "Responders"
1 2 3 4 5 6 7 8 9 10
Decile
Treatment "Responders"
Control "Responders"
The Uplift Model Objective
Maximize the Treatment responders while minimizing
the control “responders”
10
True
lift
True
lift
A standard response model A uplift response model
(Ideal)
Hypothetical data
11. Traditional Approach Uplift Modeling
Uplift Approaches
Previous campaign data
Control Treatment
Training
data set
Holdout
data set
Model
Previous campaign data
Control Treatment
Training
data set
Holdout
data set
Model Source: Lo (2002)
11
12. Uplift model solutions
0. Baseline results: Standard response model –
treatment-only (as a benchmark)
1. Two Model Approach: Take difference of two
models, Treatment Minus Control
2. Treatment Dummy Approach: Single combined
model using treatment interactions
3. Four Quadrant Method
12
13. Method 1: Two Model Approach:
Treatment - Control
Model 1 predicts P(R | Treatment)
Model Sample = Treatment Group
Model 2 predicts P(R | no Treatment)
Model Sample = Control Group
Final prediction of lift =
Treatment Response Score – Control Response Score
Pros: simple concept, familiar execution (x2)
Cons: indirectly models uplift, the difference may be only noise, 2x
the work, scales may not be comparable, 2x the error, variable
reduction done on indirect dependent vars
13
14. Method 2: Treatment Dummy Approach, Lo (2002)
1. Estimate both E(Yi|Xi;treatment) and E(Yi|Xi;control) and use a
dummy T to differentiate between treatment and control:
Linear logistic regression:
2. Predict the lift value (treatment minus control) for each individual:
)iTiXδ'iγTiXβ'exp(α1
)iTiXδ'iγTiXβ'exp(α
)iX|iE(YiP
)
i
Xβ'exp(α1
)
i
Xβ'exp(α
)
i
Xδ'
i
Xβ'γexp(α1
)
i
Xδ'
i
Xβ'γexp(α
control|iPtreatment|iP
i
Lift
Pros: simple concept, tests for presence of interaction effects
Cons: multicollinearity issues 14
15. Method 3: Four Quadrant Method
Model predicts probability of being in one
of four categories
Dependent variable outcome (nominal)
= TR, CR, TN, or CN
Model Population = Treatment &
Control groups together
Prediction of lift:
15
Pros: only one model required; more “success cases” to model after
Cons: not that intuitive…
Response
Yes No
Treatment
Yes TR TN
No CR CN
𝒁 𝒙 =
𝟏
𝟐
[
𝑷 𝑻𝑹 𝒙
𝑷 𝑻
+
𝑷 𝑪𝑵 𝒙
𝑷 𝑪
−
𝑷 𝑻𝑵 𝒙
𝑷 𝑻
−
𝑷 𝑪𝑹 𝒙
𝑷 𝑪
]
Lai (2006) generalized by Kane, Lo, Zheng (2014)
16. 16
Gini and Top 15% Gini in Holdout Sample
Source: Kane, Lo, and Zheng (2014)
17. Simulated Example:
Charity Donation
17
80-20% split between treatment and control
Randomly split into training (300K) and holdout (200K)
Predictors available:
Age of donor
Frequency – # times a donation was made in the past
Spent – average $ donation in the past
Recency – year of the last donation
Income
Wealth
18. 18
Holdout Sample Performance
Lift Chart on Simulated Data
Theoretical model: Two logistics for treatment and control
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Baseline Two model Lo (2002) Four Quadrant (KLZ) Random
19. 19
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Baseline Random
Two Model Approach Treatment Dummy Approach
Four Quadrant (KLZ)
Gains Chart on Simulated Data
Gini Gini 15% Gini repeatability (R^2)
Baseline 5.6420 0.5412 0.7311
Method 1: Two Model approach 6.0384 0.7779 0.7830
Method 2: Lo(2002), Treatment Dummy 6.0353 0.7766 0.7836
Method 3: Four Quadrant Method (or KLZ) 5.9063 0.7484 0.7884
20. Online Merchandise Data
20
From blog.minethatdata.com, with women’s merchandise
online visit as response
50-50% split between treatment and control (43K in total)
Randomly split into training (70%) and holdout (30%)
Predictors available:
• Recency
• Dollar spent last year
• Merchandise purchased last year (men’s, women’s, both)
• Urban, suburban, or rural
• Channel – web, phone, or both for purchase last year
21. -0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Baseline Lo(2002) trt dummy
Two model approach Four Quadrant (KLZ)
Random
Holdout Sample Performance
21
Lift Chart on Email Online Merchandise Data
22. 22
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Baseline Random
Two Model Approach Treatment Dummy Approach
Four Quadrant (KLZ)
Gains Chart on Email Online Merchandise Data
Gini Gini15% Ginirepeatability(R^2)
Baseline 1.8556 -0.0240 0.2071
Method1: Two Modelapproach 2.0074 0.0786 0.2941
Method2: Lo(2002),TreatmentDummy 2.4392 0.0431 0.2945
Method3: FourQuadrantMethod(orKLZ) 2.3703 0.2288 0.3290
23. Ideal Conditions for Uplift Modeling
A randomized control group is withheld!
Treatment does not cause all “responses,” i.e.
control response rate > 0
Natural Response is not highly correlated to Lift
Lift Signal-to-Noise ratio (Lift/control rate) is
large enough
23
30. Case II:
Optimization of Multiple
Treatments -
From Predictive Analytics to
Prescriptive Analytics
30
31. 31
A (or B)Improving
Targeting
No Targeting,
Single Treatment: A (or B)
Individual Level
Targeting - Model-based
No Targeting,
Single Best Treatment
for all individuals
Improving Treatment
Best of A and B
A
B
2) Target Selection
4) Optimal Treatment
for Each Individual
3) One Size Fits All
1) Random Targeting
From Random Selection to Optimization
32. 32
Maximize
𝑖=1
𝑛
𝑗=1
𝑚
△ 𝑝𝑖𝑗 𝑥𝑖𝑗
Subject to:
𝑖=1
𝑛
𝑗=1
𝑚
𝑐𝑖𝑗 𝑥𝑖𝑗 ≤ 𝐵, Budget Constraint
𝑗=1
𝑚
𝑥𝑖𝑗 ≤ 1, for 𝑖 = 1, … , 𝑛,
𝑥𝑖𝑗 = 0 or 1, 𝑖 = 1, … , 𝑛; 𝑗 = 1, … , 𝑚.
where △ 𝑝𝑖𝑗 = estimated lift value for individual i and treatment j,
𝑥𝑖𝑗 (decision variable) = 1 if treatment j is assigned to individual i and 0
otherwise; and 𝑐𝑖𝑗= cost of promoting treatment j to i.
Integer Program Formulation
E.g., size of target population = 30 million, # treatment combinations = 10,
then # decision variables = 300 millions, and
total # possible combinations without constraints = 2300,000,000!
33. 33
A Heuristic Algorithm
1. Perform cluster analysis of the m model-based
lift scores in the holdout sample
2. Compute cluster-level lift score for each
treatment, using sample mean differences
3. Apply cluster solution to new data (for a future
marketing program)
4. Solve a linear programming model to optimize
treatment assignment at the cluster-level
Source: Lo and Pachamanova (2015)
34. 34
Maximize
𝑐=1
𝐶
𝑗=1
𝑚
△ 𝑝 𝑐𝑗 𝑥 𝑐𝑗
Subject to:
𝑐=1
𝐶
𝑗=1
𝑚
𝑐𝑗 𝑥 𝑐𝑗 ≤ 𝐵𝑢𝑑𝑔𝑒𝑡, Budget Constraint
𝑗=1
𝑚
𝑥 𝑐𝑗 ≤ 𝑁𝑐, for 𝑐 = 1, … , 𝐶, Cluster Size Constraint,
and
𝑥 𝑐𝑗 ≥ 0, 𝑐 = 1, … , 𝐶; 𝑗 = 1, … , 𝑚,
where 𝑥 𝑐𝑗 = # individuals in cluster c to receive treatment j,
𝑐𝑗 = cost of treatment j for each individual.
Becomes A Much Simpler Optimization Problem
Can be solved by Excel Solver
37. 37
CLU
STER
Cluster
Size in
New
Data
Obs. Lift
in
response:
Men's
Obs. Lift
in
response:
Women's
Cost
per
treatme
nt ($)
Decision
var on
number
of men's
Decision
var on
number
of
women's
Total
number
of
treated
by
cluster
Overa 0.07408 0.0438631 4,180 0.1587 0.0224 1 4,180 - 4,180
2 5,650 0.0652 -0.0055 1 - - -
4 60,220 0.0658 0.0628 1 2,340 - 2,340
5 12,370 0.1290 0.0618 1 12,370 - 12,370
6 8,940 0.0672 0.0760 1 - 8,940 8,940
7 29,240 0.0519 0.0213 1 - - -
8 28,070 0.0868 0.0254 1 28,070 - 28,070
9 4,100 0.2249 0.0239 1 4,100 - 4,100
10 37,080 0.0572 0.0426 1 - - -
Total 189,850 obj value 5,773 680 6,453
cost $51,060 $ 8,940 $60,000
Budget 60,000$
Linear Programming Solution from Excel Solver
38. 38
Stochastic Optimization
Lift estimates can have high degree of uncertainty, stochastic
optimization solutions take the uncertainty into account:
Stochastic Programming
Robust Optimization
Mean Variance Optimization
40. Conclusion
40
• Uplift is a very impactful emerging subfield
• Deserves more R&D
• Extensions are plenty (Lo (2008)):
• Multiple treatments
• Optimization
• Non-randomized experiments
• Direct tracking
• Applications in other fields
• E.g. Potter (2013), Yong (2015)
41. References
Cai, T., Tian, L., Wong, P., and Wei, L.J. (2011), “Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment,” Biostatistics, 12:2, p.270-282,
Collins, F.S. (2010), The Language of Life: DNA and the Revolution in Personalized Medicine, HarperCollins.
Conrady, S. and Jouffe, L. (2011). “Causal Inference and Direct Effects,” Bayesia and Conrady Applied Science, at http://www.conradyscience.com/index.php/causality
Freedman, D. (2010). Statistical Methods and Causal Inference. Cambridge.
Hamburg, M.A. and Collins, F.S. (2010). “The path to personalized medicine.” The New England Journal of Medicine, 363;4, p.301-304.
Haughton, D. and Haughton, J. (2011). Living Standards Analytics, Springer.
Holland, C. (2005). Breakthrough Business Results with MVT, Wiley.
Kane, K., Lo, V.S.Y., and Zheng, J. (2014) “Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing
Methods.” Journal of Marketing Analytics, v.2, Issue 4, p.218-238.
Lai, Lilly Y.-T. (2006) Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers. Master of Science thesis, Simon Fraser
University School of Computing Science, Burnaby, BC, Canada.
Lo, V.S.Y. (2002) “The True Lift Model – A Novel Data Mining Approach to Response Modeling in Database Marketing.” SIGKDD Explorations 4, Issue 2, p.78-86, at:
http://www.acm.org/sigs/sigkdd/explorations/issues/4-2-2002-12/lo.pdf
Lo, V.S.Y. (2008), “New Opportunities in Marketing Data Mining," in Encyclopedia of Data Warehousing and Mining, Wang (2008) ed., 2nd edition, Idea Group Publishing.
Lo, V.S.Y. and D. Pachamanova (2015), “A Practical Approach to Treatment Optimization While Accounting for Estimation Risk,” Technical Report.
McKinney, R.E. et al. (1998),”A randomized study of combined zidovudine-lamivudine versus didanosine monotherapy in children with sympotomatic therapy-naïve HIV-1
infection,” J. of Pediatrics,133, no.4, p.500-508.
Mehr, I.J. (2000), “Pharmacogenomics and Industry Change,” Applied Clinical Trials, 9, no.5, p.34,36.
Morgan, S.L. and Winship C. (2007). Counterfactuals and Causal Inference. Cambridge University Press.
Pearl, J. (2000), Causality. Cambridge University Press.
Potter, Daniel (2013) Pinpointing the Persuadables: Convincing the Right Voters to Support Barack Obama. Presented at Predictive Analytics World; Oct, Boston, MA;
http://www.predictiveanalyticsworld.com/patimes/pinpointing-the-persuadables-convincing-the-right-voters-to-support-barack-obama/ (available with free subscription).
Radcliffe, N.J. and Surry, P. (1999). “Differential response analysis: modeling true response by isolating the effect of a single action,” Proceedings of Credit Scoring and
Credit Control VI, Credit Research Centre, U. of Edinburgh Management School.
Radcliffe, N.J. (2007). “Using Control Groups to Target on Predicted Lift,” DMA Analytics Annual Journal, Spring, p.14-21.
Robins, J.M. and Hernan, M.A. (2009), “Estimation of the Causal Effects of Time-Varying Exposures,” In Fitzmaurice G., Davidian, M,, Verbeke, G., and Molenberghs, G.
eds. (2009) Longitudinal Data Analysis, Chapman & Hall/CRC, p.553 – 399.
Rosenbaum, P.R. (2002), Observational Studies. Springer.
Rosenbaum, P.R. (2010), Design of Observational Studies. Springer.
Rubin, D.B. (2006), Matched Sampling for Causal Effects. Cambridge University Press.
Rubin, D.B. (2008), “For Objective Causal Inference, Design Trumps Analysis,” The Annals of Applied Statistics, p.808-840.
Rubin, D.B. and Waterman, R.P. (2006), “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology,” Statistical Science, p.206-222.
Russek-Cohen, E. and Simon, R.M. (1997), “Evaluating treatments when a gender by treatment interaction may exist,” Statistics in Medicine, 16, issue 4, p.455-464.
Signorovitch, J. (2007), “Estimation and Evaluation of Regression for Patient-Specific Efficacy,” Harvard School of Public Health working paper.
Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search, 2nd edition, MIT Press.
Wikipedia (2010), “Uplift Modeling,” at http://en.wikipedia.org/wiki/Uplift_modelling
Yong, Florence H. (2015), “Quantitative Methods for Stratified Medicine,” PhD Dissertation, Department of Biostatistics, Harvard T.H. Chan School of Public Health,
41