This document discusses reinforcement learning and its applications to optimization problems in marketing. It begins with definitions of reinforcement learning and multi-armed bandit problems. It then discusses how Bayesian AB testing, multi-armed bandits, and Thompson sampling can be used to solve single decision problems. The document also covers how reinforcement learning handles more complex multi-touchpoint optimization and attribution problems using techniques like Q-learning. It concludes by discussing how reinforcement learning approaches can be used for automation and predictive targeting based on user attributes.
6. WHAT WE WILL TALK ABOUT
• Definition of Reinforcement Learning
–Trial and Error Learning
•AB Testing (Bayesian)
•Multi-Armed Bandit – (Automation)
•Bandit with Targeting
–Multi-Touch Point Optimization
•Attribution=Dynamics
•Q-Learning
11. MARKETING PROBLEMS
Online Applications – websites, mobile, things
communicating via HTTP
Low Risk Decisions* – i.e. ‘Which Banner’
High Volume* – not for one off, or for decisions
that are made infrequently
* High Volume/Low Risk from here http://jtonedm.com/
12. TRIAL AND ERROR LEARNING
AB
Testing/Bandit
Sequential
Decisions
Targeting
22. BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|Data)=99.99…%
Sample Size=10,000
23. AB TESTING ->LEARN FIRST
Conductrics Confidential 23
Time
Explore/
Learn
Exploit/
Earn
Data Collection/Sample Apply Leaning
24. How to Solve:
1. AB Testing
2. Multi-Arm Bandit
A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
SINGLE LOCATION DECISIONS/AB TEST
25. Like Bayesian AB Testing
• Calculate P(A|Data) & P(B|Data)
Unlike AB Testing
• Don’t make fair selections (50/50)
• Select based on P(A|Data) & P(B|Data)
BANDIT: THOMPSON SAMPLING
27. A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49
ADAPTIVE: THOMPSON SAMPLING
28. A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49B=0.51
ADAPTIVE: THOMPSON SAMPLING
29. A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49C=0.46 B=0.51
ADAPTIVE: THOMPSON SAMPLING
30. A
B C
Adaptive – For Each User
1)Pick Option with Highest Score (Option B)
A=0.49C=0.46
B=0.51
ADAPTIVE: THOMPSON SAMPLING
31. A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
ADAPTIVE: THOMPSON SAMPLING
32. A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52
ADAPTIVE: THOMPSON SAMPLING
33. A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52B=0.43
ADAPTIVE: THOMPSON SAMPLING
34. A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52C=0.49B=0.43
ADAPTIVE: THOMPSON SAMPLING
35. A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52C=0.49B=0.43
ADAPTIVE: THOMPSON SAMPLING
36. Selection Chance based on:
1. Relative estimated mean value of the option
2. Amount of overlap of the distributions
67%
8%
25%
0%
20%
40%
60%
80%
100%
Option A Option B Option C
Selection Chance
ADAPTIVE: THOMPSON SAMPLING
59. Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
Q-LEARNING
60. Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
2)Use the estimates at the next step as the
conversion value!
Q-LEARNING
81. Attribution in just two simple steps:
1)Treat Landing on Next Page like a regular
conversion!
2)Use Predictions of future values at the
next step as the conversion value!
Q-LEARNING
82. Q Learning + Targeting
User: Is a New User and from Rural area
Page 1
Page 2
A
83. User: Is a New User and from Rural area
Page 1
Page 2
A
Q Learning + Targeting
88. 88
Q-VALUE: NEW & RURAL USER
1. For New & Rural users Option B has highest value
2. Use predicted value of Option B for use in the Q-value calculation
Source: Conductrics Predictive Audience Discovery
91. 1) Bandits help solve Automation
2) Attribution can be solved by
hacking ‘AB Testing’ (Q-Learning)
3) Extended Attribution to include
decisions/experiments
4) Looked into the eye of AI and Lived
WHAT DID WE LEARN
92. WAKE UP. WE ARE DONE!
Twitter:mgershoff
Email:matt.gershoff@conductrics.com