Devoxx Real-time Learning

Real-time Learning

©MapR Technologies - Confidential 1

whoami – Ted Dunning

 Chief Application Architect, MapR Technologies
 Committer, member, Apache Software Foundation
– particularly Mahout, Zookeeper and Drill

(we’re hiring)

 Contact me at
tdunning@maprtech.com
tdunning@apache.com
ted.dunning@gmail.com
@ted_dunning


 Slides and such (available late tonight):
– http://www.mapr.com/company/events/devoxx-3-29-2013
 Hash tags: #mapr #devoxxfr


Agenda

 What is real-time learning?
 A sample problem
 Philosophy, statistics and the nature of the knowledge
 A solution
 System design


What is Real-time Learning?

 Training data arrives one record at a time

 The system improves a mathematical model based on a small
amount of training data

 We retain at most a fixed amount of state

 Each learning step takes O(1) time and memory


We have a product
to sell …
from a web-site


What tag-
What line?
picture?
Bogus Dog Food is the Best!
Now available in handy 1 ton
bags!

Buy 5!

What call to
action?


The Challenge

 Design decisions affect probability of success
– Cheesy web-sites don’t even sell cheese

 The best designers do better when allowed to fail
– Exploration juices creativity

 But failing is expensive
– If only because we could have succeeded
– But also because offending or disappointing customers is bad


A Quick Diversion

 You see a coin
– What is the probability of heads?
– Could it be larger or smaller than that?
 I flip the coin and while it is in the air ask again
 I catch the coin and ask again
 I look at the coin (and you don’t) and ask again
 Why does the answer change?
– And did it ever have a single value?


A Philosophical Conclusion

 Probability as expressed by humans is subjective and depends on
information and experience


So now you understand
Bayesian probability


Another Quick Diversion

 Let’s play a shell game
 This is a special shell game
 It costs you nothing to play
 The pea has constant probability of being under each shell
(trust me)
 How do you find the best shell?
 How do you find it while maximizing the number of wins?


Pause for short
con-game


Conclusions

 Can you identify winners or losers without trying them out?
No

 Can you ever completely eliminate a shell with a bad streak?
No

 Should you keep trying apparent losers?
Yes, but at a decreasing rate


So now you understand
multi-armed bandits


Is there an optimum
strategy?


Thompson Sampling

 Select each shell according to the probability that it is the best

 Probability that it is the best can be computed using posterior

é ù
P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq
ë j û
 But I promised a simple answer


Thompson Sampling – Take 2

 Sample θ

q ~ P(q | D)
 Pick i to maximize reward

i = argmax E[r | q ]
j

 Record result from using i


Nearly Forgotten until Recently

 Citations for Thompson sampling


Bayesian Bandit for the Shells

 Compute distributions based on data so far
 Sample p1, p2 and p3 from these distributions
 Pick shell i where i = argmaxi pi

 Lemma 1: The probability of picking shell i will match the
probability it is the best shell

 Lemma 2: This is as good as it gets


And it works!

0.12

0.11

0.1

0.09

0.08

0.07
regret

0.06
ε- greedy, ε = 0.05
0.05

0.04 Bayesian Bandit with Gam m a- Norm al
0.03

0.02

0.01

0
0 100 200 300 400 500 600 700 800 900 1000 1100

n


Video Demo


The Basic Idea

 We can encode a distribution by sampling
 Sampling allows unification of exploration and exploitation

 Can be extended to more general response models


The Original Problem

x2
x1

bags!

Buy 5!

x3


Mathematical Statement

 Logistic or probit regression

P(conversion) = w (å x q )
i ij

1
w(x) =
1+ e- x
erf(x) +1
w(x) =
2


Same Algorithm

 Sample θ

q ~ P(q | D)
 Pick design x to maximize reward

x* = argmax E[rx | q ] = argmax å xiqij
x x


Context Variables

x2
x1

bags!

Buy 5!

x3

y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend


Two Kinds of Variables

 The web-site design - x1, x2, x3
– We can change these
– Different values give different web-site designs

 The environment or context – y1, y2, y3, y4
– We can’t change these
– They can change themselves

 Our model should include interactions between x and y


Same Algorithm, More Greek Letters

 Sample θ, π, φ

(q, P, F)~ P(q, P, F | D)
 Pick design x to maximize reward, y’s are constant

x* = argmax E[rx | q ]
x

= argmax å xiqi + å xi y j p ij + å yij i
x i i, j i

 This looks very fancy, but is actually pretty simple


Surprises

 We cannot record a non-conversion until we wait

 We cannot record a conversion until we wait for the same time

 Learning from conversions requires delay

 We don’t have to wait very long


Required Steps

 Learn distribution of parameters from data
– Logistic regression or probit regression (can be on-line!)
– Need Bayesian learning algorithm

 Sample from posterior distribution
– Generally included in Bayesian learning algorithm

 Pick design
– Simple sequential search

 Record data


Required system
design


Hadoop is Not Very Real-time

Unprocessed now
Data

t

Fully Latest full Hadoop job
processed period takes this
long for this
data


Real-time and Long-time together

Blended now
View
view

t

Hadoop works Storm
great back here works
here


Traditional Hadoop Design

 Can use Kafka cluster to queue log lines
 Can use Storm cluster to do real time learning
 Can host web site on NAS
 Can use Flume cluster to import data from Kafka to Hadoop
 Can record long-term history on Hadoop Cluster

 How many clusters?


HDFS
Data

Flume
Hadoop

Users
Kafka
Kafka
Kafka
Cluster
Cluster Kafka
Cluster API
Storm
Kafka
Web Site

Design
Targeting

Web Service NAS

That is a lot of
moving parts!


Alternative Design

 Can host log catcher on MapR via NFS
 Storm can read data directly from queue
 Can host web server directly on cluster

 Only one cluster needed
– Total instances drops by 3x
– Admin burden massively decreased


Users

http

Web-server
Catcher Storm

Topic Web
Queue Data
MapR


You can do this
yourself!


Contact Me!

 We’re hiring at MapR in US and Europe

 MapR software available for research use

 Contact me at tdunning@maprtech.com or @ted_dunning

 Share news with @apachemahout

 Tweet #devoxxfr #mapr #mahout @ted_dunning


Devoxx Real-time Learning

Recomendados

Recomendados

Más contenido relacionado

Similar a Devoxx Real-time Learning

Similar a Devoxx Real-time Learning (15)

Más de Ted Dunning

Más de Ted Dunning (20)

Último

Último (20)

Devoxx Real-time Learning