This document discusses building machine learning infrastructure to move analytics from the lab to production. It describes shifting from ad-hoc, question-driven analytics in the lab to automated, metric-driven systems in production. The author discusses Oryx, a platform for building and serving machine learning models at scale. Oryx allows for batch model building with MapReduce and real-time scoring. The document also introduces Gertrude, a platform for running controlled experiments to test multiple parameters and explore the model space.
5. A Shift In Perspective
Analytics in the Factory
Analytics in the Lab
•
•
•
•
•
•
5
Question-driven
Interactive
Ad-hoc, post-hoc
Fixed data
Focus on speed and
flexibility
Output is embedded into a
report or in-database
scoring engine
•
•
•
•
•
•
Metric-driven
Automated
Systematic
Fluid data
Focus on transparency and
reliability
Output is a production
system that makes
customer-facing decisions
8. Oryx: Model Building and Serving
•
Algorithms
•
•
•
ALS Recommenders
K-Means Parallel
RDF
Batch model building
via MapReduce*
• Server for real-time
scoring and updates
• PMML 4.1 Models
•
8
15. Simple Conditional Logic
•
Declare experiment
flags in compiled code
•
•
15
Settings that can vary
per request
Create a config file that
contains simple rules
for calculating flag
values and rules for
experiment diversion
16. Separate Data Push from Code Push
•
Validate config files and
push updates to servers
•
•
•
16
Zookeeper via Curator
File-based
Servers pick up new
configs, load them, and
update experiment
space and flag value
calculations
18. A Few Links I Love
•
http://research.google.com/pubs/pub36500.html
•
•
http://www.exp-platform.com/
•
•
Collection of all of Microsoft’s papers and presentations on
their experimentation platform
http://www.deaneckles.com/blog/596_lossy-betterthan-lossless-in-online-bootstrapping/
•
18
The original paper on the overlapping experiments
infrastrucure at Google
Dean Eckles on his paper about bootstrapped confidence
intervals with multiple dependencies