Machine Learning in Big Data
- Look forward or be left behind
V. William Porto
Hadoop Summit Dublin 2016
Overview of RedPoint Global
2 RedPoint Global Inc. 2016 Confidential
Launchedin2006
Foundedandstaffedbyindustryveterans
Headquarters: Wellesley,Massachusetts
OfficesinUS,UK,Australia,Philippines
Globalcustomerbase
Servesmostmajorindustries
Overview of RedPoint Global
3 RedPoint Global Inc. 2016 Confidential
MAGIC QUADRANT
Data Quality
MAGIC QUADRANT
Integrated Marketing
Management
MAGIC QUADRANT
Multichannel Campaign
Management
MAGIC QUADRANT
Digital Marketing Hubs
FORRESTER WAVE™
Cross-channel
Campaign Management
FORRESTER WAVE™
Data Quality Solutions
4 RedPoint Global Inc. 2015 Confidential
With apologies to Gary Larson
Hadoop
5 RedPoint Global Inc. 2015 Confidential
Machine Learning – why bother?
If you have always done it that way, it is probably wrong” - Charles Kettering
6 RedPoint Global Inc. 2015 Confidential
Machine Learning – keeping ahead of the curve
• Three basic tenants for success in today’s world
• Prediction - you need to learn and use what you’ve learned
• Optimization - the world is a dynamic place
• Automation - because people don’t scale well
7 RedPoint Global Inc. 2015 Confidential
Machine Learning – what really is it all about?
• Learning vs. instruction
• Humans learn instinctively – computers not so much
• Intelligent Systems
• Memory
• Prediction (modeling)
• Assessment
• Feedback
• Adaptation
8 RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
• Regression – what happened in the past
• Prediction – what will happen in the future
“Prediction is very difficult – especially if it’s about the future”
- Nihls Bohr
9 RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
The wide world of data modeling
• Supervised models
• you have historical data and known correlated outputs (truth)
• Unsupervised models
• historical data, but may not have (or trust) associated outputs
10 RedPoint Global Inc. 2015 Confidential
Decision Trees
Major Assumption: the world is discrete
• fast, easy to understand, no linearity assumptions
• ‘human time’ required, unbalanced and/or large trees
11 RedPoint Global Inc. 2015 Confidential
Standard Linear Models
Assumption: the world is linear
• the real world really isn’t linear
• all errors are not all equal
• easy to get misleading results
? !
Which line is best?
12 RedPoint Global Inc. 2015 Confidential
Generalized ‘Non-Linear’ Models
Assumptions
• underlying functional mapping is known
• all errors are equal
• data is ‘well-conditioned’
• ‘standard’ error distribution
• Polynomials
• Exponentials (e.g., Gaussian, Poisson)
• Piece-wise linear
13 RedPoint Global Inc. 2015 Confidential
Non-Linear Models
Assumption: data is representative
• ‘universal’ modeling tools
• fast execution
• no linearity assumptions
• lots of parameters, many techniques
• difficult to explain
Artificial Neural Network
14 RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Historical Behavioral Data
Customer
Rating
Retention Customer Name
Loyalty
Member
Days Since
Last Purchase
Immediate
Relatives
Household
Children
Customer ID
Latest
Purchase
Price
Latest
Purchase
Item ID
Region
Code
Customer
Capture
Method
Customer
Contact Code
Domicile
1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO
1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY
1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY
1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA
1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY
1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY
1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY
1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 6
1 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA
1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY
1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 34
1 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN
1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY
1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY
1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 4
1 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 8
1 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA
15 RedPoint Global Inc. 2015 Confidential
User Story: Predict Customer Retention / Attrition
Machine Learning Processing Chain - Training
16 RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Machine Learning Processing Chain - Prediction
Reward predicted
‘retainees’ with
targeted product
offerings
Give potential attrition
customers special
incentives to stay with
the business
18 RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – group think
Collaborative Filtering
Relationship Matrix
19 RedPoint Global Inc. 2015 Confidential
Personalization – not really
!=
20 RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Similarity?
Customer Browser Gender
Age
Sector
Income
Sector
Married Children Homeowner
Recent Baby
Clothes
Purchase
George IE9 M 0 A N 0 1 N
Carol Chrome F 1 B Y 1 0 Y
Mary IE9 F 0 A N 1 0 Y
Dist(George,Carol) = 8
Dist(George,Mary) = 4
Dist(Carol,Mary) = 4
Can you afford to target (George,Mary) the
same way as (Carol,Mary) ?
21 RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Basic Question – which one describes the data the best?
Raw data
How many clusters are there ?
Two Clusters
Four Clusters
Six Clusters
23 RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – data driven
• let the data speak for itself
• multiple data projection ‘views’
• important boundary relationships
(“swing voters”)
Customer Demographics
24 RedPoint Global Inc. 2015 Confidential
User Story: Clustering / Segmentation
ML Clustering - Training ML Clustering – Processing New Data
25 RedPoint Global Inc. 2015 Confidential
Model Selection – how to choose?
• Basic Model Type (prediction or segmentation)
• inputs + correlated outputs
• inputs only?
• Basic Questions:
• what to use for my problem?
• parameters?
• is this the best choice?
• could I do better, and how?
26 RedPoint Global Inc. 2015 Confidential
Optimization – Evolving better solutions
• Simulated Evolution
• fast, efficient search
• always have a solution
• arbitrary ‘evaluation’ functions
• can start with existing solution(s)
• Variation – alter model type, parameters
• Assessment – how well does the model work?
• Selection – survival of the fittest
27 RedPoint Global Inc. 2015 Confidential
Evolutionary Optimization – Evaluation Function
• can use any measureable data
• no continuity assumptions
• no differentiability assumptions
• no symmetry assumptions
Sunshine Hurricane
20 -1000
5 50
Sunshine
Hurricane
Prediction
Reality (Truth)
28 RedPoint Global Inc. 2015 Confidential
User Story: Optimizing Classification Models
Task: Predict Retention/Attrition
62.00
70.2
72.3 73.4 75.2
34.8
28.8
24.5
22.1 20.9
0.00
20.00
40.00
60.00
80.00
100.00
0 1 2 3 4 5 6
Performance
Generation
Model Performance Optimization
Classification Accuracy
Test Set Error (RMS)
17 Potential input features
(customer demographics)
2 outputs (retention/attrition)
1300 Training Samples (50 – 50, A / B Split)
1300 Test Samples ( naïve test data )
29 RedPoint Global Inc. 2015 Confidential
Use Case – Fully Adaptive Feedback (Next Best Offer)
DB
Historical User
Behavior
(stimulus/response)
Train / Update
Model
Non-Adaptive
(Fixed) Mode
Randomized A/B/C
Offer Selection
Adaptive
ML Mode
ML Prediction
Offer Selection
Operation
(Trigger)
Ad / Offer
(stimulus)
Feedback
Cycle
30 RedPoint Global Inc. 2015 Confidential
Five Keys to Successful Machine Learning
• Let the data speak for itself – don’t force fit your models
• Remember, all errors are not all equal – use this to your advantage
• True learning requires continual adaptation !
• Automate the process with feedback – remove the “man-in-the-loop”
• Trust the optimization process – it really works!
31 RedPoint Global Inc. 2015 Confidential
Q&A
Contact Info
Visit : www.redpoint.net
Bill Porto
Sr. Engineering Analyst
RedPoint Global Inc.
vwporto@redpoint.net
Want More Information about this topic?
Fill out your card or go to redpoint.net/hadoopeurope