How ml can improve purchase conversions

How ML can improve purchase conversions
- Talk + Case study by Sudeep Shukla

A bit about me
● Product Manager at SignEasy
● Earlier Data Analyst at SignEasy
● Business Analyst at Tracxn
● IIT Guwahati ‘14 alum

What is Machine Learning?
Machine learning is a set of generic algorithms that teach computers what to
do instead of telling them what to do. These algorithms learn from the data
they are given and can tell you something about that data without having
programmers to actually write any custom code.

What is Machine Learning?
“Ability to learn from data”
“Automatically learn and improve from experience”
“Use historical data to make better business decisions”
“Discover patterns in data, and construct mathematical models and
predictions using these discoveries”

Formally speaking
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.” — Tom Mitchell,
Carnegie Mellon University

Old school vs new school
Traditional programming

Old school vs new school
Machine learning

Classic Machine Learning Pipeline

Classic ML Pipeline
Key components:
● Train Set: Data that is fed into the model for training. It contains the value
of the prediction variable.
● Validation Set: Usually some part of the Train Set (~20%) is kept aside and
used for validation.
● Test Set: New data that is used to test the model. It doesn’t contain the
prediction variable.
● Features: Data points used in the model.
● Feature Engineering: Coming up with new, smart “features” based on
existing ones.

Types of problems & applications of ML

6 pack of problems
Machine learning problems can be grouped into common types. The following
six groups cover most of the problems we refer to when we are using Machine
Learning:
1. Classification
2. Regression
3. Recommendation
4. Ranking
5. Clustering
6. Anomaly

Classification
Classification problems involve figuring out what kind of a thing something is.
In these problems, data is labeled, meaning it is assigned a class or type.

Regression
In regression problems, you are trying to predict a numerical value of a thing.
Data is labeled with a real value.

Recommendation
With recommendation algorithms, you suggest users the thing they will be
most interested in. You apply recommender systems in scenarios where many
users interact with many items and your recommendation systems can
predict what other users will like.

Ranking
With ranking problems, you help users find the most relevant thing from a
large set of possibilities.

Clustering
With clustering problems, you divide the given data into groups based on
similarity and other measures of natural structure in the data.

Anomaly
With anomaly, you are trying to identify unusual patterns and uncommon
things that do not conform to an expected behavior, called outliers.

Basic ML Models
● Decision Tree
● Random Forest
Other models:
● Bayes, Logistic Regression, SVM, Neural Network, etc.

Decision Trees
A decision tree is a decision support tool that uses a tree-like graph or
model of decisions and their possible consequences, including
chance-event outcomes, resource costs, and utility.

Random Forest
To say it in simple words: Random forest builds multiple decision trees and
merges them together to get a more accurate and stable prediction.

Case Study : Using ML to identify paying
customers before they subscribe

Inspiration
Inspiration from Strong Analytics’ article:
https://www.strong.io/blog/predicting-customer-behavior-machine-learning-to
-identify-paying-customers

Data Gathering
Analytics
Tool
Database
Marketing
Tool

Data Points
# of app launches (Mixpanel)
# of signs
# of imports
# of RS initiated
Visited pricing page or not (Mixpanel)
Tapped on pricing page or not (Mixpanel)
Registration source (Social or Email)
Generic email domain or not Feature Engineering

Data Cleaning
● Remove NULL values
● Make sure values for the field are of intended type (number of string)

Process
Fed 4 months of data (Feb, Mar, Apr, May) into a random forest model, used
80-20 split for validation, and tested it against users who registered in 1st
week of June.

Why Random Forest?
Considered as a very handy and easy to use algorithm.
This algorithm is also a great choice, if you need to develop a model in a short
period of time. On top of that, it provides a pretty good indicator of the
importance it assigns to your features.
One of the big problems in machine learning is overfitting, but most of the
time this won’t happen that easy to a random forest classifier. That’s because
if there are enough trees in the forest, the classifier won’t overfit the model.
Another great quality of the random forest algorithm is that it is very easy to
measure the relative importance of each feature on the prediction

Results
1270 users registered in 1st week of June. 4 out of them had actually made a
purchase. Here are their prediction results from the algo:
● 99.69% accuracy in prediction.
● The algorithm filtered away the users who are unlikely to purchase
extremely well. 1231 out of 1270 users were assigned 0% chance of
upgrading. Only 1 out of these actually upgraded. The user
(austin.******@yahoo.com) is an outlier because he purchased a plan
within 5 minutes of registering and has not made any signature since then
either.
● It predicted that 39 users (3%) had a non-zero chance of conversion. Out
of which, 3 of the top users actually purchased.

Parameters for judging performance
Many parameters but basic ones that give a good idea of performance are:
● Accuracy = (TN+TP)/n
● Recall = (TP)/(TP+FN)
● Precision = (TP)/(TP+FP)
● F-Score = H-mean of Recall
and Precision

Recall (Credits -
https://www.quora.com/What-is-the-best-way-to-understand-the-terms-precis
ion-and-recall)
Imagine that, your girlfriend gave you a birthday surprise every year in last 10 years. (Sorry, I didn’t intend to depress
you if you don’t have one.) However, one day, your girlfriend asks you:
‘Sweetie, do you remember all birthday surprises from me?’
This simple question makes your life in danger. To extend your life, you need to recall all 10 surprising events from
your memory. So, recall is the ratio of a number of events you can correctly recall to a number of all correct events.
If you can recall all 10 events correctly, then, your recall ratio is 1.0 (100%). If you can recall 7 events correctly, your
recall ratio is 0.7 (70%).
Understanding Precision and Recall

Precision (Credits -
https://www.quora.com/What-is-the-best-way-to-understand-the-terms-precis
ion-and-recall)
However, you might be wrong in some answers.
For example, you answer 15 times, 10 events are correct and 5 events are wrong. This means you can recall all events
but it’s not so precise.
So, precision is the ratio of a number of events you can correctly recall to a number all events you recall (mix of
correct and wrong recalls). In other words, it is how precise of your recall.
From the previous example (10 real events, 15 answers: 10 correct answers, 5 wrong answers), you get 100% recall
but your precision is only 66.67% (10 / 15).
Understanding Precision and Recall

1st pass 2nd pass 3rd pass
Total 107843 107631 107631
TN 106582 106449 106628
TP 420 407 365
FN 620 563 605
FP 221 212 33
1st pass 2nd pass 3rd pass
Accuracy 99.22% 99.28% 99.41%
Recall 40.38% 41.96% 37.63%
Precision 65.52% 65.75% 91.71%
F Score 49.97% 51.23% 53.36%

Surfacing the score
Integrate into Sales CRM (High touch)

Surfacing the score
Integrate into Marketing CRM (low touch)

Application
Can be used to set up a process to reach out to top users and understand if
they are facing any issues that is stopping them from upgrading - either
through high touch or low touch.
If we are able to get conversions from these top users, that would help us
drive up our conversions overall.

Thank you!
Questions?
Find me on
Linkedin!

How ml can improve purchase conversions

Recomendados

Recomendados

Más contenido relacionado

Similar a How ml can improve purchase conversions

Similar a How ml can improve purchase conversions (20)

Último

Último (20)

How ml can improve purchase conversions