Machine Learning: Business Perspective - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
2. BigML, Inc #DutchMLSchool 2
ML: Business Perspective
A Gentle Introduction to Machine Learning
Charles Parker
VP, Machine Learning Algorithms
3. BigML, Inc #DutchMLSchool 3
In This Talk
• A simple introduction to supervised machine learning
• An introduction to some of the core concepts of the BigML
platform
• A tiny peek behind the curtain to see what really happens when
ML algorithms learn a model
• Ways to evaluate and interpret your model’s predictions
4. BigML, Inc #DutchMLSchool 4
A Churn Problem
• You are the CEO of a mobile phone
company (congratulations!)
• Some percentage of your customers
leave the service (or “churn” every
month)
• You have a budget to reach out to
some customers each month to try to
persuade them to stay with the service
(with, for example, incentives)
• But to do that, you need to find out
who those customers are
5. BigML, Inc #DutchMLSchool 5
Begin with the End In Mind
• Currently, you have a simple targeting strategy designed by
hand that identifies the 10,000 most likely customers to churn
• For every five people you call, two are actually thinking about
leaving (4,000 for a precision of 40%)
• Of these customers, your operators can convince half to stay (so
about 2000)
• Each of these saved customers has a net value of $500
• What if you could increase the precision of your targeting to
50%?
6. BigML, Inc #DutchMLSchool 6
You Have The Data!
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
7. BigML, Inc #DutchMLSchool 7
Now . . . Magic!
• Can we use this data to create a
better targeting strategy?
(Spoiler: Yes!)
• Can we use the very same data
to measure the effectiveness of
that strategy? (Spoiler: Yes!)
• And how do we do that? (Spoiler:
MACHINE LEARNING)
8. BigML, Inc #DutchMLSchool 8
Aside: BigML Resources
• We can now upload that data to BigML
• Everything created on BigML is a resource
• Resources are:
• Mostly immutable: You can’t “screw them up”
• Assigned a unique ID
• Always available via both the API and the UI
• Working with BigML is a process of creating resources
9. BigML, Inc #DutchMLSchool 9
Data Sources @ BigML
• A data source is a raw data file that you upload to the BigML
platform
• We make some initial guesses about the number and type of
columns in the file, and a bit about their content (such as the
language for text fields)
• Data can come from uploaded CSVs, Google drive, dropbox, a
random URL, and so on
10. BigML, Inc #DutchMLSchool 10
Datasets @ BigML
• A BigML dataset represents processed row-column data
• We’ve made a final determination of the number and type of
columns in the source
• Some summary stats have been calculated for each column
11. BigML, Inc #DutchMLSchool 11
Supervised Machine Learning
• Collect training data from the past about
your prediction problem, including the
right answer (e.g., statistics for each
customer month and whether or not the
customer churned at the end of that
month)
• Feed that data to a machine learning
algorithm
• The algorithm creates a program (that
we typically call a model, or classifier or
predictor) which can make that
prediction for you on future data
12. BigML, Inc #DutchMLSchool 12
Traditional: Expert and Programmer
• Machine learning breaks the expert system
paradigm
• To make an expert software system before
machine learning, you used an expert and
a programmer
• The expert’s job was to know how the system
should work and be able to communicate that
knowledge
• The programmer’s job was to convert the expert’s
knowledge into a running computer program
• These could be the same person, but you
must have both of them
13. BigML, Inc #DutchMLSchool 13
Now: Data and Algorithm
• Instead of an expert we have data
• Data can be easier to get (and is in some cases already there)
• You can get a volume of data much larger than any expert
could possibly see
• Humans are notoriously bad at being good at things:
• https://www.newscientist.com/article/mg21628930-400-specialist-knowledge-is-useless-and-unhelpful
• Instead of a programmer we have a learning algorithm
• Once you have the data in the proper format, learning
algorithms work much faster (enabling iteration)
• Learning algorithms are modular
14. BigML, Inc #DutchMLSchool 14
Back to the Data
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
15. BigML, Inc #DutchMLSchool 15
The Goal: A Program that Predicts
• The goal of learning is to take this sort of training data and
create a program (a model or classifier or predictor)
• This model takes as input a single row with a value for each of
the columns given in the training data
• The model will output its predicted value for the objective based
on the given column values
• Importantly this row can contain any values for the given
columns, not just the ones seen in the training data
17. BigML, Inc #DutchMLSchool 17
Behind The Scenes
• A learning algorithm is:
• A space of models that can be learned (a hypothesis space)
• A clever way of searching through that space to find a “good”
model
• A good model is one that, for example, makes accurate
predictions on the training data
• So “machine learning” is finding a model amongst all possible
models that has a good “fit” with the training data
18. BigML, Inc #DutchMLSchool 18
A Simple Hypothesis Space
• Suppose we tell our machine to split the data into two parts
based on some threshold of some feature
• If a data point is on one side of the threshold, we’ll predict the
majority class of all the training points on that side
• We can measure how many points in the training data would be
correctly predicted using this method
• This is how good our “fit” is to the training data
• The best threshold is the one with the best fit (and we will try
them all)
19. BigML, Inc #DutchMLSchool 19
Back to the Data
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
20. BigML, Inc #DutchMLSchool 20
Minutes Used > 200
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
21. BigML, Inc #DutchMLSchool 21
Website Visits > 0
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
22. BigML, Inc #DutchMLSchool 22
Last Bill > $180
Minutes Used Last Month’s
Bill
Calls To
Support
Website Visits Churn?
104 103,60 0 0 No
124 56,33 1 0 No
56 214,60 2 0 Yes
2410 305,60 0 5 No
536 145,70 0 0 No
234 122,09 0 1 No
201 185,76 1 7 Yes
111 83,60 3 2 No
23. BigML, Inc #DutchMLSchool 23
So Far, So Good!
• This is basically what machine
learning algorithms do
• Try a solution and see how well it fits
the training data
• If “not well”, take some steps to
“improve” it
• There are many, many different
ways of doing it, but this is
usually what it boils down to
25. BigML, Inc #DutchMLSchool 25
Now What?
• The next thing is to use the training data to test the model
• Split the data into training and test sets (machine learning is very good at
memorizing the data)
• Train a model on the training set
• Evaluate it using the test set (or, the “held out data”)
• We’ll get to the evaluation tool more fully later on
26. BigML, Inc #DutchMLSchool 26
And Now?
• Is the model good enough?
• If not:
• Different modeling approaches (model types, parameter
tuning)
• Better features (more information, transformations of the
information you already have)
• The more you fiddle with things, the more you contaminate
your results (through overfitting)
• Thus, if it’s “good enough”, it’s often best to leave it alone
28. BigML, Inc #DutchMLSchool 28
Field Importance
• While our model is good, we don’t really have a good high level
overview of why it thinks what it thinks
• BigML supervised models provide this in the form of field
importance under the model summary report
29. BigML, Inc #DutchMLSchool 29
Individual Explanations
• Individual predictions can be explained as well (as the model’s
reasoning for a particular point can be different from the model
at large)
• Use the magnifying glass in the prediction form
30. BigML, Inc #DutchMLSchool 30
Two Takeaways
• When beginning a machine learning project, the more concrete
the goal, the better. Numbers are the lifeblood of analytics so if
you can quantify your objective(s), success is unlikely
• Machine Learning isn’t the right solution for every problem! Be
wary of your algorithm being replaced by a human!
• “Before embarking on an ambitious project, try to kill it.” - Edsgar
Dijkstra