2. Machine learning overview and some applications
Why should I care and why now?
Closer look at the inner workings and some algorithms
How do I get started?
Did somebody say testing? Is it important?
Give me some more pointers.....
What is covered?
4. Human
• Innate ability to think, ask questions, what
questions needs to be answered, distinguish
right from wrong (which we call it as the
Expert Opinion)
• Usually limited to work with ~ 4 variables and
their interactions when trying to solve a
problem/accomplish a task - 4 variables can
be equated to 4 dimensions
Human vs Machines – How different are we?
4
Machines
• Good at answering questions, look for
patterns in data that may not be
possible by human
• Compared to humans machines can be
trained to look at N features/variables
• Machines can learn/remember patters to
be applied to future use cases
• Never needs a break…
5. NLP – Siri, Google, Cortana
Marketing – Targeting your at Risk customer using Uplift Models
Entertainment – Netflix, IMDB movie recommendations
Health Care - Gene Sequencing, Cancer detection, Hospital Re-
admissions
Politics - Election Outcomes – Nate Silver – 538 Blog
Auto Industry - Self driving cars, Auto Pilot
Insurance – Premium, Reserve, Intervention, Fraud, Subro
Productivity / Office – Email Spam vs Ham
Manufacturing - Predict Machine Failures
Other – Predicting Wait times
and yes an all time favorite “Weather Forecasting…”
What are some examples of Machine Learning?
5
6. To understand how this works, we can refer back to how a child
learns…aka how we learn…
- Children picture book – To learn about a dog, the child has to
learn about the Shape, Size, Color, Key Characteristics,
Variations
- Machine learns similar to a child / human by analyzing the data
set, learning/identifying patterns in the data, learning/identifying
variations/generalizations
- The process looks like…
Ok… What is Machine Learning?
6
Data InferencesConcepts
Data Storage Abstraction Generalization Evaluation
7. The next step in this evolution is to combine the power of the two
[Human and Machines] to push the boundaries of what we can
achieve.
Machine learning/technology is not going to replace human but
rather augment human intelligence with insights
Similar to the role technology [e.g. Internet, Social Media]
played/is-playing in 80,s, 90s and in the new Millennium,
Machine Learning / AI is going to redefine what we do and how
we do it
Why should I care?
7
8. Why Now? What Happened?
8
Data
Computing
Power
Statistical
Methods
Ability to capture and access to
more and more Data!! Yes Data!!
N fold increase
in computing
power to
process data.
Think about your
smartphone and
what all you can
do with it. Yes
the servers
are more
powerful as
well..
Continuous research!! Algorithm
implementations in PA software
packages. Anybody heard about
Data Scientists and ML courses?
New
Use
Cases
10. Let’s look under the hood…
10
Data
Featurize
Normalize
Filtering
Imputation
Featurize
Data
Fusion
Training/
Testing
Analysis/
Insights
Use /
Implement
Input pre-processing Training / optimization Post Analysis
Structured
Un-
structured
Dimension
Reduction
11. Supervised – Provide the algorithm with both the input and
output and let it learn from it. Classification is a typical
application of supervised algorithms. It can also be used to
make numeric predictions (using Linear Regression Model)
– Spam vs Ham
Un-Supervised – Only provide the input and let the algorithm
learn from the data, discover patterns [Clustering/
Segmentation Analysis]
– Market Basket Analysis for a grocery chain
– Fraudulent Behavior
Meta Learners – is not tied to a specific learning but rather
focus on learning how to learn more effectively (not covered in
this presentation)
Types of Machine Learning Algorithms
11
12. Learning Algorithms
12
Model Learning Task
Supervised
Nearest Neighbor Classification
Naïve Bayes Classification
Decision Trees Classification
Classification Rule Learners Classification
Linear Regression Numeric Predictions
Regression Trees Numeric Predictions
Model Trees Numeric Predictions
Neural Networks Dual Use
SVM (Support Vector
Machines)
Dual Use
Un-Supervised
Association Rules Pattern Detection
K-Means Clustering Clustering
13. K Nearest Neighbor
Decision Trees
Neural Network
Support Vector Machines
Regression Models
Lets take a closer look at a few of them…
13
14. Classification Algorithm
Lazy Learner, Non Parametric – No assumption about the
underlying data set
No training based on data
K – Determine the K value ~ sqrt of the Data Set
Distance Function – Euclidian (Multi Dimensional)/ Cosine etc.
K Nearest Neighbor
14
Sweetness
Acidity
W3
W2
W1
W4
W9
W5 W10
W7
W6
W8
W?
Dist (W5, W?) = SQRT ((W5x – W?x)2 + (W5y – W?y)2)
Or
Dist (W2, W?) = SQRT ((W2x – W?x)2 + (W2y – W?y)2)
Distance Function
Classifying wines….
15. Decision trees utilize tree structure to model the relationships
among the features and potential outcomes Greedy Learners
Is easy to understand / explain how it works (except when it
becomes really large)
Built using a heuristic called recursive partitioning / Divide and
Conquer Increase homogeneity in the data set – Decrease
Entropy / Variablity
Very powerful and applicable to both small and large datasets
Easy to overfit / underfit (Tree decision points/nodes)
Should not be used for data sets with large number of nominal /
numeric features as it ends up creating extremely complex
decision trees
Decision Trees
15
16. The split is done along the axis / feature set one at a time (e.g. do
it by sweetness than by acidity)
Ensure that you are accounting for pruning the tree to control the
size (pre / post pruning)
Optimize by cost
Decision Tree (Example)
16
Root
Decision
Node
Leaf
Sweetness
Acidity
W3
W2
W1
W4
W9
W5 W10
W7
W6
W8
Sweetness
Acidity
17. Neural Networks* – Black Box Methods
17
• Network Topology – Number
of Layers, Nodes per layer and
connection between them
(Feedback/Feed-forward)
• Activation Function –
transforms a neurons combined
input signals into single output
signal and send it out
• Training Algorithm –
Determines how to set
connection weights in order to
inhibit or activate a node based
on the input signal
AF
*The above is sample network that shows a one way flow, Deep NN flow both ways
Model built mimic the structure of brains. 3 pieces:
Neural Networks are very powerful but prone to over-fitting!!
18. Models built taking into consideration multidimensional
surfaces to define the relationship between features and
outcomes
SVM – Support Vector Machines
18
Sweetness
Acidity
W3
W2
W1
W4
W9
W5 W10
W7
W6
W8
• Hyper-plane – Flat surface in a high
dimensional space that separates out the
different observations [Example – Wines
shown in 2 dimensions but one can use
multiple dimensions to separate the wines]
• Maximum Margin Hyper-plane – A plane
that creates the greatest separation between
the two or more classes – Refer to the bars
in the picture
• SVM uses Vector Geometry to identify the
planes that separate out the different classes
of observations
SVMs are usually not prone to noise or over-fitting and fairly popular among
Data Scientists due to the number of implementations available across
different packages
19. Linear Regression – 1 numeric variable
Multi Linear Regression – 2 or more numeric variable
Logistic Regression – model binary categorical outcome
Poisson Regression – models integer count data
Multinomial Logistic Regression – models categorical outcome/
used for classification
Generalized Linear Models – Models that can be generalized to
other patterns
Key Difference between Machine Learning and Regression
modeling is the feature selection and model specification is
driven by the user
Regression Models
19
20. Regression Tree models do not use linear regression methods
to make predictions. They may use the average value of the
examples that reach a leaf
Model Trees are different from Regression trees as at each
leaf, a multiple linear regression model is built from the
examples reaching that node
– Combines the strengths of decision trees with the ability to model
numeric data
– Uses automatic feature selection and can be used with a data set
that has large number of features
– Usually is more difficult to understand but does not need advanced
statistical knowledge to interpret
Regression Tree and Model Trees
20
21. DID I LOSE YOU IN THE SVM HYPER-
PLANES? BACK TO REALITY!!
21
22. Define Scope and Objectives…
– What are you trying to do and why?
Remember Data is key
– Do not under-estimate the work that needs to go in to prep data for
the models [historical and on-going]
Modeling is “Sexy” but not the be-all /end-all of this process
Yes you need to test the model results too - Does it make
sense?
– Does it really make sense? Try to dig into the why?
And do not forget on how will you operationalize the findings
if any?
– Using the insights is easier said that done!!
Now the most difficult to part – Need a process to review the results of how
you are using the insights and if it is delivering value!!
OK…How do I get started?
22
23. Do not over fit the model or try to model the noise [Google to
find out more]
Try to understand what the model is saying and why?
Never hesitate to question the findings?
If it is too good to be true, than it is not true!!
Be skeptic, really skeptic of Model Results
23
24. Separate Training and Testing Data Sets – There are a number
of ways to do this…
Regression and Classification model output
– Compare predicted to actual results – Yes, you can slice/dice the
data to get more insights
– Use Confusion Matrix [Precision Rate, Recall Rate, Flag Rate]
Model Testing
24
25. Residuals – summary statistics for the errors in our predictions
[True Value – Predicted Value]
P-Value – Estimate of the probability that the true coefficient is
zero given the value of the estimate
Multiple R Squared Value – Measures how well our model as a
whole explains the values of the dependent variable (1.0
implies it is able to explain the result completely. Good luck with
that…)
Evaluating Regression Models
25
26. Model Testing – Confusion Matrix
26
TRUE FASLE
TRUE TRUE POSITIVE FALSE NEGATIVE
FALSE FALSE POSITIVE TRUE NEGATIVE
MODEL Predictions
ACTUAL
RECAL
TP / (TP+FN)
PRECISION
TP / (TP + FP)
F
L
A
G
%
P
R
E
C
I
S
I
O
N
%
RECALL %
Flag
Rate
27. Are we done here? Need my Chai…
27
https://medium.com/swlh/the-7-best-data-science-and-machine-learning-
podcasts-e8f0d5a4a419#.rl7nhqwje
http://www.kdnuggets.com/2015/07/sameer-chopra-training-resources-predictive-
analytics.html
Feel free to share / use this presentation but not before
you leave me a feedback!!