SlideShare una empresa de Scribd logo
1 de 69
Introduction to
Machine
Learning
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
Outline
• Introduction
• Types of Learning
• Machine Learning Models
• Definition
• Regularization
• Metrics
• Feature Engineering
• Real World Examples
What is
Machine
Learning?
• “Learning is any process by which a system
improves performance from experience.” -
Herbert Simon
• Machine learning is training computers to
effectively achieve a performance criterion
using examples or historical data.
Why?
Machine Learning is used when
 Human expertise is unavailable (space expeditions).
 Human expertise is not explicable (speech translation).
 Information need to be personalized (education, medicine).
 Domains with huge amount of data.
Applications
• Education --- Developing Learning Paths
for Students
• Healthcare --- Personalized Medicine
• Retail --- Product recommendations
• Web --- Search
• Manufacturing --- robotics, control
• Finance – fraud detection, asset
management
• HR --- people analytics
• Medical --- drug discovery, automated
diagnosis
• ………..
Types of Learning
Supervised Unsupervised
 Examples or training data is available
o Human annotations, user Interactions
 Data contains features correlated with the
desired outcome
 A model is learned from the examples
 Goal of the model is to predict future
behavior
 Direct examples are not available
 Data contains features correlated,
outcome may not be defined.
 It is possible to create clusters
correlated with the learning objective
based on patterns in the data
Learning Objective ---
Types of Learning
Supervised Unsupervised
 Regression
 Linear
 Decision Trees
 Classification
 Logistic Regression
 Naïve Bayes
 SVM
 Decision Trees – RF, GBDT
 Clustering --- kmeans
 Similarity based results
 Transfer Learning
Models ---
Linear Regression
Linear regression was developed in the field of statistics and is studied as a model for understanding the relationship
between input and output numerical variables, but has been borrowed by machine learning. It is both a statistical
algorithm and a machine learning algorithm.
Linear Regression | simple linear regression | Ordinary Least Squares | multiple linear regression
• The dependent variable Y has a linear relationship to the independent variable X
• For each value of X, the probability distribution of Y has the same standard deviation σ.
• For any given value of X, The Y values are independent, as indicated by a random pattern on the residual plot.
• The Y values are roughly normally distributed (i.e., symmetric and unimodal).
Example: Sales Prediction  company’s advertising spend on radio, TV, and newspapers.
Decision Tree Regression
https://www.saedsayad.com/decision_tree_reg.htm
Standard Deviation -- single attribute
•Standard Deviation (S) is for tree building (branching).
•Coefficient of Deviation (CV) is used to decide when to stop
branching. We can use Count (n) as well.
•Average (Avg) is the value in the leaf nodes.
Standard Deviation -- two attributes
Standard Deviation Reduction
The dataset is then split on the different attributes. The standard deviation for each branch is calculated. The
resulting standard deviation is subtracted from the standard deviation before the split. The result is the standard
deviation reduction.
Divide the Dataset
The dataset is divided based on the values of
the selected attribute. This process is run
recursively on the non-leaf branches, until all
data is processed.
Overcast does not need more splitting
"Overcast" subset does not need any further splitting because
its CV (8%) is less than the threshold (10%). The related leaf
node gets the average of the "Overcast" subset.
"Sunny" branch --- needs splitting
“Rainy” – needs splitting
Here is the final Tree
Pros and Cons
1. Simple and easy to understand: Decision Tree looks like
simple if-else statements which are very easy to understand.
2. Decision Tree can handle both continuous and categorical
variables.
3. No feature scaling required: No feature scaling
(standardization and normalization) required in case of Decision
Tree as it uses rule based approach instead of distance
calculation.
4. Handles non-linear parameters efficiently: Non linear
parameters don't affect the performance of a Decision Tree
5. Decision Tree can automatically handle missing values.
6. Decision Tree is usually robust to outliers and can handle
them automatically.
1. Overfitting: This is the main problem of the
Decision Tree
2. High variance: Due to the overfitting, there are
very high chances of high variance in the output
which leads to many errors in the final estimation
and shows high inaccuracy in the results. In order to
achieve zero bias (overfitting), it leads to high
variance.
3. Unstable: Adding a new data point can lead to
re-generation of the overall tree and all nodes need
to be recalculated and recreated.
4. Affected by noise: Little bit of noise can make it
unstable which leads to wrong predictions.
Applications
• Trendline --- A trend line represents a trend, the long-term movement in time series data after other
components have been accounted for. It tells whether a particular data set (say GDP, oil prices or stock prices) have
increased or decreased over the period of time.
• Epidemiology --- Early evidence relating tobacco smoking to mortality and morbidity came from observational
studies employing regression analysis. In order to reduce spurious correlations when analyzing observational data,
researchers usually include several variables in their regression models in addition to the variable of primary interest.
• Finance --- The capital asset pricing model uses linear regression as well as the concept of beta for analyzing and
quantifying the systematic risk of an investment. This comes directly from the beta coefficient of the linear regression
model that relates the return on the investment to the return on all risky assets.
• Economics --- Linear regression is the predominant empirical tool in economics. For example, it is used to
predict consumption spending,[20] fixed investment spending, inventory investment, purchases of a
country's exports,[21] spending on imports,[21] the demand to hold liquid assets,[22] labor demand,[23] and labor
supply.[23]
https://en.wikipedia.org/wiki/Linear_regression
Classification
http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf
Values of Y, i.e the response can take discrete values
o Binary Classification – response can belong to two
classes (0,1)
Rating – thumbs up, thumbs down.
o Multi-class classification – there can be n classes 
[1, …. n]
 Movie/restaurant rating can range from
[1,2,3,4,5]
o Multi-class , multi – label classification.
Example –
Concept space in Probability has many
concepts (classes) – [Probability, Bayes
Theorem, DiscretePDF – Binomial, Poisson,
Continuous PDF – normal, exponential, …]
A question will typically belong to multiple
classes.
Classification Pipeline
Data Cleaning Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall performance
• Training Data Collection / Examples
of classes that we are trying to model
• Model performance is directly
correlated with quality of training
data
Model Evaluation
• Model selection
• Architecture
• Parameter Tuning
Users
Online
21
Logistic Regression -- Binary Classification
Logistic Function or Sigmoid function
https://sebastianraschka.com/faq/docs/logistic-why-sigmoid.html -- discussion on the logit function
Interpretation: Output is the probability that y=1 given x.
Output > 0.5 represents y =1 and output < 0.5 represents y=0
Cost
Function –
Cross
Entropy
Multinomial Logistic / Softmax
Sigmoid gets replaced by the softmax function.
This applies when there are 1,…k classes.
Regularization --- Linear
/Logistic
• Penalty against complexity
o model does not up
"peculiarities," "noise,"
or "imagines a pattern
where there is none.”
• Helps with generalizing to
new data sets.
• Addition of a bias to the
model when suffers from
high variance or overfitting
L2 regularization
L1 regularization
Data --- Train,
Test and
Validation
Train – 90% , Test –5 %, Validation –5% ---- the
percentages can vary depending on the total size of
the dataset.
• Train --- data used to train the model
• Validation --- data that the model has not seen
but is used for parameter tuning, i.e the model is
optimized based on performance on this set.
• Test --- model has not seen this data, this data is
not used in any part of the computation. Final
performance metrics are reported on this data.
Bias – Variance Tradeoff
BIAS — When we model in a very simple, novice way, for example
just a single linear equation prediction for an actual complex model.
Of course because of this Model becomes Under fit and miss out
various important insights and relations between variables.
VARIANCE — On the other hand, when we become over concern for
a simple given data and fit a model in a very complicated way, it
results in Over fit. So each noise and outlier will be considered as
valid data point and modelled accordingly.
Decision Tree Classifications
Advantage ---
• Results are interpretable
• Works for both numerical and categorical data
• Does not require feature transformations (example
--- normalization, scaling)
• Robust to Multicollinearity – correlated features.
Single decision are rarely used in practice
Disadvantage ---
• Unstable --- small changes in data can lead to large
structural changes in the decision tree
• Prone to overfitting
• Easily becomes complex
Splitting criteria
Ensemble
Tecniques ---
Random Forest
Many trees are better than one!
N slightly differently trained
decision trees and merges
them together to get more
accurate and stable
predictions.
Regularization
• Limit tree depth.
• Pruning
• Penalize selection of new
features over features that
have similar gain
• Set stricter stopping criterion
on when to split a node
further (e.g. min gain, number
of samples etc.)
Feature Importance
A feature’s importance score measures
the contribution from the feature. It is
based on the impurity reduction of the
class due to the feature.
Bias towards features with more
categories
It is important to compute the correlation
with accuracy.
Unsupervised Learning
There is no training data …
Clustering
• Hierarchical clustering
• K-means clustering
• K-NN (k nearest neighbors)
Clustering is a technique that
finds groups (clusters) in the
data that have similar patterns.
K-means
Clustering
Similarity based
Recommendations
• Text data – news article,
study material , books, …
• For every piece of content
compute the distance to
all every other content in
the cluster --- save the top
–n in a database
• When a user views any
content surface the top-n
(typically 5) other similar
items
Applications
• Recommendations based on Text
Similarity
• Customer Segmentation
• Content Categorization
• As a pre-analysis for supervised
learning
Performance Metrics
Is the model good enough?
Regression
R² Error: The metric helps us to compare our current model
with a constant baseline and tells us how much our model is
better. The constant baseline is chosen by taking the mean of
the data and drawing a line at the mean.
Adjusted R²: Adjusted R² depicts the same meaning as R² but
is an improvement of it. R² suffers from the problem that the
scores improve on increasing terms even though the model is
not improving which may misguide the researcher. Adjusted
R² is always lower than R² as it adjusts for the increasing
predictors and only shows improvement if there is a real
improvement
Classification
True Positive --- Number of observations that model correctly
predicts the positive class
False Positive --- Number of observations where model
incorrectly predicts the positive class.
False Negatives --- Number of observations where model
incorrectly predicts the negative class.
True Negatives --- Number of observations where model
correctly predicts the negative class
https://en.wikipedia.org/wiki/Precision_and_recall
Classification
https://en.wikipedia.org/wiki/Precision_and_recall
Precision : TP/(TP+FP) --- what percentage of the positive class
is actually positive?
Recall : TP/(TP+FN) --- what percentage of the positive class
gets captured by the model?
Accuracy --- (TP+TN)/(TP+FP+TN+FN) --- what percentage of
predictions are correct?
Thresholding --- Coverage
In a binary classification if you choose randomly the probability of belonging to a class is 0.5
0.3
0.7
It is possible improve the percentage of
correct results at the cost of coverage.
Confusion
Matrix
Good for checking where your
model is incorrect
For multi-class classification it
reflects which classes are
correlated
Ranking
Model
Recommendations Search
Mean Reciprocal Rank
For implicit dataset, the relevance score is either 0 or 1, for items not bought or bought (not clicked or clicked, etc.).
Mean Average Precision
Precision@k would be the fraction of relevant items in the top k recommendations, and recall@k would be the
coverage of relevant times in the top k.
Normalized Cumulative Discounted Gain
(NDCG)
Gain
Gain for an item is essentially the
same as the relevance score
Cumulative Gain
Cumulative Gain is defined as the sum of
gains up to a position k
This measure is independent of ordering.
Discounted
Cumulative Gain
By diving the gain by its rank,
we sort of push the algorithm
to place highly relevant items to
the top to achieve the best DCG
score.
DCG score adds up with the
length of recommendation list.
Ideal Discounted
Cumulative Gain
Nirmalizing to incorporate lists of
different lengths
Between 0,1
Feature Engineering
Combining math and intuition
Imputation or missing values
https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114#1c08
Drop rows with missing values
Cons: Reduces training data. In case of multiple features values may only be missing in a subset of features.
Replace with a reasonable value – median is common
Cons: There are assumptions used when values are missing which may not be correct.
Categorical Imputation --- replace with the most common value.
Cons: Assumptions
Handling
Outliers –
drop, capp
Binning
The trade-off
between performance and overfitting is the key
point of the binning process.
Log Transform
• It helps to handle skewed data and after transformation, the
distribution becomes more approximate to normal.
• In most of the cases the magnitude order of the data changes
within the range of the data. For instance, the difference
between ages 15 and 20 is not equal to the ages 65 and 70. In
terms of years, yes, they are identical, but for all other
aspects, 5 years of difference in young ages mean a higher
magnitude difference. This type of data comes from a
multiplicative process and log transform normalizes the
magnitude differences like that.
• It also decreases the effect of the outliers, due to the
normalization of magnitude differences and the model become
One-hot encoding
Scaling -- all features have the same range
The continuous features become identical in terms of the range, after
a scaling process. This process is not mandatory for many algorithms,
but it might be still nice to apply. However, the algorithms based
on distance calculations such as k-NN or k-Means need to have scaled
continuous features as model input.
Normalization Normalization (or min-max normalization) scale all
values in a fixed range between 0 and 1.
Standardization
Real World Application
There is no training data …
Use Case --- Predict will a
user buy a book?
Online Book Store
Generating Features
Book Features User Features Book-User Features
• Tags --- genre, subject, …
• Level – Beginner, Intermediate
Advanced
• Popularity score –
1. exponential decay on clicks
2. Create time bound scores,
such number of views in the
last 7 days, last 14 days
• Price
• Length of the book
• …
• Tags --- genre, subject, …
• Level – Beginner, Intermediate
Advanced
Derived from user interactions on the
site.
• View Score
1. Exponential decay clicks on
books
2. Time bound --- number of
views in past 7/14 days
• Price category
• Time of day – categorical
• …
• Number of Views in past 14/30
days
• Already bought
• Number of views from the
same author
• …
Response Variable & Modeling
• If the site is not super active you might not have enough data on purchases
• Multi-stage model
• Stage 1 : response variable views – i.e will the user view/click on this
book in the next 3 days?
• Stage 2: response variable purchase – will the user purchase this book
in the next 3 days. The probability of view will be a feature in the Stage
2 model.
Modeling: We have a mix of categorical and numerical features --- Random Forest
How to start projects in machine learning?
• Kaggle competitions ---
• Make sure to solve the ML problems for concept development
before competing
How to start projects in machine learning?
• Self guided workshops/projects ---
lets say you have data from Zomato
• Restaurant recommendation --
user based, content similarity
based.
• Restaurant tags from reviews.
• Sentiment analysis from reviews.
Thank You
@sangha_deb
sangha123@gmail.com
Appendix
K-Nearest Neighbors
K-Nearest Neighbor is a classification
algorithm that leverages observations
close to the target point to decide which
class it belongs to
There are two parts of the algorithm: first, how to measure
“close”; second, how many close observations (K) we
need.
https://github.com/spotify/annoy
KNN - algo
Derivative of the sigmoid & cost function
Gradient Boosted Trees
Loss Functions
• Cross-Entropy
• Hinge
• Huber
• Kullback-Leibler
• MAE (L1)
• MSE (L2)
Optimization Techniques
• Gradient Descent
oStochastic
o Mini Batch
• Adagrad
• RMSprop
• Adam
• Cross-Entropy
• Others

Más contenido relacionado

La actualidad más candente

Course 2 Machine Learning Data LifeCycle in Production - Week 1
Course 2   Machine Learning Data LifeCycle in Production - Week 1Course 2   Machine Learning Data LifeCycle in Production - Week 1
Course 2 Machine Learning Data LifeCycle in Production - Week 1Ajay Taneja
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...Ajay Taneja
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
LearningKit.ppt
LearningKit.pptLearningKit.ppt
LearningKit.pptbutest
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!taeseon ryu
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...Edge AI and Vision Alliance
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storageAjay Taneja
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShahar Cohen
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionKishor Datta Gupta
 
Deep Learning Interview Questions and Answers | Edureka
Deep Learning Interview Questions and Answers | EdurekaDeep Learning Interview Questions and Answers | Edureka
Deep Learning Interview Questions and Answers | EdurekaEdureka!
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

La actualidad más candente (20)

Course 2 Machine Learning Data LifeCycle in Production - Week 1
Course 2   Machine Learning Data LifeCycle in Production - Week 1Course 2   Machine Learning Data LifeCycle in Production - Week 1
Course 2 Machine Learning Data LifeCycle in Production - Week 1
 
C3 w4
C3 w4C3 w4
C3 w4
 
C3 w3
C3 w3C3 w3
C3 w3
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
LearningKit.ppt
LearningKit.pptLearningKit.ppt
LearningKit.ppt
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storage
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Deep Learning Interview Questions and Answers | Edureka
Deep Learning Interview Questions and Answers | EdurekaDeep Learning Interview Questions and Answers | Edureka
Deep Learning Interview Questions and Answers | Edureka
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Word embedding
Word embedding Word embedding
Word embedding
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

Similar a Intro to ML - Types, Models, Metrics & Examples

Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsCIToolkit
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementEmil Lupu
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesClaudio Mirti
 
Monetization of Algorithms
Monetization of AlgorithmsMonetization of Algorithms
Monetization of AlgorithmsAfik Gal, MD,MBA
 

Similar a Intro to ML - Types, Models, Metrics & Examples (20)

Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Analytics
AnalyticsAnalytics
Analytics
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Operations Research
Operations ResearchOperations Research
Operations Research
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Modeling and analysis
Modeling and analysisModeling and analysis
Modeling and analysis
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
 
Machine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business casesMachine Learning - Algorithms and simple business cases
Machine Learning - Algorithms and simple business cases
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Monetization of Algorithms
Monetization of AlgorithmsMonetization of Algorithms
Monetization of Algorithms
 

Más de Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

Más de Sanghamitra Deb (13)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Intro to ML - Types, Models, Metrics & Examples

  • 2. Outline • Introduction • Types of Learning • Machine Learning Models • Definition • Regularization • Metrics • Feature Engineering • Real World Examples
  • 3. What is Machine Learning? • “Learning is any process by which a system improves performance from experience.” - Herbert Simon • Machine learning is training computers to effectively achieve a performance criterion using examples or historical data.
  • 4. Why? Machine Learning is used when  Human expertise is unavailable (space expeditions).  Human expertise is not explicable (speech translation).  Information need to be personalized (education, medicine).  Domains with huge amount of data.
  • 5. Applications • Education --- Developing Learning Paths for Students • Healthcare --- Personalized Medicine • Retail --- Product recommendations • Web --- Search • Manufacturing --- robotics, control • Finance – fraud detection, asset management • HR --- people analytics • Medical --- drug discovery, automated diagnosis • ………..
  • 6. Types of Learning Supervised Unsupervised  Examples or training data is available o Human annotations, user Interactions  Data contains features correlated with the desired outcome  A model is learned from the examples  Goal of the model is to predict future behavior  Direct examples are not available  Data contains features correlated, outcome may not be defined.  It is possible to create clusters correlated with the learning objective based on patterns in the data Learning Objective ---
  • 7. Types of Learning Supervised Unsupervised  Regression  Linear  Decision Trees  Classification  Logistic Regression  Naïve Bayes  SVM  Decision Trees – RF, GBDT  Clustering --- kmeans  Similarity based results  Transfer Learning Models ---
  • 8. Linear Regression Linear regression was developed in the field of statistics and is studied as a model for understanding the relationship between input and output numerical variables, but has been borrowed by machine learning. It is both a statistical algorithm and a machine learning algorithm. Linear Regression | simple linear regression | Ordinary Least Squares | multiple linear regression • The dependent variable Y has a linear relationship to the independent variable X • For each value of X, the probability distribution of Y has the same standard deviation σ. • For any given value of X, The Y values are independent, as indicated by a random pattern on the residual plot. • The Y values are roughly normally distributed (i.e., symmetric and unimodal). Example: Sales Prediction  company’s advertising spend on radio, TV, and newspapers.
  • 10. Standard Deviation -- single attribute •Standard Deviation (S) is for tree building (branching). •Coefficient of Deviation (CV) is used to decide when to stop branching. We can use Count (n) as well. •Average (Avg) is the value in the leaf nodes.
  • 11. Standard Deviation -- two attributes
  • 12. Standard Deviation Reduction The dataset is then split on the different attributes. The standard deviation for each branch is calculated. The resulting standard deviation is subtracted from the standard deviation before the split. The result is the standard deviation reduction.
  • 13. Divide the Dataset The dataset is divided based on the values of the selected attribute. This process is run recursively on the non-leaf branches, until all data is processed.
  • 14. Overcast does not need more splitting "Overcast" subset does not need any further splitting because its CV (8%) is less than the threshold (10%). The related leaf node gets the average of the "Overcast" subset.
  • 15. "Sunny" branch --- needs splitting
  • 17. Here is the final Tree
  • 18. Pros and Cons 1. Simple and easy to understand: Decision Tree looks like simple if-else statements which are very easy to understand. 2. Decision Tree can handle both continuous and categorical variables. 3. No feature scaling required: No feature scaling (standardization and normalization) required in case of Decision Tree as it uses rule based approach instead of distance calculation. 4. Handles non-linear parameters efficiently: Non linear parameters don't affect the performance of a Decision Tree 5. Decision Tree can automatically handle missing values. 6. Decision Tree is usually robust to outliers and can handle them automatically. 1. Overfitting: This is the main problem of the Decision Tree 2. High variance: Due to the overfitting, there are very high chances of high variance in the output which leads to many errors in the final estimation and shows high inaccuracy in the results. In order to achieve zero bias (overfitting), it leads to high variance. 3. Unstable: Adding a new data point can lead to re-generation of the overall tree and all nodes need to be recalculated and recreated. 4. Affected by noise: Little bit of noise can make it unstable which leads to wrong predictions.
  • 19. Applications • Trendline --- A trend line represents a trend, the long-term movement in time series data after other components have been accounted for. It tells whether a particular data set (say GDP, oil prices or stock prices) have increased or decreased over the period of time. • Epidemiology --- Early evidence relating tobacco smoking to mortality and morbidity came from observational studies employing regression analysis. In order to reduce spurious correlations when analyzing observational data, researchers usually include several variables in their regression models in addition to the variable of primary interest. • Finance --- The capital asset pricing model uses linear regression as well as the concept of beta for analyzing and quantifying the systematic risk of an investment. This comes directly from the beta coefficient of the linear regression model that relates the return on the investment to the return on all risky assets. • Economics --- Linear regression is the predominant empirical tool in economics. For example, it is used to predict consumption spending,[20] fixed investment spending, inventory investment, purchases of a country's exports,[21] spending on imports,[21] the demand to hold liquid assets,[22] labor demand,[23] and labor supply.[23] https://en.wikipedia.org/wiki/Linear_regression
  • 20. Classification http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf Values of Y, i.e the response can take discrete values o Binary Classification – response can belong to two classes (0,1) Rating – thumbs up, thumbs down. o Multi-class classification – there can be n classes  [1, …. n]  Movie/restaurant rating can range from [1,2,3,4,5] o Multi-class , multi – label classification. Example – Concept space in Probability has many concepts (classes) – [Probability, Bayes Theorem, DiscretePDF – Binomial, Poisson, Continuous PDF – normal, exponential, …] A question will typically belong to multiple classes.
  • 21. Classification Pipeline Data Cleaning Collecting Training Data Model Building Offline SME • Reduces noise • Ensures quality • Improves overall performance • Training Data Collection / Examples of classes that we are trying to model • Model performance is directly correlated with quality of training data Model Evaluation • Model selection • Architecture • Parameter Tuning Users Online 21
  • 22. Logistic Regression -- Binary Classification Logistic Function or Sigmoid function https://sebastianraschka.com/faq/docs/logistic-why-sigmoid.html -- discussion on the logit function Interpretation: Output is the probability that y=1 given x. Output > 0.5 represents y =1 and output < 0.5 represents y=0
  • 24. Multinomial Logistic / Softmax Sigmoid gets replaced by the softmax function. This applies when there are 1,…k classes.
  • 25. Regularization --- Linear /Logistic • Penalty against complexity o model does not up "peculiarities," "noise," or "imagines a pattern where there is none.” • Helps with generalizing to new data sets. • Addition of a bias to the model when suffers from high variance or overfitting L2 regularization L1 regularization
  • 26. Data --- Train, Test and Validation Train – 90% , Test –5 %, Validation –5% ---- the percentages can vary depending on the total size of the dataset. • Train --- data used to train the model • Validation --- data that the model has not seen but is used for parameter tuning, i.e the model is optimized based on performance on this set. • Test --- model has not seen this data, this data is not used in any part of the computation. Final performance metrics are reported on this data.
  • 27. Bias – Variance Tradeoff BIAS — When we model in a very simple, novice way, for example just a single linear equation prediction for an actual complex model. Of course because of this Model becomes Under fit and miss out various important insights and relations between variables. VARIANCE — On the other hand, when we become over concern for a simple given data and fit a model in a very complicated way, it results in Over fit. So each noise and outlier will be considered as valid data point and modelled accordingly.
  • 28. Decision Tree Classifications Advantage --- • Results are interpretable • Works for both numerical and categorical data • Does not require feature transformations (example --- normalization, scaling) • Robust to Multicollinearity – correlated features. Single decision are rarely used in practice Disadvantage --- • Unstable --- small changes in data can lead to large structural changes in the decision tree • Prone to overfitting • Easily becomes complex
  • 30. Ensemble Tecniques --- Random Forest Many trees are better than one! N slightly differently trained decision trees and merges them together to get more accurate and stable predictions.
  • 31. Regularization • Limit tree depth. • Pruning • Penalize selection of new features over features that have similar gain • Set stricter stopping criterion on when to split a node further (e.g. min gain, number of samples etc.) Feature Importance A feature’s importance score measures the contribution from the feature. It is based on the impurity reduction of the class due to the feature. Bias towards features with more categories It is important to compute the correlation with accuracy.
  • 32. Unsupervised Learning There is no training data …
  • 33. Clustering • Hierarchical clustering • K-means clustering • K-NN (k nearest neighbors) Clustering is a technique that finds groups (clusters) in the data that have similar patterns.
  • 35. Similarity based Recommendations • Text data – news article, study material , books, … • For every piece of content compute the distance to all every other content in the cluster --- save the top –n in a database • When a user views any content surface the top-n (typically 5) other similar items
  • 36. Applications • Recommendations based on Text Similarity • Customer Segmentation • Content Categorization • As a pre-analysis for supervised learning
  • 37. Performance Metrics Is the model good enough?
  • 38. Regression R² Error: The metric helps us to compare our current model with a constant baseline and tells us how much our model is better. The constant baseline is chosen by taking the mean of the data and drawing a line at the mean. Adjusted R²: Adjusted R² depicts the same meaning as R² but is an improvement of it. R² suffers from the problem that the scores improve on increasing terms even though the model is not improving which may misguide the researcher. Adjusted R² is always lower than R² as it adjusts for the increasing predictors and only shows improvement if there is a real improvement
  • 39. Classification True Positive --- Number of observations that model correctly predicts the positive class False Positive --- Number of observations where model incorrectly predicts the positive class. False Negatives --- Number of observations where model incorrectly predicts the negative class. True Negatives --- Number of observations where model correctly predicts the negative class https://en.wikipedia.org/wiki/Precision_and_recall
  • 40. Classification https://en.wikipedia.org/wiki/Precision_and_recall Precision : TP/(TP+FP) --- what percentage of the positive class is actually positive? Recall : TP/(TP+FN) --- what percentage of the positive class gets captured by the model? Accuracy --- (TP+TN)/(TP+FP+TN+FN) --- what percentage of predictions are correct?
  • 41. Thresholding --- Coverage In a binary classification if you choose randomly the probability of belonging to a class is 0.5 0.3 0.7 It is possible improve the percentage of correct results at the cost of coverage.
  • 42. Confusion Matrix Good for checking where your model is incorrect For multi-class classification it reflects which classes are correlated
  • 44. Mean Reciprocal Rank For implicit dataset, the relevance score is either 0 or 1, for items not bought or bought (not clicked or clicked, etc.).
  • 45. Mean Average Precision Precision@k would be the fraction of relevant items in the top k recommendations, and recall@k would be the coverage of relevant times in the top k.
  • 46. Normalized Cumulative Discounted Gain (NDCG) Gain Gain for an item is essentially the same as the relevance score Cumulative Gain Cumulative Gain is defined as the sum of gains up to a position k This measure is independent of ordering. Discounted Cumulative Gain By diving the gain by its rank, we sort of push the algorithm to place highly relevant items to the top to achieve the best DCG score. DCG score adds up with the length of recommendation list. Ideal Discounted Cumulative Gain Nirmalizing to incorporate lists of different lengths Between 0,1
  • 48. Imputation or missing values https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114#1c08 Drop rows with missing values Cons: Reduces training data. In case of multiple features values may only be missing in a subset of features. Replace with a reasonable value – median is common Cons: There are assumptions used when values are missing which may not be correct. Categorical Imputation --- replace with the most common value. Cons: Assumptions
  • 50. Binning The trade-off between performance and overfitting is the key point of the binning process.
  • 51. Log Transform • It helps to handle skewed data and after transformation, the distribution becomes more approximate to normal. • In most of the cases the magnitude order of the data changes within the range of the data. For instance, the difference between ages 15 and 20 is not equal to the ages 65 and 70. In terms of years, yes, they are identical, but for all other aspects, 5 years of difference in young ages mean a higher magnitude difference. This type of data comes from a multiplicative process and log transform normalizes the magnitude differences like that. • It also decreases the effect of the outliers, due to the normalization of magnitude differences and the model become
  • 53. Scaling -- all features have the same range The continuous features become identical in terms of the range, after a scaling process. This process is not mandatory for many algorithms, but it might be still nice to apply. However, the algorithms based on distance calculations such as k-NN or k-Means need to have scaled continuous features as model input.
  • 54. Normalization Normalization (or min-max normalization) scale all values in a fixed range between 0 and 1.
  • 56. Real World Application There is no training data …
  • 57. Use Case --- Predict will a user buy a book? Online Book Store
  • 58. Generating Features Book Features User Features Book-User Features • Tags --- genre, subject, … • Level – Beginner, Intermediate Advanced • Popularity score – 1. exponential decay on clicks 2. Create time bound scores, such number of views in the last 7 days, last 14 days • Price • Length of the book • … • Tags --- genre, subject, … • Level – Beginner, Intermediate Advanced Derived from user interactions on the site. • View Score 1. Exponential decay clicks on books 2. Time bound --- number of views in past 7/14 days • Price category • Time of day – categorical • … • Number of Views in past 14/30 days • Already bought • Number of views from the same author • …
  • 59. Response Variable & Modeling • If the site is not super active you might not have enough data on purchases • Multi-stage model • Stage 1 : response variable views – i.e will the user view/click on this book in the next 3 days? • Stage 2: response variable purchase – will the user purchase this book in the next 3 days. The probability of view will be a feature in the Stage 2 model. Modeling: We have a mix of categorical and numerical features --- Random Forest
  • 60. How to start projects in machine learning? • Kaggle competitions --- • Make sure to solve the ML problems for concept development before competing
  • 61. How to start projects in machine learning? • Self guided workshops/projects --- lets say you have data from Zomato • Restaurant recommendation -- user based, content similarity based. • Restaurant tags from reviews. • Sentiment analysis from reviews.
  • 64. K-Nearest Neighbors K-Nearest Neighbor is a classification algorithm that leverages observations close to the target point to decide which class it belongs to There are two parts of the algorithm: first, how to measure “close”; second, how many close observations (K) we need. https://github.com/spotify/annoy
  • 66. Derivative of the sigmoid & cost function
  • 68. Loss Functions • Cross-Entropy • Hinge • Huber • Kullback-Leibler • MAE (L1) • MSE (L2)
  • 69. Optimization Techniques • Gradient Descent oStochastic o Mini Batch • Adagrad • RMSprop • Adam • Cross-Entropy • Others

Notas del editor

  1. This is how a child learns to navigate the world, they will tocuh a hot surface , get hurt and not do that again.
  2. build in the field of statistics, widely used in machine learning. third point errors are random normality, independence and constant variance of errors.
  3. Because the number of data points for both branches (FALSE and TRUE) is equal or less than 3 we stop further branching and assign the average of each branch to the related leaf node.
  4. "rainy" branch has an CV (22%) which is more than the threshold (10%). This branch needs further splitting. We select "Windy" as the best best node because it has the largest SDR. 
  5. When the number of instances is more than one at a leaf node we calculate the average as the final value for the target.
  6. . In order to fit the data (even noisy data), it keeps generating new nodes and ultimately the tree becomes too complex to interpret. In this way, it loses its generalization capabilities. It performs very well on the trained data but starts making a lot of mistakes on the unseen data.
  7.  For example, in a regression model in which cigarette smoking is the independent variable of primary interest and the dependent variable is lifespan measured in years, researchers might include education and income as additional independent variables, to ensure that any observed effect of smoking on lifespan is not due to those other socio-economic factors.
  8. if y=1 and your predicts 0 you are penalized heavily. Conversely if y=0 and your model 1 the penalization is infinite.
  9. L2 has one very important advantage to L1, and that is invariance to rotation and scale. The reason for using L1 norm to find a sparse solution is due to its special shape. It has spikes that happen to be at sparse points. Using it to touch the solution surface will very likely to find a touch point on a spike tip and thus a sparse solution. L1 regularization can address the multicollinearity problem by constraining the coefficient norm and pinning some coefficient values to 0.  L1-norm has the property of producing many coefficients with zero values or very small values with few large coefficients. Computational efficiency. L1-norm does not have an analytical solution, but L2-norm does. This allows the L2-norm solutions to be calculated computationally efficiently.
  10. Simple decision tree
  11. Highest Information gain Lowest gini impurity higher Chi-Square value
  12. https://towardsdatascience.com/regression-an-explanation-of-regression-metrics-and-what-can-go-wrong-a39a9793d914#:~:text=Mean%20Squared%20Error%3A%20MSE%20or,preferred%20metrics%20for%20regression%20tasks.&text=Root%20Mean%20Squared%20Error%3A%20RMSE,value%20predicted%20by%20the%20model.
  13. In a binary classification if you choose randomly the probability of belonging to a class is 0.5
  14. Gain: By swapping the relative order of any two items, the CG would be unaffected. This is problematic when ranking order is important. For example, on Google Search results, you would obviously not like placing the most relevant web page at the bottom.
  15. Having features on a similar scale can help the gradient descent converge more quickly towards the minima. Scaling prevents higher weight being given to features with higher magnitudes.
  16. This transformation does not change the distribution of the feature and due to the decreased standard deviations, the effects of the outliers increases. Therefore, before normalization, it is recommended to handle the outliers.
  17. Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization. https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/