SlideShare una empresa de Scribd logo
1 de 43
Introduction to
Machine Learning
GirishGore
Introducing the Speaker
• Girish Gore : 10+Years of Experience in Data Analytics / Data Science
• B.E. Computer Science fromVIT Pune , M.S. from BITS Pilani
• SpentTime on Data Products Mainly In companies like
• Cognizant (InnovationsGroup)
• SAS (Pricing & Revenue Management)
• VuClip (Video Entertainment)
• Shoptimize (E-Commerce)
• Worked in fields like
• Text Mining
• Forecasting and Optimization
• Recommender Systems
Knowing the Audience
Average Experience in Industry ?
Average ML Experience ?
UnderstandingTerminologies
Artificial Intelligence
AI involves machines that can perform tasks that are characteristic of human
intelligence.
Machine Learning
Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed.
Deep Learning
Deep Learning is an attempt to mimic the workings of the brain. Deep
Learning is one of many approaches to machine learning
The Hierarchy
Traditional Programming vs Machine Learning
• If Programming automates processes ,
Machine Learning automates Program
generation i.e. Automation.
• Data and output is run on the computer to
create a program.This program can be used
in traditional programming
What is Machine Learning ?
• Machine Learning is
• study of algorithms that
• improve their performance at a particular task
• with experience ( previous data , output)
• Optimize a performance criterion using example data or past experience
• Role of Computer Science : Efficient Algorithms
• Solve the optimization problem
• Represent and Evaluate the model for inference
Why are we here Now !!! GoogleTrends !!
• Exponential increase in Data generation , accumulation
• Increasing computational power
• Growing progress in available algorithms and Research
• Software becoming too complex to write by hand
Common Applications of Machine Learning
• Web search: ranking page based on what you are most likely to click on.
• Finance: decide who to send what credit card offers to. Evaluation of risk on credit
offers. How to decide where to invest money.
• E-commerce: Predicting customer churn.Whether or not a transaction is fraudulent.
• Robotics: how to handle uncertainty in new environments.Autonomous. Self-driving car.
• Information extraction:Ask questions over databases across the web.
• Social networks: Data on relationships and preferences. Machine learning to extract value
from data.
• Debugging: Use in computer science especially in Labor intensive processes like
debugging. Could suggest where the bug could be
• Gaming, IBMWatson
Types Of Machine Learning
• Learning Associations
• Supervised Learning
• Regression
• Classification
• Un Supervised Learning
• Reinforcement Learning
• Semi supervised Learning
• Training data includes a few desired outputs. Between supervised and un supervised
Learning Associations
• Market Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y where X and Y
are products/services.
Example: P ( diaper| beer ) = 0.7
TransactionID BasketItems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper,Coke
Learning Associations
• Support : The probability of the customer buying diaper and beer together
among all sales transactions (Higher support the better)
• Confidence : Suppose that if a customer pick up diaper. How he/she is likely
to buy beer? (Closer to 1 better)
• Lift : Lift is a true comparison between naive model and our model,
meaning that how more likely a customer buy both, compared to buy
separately? (Lift > 1)
Supervised Learning
• Supervised Learning is a Machine Learning task of inferring a generalized function
from labelled training data. Training data includes desired outputs.
Example: Spam Detection , Credit Scoring , Face Detection
• In Supervised Learning for spam detection we have
• Email Contents with Labels marking Spam or Non Spam
• Task is to label newer emails
• Main two types of Supervised Learning Problems
• Regression
• Classification
Supervised Learning
• Regression Problems
• Maps input data to a continuous prediction variable
• Example: Predicting Retail house prices (Price as continues variable)
• Classification Problems
• Maps input data to a set of predefined classes
• Example: Benign or MalignantTumours
Regression : House Price Prediction
• We have historic data about size of house and the price for last 1 year
• Task is to predict the Price of House given its size
•Model Derivation:
Price = Slope of Line * Size + Constant
Classification : Credit Scoring
We have labelled data of low and high risk customers.
Task is differentiating between low-risk and high-risk customers from their
income and savings.
Model Derivation:
IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Un Supervised Learning
• Training data does not include desired output.
Task is to find hidden structure in unlabeled data
• CommonApproaches to Un Supervised Learning
• Clustering or Segmentation ( Customer Segmentation)
• Dimensionality Reduction ( PCA (Principal ComponentAnalysis) , SVD
(SingularValue Decomposition))
• Summarization
Un Supervised Learning
• Customer Segmentation: Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing programs.
• The clustering algorithm
forms 3 different groups of
customers to target.
Reinforcement Learning
• Learning from interaction with the environment to achieve a goal.
Rewards from a sequence of actions.
• Every Action has either a
• Reward OR
• Observation
• Examples
• Self Driving Cars
• Recommender Systems
•Stanford Research Link
https://www.cs.utexas.edu/~eladlieb/RLRG.html
ML – Data Science Relationship
Supervised Learning
Linear Regression
Linear Regression
22
• In statistics, linear regression is an approach for modeling the
relationship between a scalar dependent variable y and one or more
explanatory variables (or independent variables) denoted X
• The case of one explanatory variable is called simple linear
regression
• For more than one explanatory
variable, the process is
called multiple linear regression
https://en.wikipedia.org/wiki/Linear_regression
From School Book :
Linear Equations
Y
Y = mX + b
b = Y-intercept
X
Change
in Y
Change in X
m = Slope
Linear Regression : A Common Example
24
Ohm’s Law:
• In physics, it is observed that the relationship between Voltage (V), Current (I)
and Resistance (R) is a linear relationship expressedas
V = I * R
I = V / R
• In a circuit board for a given Resistance R,
as you increase the VoltageV,
the Current I increases proprotionately
http://www.electronics-tutorials.ws/dccircuits/dcp_1.html
Sample Monthly Income-Expense Data of a Household
25
Monthly Income
(in Rs.)
Monthly Expense
(in Rs.)
5,000 8,000
6,000 7,000
10,000 4,500
10,000 2,000
12,500 12,000
14,000 8,000
15,000 16,000
18,000 20,000
19,000 9,000
20,000 9,000
20,000 18,000
22,000 25,000
23,400 5,000
24,000 10,500
24,000 10,000
We have to find the relationship between Income and Expenses
of a household
y = 0.3008x + 6319.1
R² = 0.4215
0
40000
30000
20000
10000
50000
60000
MonthlyExpense
Monthly Income
Income Vs. Expense
Line of Best Fit
26
0
10000
20000
30000
40000
50000
60000
MonthlyExpense
Monthly Income
IncomeVs.Expense
Which of these lines best
describe the relationship
between Household Income
and Expenses ?
27
0
10000
20000
30000
40000
50000
60000
MonthlyExpense
Monthly Income
Income Vs. Expense
The Line of Best Fit will be the
one where Sum of Square of
Error (SSE) term will be
nique)
sample
on
)
)
get
Xi
X
b =
)2
ii
i i i i
nX -(
X Y
21
minimum (OLSTech
Err or (em = ym - ym)
Yi(hat) = bo + b1Xi isthe
regression equati
SSE = ei(hat
2 (1)
)
= (Yi -Y(i(hat))2 (2
= (Yi - bo - b1Xi)2 (3
Using calculus we
Error (en)
Yi -b1
bo =
n
n XY -
Line of Best Fit
Least Squares
• ‘Best Fit’ Means Difference Between ActualYValues & PredictedYValues is
a Minimum. But Positive Differences Off-Set Negative ones. So square
errors!
• LS Minimizes the Sum of the Squared Differences (errors) (SSE)
   

n
i
i
n
i
ii YY
1
2
1
2
ˆˆ 
Simple Linear Regression in R
29
### CODE SNIPPET ###
?cars
# Investigating the basics of the data set
str(cars)
attributes(cars)
Examining the data
30
### CODE SNIPPET ###
# How speed and distance value summaries look. NA’s ?
summary(cars)
# Is there a correlation between speed and time to stop
cor(cars$speed, cars$dist)
Plotting the data
31
### CODE SNIPPET ###
plot(cars, main=“Distance between Speed and Distance to Stop”)
scatter.smooth(cars,lpars = list(col = "red", lwd = 3 , lty = 3))
boxplot(cars$dist, main="Outliers for Distance")
plot(density(cars$speed) , main="Density Distribution of Speed" ,
type="h",col="blue")
Basic Linear Model
32
### CODE SNIPPET ###
linear_model = lm(dist ~ speed , data=cars)
summary(linear_model)
CoefficientAnalysis
33
• Coefficient - Estimate
• Y intercept given is -17.5791
• Every 1 mph increase in the speed of a car, the required distance to stop goes up by 3.9324 feet.
• Coefficient - Standard Error
• The coefficient Standard Error measures the average amount that the coefficient estimates vary from
the actual average value of our response variable.We’d ideally want a lower number relative to its
coefficients.
• Coefficient - t value
• The coefficient t-value is a measure of how many standard deviations our coefficient estimate is far
away from 0.We want it to be far away from zero as this would indicate we could reject the null
hypothesis - that is, we could declare a relationship between speed and distance exist. In general, t-
values are also used to compute p-values.
• Coefficient - Pr(>t)
• A small p-value for the intercept and the slope indicates that we can reject the null hypothesis which
allows us to conclude that there is a relationship between speed and distance.
ResidualAnalysis
### CODE SNIPPET ###
pred_dist <- predict(linear_model, newdata=cars)
residuals <- cars$dist - pred_dist
summary(residuals)
plot(pred_dist , residuals,
xlab=" PredictedValues" ,
ylab=" Residuals" ,
main=" Residual Plot" , col="blue")
Which residual plot suggest good
fit ? : Poll
35
Residual Standard Error
36
• Residual Standard Error is measure of the quality of a linear
regression fit.
• The Residual Standard Error is the average amount that the response
(dist) will deviate from the true regression line.
• In our example, the actual distance required to stop can deviate from
the true regression line by approximately 15.3795867 feet, on
average. (Which is ~ 3.93 * 4 times)
• The Residual Standard Error was calculated with 48 degrees of
freedom. Simplistically, degrees of freedom are the number of data
points that went into the estimation of the parameters
Coefficient of Determination
• In statistics, the coefficient of determination, denoted R2 or r2 and pronounced
"R squared", is a number that indicates the proportion of the variance in the
dependent variable that is predictable from the independent variable(s)
• The R2 we get is 0.6511. Roughly 65% of the variance found in the response
variable (distance) can be explained by the predictor variable (speed)
• R2 value significance is relative to domain , Adjusted R2 used for multi linear
https://en.wikipedia.org/wiki/Coefficient_of_determination
F Statistics & PValue
• Indicator of whether there is a relationship between our predictor and the
response variables
• Greater than 1 suggests we can reject the null hypothesis : No relation between
speed and distance exists
• We can consider a linear model to be statistically significant only when both
these p-Values are less that the pre-determined statistical significance level,
which is ideally 0.05
Summary
What allWe did ?
• Examined the data
• Plotting the data
• Simple Linear Regression Model Creation
• Co efficient Analysis
• Residual Analysis
• R2 Analysis
• F Statistics
Is the current state of model good to be deployed /
used on live ?
Evaluation of Model : SplitTrain /Test
### CODE SNIPPET ###
## 80% of the sample size
sample_size <- floor(0.80 * nrow(cars))
## set the seed to make your partition reproductible
set.seed(123)
train_index <- sample(seq_len(nrow(cars)), size = sample_size)
train <- cars[ train_index, ]
test <- cars[-train_index, ]
linear_model_subset <- lm(dist ~ speed, data=train)
distPred <- predict(linear_model_subset, test)
summary(linear_model_subset)
plot(distPred, test$dist)
RMSE :To compare between models
### CODE SNIPPET ###
rmse <-function(error)
{
sqrt(mean(error^2))
}
print(rmse(test$dist - distPreds))
• RMSE : Root Mean Squared Error
• Average Distance between the observed values and the model predictions
OR
• How far are the residuals from zero
Food for thought !!!
Is the test / train split model the best
generalization we have ??
.. Covered in Upcoming Sessions

Más contenido relacionado

La actualidad más candente

Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning BasicsSuresh Arora
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using PythonSavitaHanchinal
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningSardar Alam
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZCharles Vestur
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...Edge AI and Vision Alliance
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learningdrcfetr
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 

La actualidad más candente (20)

Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using Python
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
 
An overview of machine learning
An overview of machine learningAn overview of machine learning
An overview of machine learning
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 

Similar a Introduction to machine learning and model building using linear regression

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Nikolaos Aletras
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Market Basket Analysis Revisited using SQL Pattern Matching
Market Basket Analysis Revisited using SQL Pattern Matching Market Basket Analysis Revisited using SQL Pattern Matching
Market Basket Analysis Revisited using SQL Pattern Matching Shankar Somayajula
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxCloudBusiness2
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesRevolution Analytics
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxHarshitGoel87
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxShahbazKhan77289
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 

Similar a Introduction to machine learning and model building using linear regression (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
Market Basket Analysis Revisited using SQL Pattern Matching
Market Basket Analysis Revisited using SQL Pattern Matching Market Basket Analysis Revisited using SQL Pattern Matching
Market Basket Analysis Revisited using SQL Pattern Matching
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptxModule_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 

Último

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Último (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Introduction to machine learning and model building using linear regression

  • 2. Introducing the Speaker • Girish Gore : 10+Years of Experience in Data Analytics / Data Science • B.E. Computer Science fromVIT Pune , M.S. from BITS Pilani • SpentTime on Data Products Mainly In companies like • Cognizant (InnovationsGroup) • SAS (Pricing & Revenue Management) • VuClip (Video Entertainment) • Shoptimize (E-Commerce) • Worked in fields like • Text Mining • Forecasting and Optimization • Recommender Systems
  • 3. Knowing the Audience Average Experience in Industry ? Average ML Experience ?
  • 4. UnderstandingTerminologies Artificial Intelligence AI involves machines that can perform tasks that are characteristic of human intelligence. Machine Learning Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Deep Learning Deep Learning is an attempt to mimic the workings of the brain. Deep Learning is one of many approaches to machine learning
  • 6. Traditional Programming vs Machine Learning • If Programming automates processes , Machine Learning automates Program generation i.e. Automation. • Data and output is run on the computer to create a program.This program can be used in traditional programming
  • 7. What is Machine Learning ? • Machine Learning is • study of algorithms that • improve their performance at a particular task • with experience ( previous data , output) • Optimize a performance criterion using example data or past experience • Role of Computer Science : Efficient Algorithms • Solve the optimization problem • Represent and Evaluate the model for inference
  • 8. Why are we here Now !!! GoogleTrends !! • Exponential increase in Data generation , accumulation • Increasing computational power • Growing progress in available algorithms and Research • Software becoming too complex to write by hand
  • 9. Common Applications of Machine Learning • Web search: ranking page based on what you are most likely to click on. • Finance: decide who to send what credit card offers to. Evaluation of risk on credit offers. How to decide where to invest money. • E-commerce: Predicting customer churn.Whether or not a transaction is fraudulent. • Robotics: how to handle uncertainty in new environments.Autonomous. Self-driving car. • Information extraction:Ask questions over databases across the web. • Social networks: Data on relationships and preferences. Machine learning to extract value from data. • Debugging: Use in computer science especially in Labor intensive processes like debugging. Could suggest where the bug could be • Gaming, IBMWatson
  • 10. Types Of Machine Learning • Learning Associations • Supervised Learning • Regression • Classification • Un Supervised Learning • Reinforcement Learning • Semi supervised Learning • Training data includes a few desired outputs. Between supervised and un supervised
  • 11. Learning Associations • Market Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( diaper| beer ) = 0.7 TransactionID BasketItems 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper,Coke
  • 12. Learning Associations • Support : The probability of the customer buying diaper and beer together among all sales transactions (Higher support the better) • Confidence : Suppose that if a customer pick up diaper. How he/she is likely to buy beer? (Closer to 1 better) • Lift : Lift is a true comparison between naive model and our model, meaning that how more likely a customer buy both, compared to buy separately? (Lift > 1)
  • 13. Supervised Learning • Supervised Learning is a Machine Learning task of inferring a generalized function from labelled training data. Training data includes desired outputs. Example: Spam Detection , Credit Scoring , Face Detection • In Supervised Learning for spam detection we have • Email Contents with Labels marking Spam or Non Spam • Task is to label newer emails • Main two types of Supervised Learning Problems • Regression • Classification
  • 14. Supervised Learning • Regression Problems • Maps input data to a continuous prediction variable • Example: Predicting Retail house prices (Price as continues variable) • Classification Problems • Maps input data to a set of predefined classes • Example: Benign or MalignantTumours
  • 15. Regression : House Price Prediction • We have historic data about size of house and the price for last 1 year • Task is to predict the Price of House given its size •Model Derivation: Price = Slope of Line * Size + Constant
  • 16. Classification : Credit Scoring We have labelled data of low and high risk customers. Task is differentiating between low-risk and high-risk customers from their income and savings. Model Derivation: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 17. Un Supervised Learning • Training data does not include desired output. Task is to find hidden structure in unlabeled data • CommonApproaches to Un Supervised Learning • Clustering or Segmentation ( Customer Segmentation) • Dimensionality Reduction ( PCA (Principal ComponentAnalysis) , SVD (SingularValue Decomposition)) • Summarization
  • 18. Un Supervised Learning • Customer Segmentation: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs. • The clustering algorithm forms 3 different groups of customers to target.
  • 19. Reinforcement Learning • Learning from interaction with the environment to achieve a goal. Rewards from a sequence of actions. • Every Action has either a • Reward OR • Observation • Examples • Self Driving Cars • Recommender Systems •Stanford Research Link https://www.cs.utexas.edu/~eladlieb/RLRG.html
  • 20. ML – Data Science Relationship
  • 22. Linear Regression 22 • In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X • The case of one explanatory variable is called simple linear regression • For more than one explanatory variable, the process is called multiple linear regression https://en.wikipedia.org/wiki/Linear_regression
  • 23. From School Book : Linear Equations Y Y = mX + b b = Y-intercept X Change in Y Change in X m = Slope
  • 24. Linear Regression : A Common Example 24 Ohm’s Law: • In physics, it is observed that the relationship between Voltage (V), Current (I) and Resistance (R) is a linear relationship expressedas V = I * R I = V / R • In a circuit board for a given Resistance R, as you increase the VoltageV, the Current I increases proprotionately http://www.electronics-tutorials.ws/dccircuits/dcp_1.html
  • 25. Sample Monthly Income-Expense Data of a Household 25 Monthly Income (in Rs.) Monthly Expense (in Rs.) 5,000 8,000 6,000 7,000 10,000 4,500 10,000 2,000 12,500 12,000 14,000 8,000 15,000 16,000 18,000 20,000 19,000 9,000 20,000 9,000 20,000 18,000 22,000 25,000 23,400 5,000 24,000 10,500 24,000 10,000 We have to find the relationship between Income and Expenses of a household y = 0.3008x + 6319.1 R² = 0.4215 0 40000 30000 20000 10000 50000 60000 MonthlyExpense Monthly Income Income Vs. Expense
  • 26. Line of Best Fit 26 0 10000 20000 30000 40000 50000 60000 MonthlyExpense Monthly Income IncomeVs.Expense Which of these lines best describe the relationship between Household Income and Expenses ?
  • 27. 27 0 10000 20000 30000 40000 50000 60000 MonthlyExpense Monthly Income Income Vs. Expense The Line of Best Fit will be the one where Sum of Square of Error (SSE) term will be nique) sample on ) ) get Xi X b = )2 ii i i i i nX -( X Y 21 minimum (OLSTech Err or (em = ym - ym) Yi(hat) = bo + b1Xi isthe regression equati SSE = ei(hat 2 (1) ) = (Yi -Y(i(hat))2 (2 = (Yi - bo - b1Xi)2 (3 Using calculus we Error (en) Yi -b1 bo = n n XY - Line of Best Fit
  • 28. Least Squares • ‘Best Fit’ Means Difference Between ActualYValues & PredictedYValues is a Minimum. But Positive Differences Off-Set Negative ones. So square errors! • LS Minimizes the Sum of the Squared Differences (errors) (SSE)      n i i n i ii YY 1 2 1 2 ˆˆ 
  • 29. Simple Linear Regression in R 29 ### CODE SNIPPET ### ?cars # Investigating the basics of the data set str(cars) attributes(cars)
  • 30. Examining the data 30 ### CODE SNIPPET ### # How speed and distance value summaries look. NA’s ? summary(cars) # Is there a correlation between speed and time to stop cor(cars$speed, cars$dist)
  • 31. Plotting the data 31 ### CODE SNIPPET ### plot(cars, main=“Distance between Speed and Distance to Stop”) scatter.smooth(cars,lpars = list(col = "red", lwd = 3 , lty = 3)) boxplot(cars$dist, main="Outliers for Distance") plot(density(cars$speed) , main="Density Distribution of Speed" , type="h",col="blue")
  • 32. Basic Linear Model 32 ### CODE SNIPPET ### linear_model = lm(dist ~ speed , data=cars) summary(linear_model)
  • 33. CoefficientAnalysis 33 • Coefficient - Estimate • Y intercept given is -17.5791 • Every 1 mph increase in the speed of a car, the required distance to stop goes up by 3.9324 feet. • Coefficient - Standard Error • The coefficient Standard Error measures the average amount that the coefficient estimates vary from the actual average value of our response variable.We’d ideally want a lower number relative to its coefficients. • Coefficient - t value • The coefficient t-value is a measure of how many standard deviations our coefficient estimate is far away from 0.We want it to be far away from zero as this would indicate we could reject the null hypothesis - that is, we could declare a relationship between speed and distance exist. In general, t- values are also used to compute p-values. • Coefficient - Pr(>t) • A small p-value for the intercept and the slope indicates that we can reject the null hypothesis which allows us to conclude that there is a relationship between speed and distance.
  • 34. ResidualAnalysis ### CODE SNIPPET ### pred_dist <- predict(linear_model, newdata=cars) residuals <- cars$dist - pred_dist summary(residuals) plot(pred_dist , residuals, xlab=" PredictedValues" , ylab=" Residuals" , main=" Residual Plot" , col="blue")
  • 35. Which residual plot suggest good fit ? : Poll 35
  • 36. Residual Standard Error 36 • Residual Standard Error is measure of the quality of a linear regression fit. • The Residual Standard Error is the average amount that the response (dist) will deviate from the true regression line. • In our example, the actual distance required to stop can deviate from the true regression line by approximately 15.3795867 feet, on average. (Which is ~ 3.93 * 4 times) • The Residual Standard Error was calculated with 48 degrees of freedom. Simplistically, degrees of freedom are the number of data points that went into the estimation of the parameters
  • 37. Coefficient of Determination • In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is a number that indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s) • The R2 we get is 0.6511. Roughly 65% of the variance found in the response variable (distance) can be explained by the predictor variable (speed) • R2 value significance is relative to domain , Adjusted R2 used for multi linear https://en.wikipedia.org/wiki/Coefficient_of_determination
  • 38. F Statistics & PValue • Indicator of whether there is a relationship between our predictor and the response variables • Greater than 1 suggests we can reject the null hypothesis : No relation between speed and distance exists • We can consider a linear model to be statistically significant only when both these p-Values are less that the pre-determined statistical significance level, which is ideally 0.05
  • 40. What allWe did ? • Examined the data • Plotting the data • Simple Linear Regression Model Creation • Co efficient Analysis • Residual Analysis • R2 Analysis • F Statistics Is the current state of model good to be deployed / used on live ?
  • 41. Evaluation of Model : SplitTrain /Test ### CODE SNIPPET ### ## 80% of the sample size sample_size <- floor(0.80 * nrow(cars)) ## set the seed to make your partition reproductible set.seed(123) train_index <- sample(seq_len(nrow(cars)), size = sample_size) train <- cars[ train_index, ] test <- cars[-train_index, ] linear_model_subset <- lm(dist ~ speed, data=train) distPred <- predict(linear_model_subset, test) summary(linear_model_subset) plot(distPred, test$dist)
  • 42. RMSE :To compare between models ### CODE SNIPPET ### rmse <-function(error) { sqrt(mean(error^2)) } print(rmse(test$dist - distPreds)) • RMSE : Root Mean Squared Error • Average Distance between the observed values and the model predictions OR • How far are the residuals from zero
  • 43. Food for thought !!! Is the test / train split model the best generalization we have ?? .. Covered in Upcoming Sessions