SlideShare una empresa de Scribd logo
1 de 58
CORRELATION &REGRESSION
ANALYSIS Using SPSS
Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics)
www.paragstatistics.wordpress.com
Correlation
Correlation analysis is used to study the strength of
relationship between two or more quantitative
variables. Correlation shows the degree of linear
dependence between the two variables.
Correlation doesn’t imply causation.
If variables are not related by cause and effect
relationship but show correlation then such
correlation is called Spurious or Non-sense
correlation.
Correlation
Correlation can be positive, negative or zero
depending on the change between two variables.
If the change in two variables is in the same
direction it is positive correlation.
If the change in two variables is in the opposite
direction it is negative correlation.
If the change in one variable does not affect the
change in the other variable it is zero correlation.
Correlation
Coefficient
Correlation coefficient (r) is the measure of extent
of correlation between two variables.
There are several types of correlation coefficient
but the most popular is Karl Pearson’s correlation
coefficient.
Testing
Correlation
Coefficient
Null Hypothesis H0: 𝜌 = 0
[There is no significant linear correlation between two variables]
Alternative Hypothesis H1: 𝜌≠ 0
[There is significant linear correlation between two variables]
Test statistics: 𝐭 =
𝑟 𝑛−2
1−𝑟2
The test statistics t follows Student’s t distribution with 𝒏 − 𝟐
degrees of freedom.
Case Study
The body temperature (in 0
𝐹) for 100 adults were measured along with
their gender, age, and heart rate. Data: body_temp.xlsx .
Obtain correlation coefficient between body temperature and heart rate.
Also check its significance.
Null & Alternative
Hypothesis
Null Hypothesis H0: 𝜌 = 0
[There is no significant linear correlation between body
temperature and heart rate]
Alternative Hypothesis H1: 𝜌≠ 0
[There is significant linear correlation between body temperature
and heart rate]
Test Statistics t
and p value
Test Statistics t
and p value
Correlation coefficient (r) between two variables heart rate
and temperature is 0.448.
Here p value = 0.000 < 0.05, so null hypothesis is rejected.
Thus, there is significant linear correlation between Heart rate
and Temperature
Regression
Regression analysis is a set of statistical processes
for estimating the relationships between
a dependent variable (often called the 'outcome' or
'response' variable) and one or more independent
variables (often called 'predictors', 'covariates',
'explanatory variables' or 'features’).
Regression
Analysis
Regression analysis helps you understand how the
dependent variable changes when one of the
independent variables varies and allows to
mathematically determine which of those
variables really has an impact.
Regression analysis includes several variations,
such as linear, multiple linear, and nonlinear. The
most common models are simple linear and
multiple linear.
Types of Regression
Dependent variable Independent variable Type of
Regression
Relationship
between variables
One
(Scale )
One
(Scale)
Simple Linear Linear
One
(Scale)
Two or more
(Continuous / Categorical)
Multiple Linear Linear
One
( Categorical – binary)
Two or more
(Continuous / Categorical)
Logistic Need not be linear
One
( Categorical )
Two or more
(Continuous / Categorical)
Multinomial
Logistic
Need not be linear
Simple
Regression
The simple linear regression model is used to predict one
response (dependent) variable based on one predictor
(independent) variable.
The linear regression model can be stated as follows
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝑒𝑖 , 𝑖 = 1, 2, · · · , n.
where
• 𝑦𝑖 is value of the response variable,
• 𝑥𝑖 is the value of the predictor variable,
• 𝛽0 , 𝛽1are the parameters (regression coefficients),
• 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2.
Random Error
for this Xi value
Y
X
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
i
i
1
0
i ε
x
β
β
y 


Xi
Slope = β1
Intercept = β0
εi
Graphical representation
Assumptions of
Simple
Regression
The four important assumptions for a simple linear
regression model are :
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. i.e. V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
Method
The best line of fit can be obtained by the method of
least squares. It calculates the best line of fit for the
observed data by minimizing the sum of squares of the
vertical deviations from each data point to the line,
i.e., (𝑦𝑖 − 𝑦𝑖)2
Total variation is made up of two parts:
SSE
SSR
SST 

Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
 
 2
i )
Y
Y
(
SST  
 2
i
i )
Ŷ
Y
(
SSE
 
 2
i )
Y
Ŷ
(
SSR
where: = Mean value of the dependent variable
Yi = Observed value of the dependent variable
= Predicted value of Y for the given Xi value
i
Y
ˆ
Y
• SST = total sum of squares (Total Variation)
• Measures the variation of the Yi values around their mean 𝑌
• SSR = regression sum of squares (Explained Variation)
• Variation attributable to the relationship between X and Y
• SSE = error sum of squares (Unexplained Variation)
• Variation in Y attributable to factors other than X
Measures of Variations
Xi
Y
X
Yi
SST = (Yi - Y)2
SSE = (Yi - Yi )2

SSR = (Yi - Y)2

_
_
_
Y

Y
Y
_
Y

Measures of Variations
The Coefficient of determination is the portion of the total variation in the
dependent variable that is explained by variation in the independent variable.
The coefficient of determination is denoted as R2
1
R
0 2


Note:


SST
SSR
R2
Coefficient of Determination
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
The Adjusted R-squared is a modified
version of R-squared that adjusts for
predictors that are not significant in a
regression model.
Adjusted R Square
R-squared increases every time you add an
independent variable to the model. Adjusted R-
squared value increases only when the new term
improves the model fit more than expected by
chance alone. The adjusted R-squared value
actually decreases when the term doesn’t
improve the model fit by a sufficient amount.
Multiple
Regression
The multiple linear regression model is used to predict a
response (independent) variable based on two or more
predictor variable (dependent) variable.
The multiple linear regression model can be stated as follows
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖1 + 𝛽2𝑥𝑖2 + ⋯ … … + 𝛽𝑝𝑥𝑖𝑝 + 𝑒𝑖 , 𝑖 = 1,2, · · , n.
where
• 𝑦𝑖 is 𝑖𝑡ℎvalue of the response variable,
• 𝑥𝑖𝑗 is the 𝑖𝑡ℎ
observation of 𝑗𝑡ℎ
predictor variable,
• 𝛽0, 𝛽1, 𝛽2 …. 𝛽𝑝 are the parameters (regression coefficients),
• 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2
.
Case Study 1
The body temperature (in 0
𝐹) for 100 adults were measured along with
their gender, age, and heart rate. The data is stored in body_temp.xlsx file.
Built a linear regression model for body temperature using heart rate as a
predictor.
Regression
Regression
Multiple R = Correlation Coefficient = 0.45
R Square = Coefficient of Determination = 0.20
R Square = 0.20 shows that 20% of variations in temperature due to Heart Rate.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – Temperature,
significantly well.
ANOVA
H0: 𝛽1=0 [Regression coefficient for Heart Rate is
not significant]
H1: 𝛽1≠ 0 [Regression coefficient for Heart Rate is
significant]
p value of regression coefficient of Heart Rate = 0
< 0.05, H0 is rejected.
So , regression coefficient of Heart Rate is
significant.
Regression Coefficients
Regression Model:
Temperature = 92.391 + 0.081 Heart Rate
Checking
Assumptions
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
Linearity Assumption
Linearity Assumption
Assumption - Errors are Independently distributed
Assumption - Errors are Independently distributed
Value of Durbin-Watson is
1.804,which is close to 2.
So, the assumption that errors
are independently distributed is
met
Normality & Homoscedasticity Assumptions
Normality Assumptions
Points are very close to the
diagonal line, so the variable -
temperature is normally distributed
Homoscedastic Assumptions
The data does not have an obvious
pattern, there are points equally
distributed above and below zero on the
X axis, and to the left and right of zero
on the Y axis.
So homoscedasticity assumption is met.
Case Study 2
The data were collected on a simple random sample of 20
patients with hypertension. The dataset is in arterialBp.csv.
The variables are
Y = mean arterial blood pressure (mm Hg)
X1 = age (years), X2 = weight (kgs)
X3 = body surface area (sq. m)
X4 = duration of hypertension (years)
X5 = basal pulse (beats /min), X6 = measure of stress
Fit an appropriate regression equation.
Case Study 2
Regression
Regression
Multiple R = Correlation Coefficient = 0.997
R Square = Coefficient of Determination = 0.995
R Square = 0.995 shows that 99.5% of variations in blood pressure is due to age,
weight, bsa, hypertension, pulse and stress.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – blood pressure,
significantly well.
ANOVA
Regression Coefficients
Running the regression again after removing the insignificant variables:
hyper, pulse and stress
Multiple R = Correlation Coefficient = 0.997
R Square = Coefficient of Determination = 0.993
R Square = 0.993 shows that 99.3% of variations in blood pressure is due to age,
weight, bsa.
Model Summary
p value = 0 < 0.05.
So, there is enough evidence that fitted regression model is significant.
The regression model predicts the dependent variable – blood pressure,
significantly well.
ANOVA
Regression Coefficients
Regression Model:
Bp = -13.401 + 0.718 * Age + 0.896 * weight + 4.553 * bsa
Checking
Assumptions
• The regression model is Linear in parameter.
• The errors are Independently distributed.
• The errors are Normally distributed.
• The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2
.
( Homoscedasticity)
• There is no Multicollinearity
(No significant correlation between independent variables)
Linearity Assumptions
Linearity Assumptions
Linearity Assumptions
Normality & Homoscedasticity Assumptions
Normality Assumptions
Points are very close to the
diagonal line, so the variable - Bp is
normally distributed
Homoscedastic Assumptions
The data does not have an obvious
pattern, there are points equally
distributed above and below zero on the
X axis, and to the left and right of zero
on the Y axis.
So homoscedasticity assumption is met.
Assumption - Errors are Independently distributed
Assumption - Errors are Independently distributed
Value of Durbin-Watson is
1.537,which is close to 2.
So, the assumption that errors
are independently distributed
is met
Multicollinearity Assumptions
Multicollinearity Assumptions
Variance Inflation Factor(VIF) for all variables lie between 1 & 10, so there is no
multicollinearity. i.e. independent variables are do not have significant correlation between
them.
THANK YOU
Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics)
www.paragstatistics.wordpress.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Regression
Regression Regression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Statistical Inference
Statistical Inference Statistical Inference
Statistical Inference
 
Partial correlation
Partial correlationPartial correlation
Partial correlation
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spss
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 

Similar a Correlation & Regression Analysis using SPSS

Regression analysis
Regression analysisRegression analysis
Regression analysis
saba khan
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
Suchita Rawat
 

Similar a Correlation & Regression Analysis using SPSS (20)

Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
STATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELSSTATISTICAL REGRESSION MODELS
STATISTICAL REGRESSION MODELS
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Simple Regression.pptx
Simple Regression.pptxSimple Regression.pptx
Simple Regression.pptx
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Biostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptxBiostatistics Lecture on Correlation.pptx
Biostatistics Lecture on Correlation.pptx
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Chapter13
Chapter13Chapter13
Chapter13
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Regression -Linear.pptx
Regression -Linear.pptxRegression -Linear.pptx
Regression -Linear.pptx
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdfMSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
MSC III_Research Methodology and Statistics_Inferrential ststistics.pdf
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Correlations
CorrelationsCorrelations
Correlations
 

Más de Parag Shah

Más de Parag Shah (17)

Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptx
 
Non- Parametric Tests
Non- Parametric TestsNon- Parametric Tests
Non- Parametric Tests
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
 
Chi square tests using spss
Chi square tests using spssChi square tests using spss
Chi square tests using spss
 
Chi square tests using SPSS
Chi square tests using SPSSChi square tests using SPSS
Chi square tests using SPSS
 
t test using spss
t test using spsst test using spss
t test using spss
 
Basics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyBasics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for Pharmacy
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
 
Probability
Probability    Probability
Probability
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
 
Small sample test
Small sample testSmall sample test
Small sample test
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
Testing of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square testTesting of hypothesis - Chi-Square test
Testing of hypothesis - Chi-Square test
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample test
 
Statistics for Physical Education
Statistics for Physical EducationStatistics for Physical Education
Statistics for Physical Education
 
Career option for stats
Career option for statsCareer option for stats
Career option for stats
 

Último

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 

Último (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Correlation & Regression Analysis using SPSS

  • 1. CORRELATION &REGRESSION ANALYSIS Using SPSS Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics) www.paragstatistics.wordpress.com
  • 2. Correlation Correlation analysis is used to study the strength of relationship between two or more quantitative variables. Correlation shows the degree of linear dependence between the two variables. Correlation doesn’t imply causation. If variables are not related by cause and effect relationship but show correlation then such correlation is called Spurious or Non-sense correlation.
  • 3. Correlation Correlation can be positive, negative or zero depending on the change between two variables. If the change in two variables is in the same direction it is positive correlation. If the change in two variables is in the opposite direction it is negative correlation. If the change in one variable does not affect the change in the other variable it is zero correlation.
  • 4. Correlation Coefficient Correlation coefficient (r) is the measure of extent of correlation between two variables. There are several types of correlation coefficient but the most popular is Karl Pearson’s correlation coefficient.
  • 5. Testing Correlation Coefficient Null Hypothesis H0: 𝜌 = 0 [There is no significant linear correlation between two variables] Alternative Hypothesis H1: 𝜌≠ 0 [There is significant linear correlation between two variables] Test statistics: 𝐭 = 𝑟 𝑛−2 1−𝑟2 The test statistics t follows Student’s t distribution with 𝒏 − 𝟐 degrees of freedom.
  • 6. Case Study The body temperature (in 0 𝐹) for 100 adults were measured along with their gender, age, and heart rate. Data: body_temp.xlsx . Obtain correlation coefficient between body temperature and heart rate. Also check its significance.
  • 7. Null & Alternative Hypothesis Null Hypothesis H0: 𝜌 = 0 [There is no significant linear correlation between body temperature and heart rate] Alternative Hypothesis H1: 𝜌≠ 0 [There is significant linear correlation between body temperature and heart rate]
  • 9.
  • 10. Test Statistics t and p value Correlation coefficient (r) between two variables heart rate and temperature is 0.448. Here p value = 0.000 < 0.05, so null hypothesis is rejected. Thus, there is significant linear correlation between Heart rate and Temperature
  • 11. Regression Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features’).
  • 12. Regression Analysis Regression analysis helps you understand how the dependent variable changes when one of the independent variables varies and allows to mathematically determine which of those variables really has an impact. Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear.
  • 13. Types of Regression Dependent variable Independent variable Type of Regression Relationship between variables One (Scale ) One (Scale) Simple Linear Linear One (Scale) Two or more (Continuous / Categorical) Multiple Linear Linear One ( Categorical – binary) Two or more (Continuous / Categorical) Logistic Need not be linear One ( Categorical ) Two or more (Continuous / Categorical) Multinomial Logistic Need not be linear
  • 14. Simple Regression The simple linear regression model is used to predict one response (dependent) variable based on one predictor (independent) variable. The linear regression model can be stated as follows 𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝑒𝑖 , 𝑖 = 1, 2, · · · , n. where • 𝑦𝑖 is value of the response variable, • 𝑥𝑖 is the value of the predictor variable, • 𝛽0 , 𝛽1are the parameters (regression coefficients), • 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2.
  • 15. Random Error for this Xi value Y X Observed Value of Y for Xi Predicted Value of Y for Xi i i 1 0 i ε x β β y    Xi Slope = β1 Intercept = β0 εi Graphical representation
  • 16. Assumptions of Simple Regression The four important assumptions for a simple linear regression model are : • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. i.e. V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity)
  • 17. Method The best line of fit can be obtained by the method of least squares. It calculates the best line of fit for the observed data by minimizing the sum of squares of the vertical deviations from each data point to the line, i.e., (𝑦𝑖 − 𝑦𝑖)2
  • 18. Total variation is made up of two parts: SSE SSR SST   Total Sum of Squares Regression Sum of Squares Error Sum of Squares    2 i ) Y Y ( SST    2 i i ) Ŷ Y ( SSE    2 i ) Y Ŷ ( SSR where: = Mean value of the dependent variable Yi = Observed value of the dependent variable = Predicted value of Y for the given Xi value i Y ˆ Y • SST = total sum of squares (Total Variation) • Measures the variation of the Yi values around their mean 𝑌 • SSR = regression sum of squares (Explained Variation) • Variation attributable to the relationship between X and Y • SSE = error sum of squares (Unexplained Variation) • Variation in Y attributable to factors other than X Measures of Variations
  • 19. Xi Y X Yi SST = (Yi - Y)2 SSE = (Yi - Yi )2  SSR = (Yi - Y)2  _ _ _ Y  Y Y _ Y  Measures of Variations
  • 20. The Coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is denoted as R2 1 R 0 2   Note:   SST SSR R2 Coefficient of Determination 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
  • 21. The Adjusted R-squared is a modified version of R-squared that adjusts for predictors that are not significant in a regression model. Adjusted R Square R-squared increases every time you add an independent variable to the model. Adjusted R- squared value increases only when the new term improves the model fit more than expected by chance alone. The adjusted R-squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.
  • 22. Multiple Regression The multiple linear regression model is used to predict a response (independent) variable based on two or more predictor variable (dependent) variable. The multiple linear regression model can be stated as follows 𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖1 + 𝛽2𝑥𝑖2 + ⋯ … … + 𝛽𝑝𝑥𝑖𝑝 + 𝑒𝑖 , 𝑖 = 1,2, · · , n. where • 𝑦𝑖 is 𝑖𝑡ℎvalue of the response variable, • 𝑥𝑖𝑗 is the 𝑖𝑡ℎ observation of 𝑗𝑡ℎ predictor variable, • 𝛽0, 𝛽1, 𝛽2 …. 𝛽𝑝 are the parameters (regression coefficients), • 𝑒𝑖 is random error term with E(𝑒𝑖 ) = 0 and V (𝑒𝑖 ) = 𝜎2 .
  • 23. Case Study 1 The body temperature (in 0 𝐹) for 100 adults were measured along with their gender, age, and heart rate. The data is stored in body_temp.xlsx file. Built a linear regression model for body temperature using heart rate as a predictor.
  • 26. Multiple R = Correlation Coefficient = 0.45 R Square = Coefficient of Determination = 0.20 R Square = 0.20 shows that 20% of variations in temperature due to Heart Rate. Model Summary
  • 27. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – Temperature, significantly well. ANOVA
  • 28. H0: 𝛽1=0 [Regression coefficient for Heart Rate is not significant] H1: 𝛽1≠ 0 [Regression coefficient for Heart Rate is significant] p value of regression coefficient of Heart Rate = 0 < 0.05, H0 is rejected. So , regression coefficient of Heart Rate is significant. Regression Coefficients Regression Model: Temperature = 92.391 + 0.081 Heart Rate
  • 29. Checking Assumptions • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity)
  • 32. Assumption - Errors are Independently distributed
  • 33. Assumption - Errors are Independently distributed Value of Durbin-Watson is 1.804,which is close to 2. So, the assumption that errors are independently distributed is met
  • 35. Normality Assumptions Points are very close to the diagonal line, so the variable - temperature is normally distributed
  • 36. Homoscedastic Assumptions The data does not have an obvious pattern, there are points equally distributed above and below zero on the X axis, and to the left and right of zero on the Y axis. So homoscedasticity assumption is met.
  • 37. Case Study 2 The data were collected on a simple random sample of 20 patients with hypertension. The dataset is in arterialBp.csv. The variables are Y = mean arterial blood pressure (mm Hg) X1 = age (years), X2 = weight (kgs) X3 = body surface area (sq. m) X4 = duration of hypertension (years) X5 = basal pulse (beats /min), X6 = measure of stress Fit an appropriate regression equation.
  • 41. Multiple R = Correlation Coefficient = 0.997 R Square = Coefficient of Determination = 0.995 R Square = 0.995 shows that 99.5% of variations in blood pressure is due to age, weight, bsa, hypertension, pulse and stress. Model Summary
  • 42. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – blood pressure, significantly well. ANOVA
  • 43. Regression Coefficients Running the regression again after removing the insignificant variables: hyper, pulse and stress
  • 44. Multiple R = Correlation Coefficient = 0.997 R Square = Coefficient of Determination = 0.993 R Square = 0.993 shows that 99.3% of variations in blood pressure is due to age, weight, bsa. Model Summary
  • 45. p value = 0 < 0.05. So, there is enough evidence that fitted regression model is significant. The regression model predicts the dependent variable – blood pressure, significantly well. ANOVA
  • 46. Regression Coefficients Regression Model: Bp = -13.401 + 0.718 * Age + 0.896 * weight + 4.553 * bsa
  • 47. Checking Assumptions • The regression model is Linear in parameter. • The errors are Independently distributed. • The errors are Normally distributed. • The errors have Equal variances. That is V (𝑒𝑖 ) = 𝜎2 . ( Homoscedasticity) • There is no Multicollinearity (No significant correlation between independent variables)
  • 52. Normality Assumptions Points are very close to the diagonal line, so the variable - Bp is normally distributed
  • 53. Homoscedastic Assumptions The data does not have an obvious pattern, there are points equally distributed above and below zero on the X axis, and to the left and right of zero on the Y axis. So homoscedasticity assumption is met.
  • 54. Assumption - Errors are Independently distributed
  • 55. Assumption - Errors are Independently distributed Value of Durbin-Watson is 1.537,which is close to 2. So, the assumption that errors are independently distributed is met
  • 57. Multicollinearity Assumptions Variance Inflation Factor(VIF) for all variables lie between 1 & 10, so there is no multicollinearity. i.e. independent variables are do not have significant correlation between them.
  • 58. THANK YOU Dr Parag Shah | M.Sc., M.Phil., Ph.D. ( Statistics) www.paragstatistics.wordpress.com