Stat 1163 -correlation and regression

Stat-1163: Statistics in Environmental Science
Section B,
Chapter: Correlation and Regession
Md. Menhazul Abedin
Lecturer
Statistics Discipline
Khulna University, Khulna-9208
Email: menhaz70@gmail.com

Correlation
• Variable
– Independent variable (x)
– Dependent variable (y)
• They are related proportinaly or
anti proportionally obviously
• Let see an example
• Cut your coat (y) according to your cloths(x)

Correlation
YorX
XorY
Anti
proportional
Negative
Correlation
Opposite
Direction
Y
X
Proportional
Positive
Correlation
Same
Direction
Y
XProportional
Positive
Correlation
Same
Direction
Indicision
No
Correlation
No direction
X Y

Correlation
• Scatter plot: A scatterplot is a useful
summary of a set of bivariate data (two
variables), usually drawn before working out a
linear correlation coefficient or fitting a
regression line. It gives a good visual picture of
the relationship between the two variables,
and aids the interpretation of the correlation
coefficient or regression model.

Correlation
• OKKK....... How to measue correlation ??
– Correlation coefficient
• Correlation coefficient : Correlation coefficient is a
quantitative measure of the direction and strength of
linear relationship between two numerically
measured variables.

Correlation
• Correlation coefficient can be defined as

Correlation
• Assumptions:
– Both variables are measured on an interval or
ratio scale
– The variable follow normal distribution
– The relationship betweeen variables is linear
– The relationship is of adequate size to assume
normality

Correlation
Interpretation: The value of r is always
between +1 and –1.
• Exactly –1. A perfect downhill (negative) linear
relationship
• –0.70. A strong downhill (negative) linear
relationship
• –0.50. A moderate downhill (negative)
relationship
• –0.30. A weak downhill (negative) linear
relationship

Correlation
• 0. No linear relationship
• +0.30. A weak uphill (positive) linear relationship
• +0.50. A moderate uphill (positive) relationship
• +0.70. A strong uphill (positive) linear relationship
• Exactly +1. A perfect uphill (positive) linear
relationship

Correlation
• Misinterpretations:
– Does not demonstrate the causal relationship
between two variables
– R=0 does not mean that X and Y are not related,
but that they are not linearly related.
– Two variable can have a strong association but a
small correlation coefficient r, if the relationship is
not linear.

Correlation
• Properties:
– Coefficient of Correlation lies between -1 and +1:
The coefficient of correlation cannot take value less than -
1 or more than one +1. Symbolically,
-1<=r<= + 1 or | r | <1
– Coefficients of Correlation are independent of Change of
Origin:
This property reveals that if we subtract any constant
from all the values of X and Y, it will not affect the
coefficient of correlation.
– Coefficients of Correlation possess the property of
symmetry:
The degree of relationship between two variables is
symmetric .

Correlation
– Coefficient of Correlation is independent of
Change of Scale
This property reveals that if we divide or
multiply all the values of X and Y, it will not
affect the coefficient of correlation.
– Co-efficient of correlation measures only linear
correlation between X and Y.
– If two variables X and Y are independent,
coefficient of correlation between them will be
zero.

Example-2
• Exercise:
https://www.mathsisfun.com/data/correlation.html
Draw a scatter plot
Find corrrelation
coefficient

Example-3
• Correlation of Gestational Age and Birth Weight

Different correlations
• Sprearman ranks correlation
• Spurious correlations
• Intraclass correlation
• Tetrachoric correlation
• Point bi-serial correlation
• Bi-serial correlation

Further study
• Proof of properties
• Distribution of correlatio coefficient
• Test of correlation coefficient
– Zero correlation test
– Non-zero correlation test
– Paired correlation test

Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Predict Response Variable

What is a Model?
1. Representation of
Some Phenomenon
Non-Math/Stats Model

What is a Math/Stats Model?
1. Often Describe Relationship between Variables
2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)

Example-1
Do you think days with
temparature 100
,
130, 200, 250 𝐶 what will
be the sales scenario ?
This is a job of foreteller
But we are not that.
We have satatistics

Example-1
1. Need staight line
2. There are Infinite
straight line
3. Which line represent
the phenomena is
model
4. We have to modeling
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖
𝑦𝑖: Dependent variable
𝑥𝑖: Independent variable
𝜖𝑖: Random error
𝛼: Intercept
𝛽: Regression coefficient
Minimize error and find
out 𝛼 and 𝛽 for modeling
(model fitting)

Types of
Regression Models
Regression
Model
Multiple
1 dependent
2+ explanatory
variable
Linear Non-linear
Multivariate
2+ dependent variable
No restriction on
explanatory variable
Simple
1 dependent
1 explanatory
variable

Simple linear regression model
• 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖𝑖
– 𝑦𝑖: Dependent variable (known)
– 𝑥𝑖: Independent variable (known)
– 𝜖𝑖: Random error
– 𝛽0: Intercept (unknown)
– 𝛽1: Regression coefficient (unknown)
• Minimize the error and find out 𝛽0 and 𝛽1 for
modeling (model fitting)

Simple linear regression line
• 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 (regression line/ prediction equation)
– 𝑦𝑖: Fitted values
– 𝑥𝑖: Independent variable (known)
– 𝛽0: Estimated intercept (general mean)
– 𝛽1: Estimated regression coefficient (changing
rate/ slope)
• How to find 𝛽0 and 𝛽1 ????

Coefficient Equations
• Prediction equation
• Sample slope
•
• Sample Y - intercept
ii xy 10
ˆˆˆ  
  
  
 

21
ˆ
xx
yyxx
SS
SS
i
ii
xx
xy

xy 10
ˆˆ  

How to find 𝛽0 and 𝛽1 ????
• SLS/ OLS (Simple/ ordinary Least square)
• WLS (Weighted Least square)
• GLS (Generalized Least square)

Least Squares
• 1. ‘Best Fit’ Means Difference Between Actual Y
Values & Predicted Y Values Are a Minimum. But
Positive Differences Off-Set Negative. So square
errors!
𝑖=1
𝑛
(𝑦𝑖 − 𝑦𝑖)2
=
𝑖=1
𝑛
𝜀𝑖
2
• 2. LS Minimizes the Sum of the Squared
Differences (errors) (SSE)

Assumptions
1. The regression model is linear in parameters.
2. The regression model is correctly specified.
3. X’s are fixed over repeated sample.
4. Errors are normaly distributed with mean zero
and fixed variance i.e. 𝑒𝑖~𝑁(0, 𝜎2
).
5. No perfect multicollinearity.
6. No autocorrelation of residuals

Derivation of Parameters
• Least Squares (L-S):
Minimize squared error
 
22
0 1
1 1
n n
i i i
i i
y x  
 
   
 
 
22
0 1
0 0
0 1
0
2
i i iy x
ny n n x
  
 
 
   
 
 
   
 
xy 10
ˆˆ  

• Least Squares (L-S):
Minimize squared error
 
 
 
22
0 1
1 1
0 1
1 1
0
2
2
i i i
i i i
i i i
y x
x y x
x y y x x
  
 
 
 
   
 
 
   
    
 


   
     
1
1
1
ˆ
i i i i
i i i i
xy
xx
x x x x y y
x x x x x x y y
SS
SS



  
    

 
 

• Prediction equation
• Sample slope
•
• Sample Y – intercept
• 0
and 1
are called OLSE (ordinary least square
estimator)
ii xy 10
ˆˆˆ  
  
  
 

21
ˆ
xx
yyxx
SS
SS
i
ii
xx
xy

xy 10
ˆˆ  

Interpretation of Coefficients
• 1. Slope (1)
– Estimated Y changes by 1 for each 1 unit increase
in X
• If 1 = 2, then Y is expected to increase by 2 for each 1
unit increase in X
• 2. Y-Intercept (0
)
– Average value of Y when X = 0
• If 0
= 4, then average Y is expected to be 4
when X Is 0

Example -1
• Consider the data obtained from a chemical process where the yield of the
process is thought to be related to the reaction temperature (see the table
below).

• The least square estimates of the regression coefficients can
be obtained for the data in the preceding table as follows:
Example -1

• Once the fitted regression line is known, the fitted value
of corresponding to any observed data point can be
calculated. For example, the fitted value corresponding to the
21st observation in the preceding table is:
Example -1

Properties of regression coefficients
1. The correlation coefficient is the geometric mean of two
regression coefficients. Symbolically, it can be expressed
as 𝑟 = (𝛽 𝑥𝑦 𝛽 𝑦𝑥)
1
2
2. Arithmetic mean of both regression coefficients is equal
to or greater than coefficient of correlation.
𝛽 𝑥𝑦+𝛽 𝑦𝑥
2
≥ 𝑟
1. The value of the coefficient of correlation cannot exceed
unity. Therefore, if one of the regression coefficients is
greater than unity, the other must be less than unity.
2. The regression coefficients are independent of the change
of origin, but not of the scale.

Definition
• The regression analysis is a technique of
studying the dependence of one variable
(called dependent variable), on one or more
variables (called explanatory variable), with a
view to estimating or predicting the average
value of the dependent variable in terms of
the known or fixed values of the independent
variables.

Applications
• The regression technique is primarily used to
– Estimate the relationship that exists, on average,
between the dependent variable and explanatory
variable.
– Determine the effect each of the explanatory
variales on the dependent variable, controlling the
effects of all other explanatory variables.
– Predict the value of the dependent variable for a
given value of the explanatory variable.

Stat 1163 -correlation and regression

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Stat 1163 -correlation and regression

Similar a Stat 1163 -correlation and regression (20)

Más de Khulna University

Más de Khulna University (11)

Último

Último (20)

Stat 1163 -correlation and regression