3. Correlation
• Variable
– Independent variable (x)
– Dependent variable (y)
• They are related proportinaly or
anti proportionally obviously
• Let see an example
• Cut your coat (y) according to your cloths(x)
6. Correlation
• Scatter plot: A scatterplot is a useful
summary of a set of bivariate data (two
variables), usually drawn before working out a
linear correlation coefficient or fitting a
regression line. It gives a good visual picture of
the relationship between the two variables,
and aids the interpretation of the correlation
coefficient or regression model.
7. Correlation
• OKKK....... How to measue correlation ??
– Correlation coefficient
• Correlation coefficient : Correlation coefficient is a
quantitative measure of the direction and strength of
linear relationship between two numerically
measured variables.
9. Correlation
• Assumptions:
– Both variables are measured on an interval or
ratio scale
– The variable follow normal distribution
– The relationship betweeen variables is linear
– The relationship is of adequate size to assume
normality
10. Correlation
Interpretation: The value of r is always
between +1 and –1.
• Exactly –1. A perfect downhill (negative) linear
relationship
• –0.70. A strong downhill (negative) linear
relationship
• –0.50. A moderate downhill (negative)
relationship
• –0.30. A weak downhill (negative) linear
relationship
11. Correlation
• 0. No linear relationship
• +0.30. A weak uphill (positive) linear relationship
• +0.50. A moderate uphill (positive) relationship
• +0.70. A strong uphill (positive) linear relationship
• Exactly +1. A perfect uphill (positive) linear
relationship
12. Correlation
• Misinterpretations:
– Does not demonstrate the causal relationship
between two variables
– R=0 does not mean that X and Y are not related,
but that they are not linearly related.
– Two variable can have a strong association but a
small correlation coefficient r, if the relationship is
not linear.
13. Correlation
• Properties:
– Coefficient of Correlation lies between -1 and +1:
The coefficient of correlation cannot take value less than -
1 or more than one +1. Symbolically,
-1<=r<= + 1 or | r | <1
– Coefficients of Correlation are independent of Change of
Origin:
This property reveals that if we subtract any constant
from all the values of X and Y, it will not affect the
coefficient of correlation.
– Coefficients of Correlation possess the property of
symmetry:
The degree of relationship between two variables is
symmetric .
14. Correlation
– Coefficient of Correlation is independent of
Change of Scale
This property reveals that if we divide or
multiply all the values of X and Y, it will not
affect the coefficient of correlation.
– Co-efficient of correlation measures only linear
correlation between X and Y.
– If two variables X and Y are independent,
coefficient of correlation between them will be
zero.
24. Further study
• Proof of properties
• Distribution of correlatio coefficient
• Test of correlation coefficient
– Zero correlation test
– Non-zero correlation test
– Paired correlation test
26. Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Predict Response Variable
27. What is a Model?
1. Representation of
Some Phenomenon
Non-Math/Stats Model
28. What is a Math/Stats Model?
1. Often Describe Relationship between Variables
2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)
30. Example-1
Do you think days with
temparature 100
,
130, 200, 250 𝐶 what will
be the sales scenario ?
This is a job of foreteller
But we are not that.
We have satatistics
31. Example-1
1. Need staight line
2. There are Infinite
straight line
3. Which line represent
the phenomena is
model
4. We have to modeling
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖
𝑦𝑖: Dependent variable
𝑥𝑖: Independent variable
𝜖𝑖: Random error
𝛼: Intercept
𝛽: Regression coefficient
Minimize error and find
out 𝛼 and 𝛽 for modeling
(model fitting)
32. Types of
Regression Models
Regression
Model
Multiple
1 dependent
2+ explanatory
variable
Linear Non-linear
Multivariate
2+ dependent variable
No restriction on
explanatory variable
Simple
1 dependent
1 explanatory
variable
33. Simple linear regression model
• 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖𝑖
– 𝑦𝑖: Dependent variable (known)
– 𝑥𝑖: Independent variable (known)
– 𝜖𝑖: Random error
– 𝛽0: Intercept (unknown)
– 𝛽1: Regression coefficient (unknown)
• Minimize the error and find out 𝛽0 and 𝛽1 for
modeling (model fitting)
34. Simple linear regression line
• 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 (regression line/ prediction equation)
– 𝑦𝑖: Fitted values
– 𝑥𝑖: Independent variable (known)
– 𝛽0: Estimated intercept (general mean)
– 𝛽1: Estimated regression coefficient (changing
rate/ slope)
• How to find 𝛽0 and 𝛽1 ????
36. Coefficient Equations
• Prediction equation
• Sample slope
•
• Sample Y - intercept
ii xy 10
ˆˆˆ
21
ˆ
xx
yyxx
SS
SS
i
ii
xx
xy
xy 10
ˆˆ
37. How to find 𝛽0 and 𝛽1 ????
• SLS/ OLS (Simple/ ordinary Least square)
• WLS (Weighted Least square)
• GLS (Generalized Least square)
38. Least Squares
• 1. ‘Best Fit’ Means Difference Between Actual Y
Values & Predicted Y Values Are a Minimum. But
Positive Differences Off-Set Negative. So square
errors!
𝑖=1
𝑛
(𝑦𝑖 − 𝑦𝑖)2
=
𝑖=1
𝑛
𝜀𝑖
2
• 2. LS Minimizes the Sum of the Squared
Differences (errors) (SSE)
39. Assumptions
1. The regression model is linear in parameters.
2. The regression model is correctly specified.
3. X’s are fixed over repeated sample.
4. Errors are normaly distributed with mean zero
and fixed variance i.e. 𝑒𝑖~𝑁(0, 𝜎2
).
5. No perfect multicollinearity.
6. No autocorrelation of residuals
40. Derivation of Parameters
• Least Squares (L-S):
Minimize squared error
22
0 1
1 1
n n
i i i
i i
y x
22
0 1
0 0
0 1
0
2
i i iy x
ny n n x
xy 10
ˆˆ
41. Derivation of Parameters
• Least Squares (L-S):
Minimize squared error
22
0 1
1 1
0 1
1 1
0
2
2
i i i
i i i
i i i
y x
x y x
x y y x x
1
1
1
ˆ
i i i i
i i i i
xy
xx
x x x x y y
x x x x x x y y
SS
SS
42. Derivation of Parameters
• Prediction equation
• Sample slope
•
• Sample Y – intercept
• 0
and 1
are called OLSE (ordinary least square
estimator)
ii xy 10
ˆˆˆ
21
ˆ
xx
yyxx
SS
SS
i
ii
xx
xy
xy 10
ˆˆ
43. Interpretation of Coefficients
• 1. Slope (1)
– Estimated Y changes by 1 for each 1 unit increase
in X
• If 1 = 2, then Y is expected to increase by 2 for each 1
unit increase in X
• 2. Y-Intercept (0
)
– Average value of Y when X = 0
• If 0
= 4, then average Y is expected to be 4
when X Is 0
44. Example -1
• Consider the data obtained from a chemical process where the yield of the
process is thought to be related to the reaction temperature (see the table
below).
48. • Once the fitted regression line is known, the fitted value
of corresponding to any observed data point can be
calculated. For example, the fitted value corresponding to the
21st observation in the preceding table is:
Example -1
49. Properties of regression coefficients
1. The correlation coefficient is the geometric mean of two
regression coefficients. Symbolically, it can be expressed
as 𝑟 = (𝛽 𝑥𝑦 𝛽 𝑦𝑥)
1
2
2. Arithmetic mean of both regression coefficients is equal
to or greater than coefficient of correlation.
𝛽 𝑥𝑦+𝛽 𝑦𝑥
2
≥ 𝑟
1. The value of the coefficient of correlation cannot exceed
unity. Therefore, if one of the regression coefficients is
greater than unity, the other must be less than unity.
2. The regression coefficients are independent of the change
of origin, but not of the scale.
50. Definition
• The regression analysis is a technique of
studying the dependence of one variable
(called dependent variable), on one or more
variables (called explanatory variable), with a
view to estimating or predicting the average
value of the dependent variable in terms of
the known or fixed values of the independent
variables.
51. Applications
• The regression technique is primarily used to
– Estimate the relationship that exists, on average,
between the dependent variable and explanatory
variable.
– Determine the effect each of the explanatory
variales on the dependent variable, controlling the
effects of all other explanatory variables.
– Predict the value of the dependent variable for a
given value of the explanatory variable.