2. Definition
The Regression Analysis is a technique of studying the dependence of one variable (called
dependant variable), on one or more variables (called explanatory variable), with a view to
estimate or predict the average value of the dependent variables in terms of the known or
fixed values of the independent variables.
THE REGRESSION TECHNIQUE IS PRIMARILY USED TO :
• Estimate the relationship that exists, on the average, between the dependent variable and the
explanatory variable
• Determine the effect of each of the explanatory variables on the dependent variable,
controlling the effects of all other explanatory variables
• Predict the value of dependent variable for a given value of the explanatory variable
3. Assumptions of the Linear Regression
Model
1.
2.
3.
4.
5.
6.
7.
8.
9.
Linear Functional form
Fixed independent variables
Independent observations
Representative sample and proper specification of the model
(no omitted variables)
Normality of the residuals or errors
Equality of variance of the errors (homogeneity of residual
variance)
No multicollinearity
No autocorrelation of the errors
No outlier distortion
4. Introduction to Regression Analysis
• Regression analysis is the most often applied technique of statistical
•
analysis and modeling.
If two variables are involved, the variable that is the basis of the
estimation, is conventionally called the independent variable and the
variable whose value is to be estimated+ is called the dependent
variable.
•
In general, it is used to model a response variable (Y) as a function of
one or more driver variables (X1, X2, ..., Xp).
•
The functional form used is:
Yi = 0 + 1X1i + 2X2i + ... + pXpi +
• The dependent variable is variously known as explained
variables, predictand, response and endogenous variables.
• While the independent variable is known as explanatory,
regressor and exogenous variable.
5. Derivation of the Intercept
y = +
a
bx +
e
e = − −
y
a
bx
n
n
n
n
e
=
y − a −
x
∑ ∑ ∑ b∑
i=
1
i
i
i=
1
i=
1
i
i=
1
n
Because by definition
e =
∑0
i=
1
n
n
n
i
0 =
yi − ai −
xi
∑ ∑ b∑
i=
1
n
n
i=
1
i=
1
n
ai =
yi −
xi
∑∑ b ∑
i=
1
i=
1
i=
1
n
n
na =
yi −
xi
∑ b∑
i=
1
a
=
y
i=
1
−
bx
i
6. The term ε in the model is referred to as a “random
error term” and may reflect a number of things
including the general idea that knowledge of the
driver variables will not ordinarily lead to perfect
reconstruction of the response.
7. • If there is only one driver variable, X, then we usually
speak of “simple” linear regression analysis.
• When the model involves
• (a) multiple driver variables,
• (b) a driver variable in multiple forms, or
• (c) a mixture of these, the we speak of “multiple linear
regression analysis”.
• The “linear” portion of the terminology refers to the
response variable being expressed as a “linear
combination” of the driver variables.
8. In regression analysis, the data used to describe the relationship between the
variables are primarily measured on interval scale. the chief advantage of
using the interval level of measurement is that, with such data it is possible
to describe the relationship between variables more exactly employing
mathematical equation. This in turn allows more accurate prediction of one
variable from the knowledge of the other variables, which is one of the most
important objectives of regression analysis.
9. It is important to note that if the relationship between X and
Y is curvilinear , the regression line will be a curved line
rather than straight line. The greater the strength of
relationships between X and Y the better is the prediction.
10. The problem is presented to the mathematician as
follows: "The values of a and b in the linear model
Y'i = a + b Xi are to be found which minimize the
algebraic expression ."
The mathematician begins as follows:
14. THE REGRESSION MODEL
The situation using the regression model is analogous to that of the interviewers,
except instead of using interviewers, predictions are made by performing a linear
transformation of the predictor variable. Rather than interviewers in the above
example, the predicted value would be obtained by a linear transformation of the
score. The prediction takes the form
where a and b are parameters in the regression model.
15. Types of regression analysis:
Regression analysis is generally classified into two kinds: simple and multiple.
Simple
regression involves only two variables, one of which is dependent variable and
the other
Is explanatory(independent) variable. The associated model in the case of simple
regression will be a simple regression model.
•A regression analysis may involve a linear model or a nonlinear model.
The term linear can be interpreted in two different ways:
1.Linear in variable
2. Linearity in the parameter
16. Regression Analysis: Model Assumptions
Model assumptions are stated in terms of the random
errors, ε, as follows:
the errors are normally distributed,
with mean = zero, and
constant variance σ2ε, that does not depend on the settings of
the driver variables, and
the errors are independent of one another.
This is often summarized symbolically as: ε is NID(0, σ2ε)
17. LINEAR REGRESSION
In linear regression, the model specification is that the dependent variable, yi is a
linear combination of the parameters (but need not be linear in the independent
variables). For example, in simple linear regression for modeling n data points
there is one independent variable: xi, and two parameters, β0 and β1:
Fig: Illustration of linear regression on a data set
18. In the case of simple regression, the formulas for the least squares
estimates are
19.
20.
21.
22.
23.
24. Regression Analysis: Model Building
General Linear Model
Determining When to Add or Delete Variables
Analysis of a Larger Problem
Multiple Regression Approach
to Analysis of Variance
25. General Linear Model
Models in which the parameters (β0, β1, . . . , βp)
all have exponents of one are called linear
models.
y = β 0 + β 1 x1 + ε
First-Order Model with One Predictor
Variable
26. Variable Selection Procedures
Stepwise Regression
Forward Selection
Backward Elimination
Iterative; one
independent
variable at a time
is added or
deleted
Based on
the F statistic
27. Variable Selection Procedures
F Test
To test whether the addition of x2 to a
model involving x1 (or the deletion of x2
from a model involving x1and x2) is
statistically significant
F=
(SSE(reduced)-SSE(full))/number of extra terms
MSE(full)
F0=MSR/MSRes
(MSR=SSR/K)
The p-value corresponding to the F statistic
is the criterion used to determine if a variable
should be added or deleted
28. Forward Selection
This procedure is similar to stepwiseregression, but does not permit a variable to
be deleted.
This forward-selection procedure starts
with no independent variables.
It adds variables one at a time as long as a
significant reduction in the error sum of
squares (SSE) can be achieved.
29. Backward Elimination
This procedure begins with a model that
includes all the independent variables the
modeler wants considered.
It then attempts to delete one variable at a
time by determining whether the least
significant variable currently in the model can
be removed because its p-value is less than
the user-specified or default value.
Once a variable has been removed from the
model it cannot re enter at a subsequent step.
30. Stepwise regression:
Procedure of simultaneous forward and backward
selection also available
In a stepwise regression, predictor variables are
entered into the regression equation one at a time
based upon statistical criteria.
At each step in the analysis the predictor variable that
contributes the most to the prediction equation in
terms of increasing the multiple correlation, R, is
entered first. This process is continued only if
additional variables add anything statistically to the
regression equation.
31. Uses of Regression Analysis
1.Regression analysis helps in establishing a functional
Relationship between two or more variables.
THANKeconomic analysis are based on
YOU….
2. Since most of the problems of
cause and effect relationships, the regression analysis is a
highly valuable tool in economic and business research.
3. Regression analysis predicts the values of dependent variables
from the values of independent variables.
4. We can calculate coefficient of correlation (r) and coefficient
of determination (R2) with the help of regression
32. References
•
Fox, J. D. (1984). Linear Statistical Models and Related Methods.
New York: Wiley. [Chapter 5 is an excellent introduction to logistic regression]
•
Hosmer, D.W., & Lemeshow, S. (1989). Applied Logistic Regression. New York:
Wiley.
•
Walsh, A. Teaching understanding and interpretation of logit regression.
Teaching Sociology, 15:178-183, 1987.
•
Whitemore, A.S. (1981). "Sample size for logistic regression with small
response probability" JASA, 76, 27-32.
•
Cook, R. D. (1979), "Influential Observations in Linear Regression," Journal of the
American Statistical Association, 74, 169–174.