1. Short Guides to Microeconometrics Kurt Schmidheiny The Multiple Linear Regression Model 2
Fall 2011 Unversit¨t Basel
a
2 The Econometric Model
The multiple linear regression model assumes a linear (in parameters)
The Multiple Linear Regression Model relationship between a dependent variable yi and a set of explanatory
variables xi =(xi0 , xi1 , ..., xiK ). xik is also called an independent
variable, a covariate or a regressor. The first regressor xi0 = 1 is a constant
unless otherwise specified.
1 Introduction Consider a sample of N observations i = 1, ... , N . Every single obser-
vation i follows
The multiple linear regression model and its estimation using ordinary
yi = xi β + ui
least squares (OLS) is doubtless the most widely used tool in econometrics.
It allows to estimate the relation between a dependent variable and a set where β is a (K + 1)-dimensional column vector of parameters, xi is a
of explanatory variables. Prototypical examples in econometrics are: (K + 1)-dimensional row vector and ui is a scalar called the error term.
The whole sample of N observations can be expressed in matrix nota-
• Wage of an employee as a function of her education and her work tion,
experience (the so-called Mincer equation). y = Xβ + u
• Price of a house as a function of its number of bedrooms and its age where y is a N -dimensional column vector, X is a N × (K + 1) matrix
(an example of hedonic price regressions). and u is a N -dimensional column vector of error terms, i.e.
The dependent variable is an interval variable, i.e. its values repre-
y1 1 x11 · · · x1K u1
sent a natural order and differences of two values are meaningful. The
β0
y
2
1 x21 · · · x2K u
2
dependent variable can, in principle, take any real value between −∞ and
β1
y3 1 x31 · · · x3K u3
+∞. In practice, this means that the variable needs to be observed with = .
.
+
. . . . .
.. .
. . . . .
.
some precision and that all observed values are far from ranges which . . . . .
βK
are theoretically excluded. Wages, for example, do strictly speaking not yN 1 xN 1 · · · xN K uN
qualify as they cannot take values beyond two digits (cents) and values N ×1 N × (K + 1) (K + 1) × 1 N ×1
which are negative. In practice, monthly wages in dollars in a sample
The data generation process (dgp) is fully described by a set of as-
of full time workers is perfectly fine with OLS whereas wages measured
sumptions. Several of the following assumptions are formulated in dif-
in three wage categories (low, middle, high) for a sample that includes
ferent alternatives. Different sets of assumptions will lead to different
unemployed (with zero wages) ask for other estimation tools.
properties of the OLS estimator.
Version: 29-11-2011, 18:29
2. 3 Short Guides to Microeconometrics The Multiple Linear Regression Model 4
OLS1: Linearity OLS4: Identifiability
yi = xi β + ui and E(ui ) = 0 rank(X) = K + 1 < N and
E(xi xi ) = QXX is positive definite and finite
OLS1 assumes that the functional relationship between dependent and
explanatory variables is linear in parameters, that the error term enters The OLS4 assumes that the regressors are not perfectly collinear, i.e. no
additively and that the parameters are constant across individuals i. variable is a linear combination of the others. For example, there can only
be one constant. Intuitively, OLS4 means that every explanatory variable
OLS2: Independence adds additional information. OLS4 also assumes that all regressors (but
{xi , yi }N i.i.d. (independent and identically distributed)
i=1 the constant) have non-zero variance and not too many extreme values.
OLS2 means that the observations are independently and identically dis- OLS5: Error Structure
tributed. This assumption is in practice guaranteed by random sampling.
a) V (ui |xi ) = σ 2 < ∞ (homoscedasticity)
OLS3: Exogeneity 2
b) V (ui |xi ) = σi = g(xi ) < ∞ (conditional heteroscedasticity)
2
a) ui |xi ∼ N (0, σ ) OLS5a (homoscedasticity) means that the variance of the error term is
b) ui ⊥ xi (independent) a constant. OLS5b (conditional heteroscedasticity) allows the variance of
the error term to depend on the explanatory variables.
c) E(ui |xi ) = 0 (mean independent)
d) cov(xi , ui ) = 0 (uncorrelated)
3 Estimation with OLS
OLS3a assumes that the error term is independent of the explanatory
variables and normally distributed. OLS3b means that the error term is Ordinary least squares (OLS) minimizes the squared distances between
independent of the explanatory variables. OLS3c states that the mean of the observed and the predicted dependent variable y:
the error term is independent of the explanatory variables. OLS3d means N
that the error term and the explanatory variables are uncorrelated. OLS3a S (β) = (yi − xi β)2 = (y − Xβ) (y − Xβ) → min
β
is the strongest assumption and implies OLS3b to OLS3d. OLS3b implies i=1
OLS3c and OLS3d. OLS3c implies OLS3d which is the weakest assump-
The resulting OLS estimator of β is:
tion. Intuitively, OLS3 means that the explanatory variables contain no
information about the error term. −1
β = (X X) Xy
Given the OLS estimator, we can predict the dependent variable by
yi = xi β and the error term by ui = yi − xi β. ui is called the residual.
3. 5 Short Guides to Microeconometrics The Multiple Linear Regression Model 6
Note: R2 is in general not meaningful in a regression without a con-
4
stant. R2 increases by construction with every (also irrelevant) additional
E(y|x)
OLS regressors and is therefore not a good criterium for the selection of regres-
3 data
sors.
2
5 Small Sample Properties
1
Assuming OLS1, OLS2, OLS3a, OLS4, and OLS5, the following proper-
0
y
ties can be established for finite, i.e. even small, samples.
−1 • The OLS estimator of β is unbiased :
−2 E(β|X) = β
−3 • The OLS estimator is (multivariate) normally distributed:
−4 β|X ∼ N β, V (β|X)
0 2 4 6 8 10
x
−1
with variance V (β|X) = σ 2 (X X) under homoscedasticity (OLS5a)
−1 −1
and V (β|X) = σ 2 (X X) [X ΩX] (X X) under known heteroscedas-
Figure 1: The linear regression model with one regressor. β0 = −2, ticity (OLS5b). Under homoscedasticity (OLS5a) the variance V
β1 = 0.5, σ 2 = 1, x ∼ uniform(0, 10), u ∼ N (0, σ 2 ). can be unbiasedly estimated as
−1
V (β|X) = σ 2 (X X)
4 Goodness-of-fit
with
uu
The goodness-of-fit of an OLS regression can be measured as σ2 = .
N −K −1
SSR SSE
R2 = 1 − =
SST SST • Gauß-Markov-Theorem: under homoscedasticity (OLS5a),
N N
where SST = ¯2
i=1 (yi − y )
is the total sum of squares, SSE = i=1 (yi −
N
β is BLUE (best linear unbiased estimator)
y )2 the explained sum of squares and SSR = i=1 u2 the residual sum
¯ i
of squares. R2 lies by definition between 0 and 1 and reports the fraction Table 1 provides a systematic overview of assumptions and their implied
of the sample variation in y that is explained by the xs. properties.
4. 7 Short Guides to Microeconometrics The Multiple Linear Regression Model 8
Table 1: Properties of the OLS estimator. (k+1)−th column of V (β|X). For example, to perform a two-sided test of
Case [1] [2] [3] [4] [5] [6] H0 against the alternative hypotheses Ha : βk = q on the 5% significance
Assumptions
level, we calculate the t-statistic and compare its absolute value to the
OLS1: linearity
OLS2: independence 0.975-quantile of the t-distribution. With N = 30 and K = 2, H0 is
OLS3: exogeneity rejected if |t| > 2.052.
- OLS3a: normality × × × × A null hypotheses of the form H0 : Rβ = q with J linear restrictions
- OLS3b: independent × × × ×
is jointly tested with the F -test. If the null hypotheses is true, the F -
- OLS3c: mean indep. ×
- OLS3d: uncorrelated statistic
OLS4: identifiability −1
OLS5: error Structure Rβ − q RV (β|X)R Rβ − q
- OLS5a: homoscedastic F = ∼ FJ,N −K−1
J
- OLS5b: heteroscedastic
Small sample properties of β ˆ follows an F distribution with J numerator degrees of freedom and N −
unbiased × K − 1 denominator degrees of freedom. For example, to perform a two-
normally distributed × × × × sided test of H0 against the alternative hypotheses Ha : Rβ = q at the
efficient × × ×
5% significance level, we calculate the F -statistic and compare it to the
t-test, F -test × × × × ×
Large sample properties of β ˆ 0.95-quantile of the F -distribution. With N = 30, K = 2 and J = 2, H0
consistent is rejected if F > 3.35. We cannot perform one-sided F -tests.
approx. normal Only under homoscedasticity (OLS5a), the F -statistic can also be
asymptotically efficient × × ×
∗ ∗ ∗
computed as
z-test, Wald test
Notes: = fullfiled, × = violated, ∗ = corrected standard errors (SSRrestricted − SSR)/J (R2 − Rrestricted )/J
2
F = = ∼ FJ,N −K−1
SSR/(N − K − 1) (1 − R2 )/(N − K − 1)
2
6 Tests in Small Samples where SSRrestricted and Rrestricted are, respectively, estimated by re-
striced least squares which minimizes S(β) s.t. Rβ = q. Exclusionary
Assume OLS1, OLS2, OLS3a, OLS4, and OLS5a. restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
A simple null hypotheses of the form H0 : βk = q is tested with the H0 : Rβ = q. In this case, restricted least squares is simply estimated as
t-test. If the null hypotheses is true, the t-statistic a regression were the explanatory variables k, m, ... are excluded.
βk − q
t= ∼ tN −K−1
se(βk ) 7 Confidence Intervals in Small Samples
follows a t-distribution with N − K − 1 degrees of freedom. The standard Assuming OLS1, OLS2, OLS3a, OLS4, and OLS5a, we can construct
error se(βk ) is the square root of the element in the (k + 1)−th row and confidence intervals for a particular coefficient βk . The (1 − α) confidence
5. 9 Short Guides to Microeconometrics The Multiple Linear Regression Model 10
interval is given by robust or Eicker-Huber-White estimator (see handout on “Heteroscedas-
ticity in the linear Model”)
βk − t(1−α/2),(N −K−1) se(βk ) , βk + t(1−α/2),(N −K−1) se(βk )
N
−1 −1
where t(1−α/2),(N −K−1) is the (1 − α/2) quantile of the t-distribution with Avar(β) = (X X) u2 xi xi (X X)
i .
i=1
N − K − 1 degrees of freedom. For example, the 95 % confidence interval
with N = 30 and K = 2 is βk − 2.052se(βk ) , βk + 2.052se(βk ) .
Note: In practice we can almost never be sure that the errors are
8 Asymptotic Properties of the OLS Estimator homoscedastic and should therefore always use robust standard errors.
Table 1 provides a systematic overview of assumptions and their im-
Assuming OLS1, OLS2, OLS3d, OLS4, and OLS5a or OLS5b the follow-
plied properties.
ing properties can be established for large samples.
• The OLS estimator is consistent: 9 Asymptotic Tests
plim β = β Assume OLS1, OLS2, OLS3d, OLS4, and OLS5a or OLS5b.
A simple null hypotheses of the form H0 : βk = q is tested with the
• The OLS estimator is asymptotically normally distributed: z-test. If the null hypotheses is true, the z-statistic
√ d
N (β − β) −→ N (0, Σ) βk − q A
z= ∼ N (0, 1)
se(βk )
where Σ = σ 2 [E(xi xi )]−1 = σ 2 Q−1 = under OLS5a and Σ =
XX
[E(xi xi )]−1 [E(u2 xi xi )][E(xi xi )]−1 = QXX [E(u2 xi xi )]Q−1 under
−1 follows approximately the standard normal distribution. The standard
i i XX
OLS5b. error se(βk ) is the square root of the element in the (k + 1)−th row and
(k + 1)−th column of Avar(β). For example, to perform a two sided
• The OLS estimator is approximately normally distributed test of H0 against the alternative hypotheses Ha : βk = q on the 5%
A significance level, we calculate the z-statistic and compare its absolute
β ∼ N β, Avar(β)
value to the 0.975-quantile of the standard normal distribution. H0 is
where the asymptotic variance Avar(β) = N −1 Σ can be consistently rejected if |z| > 1.96.
estimated under OLS5a (homoscedasticity) as A null hypotheses of the form H0 : Rβ = q with J linear restrictions is
jointly tested with the Wald test. If the null hypotheses is true, the Wald
−1
Avar(β) = σ 2 (X X) statistic
−1 A
with σ 2 = u u/N and under OLS5b (heteroscedasticity) as the W = Rβ − q RAvar(β)R Rβ − q ∼ χ2
J
6. 11 Short Guides to Microeconometrics The Multiple Linear Regression Model 12
follows approximately an χ2 distribution with J degrees of freedom. For 11 Small Sample vs. Asymptotic Properties
example, to perform a test of H0 against the alternative hypotheses Ha :
Rβ = q on the 5% significance level, we calculate the Wald statistic and The t-test, F -test and confidence interval for small samples depend on the
compare it to the 0.95-quantile of the χ2 -distribution. With J = 2, H0 is normality assumption OLS3d (see Table 1). This assumption is strong and
rejected if W > 5.99. We cannot perform one-sided Wald tests. unlikely to be satisfied. The asymptotic z-test, Wald test and the con-
Under OLS5a (homoscedasticity) only, the Wald statistic can also be fidence interval for large samples rely on much weaker assumptions. Al-
computed as though most statistical software packages report the small sample results
by default, we would typically prefer the large sample approximations. In
(SSRrestricted − SSR) (R2 − Rrestricted ) A 2
2
W = = ∼ χJ practice, small sample and asymptotic tests and confidence intervals are
SSR/N (1 − R2 )/N
very similar already for relatively small samples, i.e. for (N − K) > 30.
2
where SSRrestricted and Rrestricted are, respectively, estimated by re- Large sample tests also have the advantage that they can be based on
stricted least squares which minimizes S(β) s.t. Rβ = q. Exclusionary heteroscedasticity robust standard errors.
restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
H0 : Rβ = q. In this case, restricted least squares is simply estimated as
a regression were the explanatory variables k, m, ... are excluded.
Note: the Wald statistic can also be calculated as
A
W = J · F ∼ χ2
J
where F is the small sample F -statistic. This formulation differs by a
factor (N − K − 1)/N but has the same asymptotic distribution.
10 Confidence Intervals in Large Samples
Assuming OLS1, OLS2, OLS3d, OLS4, and OLS5a or OLS5b, we can
construct confidence intervals for a particular coefficient βk . The (1 − α)
confidence interval is given by
βk − z(1−α/2) se(βk ) , βk + z(1−α/2) se(βk )
where z(1−α/2) is the (1 − α/2) quantile of the standard normal distribu-
tion. For example, the 95 % confidence interval is βk − 1.96se(βk ) , βk +
1.96se(βk ) .
7. 13 Short Guides to Microeconometrics The Multiple Linear Regression Model 14
12 Implementation in Stata 11 13 Implementation in R 2.13
The multiple linear regression model is estimated by OLS with the regress The multiple linear regression model is estimated by OLS with the lm
command For example, function. For example,
webuse auto.dta > library(foreign)
regress mpg weight displacement > auto <- read.dta("http://www.stata-press.com/data/r11/auto.dta")
> fm <- lm(mpg~weight+displacement, data=auto)
regresses the mileage of a car (mpg) on weight and displacement. A > summary(fm)
constant is automatically added if not suppressed by the option noconst
regresses the mileage of a car (mpg) on weight and displacement. A
regress mpg weight displacement, noconst constant is automatically added if not suppressed by -1
Estimation based on a subsample is performed as > lm(mpg~weight+displacement-1, data=auto)
regress mpg weight displacement if weight>3000 Estimation based on a subsample is performed as
where only cars heavier than 3000 lb are considered. Transformations of > lm(mpg~weight+displacement, subset=(weight>3000), data=auto)
variables are included with new variables
where only cars heavier than 3000 lb are considered. Tranformations of
generate logmpg = log(mpg) variables are directly included with the I() function
generate weight2 = weight^2
regress logmpg weight weight2 displacement > lm(I(log(mpg))~weight+I(weight^2)+ displacement, data=auto)
The Eicker-Huber-White covariance is reported with the option robust The Eicker-Huber-White covariance is reported after estimation with
regress mpg weight displacement, vce(robust) > library(sandwich)
> library(lmtest)
F -tests for one or more restrictions are calculated with the post-estimation > coeftest(fm, vcov=sandwich)
command test. For example
F -tests for one or more restrictions are calculated with the command
test weight waldtest which also uses the two packages sandwich and lmtest
tests H0 : β1 = 0 against HA : β1 = 0, and > waldtest(fm, "weigth", vcov=sandwich)
test weight displacement tests H0 : β1 = 0 against HA : β1 = 0 with Eicker-Huber-White, and
tests H0 : β1 = 0 and β2 = 0 against HA : β1 = 0 or β2 = 0 > waldtest(fm, .~.-weight-displacement, vcov=sandwich)
New variables with residuals and fitted values are generated by
tests H0 : β1 = 0 and β2 = 0 against HA : β1 = 0 or β2 = 0.
predict uhat if e(sample), resid New variables with residuals and fitted values are generated by
predict pricehat if e(sample)
> auto$uhat <- resid(fm)
> auto$mpghat <- fitted(fm)
8. 15 Short Guides to Microeconometrics
References
Introductory textbooks
Stock, James H. and Mark W. Watson (2007), Introduction to Economet-
rics, 2nd ed., Pearson Addison-Wesley. Chapters 4 - 9.
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed., South-Western Cengage Learning. Chapters 2 - 8.
Advanced textbooks
Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:
Methods and Applications, Cambridge University Press. Sections 4.1-
4.4.
Wooldridge, Jeffrey M. (2002), Econometric Analysis of Cross Section and
Panel Data, MIT Press. Chapters 4.1 - 4.23.
Companion textbooks
Angrist, Joshua D. and J¨rn-Steffen Pischke (2009), Mostly Harmless
o
Econometrics: An Empiricist’s Companion, Princeton University Press.
Chapter 3.
Kennedy, Peter (2008), A Guide to Econometrics, 6th ed., Blackwell Pub-
lishing. Chapters 3 - 11, 14.