Distribution of EstimatesLinear Regression ModelAssume (yt,.docx

Distribution of Estimates
Linear Regression Model
Assume (yt, xt) are independent and identically distributed and
E(xtet) = 0
Estimation Consistency
The estimates approach the true values as the sample size
increases.
Estimation variance decreases as the sample size increases.
Illustration of Consistency
Take a random sample of U.S. men
Estimate a linear regression of log(wages) on education
Total sample = 9089
Start with 100 observations, and sequentially increase sample
size until in the final regression use the whole 9089.
Sequence of Slope Coefficients
Asymptotic Normality
4

Illustration of Asymptotic Normality
Time Series
Do these results apply to time-series data?
Consistency
Asymptotic Normality
Variance Formula
Time-series models
AR models, i.e., xt = yt-1
Trend and seasonal models
One-step and multi-step forecasting
Derivation of Variance Formula
For simplicity
Assume the variables have zero mean
The regression has no intercept
Model with no intercept:
Model with no intercept
OLS minimizes the sum of squares
The first-order condition is

Solution
Now substitute
We have
The denominator is the sample variance (when x has mean
zero), so

10
Then
Where
Since
Then
From the covariance formula
When the observations are independent, the covariances are
zero.
And since
We obtain
We have found

As stated at the beginning.
Extension to Time-Series
The only place in this argument where we used the assumption
of the independence of observations was to show that vt = xtet
has zero covariance with vj = xjej.
This is saying that vt is not autocorrelated.
Unforecastable one-step errors
In one-step-ahead forecasting, if the regression error is
unforecastable, then vt is not autocorrelated.
In this case, the variance formula for the least-squares estimate
is

Why is this true?
The error is unforecastable if
For simplicity, suppose that xt = 1.
Then for
Summary
In one-step-ahead time-series models, if the error is
unforecastable, then least-squares estimates satisfy the
asymptotic (approximate) distribution
As the sample size T is in the denominator, the variance
decreases as the sample size increases.
This means that least-squares is consistent.
Variance Formula
The variance formula for the least-squares estimate takes the
form

This formula is valid in time-series regression when the error is
unforecastable.
Classical Variance Formula
If we make the simplifying assumption
Then
Homoskedasticity
The variance simplification is valid under “conditional
homoskedasticity”

This is a simplifying assumption made to make calculations
easier, and is a conventional assumption in introductory
econometrics courses.
It is not used in serious econometrics.
Variance Formula: AR(1) Model
Take the AR(1) model with unforecastable homoscedastic errors
Then the variance of the OLS estimate is
Since in this model
AR(1) Asymptotic Variance
We know that
So

The asymptotic distribution is very simple
The variance is a function of the unknown true value of
As || increases, the variance decreases, so the OLS estimate is
actually more precise.
Distribution of Least Squares
In classic regression, if the errors are iid normal, and
independent of the regressors, then the least-squares estimates
have an exact normal distribution, not just asymptotic.
This is not true in most time-series regressions.
Non Classical Distributions
Estimates in autoregressive models
Biased downwards

Skewed
Thick tails
Especially
When autoregressive coefficients are large
Sample sizes are small
These issues diminish in large samples
Interpretation
Estimates of autoregressive parameters are random.
Even if the regression error is normal, the parameter estimates
are not normally distributed.
Distributions are less normal when AR coefficient is large.
Distributions are more concentrated and normal when sample
size is large.
Asymptotic Standard Deviation
The least-squares estimate is asymptotically (approximately)
normally distributed.
In the simple model
Then

The standard deviation measures the precision of the estimate,
but it is unknown.
Standard Errors
Estimates of the standard deviations are called standard errors,
and are reported in the regression output.
They are used to measure precision.
Classical standard errors
A classical standard error is an estimate of the standard
deviation from the formula
This formula is valid under conditional homoskedasticity
This last equation is unforecastability of the variance. This is a
particularly poor assumption for financial data.

Robust Standard Errors
“Robust” standard errors are estimates of
These are conventional standard errors for regression analysis
Due Halbert White (1980). Most referenced paper in
economics.
Robust standard errors will often differ by quite a lot from
estimates of standard errors that use the assumption of
homoskedasticity.
Computation
In STATA, the default is homoskedastic standard errors.
They are reported automatically with the regress command.
For robust standard errors, use the “r” option:
.reg rgdp L.rgdp, r
Example: Real GDP Growth (classical)

Real GDP Growth (robust)
Issue
With the “r” option, STATA does not report the sum of squared
error table. You might want to see this, so you might want to
run both command:
.reg y x
.reg y x, r
Interpretation of standard errors
The standard errors measure the precision of the estimate.
Small standard errors mean the estimate is precise, which is
good for forecasting.
Large standard errors mean the estimate is not precise, which
can lead to inaccurate forecasts.
Interpretation of t-statistics
“t” is the coefficient estimate divided by the standard error.
It is used to test if the coefficient is zero.

“P”>t is the p-value of the t-statistic
If p<.05, you reject the hypothesis of a zero coefficient
Hypothesis tests are useful for assessing economic theories, but
are less useful for picking good forecast models.
The 95% confidence interval is the coefficient estimate plus and
minus 1.96 times the standard error. Helps to gauge possible
values of the true coefficient.
Summary
In one-step-ahead forecast regressions with unforecastable
errors, robust standard errors are generally appropriate.
Classical standard errors are appropriate under conditional
homoskedasticity.
Next class October 16
Complete reading from Wooldridge. Topic is autocorrelation
and heteroskedastic consistent standard errors.
Sequence of Slope Coefficients
.0

8
.0
9
.1
.1
1
.1
2
_b
[e
du
ca
tio
n]
0 2000 4000 6000 8000 10000
observation
Sequence of Slope Coefficients.
0
8

.
0
9
.
1
.
1
1
.
1
2
_
b
[
e
d
u
c
a
t
i
o
n
]
0200040006000800010000

observation
_cons 2.154868 .3415603 6.31 0.000 1.482516
2.827221
L4. -.0719696 .0592348 -1.21 0.225 -.1885718
.0446325
L3. -.0893879 .0621447 -1.44 0.151 -.211718
.0329422
L2. .1695538 .0622018 2.73 0.007 .0471113
.2919963
L1. .3204166 .0595748 5.38 0.000 .2031452
.437688
rgdp
rgdp Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 4207.796 284 14.8161831 Root MSE =

3.5565
Adj R-squared = 0.1463
Residual 3541.53535 280 12.6483405 R-squared
= 0.1583
Model 666.260654 4 166.565164 Prob > F =
0.0000
F(4, 280) = 13.17
Source SS df MS Number of obs =
285
. regress rgdp L(1/4).rgdp
.
_cons 2.154868 .4135065 5.21 0.000 1.340892
2.968845
L4. -.0719696 .0735299 -0.98 0.329 -.2167112
.0727719
L3. -.0893879 .0694964 -1.29 0.199 -.2261897
.0474138

L2. .1695538 .0819025 2.07 0.039 .0083309
.3307766
L1. .3204166 .0733727 4.37 0.000 .1759846
.4648487
rgdp
rgdp Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Robust
Root MSE = 3.5565
R-squared = 0.1583
Prob > F = 0.0000
F(4, 280) = 9.83
Linear regression Number of obs =
285
. regress rgdp L(1/4).rgdp, r
Economics 202
Homework #3

1. Use aggregate residential investment growth rates from
FRED (label A011RL1Q225SBEA). Estimate an AR(4) model
for this series.
a. Generate point and interval forecasts for the third and fourth
quarters of 2019, and the first and second quarters of 2020
using the direct method. Create a plot of the forecasts and
intervals. 3 points.
b. Generate point and interval forecasts for the third and fourth
quarters of 2019, and the first and second quarters of 2020
using the iterated method. Create a plot of the forecasts and
intervals. Compare the forecasts from the two methods. 3
points.
2. Use household gross fixed investment, residential structures,
flow from FRED (label BOGZ1FU155012061Q). Drop all
observations before the first quarter of 1952.a. Convert the
series to logarithms and estimate a linear trend. Plot the
residuals from the series and discuss. Do you think that the
residuals exhibit seasonality and or a cycle component? 2
points.b. Estimate a model of the log of the series with a linear
trend plus seasonal dummy variables. Plot the residuals and
discuss. Do you think that the residuals exhibit a cycle
component? 2 points.c. Estimate and AR(4) model with a trend
and with or without seasonal dummy variables, depending upon
your answers to a and b. Plot the residuals and discuss. 2
points.

d. Using the model in part c, generate point and interval
forecasts for the third and fourth quarters of 2019, and the first
and second quarters of 2020 using the direct method. Create a
plot of the forecasts and intervals. 2 points.
e. What additional adjustments to the forecast model do you
think might be appropriate? Why? 2 points.

Distribution of EstimatesLinear Regression ModelAssume (yt,.docx

Recomendados

Recomendados

Más contenido relacionado

Similar a Distribution of EstimatesLinear Regression ModelAssume (yt,.docx

Similar a Distribution of EstimatesLinear Regression ModelAssume (yt,.docx (20)

Más de madlynplamondon

Más de madlynplamondon (20)

Último

Último (20)

Distribution of EstimatesLinear Regression ModelAssume (yt,.docx