1. Forecasting in State Space: theory and practice
Siem Jan Koopman
http://personal.vu.nl/s.j.koopman
Department of Econometrics
VU University Amsterdam
Tinbergen Institute
2012
2. Program
Lectures :
• Introduction to UC models
• State space methods
• Forecasting time series with different components
• Practice of Forecasting with Illustrations
Exercises and assignments will be part of the course.
2 / 42
3. Time Series
A time series is a set of observations yt , each one recorded at a
specific time t.
The observations are ordered over time.
We assume to have n observations, t = 1, . . . , n.
Examples of time series are:
• Number of cars sold each year
• Gross Domestic Product of a country
• Stock prices during one day
• Number of firm defaults
Our purpose is to identify and to model the serial or “dynamic”
correlation structure in the time series.
Time series analysis may be relevant for economic policy, financial
decision making and forecasting
3 / 42
4. Example: Nile data
Nile Data
1400
1300
1200
1100
1000
900
800
700
600
500
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
4 / 42
8. Sources for time series data
Data sources :
• US economics :
http://research.stlouisfed.org/fred2/
• DK book data : http://www.ssfpack.com/dkbook.html
• Financial data : Datastream, Yahoo Finance
8 / 42
10. White noise processes
Simplest example of a stationary process is a white noise (WN)
process which we usually denote as εt .
A white noise process is a sequence of uncorrelated random
2
variables, each with zero mean and constant variance σε :
2
εt ∼ WN(0, σε ).
The autocovariance function is equal to zero for lags h > 0:
σε2 if h = 0,
γY (h) =
0 if h = 0.
9 / 42
12. White noise ACF and SACF
Theoretical ACF max lag = 50 Theoretical ACF max lag = 500
0.10 0.10
0.05 0.05
0.00 0.00
−0.05 −0.05
0 10 20 30 40 50 0 100 200 300 400 500
Sample ACF n = 50 Sample ACF n = 500
1.0 1.0
ACF− ACF−
0.5 0.5
0.0 0.0
−0.5 −0.5
0 10 20 30 40 50 0 100 200 300 400 500
11 / 42
13. Random Walk processes
If ε1 , ε2 , . . . come from a white noise process with variance σ 2 ,
then the process {Yt } with
Yt = ε1 + ε2 + . . . + εt for t = 1, 2, . . .
is called a random walk.
A recursive way to define a random walk is:
Yt = Yt−1 + εt for t = 2, 3, . . .
Y1 = ε1
12 / 42
14. Random Walk properties I
A random walk is not stationary, because the variance of Yt is
time-varying:
E(Yt ) = E(ε1 + . . . + εt ) = 0
Var (Yt ) = E(Yt2 ) = E[(ε1 + . . . + εt )2 ] = tσ 2
The autocovariance function is equal to:
γ(t, t − h) = E(Yt Yt−h )
t−h t t−h
= E[( εj + εj )( εj )]
j=1 j=t−h+1 j=1
2
= (t − h)σ
This means that the variance and the autocovariances go to
infinity if t → ∞.
13 / 42
15. Random Walk properties II
The autocorrelation of Yt and Yt−h is
γ(t, t − h)
ρ(t, t − h) =
Var (Yt )Var (Yt−h )
√
(t − h)σ 2 t−h
= = √
(tσ 2 )((t − h)σ 2 ) t
14 / 42
19. Classical Decomposition
A basic model for representing a time series is the additive model
yt = µt + γt + εt , t = 1, . . . , n,
also known as the Classical Decomposition.
yt = observation,
µt = slowly changing component (trend),
γt = periodic component (seasonal),
εt = irregular component (disturbance).
In a Structural Time Series Model (STSM) or Unobserved
Components Model (UCM), the RHS components are modelled
explicitly as stochastic processes.
18 / 42
20. Nile data
Nile Data
1400
1300
1200
1100
1000
900
800
700
600
500
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
19 / 42
21. Local Level Model
• Components can be deterministic functions of time (e.g.
polynomials), or stochastic processes;
2
• Deterministic example: yt = µ + εt with εt ∼ NID(0, σε ).
• Stochastic example: the Random Walk plus Noise, or
Local Level model:
2
yt = µt + εt , εt ∼ NID(0, σε )
2
µt+1 = µt + ηt , ηt ∼ NID(0, ση ),
• The disturbances εt , ηs are independent for all s, t;
• The model is incomplete without a specification for µ1 (note
the non-stationarity):
µ1 ∼ N (a, P)
20 / 42
22. Local Level Model
2
yt = µt + εt , εt ∼ NID(0, σε )
2
µt+1 = µt + ηt , ηt ∼ NID(0, ση ),
µ1 ∼ N (a, P)
• The level µt and the irregular εt are unobserved;
2 2
• Parameters: a, P, σε , ση ;
• Trivial special cases:
2 2
• ση = 0 =⇒ yt ∼ NID(µ1 , σε ) (WN with constant level);
2
• σε = 0 =⇒ yt+1 = yt + ηt (pure RW);
• Local Level is a model representation for EWMA forecasting.
21 / 42
28. Properties of the LL model
• The ACF of ∆yt is
2
−σε 1 2 2
ρ1 = 2 2
=− , q = ση /σε ,
ση + 2σε q+2
ρτ = 0, τ ≥ 2.
• q is called the signal-noise ratio;
• The model for ∆yt is MA(1) with restricted parameters such
that
−1/2 ≤ ρ1 ≤ 0
i.e., yt is ARIMA(0,1,1);
• Write ∆yt = ξt + θξt−1 , ξt ∼ NID(0, σ 2 ) to solve θ:
1
θ= q 2 + 4q − 2 − q .
2
27 / 42
29. Local Level Model
• The model parameters are estimated by Maximum Likelihood;
• Advantages of model based approach: assumptions can be
tested, parameters are estimated rather than “calibrated”;
• Estimated model can be used for signal extraction;
• The estimated level µt is obtained as a locally weighted
average;
• The distribution of weights can be compared with Kernel
functions in nonparametric regressions;
• Within the model, our methods yield MMSE forecasts.
28 / 42
30. Signal Extraction and Weights for the Nile Data
data and estimated level weights
1250
0.02
1000
750 0.01
500
1880 1900 1920 1940 1960 −20 −15 −10 −5 0 5 10 15 20
1250 0.15
1000 0.10
750
0.05
500
1880 1900 1920 1940 1960 −20 −15 −10 −5 0 5 10 15 20
1500
1250 0.50
1000
0.25
750
500
1880 1900 1920 1940 1960 −20 −15 −10 −5 0 5 10 15 20
29 / 42
31. Local Linear Trend Model
The LLT model extends the LL model with a slope:
2
yt = µt + εt , εt ∼ NID(0, σε ),
2
µt+1 = βt + µt + ηt , ηt ∼ NID(0, ση ),
2
βt+1 = βt + ξt , ξt ∼ NID(0, σξ ).
• All disturbances are independent at all lags and leads;
• Initial distributions β1 , µ1 need to specified;
2
• If σξ = 0 the trend is a random walk with constant drift β1 ;
(For β1 = 0 the model reduces to a LL model.)
2
• If additionally ση = 0 the trend is a straight line with slope β1
and intercept µ1 ;
2 2
• If σξ > 0 but ση = 0, the trend is a smooth curve, or an
Integrated Random Walk;
30 / 42
33. Trend and Slope in Integrated Random Walk Model
10 µ
5
0
0 10 20 30 40 50 60 70 80 90 100
0.75 β
0.50
0.25
0.00
−0.25
0 10 20 30 40 50 60 70 80 90 100
32 / 42
34. Local Linear Trend Model
• Reduced form of LLT is ARIMA(0,2,2);
• LLT provides a model for Holt-Winters forecasting;
• Smooth LLT provides a model for spline-fitting;
• Smoother trends: higher order Random Walks
∆d µt = ηt
33 / 42
35. Seasonal Effects
We have seen specifications for µt in the basic model
yt = µt + γt + εt .
Now we will consider the seasonal term γt . Let s denote the
number of ‘seasons’ in the data:
• s = 12 for monthly data,
• s = 4 for quarterly data,
• s = 7 for daily data when modelling a weekly pattern.
34 / 42
36. Dummy Seasonal
The simplest way to model seasonal effects is by using dummy
variables. The effect summed over the seasons should equal zero:
s−1
γt+1 = − γt+1−j .
j=1
To allow the pattern to change over time, we introduce a new
disturbance term:
s−1
2
γt+1 = − γt+1−j + ωt , ωt ∼ NID(0, σω ).
j=1
The expectation of the sum of the seasonal effects is zero.
35 / 42
37. Trigonometric Seasonal
Defining γjt as the effect of season j at time t, an alternative
specification for the seasonal pattern is
[s/2]
γt = γjt ,
j=1
γj,t+1 = γjt cos λj + γjt sin λj + ωjt ,
∗
γj,t+1 = −γjt sin λj + γjt cos λj + ωjt ,
∗ ∗ ∗
2
ωjt , ωjt ∼ NID(0, σω ),
∗
λj = 2πj/s.
• Without the disturbance, the trigonometric specification is
identical to the deterministic dummy specification.
• The autocorrelation in the trigonometric specification lasts
through more lags: changes occur in a smoother way;
36 / 42
38. Unobserved Component Models
• Different specifications for the trend and the seasonal can be
freely combined.
• Other components of interest, like cycles, explanatory
variables, interventions effects, outliers, are easily added.
• UC models are Multiple Source of Errors models. The reduced
form is a Single Source of Errors model.
• We model non-stationarity directly.
• Components have an explicit interpretation: the model is not
just a forecasting device.
37 / 42
42. Textbooks
• A.C.Harvey (1989). Forecasting, Structural Time Series
Models and the Kalman Filter. Cambridge University Press
• G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysis
of Time Series. Springer-Verlag
• J.Harrison & M.West (1997). Bayesian Forecasting and
Dynamic Models. Springer-Verlag
• J.Durbin & S.J.Koopman (2001). Time Series Analysis by
State Space Methods. Oxford University Press
• J.J.F.Commandeur & S.J.Koopman (2007). An Introduction
to State Space Time Series Analysis. Oxford University Press
41 / 42
43. Exercises
1. Consider LL model (see slides, see DK chapter 2).
• Reduced form is ARIMA(0,1,1) process. Derive the
relationship between signal-to-noise ratio q of LL model and
the θ coefficient of the ARIMA model;
√
• Derive the reduced form in the case ηt = qεt and notice the
difference in the general case.
• Give the elements of the mean vector and variance matrix of
y = (y1 , . . . , yn )′ when yt is generated by a LL model for
t = 1, . . . , n.
2. Consider LLT model (see slides, see DK section 3.2.1).
• Show that the reduced form is an ARIMA(0,2,2) process;
• Discuss the initial values for level and slope of LLT;
• Relate the LLT model forecasts with the Holt-Winters method
of forecasting. Comment.
42 / 42