Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Próxima SlideShare
Cargando en…5
×

# Paris2012 session2

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Sé el primero en comentar

• Sé el primero en recomendar esto

### Paris2012 session2

1. 1. State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
2. 2. Regression Model in time seriesFor a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n.For a large n, is it reasonable to assume constant coeﬃcients µ, βand σ 2 ?Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .This is an example of a linear state space model. 2 / 35
3. 3. State Space ModelLinear Gaussian state space model is deﬁned in three parts:→ State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),→ Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ),→ Initial state distribution α1 ∼ N (a1 , P1 ).Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
4. 4. State Space Model• state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply;• state vector αt evolves as a VAR(1) process;• system matrices usually contain unknown parameters;• estimation has therefore two aspects: • measuring the unobservable state (prediction, ﬁltering and smoothing); • estimation of unknown parameters (maximum likelihood estimation);• state space methods oﬀer a uniﬁed approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-ﬁtting and many ad-hoc ﬁlters;• next, some well-known model speciﬁcations in state space form ... 4 / 35
5. 5. Regression with time-varying coeﬃcientsGeneral state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I ,Result is regression model yt = Xt βt + εt with time-varyingcoeﬃcient βt = αt following a random walk.In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xt β + εt with β = α1 .Many dynamic speciﬁcations for the regression coeﬃcient can beconsidered, see below. 5 / 35
6. 6. AR(1) process in State Space FormConsider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
7. 7. AR(2) in State Space FormConsider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state spaceform: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
8. 8. MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θwe have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
9. 9. General ARMA in State Space FormConsider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θAll ARIMA(p, d, q) models have a (non-unique) state spacerepresentation. 9 / 35
10. 10. UC models in State Space FormState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .LL model µt+1 = µt + ηt and yt = µt + εt : 2 αt = µt , Tt = 1, Rt = 1, Qt = ση , 2 Zt = 1, Ht = σε .LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt andyt = µt + εt : 2 ση 0 µt 1 1 1 0αt = , Tt = , Rt = , Qt = 2 , βt 0 1 0 1 0 σξ 2 Zt = 1 0 , Ht = σε . 10 / 35
11. 11. UC models in State Space FormState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,S(L)γt+1 = ωt and yt = µt + γt + εt : ′αt = µt βt γt−1 γt−2 , γt     1 1 0 0 0  2  1 0 0 0 1 0 0 0 ση 0 0 0 1 0 2   Tt = 0  0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0   0 1 ,  0 0 1 0 0 0 0 σω2 0 0 0 0 0 0 1 0 0 0 0 2Zt = 1 0 1 0 0 , Ht = σε . 11 / 35
12. 12. Kalman Filter• The Kalman ﬁlter calculates the mean and variance of the unobserved state, given the observations.• The state is Gaussian: the complete distribution is characterized by the mean and variance.• The ﬁlter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained.• To start the recursion, we need a1 and P1 , which we assumed given.• There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
13. 13. Kalman FilterThe unobserved state αt can be estimated from the observationswith the Kalman ﬁlter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
14. 14. Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y ), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xyDeﬁne e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ),and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
15. 15. Lemma II : an extensionThe proof of the Kalman ﬁlter uses a slight extension of the lemmafor multivariate Normal regression theory.Suppose x, y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xzCan you give a proof of Lemma II ?Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
16. 16. Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, deﬁne at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
17. 17. Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
18. 18. Kalman Filter : derivation• Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ?• The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman ﬁlter for the Local Level Model (DK, Chapter 2). 18 / 35
19. 19. Kalman Filter Illustration 10000 observation filtered level a_t state variance P_t1250 90001000 8000 750 7000 6000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 prediction error v_t prediction error variance F_t 25000 250 24000 0 23000 22000−250 21000 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 19 / 35
20. 20. Prediction and FilteringThis version of the Kalman ﬁlter computes the predictions of thestates directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).The ﬁltered estimates are deﬁned by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ),for t = 1, . . . , n.The Kalman ﬁlter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .Compare Mt with Kt . Verify this version of KF using earlierderivations. 20 / 35
21. 21. Smoothing • The ﬁlter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the ﬁltered estimates are calculated, the smoothing recursion starts at the last observations and runs until the ﬁrst. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt .Starting with rn = 0, Nn = 0, the smoothing recursions are givenby rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
22. 22. Smoothing Illustration observations smoothed state 4000 V_t1250 35001000 3000 750 2500 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 0.02 r_t 0.000100 N_t 0.000075 0.00 0.000050−0.02 0.000025 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 22 / 35
23. 23. Filtering and Smoothing1400 observation filtered level smoothed level1300120011001000900800700600500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 23 / 35
24. 24. Missing ObservationsMissing observations are very easy to handle in Kalman ﬁltering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normalThe ﬁlter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations. 24 / 35
25. 25. Missing Observations observation a_t12501000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 P_t300002000010000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 25 / 35
26. 26. Missing Observations, Filter and Smoohter filtered state P_t1250 300001000 20000750 10000500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 10000 smoothed state V_t1250 75001000 5000750500 2500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 26 / 35
27. 27. ForecastingForecasting requires no extra theory: just treat future observationsas missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
28. 28. Forecasting observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 200050000 P_t40000300002000010000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 28 / 35
29. 29. Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1In the state space model, p(yt |Yt−1 ) is a Gaussian density withmean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1with vt and Ft from the Kalman ﬁlter. This is called the predictionerror decomposition of the likelihood. Estimation proceeds bynumerically maximising log L. 29 / 35
30. 30. ML Estimate of Nile Data1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate)1300120011001000900800700600500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
31. 31. Diagnostics• Null hypothesis: standardised residuals √ vt / F t ∼ NID(0, 1)• Apply standard test for Normality, heteroskedasticity, serial correlation;• A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers;• Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
32. 32. Nile Data Residuals Diagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5−2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot0.5 N(s=0.996) normal 20.40.3 00.20.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
33. 33. Assignment 1 - Exercises 1 and 21. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases.2. Derive a Kalman ﬁlter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
34. 34. Assignment 1 - Exercise 33.Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way. • How would you introduce Xt as regression eﬀects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ?For the last two questions, please also consider maximumlikelihood estimation of coeﬃcients within the system matrices. 34 / 35
35. 35. Assignment 1 - Computing work• Consider Chapter 2 of Durbin and Koopman book.• There are 8 ﬁgures in Chapter 2.• Write computer code that can reproduce all these ﬁgures in Chapter 2, except Fig. 2.4.• Write a short documentation for the Ox program. 35 / 35