How to do quick user assign in kanban in Odoo 17 ERP
Time series forecasting with ARIMA
1. Time series forecasting ARIMA regARIMA
Statistical Data Analysis
9. Time series, part 1.
Evgeniy Riabenko
riabenko.e@gmail.com
2018
Evgeniy Riabenko SDA-9. Time series, part 1.
2. Time series forecasting ARIMA regARIMA
Time series forecasting
Time series: y1, . . . , yT , . . . , yt ∈ R, values of a feature measured over
constant time intervals.
To forecast we need to nd a function fT :
yT +d ≈ fT (yT , . . . , y1, d) ≡ ˆyT +d|T ,
where d ∈ {1, . . . , D}, D forecast horizon.
Evgeniy Riabenko SDA-9. Time series, part 1.
3. Time series forecasting ARIMA regARIMA
Prediction intervals
Example: in April 1997 there was a ood in Grand Forks, North Dakota.
The city was protected with a dam 51 feet high; according to the forecast,
ood level should have been 49 feet; actual ood was 54 feet high.
50000 citizens were evacuated, 75% of buildings damaged, total damages
$3.5 billion.
Historically the accuracy of the forecasts there was ±9 feet.
Evgeniy Riabenko SDA-9. Time series, part 1.
4. Time series forecasting ARIMA regARIMA
Regression
Simple idea: regression on time.
Residuals do not look like noise:
Evgeniy Riabenko SDA-9. Time series, part 1.
5. Time series forecasting ARIMA regARIMA
Wine sales in Australia
Evgeniy Riabenko SDA-9. Time series, part 1.
6. Time series forecasting ARIMA regARIMA
Sales in the adjacent months
Evgeniy Riabenko SDA-9. Time series, part 1.
7. Time series forecasting ARIMA regARIMA
Sales 1 month apart
Evgeniy Riabenko SDA-9. Time series, part 1.
8. Time series forecasting ARIMA regARIMA
Sales 2 months apart
Evgeniy Riabenko SDA-9. Time series, part 1.
9. Time series forecasting ARIMA regARIMA
Sales 1 year apart
Evgeniy Riabenko SDA-9. Time series, part 1.
10. Time series forecasting ARIMA regARIMA
Autocorrelation function (ACF)
Time series values are autocorrelated.
Autocorrelation:
rτ = rytyt+τ =
T −τ
t=1
(yt − ¯y) (yt+τ − ¯y)
T
t=1
(yt − ¯y)2
, ¯y =
1
T
T
t=1
yt.
rτ ∈ [−1, 1] , τ lag.
Testing if it's dierent from zero:
time series: Y T
= Y1, . . . , YT ;
null hypothesis: H0 : rτ = 0;
alternative: H1 : rτ = 0;
statistic: T Y T
= rτ
√
T −τ−2√
1−r2
τ
;
null distribution: St (T − τ − 2).
Evgeniy Riabenko SDA-9. Time series, part 1.
11. Time series forecasting ARIMA regARIMA
Autocorrelation function (ACF)
Correlogram:
Evgeniy Riabenko SDA-9. Time series, part 1.
12. Time series forecasting ARIMA regARIMA
Time series components
Trend long-term level shift
Seasonality cyclic level uctuations with xed period
Cycle uctuations that are not of a xed frequency (economic cycles, solar
activity)
Error unforecastable random component
Evgeniy Riabenko SDA-9. Time series, part 1.
13. Time series forecasting ARIMA regARIMA
Time series components
Evgeniy Riabenko SDA-9. Time series, part 1.
14. Time series forecasting ARIMA regARIMA
Time series components
Evgeniy Riabenko SDA-9. Time series, part 1.
15. Time series forecasting ARIMA regARIMA
Time series components
Evgeniy Riabenko SDA-9. Time series, part 1.
16. Time series forecasting ARIMA regARIMA
Time series components
STL decomposition:
Evgeniy Riabenko SDA-9. Time series, part 1.
17. Time series forecasting ARIMA regARIMA
Removing seasonality
Sometimes seasonality is removed to ease interpretation:
Evgeniy Riabenko SDA-9. Time series, part 1.
18. Time series forecasting ARIMA regARIMA
Calendar eects
Sometimes a time series could be simplied by taking into account the
non-uniform length of intervals:
Evgeniy Riabenko SDA-9. Time series, part 1.
19. Time series forecasting ARIMA regARIMA
Stationarity
Time series y1, . . . , yT is stationary, if ∀s the distribution of yt, . . . , yt+s does
not depend on t, i.e. the series properties does not change over time.
trend ⇒ nonstationarity
seasonality ⇒ nonstationarity
cycle nonstationarity (we cannot be sure where maximums and
minimums would be)
Evgeniy Riabenko SDA-9. Time series, part 1.
20. Time series forecasting ARIMA regARIMA
Stationarity
Evgeniy Riabenko SDA-9. Time series, part 1.
21. Time series forecasting ARIMA regARIMA
Stationarity
Nonstationary because of seasonality:
Evgeniy Riabenko SDA-9. Time series, part 1.
22. Time series forecasting ARIMA regARIMA
Stationarity
Nonstationary because of trend:
Evgeniy Riabenko SDA-9. Time series, part 1.
23. Time series forecasting ARIMA regARIMA
Stationarity
Nonstationary because of changing variance:
Evgeniy Riabenko SDA-9. Time series, part 1.
24. Time series forecasting ARIMA regARIMA
Stationarity
Stationary:
Evgeniy Riabenko SDA-9. Time series, part 1.
25. Time series forecasting ARIMA regARIMA
Residuals
Residuals forecast errors:
ˆεt = yt − ˆyt.
(from http://robjhyndman.com/talks/Eindhoven2017.pdf)
Evgeniy Riabenko SDA-9. Time series, part 1.
26. Time series forecasting ARIMA regARIMA
Necessary properties of error
Unbiasedness:
Evgeniy Riabenko SDA-9. Time series, part 1.
27. Time series forecasting ARIMA regARIMA
Necessary properties of error
The absence of autocorrelation:
Evgeniy Riabenko SDA-9. Time series, part 1.
28. Time series forecasting ARIMA regARIMA
Necessary properties of error
Stationarity:
Evgeniy Riabenko SDA-9. Time series, part 1.
29. Time series forecasting ARIMA regARIMA
Desirable properties of error
Normality:
Evgeniy Riabenko SDA-9. Time series, part 1.
30. Time series forecasting ARIMA regARIMA
Testing the error
unbiasedness Wilcoxon test
stationarity visual analysis, KPSS test
uncorrelatedness correlogram, Ljung-Box test
normality q-q plot, Shapiro-Wilk test
Evgeniy Riabenko SDA-9. Time series, part 1.
31. Time series forecasting ARIMA regARIMA
KPSS (Kwiatkowski-Philips-Schmidt-Shin) test
forecast errors: εT
= ε1, . . . , εT
null hypothesis: H0 : ε is stationary
alternative: H1 : εt = αεt−1
statistic: KPSS εT
= 1
T 2
T
i=1
i
t=1
εt
2
λ2
null distribution: tabulated
Other stationarity tests: Dickey-Fuller , Phillips-Perron, and many, many more
(e.g., Patterson K. Unit root tests in time series, volume 1: key concepts and
problems, 2011).
Evgeniy Riabenko SDA-9. Time series, part 1.
32. Time series forecasting ARIMA regARIMA
Ljung-Box test
forecast errors: εT
= ε1, . . . , εT
null hypothesis: H0 : r1 = · · · = rL = 0
alternative: H1 : H0 is false
statistic: Q εT
= T (T + 2)
L
τ=1
r2
τ
T −τ
null distribution: χ2
L−K , K number of tted parameters
of the forecasting model.
Evgeniy Riabenko SDA-9. Time series, part 1.
33. Time series forecasting ARIMA regARIMA
Variance stabilizing transformation
If data shows variation that increases or decreases with the level of the series,
then a VST can be useful.
Logarithmic transformation often works:
Evgeniy Riabenko SDA-9. Time series, part 1.
34. Time series forecasting ARIMA regARIMA
Variance stabilizing transformation
Box-Cox transformation:
yt =
ln yt, λ = 0,
yλ
t − 1 /λ, λ = 0.
Evgeniy Riabenko SDA-9. Time series, part 1.
35. Time series forecasting ARIMA regARIMA
Variance stabilizing transformation
Forecasts for transformed series should be transformed back to the original
scale:
ˆyt =
exp (ˆyt) , λ = 0,
(λˆyt + 1)
1/λ
, λ = 0.
If some yt ≤ 0, we need to add a constant to the series before applying
Box-Cox transformation
λ could be rounded for better interpretability
VST usually have small eect on point forecasts and large on prediction
intervals
Evgeniy Riabenko SDA-9. Time series, part 1.
36. Time series forecasting ARIMA regARIMA
Dierencing
Dierencing a time series computing the dierences between consecutive
observations:
yt = yt − yt−1
stabilizes the mean level of series and eradicates trend
could be applied several times
Evgeniy Riabenko SDA-9. Time series, part 1.
37. Time series forecasting ARIMA regARIMA
Dierencing
KPSS test: p 0.01 for the original series, p 0.1 after dierencing.
Evgeniy Riabenko SDA-9. Time series, part 1.
38. Time series forecasting ARIMA regARIMA
Seasonal dierencing
Seasonal dierencing computing the dierences between every observation
and the last observation from the same season:
yt = yt − yt−s.
removes seasonality
seasonal and normal dierencing could be applied in any order
if the series have prominent seasonality, it is recommended to start with
seasonal dierencing
Evgeniy Riabenko SDA-9. Time series, part 1.
39. Time series forecasting ARIMA regARIMA
Seasonal dierencing
KPSS test:
p 0.01 fortheoriginalseries,
p 0.01 after log,
p 0.1 afterseasonaldierencing.
Evgeniy Riabenko SDA-9. Time series, part 1.
40. Time series forecasting ARIMA regARIMA
Seasonal dierencing
KPSS test:
p 0.01 for the original series,
p 0.01 after log,
p = 0.0355 afterseasonaldierencing,
p 0.1 afteronemoredierencing.
Evgeniy Riabenko SDA-9. Time series, part 1.
41. Time series forecasting ARIMA regARIMA
Autoregression
What if we regress y on its own values in the past?
yt = α + φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt
Autoregression model of order p (AR(p)):
yt a linear combination of last p values of the variable and a random noise
component.
Evgeniy Riabenko SDA-9. Time series, part 1.
42. Time series forecasting ARIMA regARIMA
Moving average
Let's generate noise εt i.i.d. over time:
Evgeniy Riabenko SDA-9. Time series, part 1.
43. Time series forecasting ARIMA regARIMA
Moving average
Averages of 2 consecutive time points:
Evgeniy Riabenko SDA-9. Time series, part 1.
44. Time series forecasting ARIMA regARIMA
Moving average
Averages of 3 consecutive time points:
Evgeniy Riabenko SDA-9. Time series, part 1.
45. Time series forecasting ARIMA regARIMA
Moving average
Averages of 4 consecutive time points:
Evgeniy Riabenko SDA-9. Time series, part 1.
46. Time series forecasting ARIMA regARIMA
Moving average
Generalization with weights:
yt = α + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−q
Moving average model of order q (MA(q)):
yt a linear combination of last q noise components.
Evgeniy Riabenko SDA-9. Time series, part 1.
47. Time series forecasting ARIMA regARIMA
ARMA (Autogerressive moving average)
ARMA(p, q) model:
yt = α + φ1yt−1 + · · · + φpyt−p + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−q
Wold's theorem: every stationary time series could be approximated with
ARMA(p, q) model with any predetermined accuracy.
Evgeniy Riabenko SDA-9. Time series, part 1.
48. Time series forecasting ARIMA regARIMA
ARIMA (Autoregressive integrated moving average)
ARIMA(p, d, q) ARMA(p, q) for a series that has been dierenced d times.
Evgeniy Riabenko SDA-9. Time series, part 1.
49. Time series forecasting ARIMA regARIMA
Seasonal ARMA/ARIMA
Say a time series has seasonality with period S.
Take ARMA(p, q):
yt = α + φ1yt−1 + · · · + φpyt−p + εt + θ1εt−1 + · · · + θqεt−q
and add P last seasonal autoregressive components:
+φSyt−S + φ2Syt−2S + · · · + φP Syt−P S
and Q last seasonal moving average components:
+θSεt−S + θ2Sεt−2S + · · · + θQSεt−QS.
This is SARMA(p, q) × (P, Q) model.
SARIMA(p, d, q) × (P, D, Q) is SARMA(p, q) × (P, Q) for a series that has
been dierenced d times normally and D times seasonally.
Often called just ARIMA.
Evgeniy Riabenko SDA-9. Time series, part 1.
50. Time series forecasting ARIMA regARIMA
Fitting the model
Parameters to tune:
α, φ, θ
d, D
q, Q
p, P
Evgeniy Riabenko SDA-9. Time series, part 1.
51. Time series forecasting ARIMA regARIMA
Fitting the model
α, φ, θ:
If the rest is xed, regression coecients are obtained by OLS.
To estimate θ, the error component is pre-estimated with residuals from
autoregression with small p.
If the noise is white, the estimates are MLE.
d, D:
Orders of dierencing are selected such that the resulting series is
stationary
Again: if the seasonality is prominent, better start with seasonal
dierencing
The less we dierence, the less would be the variance of the forecast
Evgeniy Riabenko SDA-9. Time series, part 1.
52. Time series forecasting ARIMA regARIMA
q, Q, p, P
Hyperparameters could not be selected from maximum likelihood principle:
L always grows with q, Q, p, P
To compare models with dierent number of parameters one could use
Akaike's information criteria:
AIC = −2 log L + 2k,
k = P + Q + p + q + 1 number of parameters in the model
Initial approximations for q, Q, p, P could be selected from autocorrelations
Evgeniy Riabenko SDA-9. Time series, part 1.
53. Time series forecasting ARIMA regARIMA
q, Q
Q ∗ S number of the last signicant seasonal lag (here 1*12).
q number of the last signicant nonseasonal lag (here 2).
q should be less that S if Q 0
Evgeniy Riabenko SDA-9. Time series, part 1.
54. Time series forecasting ARIMA regARIMA
p, P
Partial autocorrelation:
φhh =
r (yt+1, yt) , h = 1,
r (yt+h − ˆyt+h, yt − ˆyt) , h ≥ 2,
where ˆyt+h and ˆyt are tted regression estimates of yt+h and yt on
yt+1, yt+2, . . . , yt+h−1:
ˆyt = β1yt+1 + β2yt+2 + · · · + βh−1yt+h−1,
ˆyt+h = β1yt+h−1 + β2yt+h−2 + · · · + βh−1yt+1.
Evgeniy Riabenko SDA-9. Time series, part 1.
55. Time series forecasting ARIMA regARIMA
p, P
P ∗ S number of the last signicant seasonal lag (here 0).
p number of the last signicant nonseasonal lag (here 11).
p should be less that S if P 0
Evgeniy Riabenko SDA-9. Time series, part 1.
56. Time series forecasting ARIMA regARIMA
Forecasting with ARIMA model
1 Look at the plot
2 Apply variance stabilizing transformation if necessary
3 Select orders of dierencing d and D
4 Initialize p, q, P, Q by ACF/PACF analysis
5 Fit candidate models, compare their AIC values, select the winner
6 Check the residuals
Evgeniy Riabenko SDA-9. Time series, part 1.
57. Time series forecasting ARIMA regARIMA
Forecasting
yt = ˆα + ˆφ1yt−1 + · · · + ˆφpyt−p + εt + ˆθ1εt−1 + · · · + ˆθqεt−q
Replace t with T + 1:
ˆyT +1|T = ˆα + ˆφ1yT + · · · + ˆφpyT +1−p + εT +1 + ˆθ1εT + · · · + ˆθqεT +1−q
Replace future errors with zeroes:
ˆyT +1|T = ˆα + ˆφ1yT + · · · + ˆφpyT +1−p + ˆθ1εT + · · · + ˆθqεT +1−q
Replace past errors with residuals:
ˆyT +1|T = ˆα + ˆφ1yT + · · · + ˆφpyT +1−p + ˆθ1 ˆεT + · · · + ˆθq ˆεT +1−q
In ˆyT +2|T formula there is an unknown value of yT +1:
ˆyT +2|T = ˆα + ˆφ1yT +1 + · · · + ˆφpyT +2−p + ˆθ1 ˆεT +1 + · · · + ˆθq ˆεT +2−q
Such values should be replaced with their forecasts (yT +1 → ˆyT +1|T ).
Evgeniy Riabenko SDA-9. Time series, part 1.
58. Time series forecasting ARIMA regARIMA
Prediction intervals
If the error is Gaussian and stationary, prediction intervals could be calculated
from analytical formulas. E.g., for the next-step forecast the interval is
ˆyT +1|T ± 1.96ˆσε.
If the normality and/or stationarity are rejected, prediction intervals could be
simulated.
Evgeniy Riabenko SDA-9. Time series, part 1.
59. Time series forecasting ARIMA regARIMA
auto.arima
One function to select and t arima:
auto.arima(y, d=NA, D=NA, max.p=5, max.q=5, max.P=2, max.Q=2,
max.order=5, max.d=2, max.D=1, start.p=2, start.q=2,
start.P=1, start.Q=1, stationary=FALSE, seasonal=TRUE,
ic=c(aicc,aic, bic), stepwise=TRUE, trace=FALSE,
approximation=(length(x)150 | frequency(x)12),
truncate=NULL, xreg=NULL, test=c(kpss,adf,pp),
seasonal.test=c(seas, ocsb, hegy, ch),
allowdrift=TRUE, allowmean=TRUE, lambda=NULL,
biasadj = FALSE, parallel=FALSE, num.cores=2, ...)
One function to forecast:
forecast(object, h=ifelse(frequency(object)1,2*frequency(object),10),
level=c(80,95), fan=FALSE, robust=FALSE, lambda=NULL,
biasadj = FALSE, find.frequency=FALSE,
allow.multiplicative.trend=FALSE, model = NULL, ...)
Evgeniy Riabenko SDA-9. Time series, part 1.
60. Time series forecasting ARIMA regARIMA
auto.arima
Evgeniy Riabenko SDA-9. Time series, part 1.
61. Time series forecasting ARIMA regARIMA
Holidays
Daily electricity consumption in Turkey:
Drops correspond to islamic holidays (based on Islamic Hijri calendar with
a year approximately 11 days shorter that of the Gregorian calendar)
Evgeniy Riabenko SDA-9. Time series, part 1.
62. Time series forecasting ARIMA regARIMA
SARIMAX/regARIMA
yt =
k
j=1
βjxjt + zt,
zt = α + φ1zt−1 + . . . + φpzt−p+
+ θ1εt−1 + . . . + θqεt−q+
+ φSzt−S+ . . . + φP Szt−P S+
+ θSεt−S + . . . + θP Sεt−P S + εt.
Estimation: https://otexts.org/fpp2/estimation.html
xreg parameter in auto.arima and Arima.
Evgeniy Riabenko SDA-9. Time series, part 1.
63. Time series forecasting ARIMA regARIMA
Reference
Hyndman R.J., Athanasopoulos G. Forecasting: principles and practice.
OTexts, https://www.otexts.org/book/fpp2
Evgeniy Riabenko SDA-9. Time series, part 1.