Indaba Malawi workshop on basic approaches to time series data, including ARIMA models and SSA models. Example in R includes an agricultural example from historical Malawi data with Rssa package and base ARIMA models.
1. An introduction
to time series
data with R and
Malawian
agricultural
data
COLLEEN M. FARRELLY,
MACHINE LEARNING LEAD
2. Who Am I?
Consulting machine learning lead, Mpuza
Industry researcher in topological data analysis, natural
language processing, and time series analytics
Co-author of The Shape of Data (No Starch Press)
4. Time
Dependency
Caveat…
Future system behavior depends on
current and past system states…
◦ Not independent data points
◦ Limits usage of machine learning
◦ Limits accuracy of far-off predictions
5. Analyzing Time
Series Data:
ARIMA
Moving averages
◦ Many types of models based on
averages over time
◦ Many that add in an autoregressive
piece to account for correlations
across time periods
◦ Prediction based on autoregression
and moving average
6. Analyzing
Time Series
Data: SSA
Another approach is to decompose
the time series (singular spectrum
analysis):
◦ Embed the time series
◦ Perform spectral decomposition
(singular value decomposition…)
◦ Group eigentriples and average
across the matrix diagonal
◦ Linear prediction of future time
periods
7. More
Advanced
Methods
Time-lag components
in machine learning
models (deep learning,
KNN, random forest…)
Partial differential
equation models (SIR
models for epidemics…)
Forecasting future
values
Detecting changes in
system behavior
8. Example Dataset: ARIMA and SSA
1961-2013 agricultural
land usage in Malawi
Cleaned up from World
Bank’s data on climate
change indicators by
country (Humanitarian
Data Exchange)