Machine Learning for Time Series, Strata London 2018

MACHINE
LEARNING FOR
TIME SERIES
DR. MIKIO L. BRAUN
AI ARCHITECT AT ZALANDO
@mikiobraun
WHAT WORKS AND WHAT DOESN’T
STRATA DATA LONDON, MARCH 23, 2018

!2
TIME SERIES ANALYSIS
MIKIO BRAUN, MACHINE LEARNING FOR TIME SERIES: WHAT WORKS AND WHAT DOESN'T, STRATA DATA 2018 LONDON

!3
TIME SERIES APPLICATIONS

!4
MACHINE LEARNING FOR TIME SERIES

!5
CLASSICAL METHODS
Strong assumptions on stationarity. Predictions as linear combinations of past data / i.i.d. noise.

!6
ESTIMATING WITH THE BOX-JENKINS PROGRAM

!7
• Solid theoretical background.
• Very explicit modeling.
• A lot of control as it is a manual process.
• Bayesian version available to provide uncertainty
estimates.
WHAT WORKS

!8
CHALLENGES: SEASONALITY & NON-STATIONARITY
In reality, data is seldom stationary,
but shows trends, seasonality,
cycles, ... .
In the classical approach, these are
manually removed first.

!9
DIFFERENCING AND SCALING
• Running means.
• De-trending by differencing.
• Variance stabilization by log, square root, Box-
Cox transformation.

!10
• What if assumptions do not hold?
• Stationarity is a rather strong requirement.
• Linear autoregressive models are somewhat “boring.”
CLASSICAL METHODS: WHAT DOESN’T WORK SO WELL

!11
MORE GENERAL MACHINE LEARNING APPROACH
Be explicitly collecting the past of a point, we can
construct a supervised learning setting.
Still different as points are highly correlated.
Can use any number of methods (linear, SVMs,
neural networks, …)
Easily extends to other areas as well:
• Multiple input variables.
• Multiple output variables.
• Additional variables to feed into the model.

!12
EVALUATION AND CROSS-VALIDATION WITH TIME SERIES DATA
In ML, one often uses cross-validation to
estimate performance on future data.
Since time series data is highly correlated, one
cannot sample test data at random but should
sample block-wise.

!13
CHALLENGES: EXTEND PREDICTION
Prediction can be done either one point at a
time, using test data as past values as they
become available.
Or one can use the predictions themselves,
which leads to much less stable predictions.

!14
CLICK DATA & BEYOND SIMPLE TIME SERIES MODELS
Another interesting data source is event
data (click data, customer actions, …).
These show very similar properties: strong
dependence, predictions depend on past,
etc.
Often, data needs to be summarized and
transformed to get good predictions.

!15
• Aggregate histograms over time scales.
• Transform into Fourier Space.
• Apply bandpass / low pass / high pass filter.
• Intelligent filtering: independent component analysis,
canonical correlation analysis.
• Downside: Quite costly to retrain on each iteration.
FEATURE ENGINEERING FOR TIME SERIES

!16
DEEP LEARNING: LONG SHORT TERM MEMORY
Recurrent neural network base predictions on past
data point and hidden state.
Hidden state can aggregate features automatically.
LSTM is a particularly flexible variant that has
(learnable) gates and transformations to control how
hidden state is updated.

!17
APPLICATION: ANALYZING USER ACTIONS @ ZALANDO
• Goal is to predict buy probability based on user
histories.
• Before: many handcrafted features + logistic
regression
• Drawback: retune all the features again and again
• With DL: embedding of user histories in a RNN plus
user specific features.
• Performs already pretty well.
Lang, Rettenmeier: „Understanding Customer
Behavior with Recurrent Neural Networks“, MLRec
2017

!18
DEEP LEARNING FOR CUSTOMER ACTIONS @ ZALANDO

!19
APPLICATION: DEMAND PREDICTION FOR RARE EVENTS @ UBER
Uber is interested in having reliable models also during extreme events like Thanksgiving or
New Year's Day—which have little coverage in usual data.
https://eng.uber.com/neural-networks/

!20
DEMAND PREDICTION AT UBER: THE DATA
Available data uses a number of exogenous features like weather, app views.

!21
ARCHITECTURE: TIME SERIES AUTOENCODERS
Combination of a stacked LSTM autoencoder to
capture general dynamics and informative
features.
These are then concatenated with the actual
input and put into another LSTM forecast
network.

!22
APPLICATION: DEMAND PREDICTION @ AMAZON, MANY TIME SERIES & PROBABILITIES
https://arxiv.org/abs/1704.04110
Challenges of predicting article
demand over thousands of articles:
• Numbers on many scales.
• Amount of available data varies.
• We want probability distributions in
predictions.
• Predictions ahead in time.

!23
• Use LSTM to learn interactions in the time
series.
• LSTMs also propagate knowledge about
dynamics to data points with few data
points.
• LSTM predicts parameters of distributions
in each point.
• Pre- & post-scale time series.
DEEP AR @ AMAZON

!24
• Procedure:
1. Predict parameter
2. Compute likelihood
3. Sample next point
• Train by maximizing
likelihood.
• Train directly on requested
prediction into the future.
• Sample points to go into the
future.
DEEP AR: TRAINING

!25
SUMMARY
Classical Timer Series
Models
General Machine
Learning
Feature Engineering Deep Learning
Use to get started.
Use if explicit modeling is
good.
If you are unsure about
modeling assumptions.
But: use proper validation
to ensure good
performance.
For more complex data.
If you have a priori
knowledge about the
domain.
If you have a lot of data.
If you frequently want to
iterate & experiment.
If explicit modeling &
feature engineering is too
costly.

Machine Learning for Time Series, Strata London 2018

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Machine Learning for Time Series, Strata London 2018

Similar a Machine Learning for Time Series, Strata London 2018 (20)

Más de Mikio L. Braun

Más de Mikio L. Braun (7)

Último

Último (20)

Machine Learning for Time Series, Strata London 2018