3. Introduction
• Motivation: In traffic speed prediction domain, there exists survey paper
that summarize the models in high-level, but there are no paper that
investigate the deep learning models deeply
• Contribution: Deep investigation of deep learning models + Find the
contribution of all model and validate the contribution
2
4. Method for feature extraction: spatial
• Convolutional Neural Network
Simply apply Convolution on grid map
Cannot model the road network correctly
Task will be easy(input is mapped to grid map)
• Graph-Convolutional Network
Can capture the road network
Limitation: only capture the neighbors in K-steps
Spectral Domain(Similar to frequency domain)
• apply the filter to the graph signal
• should convert adjacency matrix to Laplacian matrix
Spatial Domain
• Aggregation of the neighborhood node
• Directly use adjacency matrix without transform
3
5. Method for feature extraction: temporal
• Recurrent Neural Network
LSTM
• Standard model in RNN
• Model complexity is in both memory and time(6 calculations per unit, with hidden/cell state)
GRU
• Modified version of LSTM
• Simple than LSTM(4 calculations per unit, no cell state)
4
LSTM unit GRU unit
6. Method for feature extraction: temporal
• Convolutional Neural Network
simple architecture – Faster than RNN
without dilation
• Default CNN, convolution to time sequence
(e.g. shape=(time sequence, node) filter=(m,1))
• Only capture the field with same size to filters
with dilation
• Zero padding to filter
• Can check more field than default CNN
Can extract overall features in long-term
5
7. Method for feature extraction: temporal
• Graph-Convolutional Network
Song et al. proposed localized spatial-temporal
adjacency matrix can model both spatial and
temporal in one graph convolution
Adjacency matrix shape (3N,3N) much slower
• Attention
Apply attention mechanisms + some type of neural networks
Can validate importance of the sequence
6
C. Song, Y. Lin, S. Guo, and H. Wan, “Spatial-temporal sychronous graph convolutional networks: A new framework
for spatialtemporal network data forecasting."
8. Summary of Models: DCRNN
• Spatial: GCN(Diffusional convolutional network)
• Temporal: RNN(sequence to sequence, GRU)
• Most basic model in this area
• Simply change concatenate operation to GCN
7
GCN applied
Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural networks: data-driven traffic
forecasting,” in Proceedings of the International Conference on Learning Representations, 2018.
9. Summary of Models: STGCN
• Spatial: GCN
• Temporal: Default CNN
• To predict 12 sequences, use many-to-one architecture 12 times
8
B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: a deep learning framework for traffic
forecasting,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 3634–3640
10. Summary of Models: ASTGCN
• Spatial: Attention + GCN
• Temporal: Attention + CNN
• Using attention
capture the features with attention weights
• Blending Spatial/Temporal attention
(Spatial attention=SAttn(TAttn(x)))
• Use 3 type of input – recent, daily, weekly
9
S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatialtemporal graph convolutional networks for traffic
flow forecasting,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 922–929.
11. Summary of Models: STSGCN
• Spatial: GCN
• Temporal: GCN
• Use Localized Spatial-Temporal Graph
to catch spatial-temporal correlation
• By using that, model can simultaneously
capture both spatial and temporal features
• To capture the heterogeneity in long-term spatial temporal dependency,
use individual modules for different time step
10
C. Song, Y. Lin, S. Guo, and H. Wan, “Spatial-temporal sychronous graph convolutional
networks: A new framework for spatialtemporal network data forecasting”
12. Summary of Models: STSGCN
• Spatial: GCN
• Temporal: GCN
• Use Localized Spatial-Temporal Graph
to catch spatial-temporal correlation
• By using that, model can simultaneously capture both spatial and
temporal features
• To capture the heterogeneity in long-term spatial temporal dependency,
use individual modules for different time step
11
C. Song, Y. Lin, S. Guo, and H. Wan, “Spatial-temporal sychronous graph convolutional
networks: A new framework for spatialtemporal network data forecasting”
13. Summary of Models: Graph-WaveNet
• Spatial: GCN
• Temporal: dilated CNN
• To capture the hidden graph structure, add adaptive adjacency matrix
that can be trained
• Simple and fast
12
Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for deep spatial-temporal graph modeling,” in
Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2019, pp. 1907–1913
14. Additional Experiments: Model Comparison
• Temporal: RNN-based(DCRNN) is much slower than CNN-based
• Sequence to sequence model is slow(DCRNN, STGRAT)
• All of these models are evaluated in same epoch and batch size, but not
fair because of the model architecture(e.g. ASTGCN use 3 types of inputs,
but other models only use 1 type)
should conduct more experiment with fair setting
• To evaluate the contribution of model, should modify the model
13
15. Additional Experiments
• At first, DCRNN and STGCN is not considered in this experiment
DCRNN: First model that utilizes GCN in traffic domain
STGCN: Bad architecture – many to one recursion 12 times
• ASTGCN
Fairness: modify the model to get only recent 12 sequences as input
Contribution: change spatial-temporal blending method to other methods
(e.g. CNN, simple concat, do spatial and temporal in linear(no blend))
• STSGCN
Contribution:
• change individual module to shared one
• Localized spatial-temporal graph to only spatial graph
• Graph-WaveNet
Contribution: Use trained adaptive adjacency matrix to other models
14
16. ToDo
• Experiment
STSGCN is not reproduced as paper shows – hyperparameter tuning needed
Additional Experiments - STSGCN
with other models – ST-MetaNet, ST-ResNet, GMAN, STGRAT
• Write model summary + contribution discussion documents
• Find
15