2. Relationship to Research
Currently analyzing the performance of NEAT for Time Series
Forecasting (TSF)
Paper summarizes common approaches, and issues, using ANNs for
TSF
2 / 19
3. Claims of the Paper
Develops an automatic TSF model using a Generalized Regression
Neural Network (GRNN)
Shows promising results by winning NN3 time-series competition
against 60 different models
3 / 19
4. General Problems with ANN
Most approaches are ad hoc meaning they do some type of
preprocessing of the data
Typically try different ANN architectures to see which one performs
better
Nelson et al. : ANN inconsistency on TSF is the result of different
preprocessing strategies
Balkin et al. : ANNs require larger number of samples to be trained.
Real-world examples, financial etc., are short training samples.
4 / 19
5. RBF
RBF can be viewed as local linear regression model
Apply Gaussian kernel to input data. All inputs go to node of form:
G(x) = exp
−x − c
σ2
(1)
Find center points by assigning c (center point) to each point in data
set (measuring the distance to center point).
This is equivalent to doing a local regression (sigma affects the
smoothing of the approximation).
Output layer (the weights) are trained using least-squares regression
5 / 19
6. Generalized Definition for Regression
Computation of most probable value of Y for each value of X based
on finite number of possibly noisy measurements of X
Conditional mean of y given X (regression of y on X ) is given by:
E[y|X] =
∞
−∞
yf (X, y)dy
∞
−∞
f (X, y)dy
(2)
Since we don’t typically know the density function f(X, y) it can be
estimated using a Parzen window density estimator.
6 / 19
7. Generalized Definition for Regression
The generalized definition yields the following regression function:
ˆY (X) =
n
i=1
Y i exp −
D2
i
2σ2
n
i=1
exp −
D2
i
2σ2
(3)
Where D2
i = (X − Xi
)T (X − Xi
)
In the case of GRNN X is the input data and Xi
are the centers.
7 / 19
8. GRNN
G(x, xi ) are the standard radial basis functions
wi is the generalized regression equation
The spread factor dictates the performance
8 / 19
9. Claimed Benefits of GRNN
Easy to train
Can accurately approximate functions from sparse and noisy data
Note: Recent paper, Ahmed et al., claim GRNN inferior to MLP for
TSF
9 / 19
10. Methodology Requirements
Minimal human intervention
Computationally efficient for a large number of series
Good forecasting over range of data sets
10 / 19
11. Preprocessing: Outliers
Real-world time series has outliers
Outliers identified by
|x| ≥ 4max(|ma|, |mb|) (4)
where ma = median(xi−3, xi−2, xi−1) and
mb = median(xi+1, xi+2, xi+3)
If x is an outlier the value is replaced with average value of two points
before and after x
11 / 19
12. Preprocessing: Trends
Real-world time series has trends. Could be due to seasonality or other
factors.
Common approaches are curve fitting, filtering, and differencing.
Identifying trends is difficult to do algorithmically
Proposes detrending scheme:
Split series into segments. If monthly split into 12 if quarterly split into
4
Mean of historical observations within each segment is subtracted from
every historical observation in segment.
If x is an outlier the value is replaced with average value of two points
before and after x
12 / 19
13. Preprocessing: Seasonality
Identifying seasonality is typically a manual process
Author used a simple approach and defined short series as n ≤ 60 and
long n ≥ 60
Uses autocorrelation coefficients at one and two seasonal lags to
decide if seasonal
Uses a standard method for subtracting out seasonality from series
data
13 / 19
14. ANN Modeling
Aspects of ANN modeling
Spread Factor. Typically found empirically since no good analytic
approach has been found. Some guidance was given by Haykin
σ = dmax√
2n
where dmax is max distance between the training points.
Proposes spread factor be set to d50, d75, d95 (percentiles) of the
nearest distance of all training samples to rest of points.
Uses three GRNNs that all take the same input and are combined to
give the final output.
Choice of combining three GRNNs is based on previous success in
literature
14 / 19
15. ANN Modeling Cont’d
Input selection is considered one of the most important aspects in
TSF
Two general approaches: filter and wrapper
Filtering selects features based on data itself (independent of learning
algorithm)
Wrapping approaches use the learning algorithm. Wrapper typically
performs better.
Author uses contiguous lag and limits to one full season for 12 month
data.
15 / 19
16. Experimental Results
Use NN3 time-series competition dataset which has composed of
Dataset A and Dataset B
Dataset A is 111 monthly time series data drawn from empirical
business time series
Dataset B is a small subset of Dataset A which consists of 11 time
series
Error is measured using sMAPE
16 / 19
19. Discussion
Are TSF competitions just a demonstration of the no free lunch
theorem? Why is the theorem not mentioned?
Did he prove his approach was “better” or did this approach just
outperform on a particular contest?
Why doesn’t the training of the GRNN factor out outliers and
seasonality on its own? Isn’t that what training is for?
Why did he choose a GRNN? Previous papers said they perform
poorly.
What kind of bias does the detrending scheme introduce?
Paper was “rule of thumb” oriented. Is there a way to make an
automatic approach more rigorous?
19 / 19