Time series and forecasting from wikipedia

Time Series and Forecasting

Compiled by M.Barros, D.Sc.

December 12th, 2012

Source: Wikipedia

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.
PDF generated at: Tue, 11 Dec 2012 03:49:39 UTC

Contents
Articles
Time series 1
Forecasting 8
Stationary process 14
Stochastic process 16
Covariance 20
Autocovariance 24
Autocorrelation 25
Cross-correlation 31
White noise 35
Random walk 41
Brownian motion 55
Wiener process 66
Autoregressive model 74
Moving average 80
Autoregressive–moving-average model 86
Fourier transform 90
Spectral density 110
Signal processing 116
Autoregressive conditional heteroskedasticity 118
Autoregressive integrated moving average 122
Volatility (finance) 124
Stable distribution 129
Mathematical finance 137
Stochastic differential equation 141
Brownian model of financial markets 145
Stochastic volatility 151
Black–Scholes 154
Black model 168
Black–Derman–Toy model 170
Cox–Ingersoll–Ross model 172
Monte Carlo method 173

References
Article Sources and Contributors 185

Image Sources, Licenses and Contributors 188

Article Licenses
License 190

AVAILABLE FREE OF CHARGE AT:
www.mbarros.com
http://mbarrosconsultoria.blogspot.com
http://mbarrosconsultoria2.blogspot.com

Time series 1

Time series
In statistics, signal processing, pattern recognition,
econometrics, mathematical finance, Weather
forecasting, Earthquake prediction,
Electroencephalography, Control engineering and
Communications engineering a time series is a
sequence of data points, measured typically at
successive time instants spaced at uniform time
intervals. Examples of time series are the daily closing
value of the Dow Jones index or the annual flow
volume of the Nile River at Aswan. Time series
analysis comprises methods for analyzing time series
data in order to extract meaningful statistics and other
Time series: random data plus trend, with best-fit line and different
characteristics of the data. Time series forecasting is
smoothings
the use of a model to predict future values based on
previously observed values. Time series are very
frequently plotted via line charts.

Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data
analysis problems, in which there is no natural ordering of the observations (e.g. explaining people's wages by
reference to their respective education levels, where the individuals' data could be entered in any order). Time series
analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations
(e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic
model for a time series will generally reflect the fact that observations close together in time will be more closely
related than observations further apart. In addition, time series models will often make use of the natural one-way
ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather
than from future values (see time reversibility.)

Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain
methods. The former include spectral analysis and recently wavelet analysis; the latter include auto-correlation and
cross-correlation analysis.
Additionally time series analysis techniques may be divided into parametric and non-parametric methods. The
parametric approaches assume that the underlying stationary Stochastic process has a certain structure which can be
described using a small number of parameters (for example, using an autoregressive or moving average model). In
these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By
contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without
assuming that the process has any particular structure.
Additionally methods of time series analysis may be divided into linear and non-linear, univariate and multivariate.
Time series analysis can be applied to:
• real-valued, continuous data
• discrete numeric data
• discrete symbolic data (i.e. sequences of characters, such as letters and words in English language[1]).

Time series 2

Analysis
There are several types of data analysis available for time series which are appropriate for different purposes.
In the context of statistics, econometrics, quantitative finance, seismology, meteorology, geophysics the primary goal
of time series analysis is forecasting, in the context of signal processing, control engineering and communication
engineering it is used for signal detection and estimation while in the context of data mining, pattern recognition and
machine learning time series analysis can be used for clustering, classification, query by content, anomaly detection
as well as forecasting.

Exploratory analysis
The clearest way to examine a regular time series manually is with a
line chart such as the one shown for tuberculosis in the United States,
made with a spreadsheet program. The number of cases was
standardized to a rate per 100,000 and the percent change per year in
this rate was calculated. The nearly steadily dropping line shows that
the TB incidence was decreasing in most years, but the percent change
in this rate varied by as much as +/- 10%, with 'surges' in 1975 and
around the early 1990s. The use of both vertical axes allows the
comparison of two time series in one graphic. Other techniques Tuberculosis incidence US 1953-2009

include:

• Autocorrelation analysis to examine serial dependence
• Spectral analysis to examine cyclic behaviour which need not be related to seasonality. For example, sun spot
activity varies over 11 year cycles.[2][3] Other common examples include celestial phenomena, weather patterns,
neural activity, commodity prices, and economic activity.
• Separation into components representing trend, seasonality, slow and fast variation, cyclical irregular: see
decomposition of time series
• Simple properties of marginal distributions

Prediction and forecasting
• Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the
time series, representing what might happen over non-specific time-periods in the future
• Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future,
given knowledge of the most recent outcomes (forecasting).
• Forecasting on time series is usually done using automated statistical software packages and programming
languages, such as R (programming language), S (programming language), SAS (software), SPSS, Minitab and
many others.

Time series 3

Classification
• Assigning time series pattern to a specific category, for example identify a word based on series of hand
movements in Sign language
See main article: Statistical classification

Regression analysis
• Estimating future value of a signal based on its previous behavior, e.g. predict the price of AAPL stock based on
its previous price movements for that hour, day or month, or predict position of Apollo 11 spacecraft at a certain
future moment based on its current trajectory (i.e. time series of its previous locations).[4]
• Regression analysis is usually based on statistical interpretation of time series properties in time domain,
pioneered by statisticians George Box and Gwilym Jenkins in the 50s: see Box–Jenkins
See main article: Regression analysis

Signal Estimation
• This approach is based on Harmonic analysis and filtering of signals in Frequency domain using Fourier
transform, and Spectral density estimation, the development of which was significantly accelerated during World
War II by mathematician Norbert Weiner, electrical engineers Rudolf E. Kálmán, Dennis Gabor and others for
filtering signal from noise and predicting signal value at a certain point in time, see Kalman Filter, Estimation
theory and Digital Signal Processing

Models
Models for time series data can have many forms and represent different stochastic processes. When modeling
variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models,
the integrated (I) models, and the moving average (MA) models. These three classes depend linearly[5] on previous
data points. Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive
integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA)
model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under
the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an
initial "V" for "vector". An additional set of extensions of these models is available for use where the observed
time-series is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the
distinction from the multivariate case is that the forcing series may be deterministic or under the experimenter's
control. For these models, the acronyms are extended with a final "X" for "exogenous".
Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility
of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage
of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear
autoregressive exogenous models.
Among other types of non-linear time series models, there are models to represent the changes of variance along
time (heteroskedasticity). These models represent autoregressive conditional heteroskedasticity (ARCH) and the
collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc.).
Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in
contrast to other possible representations of locally varying variability, where the variability might be modelled as
being driven by a separate time-varying process, as in a doubly stochastic model.
In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets
and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution)
techniques decompose a given time series, attempting to illustrate time dependence at multiple scales. See also

Time series 4

Markov switching multifractal (MSMF) techniques for modeling volatility evolution.

Notation
A number of different notations are in use for time-series analysis. A common notation specifying a time series X
that is indexed by the natural numbers is written
X = {X1, X2, ...}.
Another common notation is
Y = {Yt: t ∈ T},
where T is the index set.

Conditions
There are two sets of conditions under which much of the theory is built:
• Stationary process
• Ergodic process
However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order
stationarity. Both models and applications can be developed under each of these conditions, although the models in
the latter case might be considered as only partly specified.
In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary.
Situations where the amplitudes of frequency components change with time can be dealt with in time-frequency
analysis which makes use of a time–frequency representation of a time-series or signal.[6]

Models
The general representation of an autoregressive model, well known as AR(p), is

where the term εt is the source of randomness and is called white noise. It is assumed to have the following
characteristics:
•
•
•
With these assumptions, the process is specified up to second-order moments and, subject to conditions on the
coefficients, may be second-order stationary.
If the noise also has a normal distribution, it is called normal or Gaussian white noise. In this case, the AR process
may be strictly stationary, again subject to conditions on the coefficients.
Tools for investigating time-series data include:
• Consideration of the autocorrelation function and the spectral density function (also cross-correlation functions
and cross-spectral density functions)
• Scaled cross- and auto-correlation functions[7]
• Performing a Fourier transform to investigate the series in the frequency domain
• Use of a filter to remove unwanted noise
• Principal components analysis (or empirical orthogonal function analysis)
• Singular spectrum analysis
• "Structural" models:

Time series 5

• General State Space Models
• Unobserved Components Models
• Machine Learning
• Artificial neural networks
• Support Vector Machine
• Fuzzy Logic
• Hidden Markov model
• Control chart
• Shewhart individuals control chart
• CUSUM chart
• EWMA chart
• Detrended fluctuation analysis
• Dynamic time warping
• Dynamic Bayesian network
• Time-frequency analysis techniques:
• Fast Fourier Transform
• Continuous wavelet transform
• Short-time Fourier transform
• Chirplet transform
• Fractional Fourier transform
• Chaotic analysis
• Correlation dimension
• Recurrence plots
• Recurrence quantification analysis
• Lyapunov exponents
• Entropy encoding

Measures
Time series metrics or features that can be used for time series classification or regression analysis[8]:
• Univariate linear measures
• Moment (mathematics)
• Spectral band power
• Spectral edge frequency
• Accumulated Energy (signal processing)
• Characteristics of the autocorrelation function
• Hjorth parameters
• FFT parameters
• Autoregressive model parameters
• Univariate non-linear measures
• Measures based on the correlation sum
• Correlation dimension
• Correlation integral
• Correlation density
• Correlation entropy

Time series 6

• Approximate Entropy[9]
• Sample Entropy
• Fourier entropy
• Wavelet entropy
• Rényi entropy
• Higher-order methods
• Marginal predictability
• Dynamical similarity index
• State space dissimilarity measures
• Lyapunov exponent
• Permutation methods
• Local flow
• Other univariate measures
• Algorithmic complexity
• Kolmogorov complexity estimates
• Hidden Markov Model states
• Surrogate time series and surrogate correction
• Loss of recurrence (degree of non-stationarity)
• Bivariate linear measures
• Maximum linear cross-correlation
• Linear Coherence (signal processing)
• Bivariate non-linear measures
• Non-linear interdependence
• Dynamical Entrainment (physics)
• Measures for Phase synchronization
• Similarity measures[10]:
• Dynamic Time Warping
• Hidden Markov Models
• Edit distance
• Total correlation
• Newey–West estimator
• Prais-Winsten transformation
• Data as Vectors in a Metrizable Space
• Minkowski distance
• Mahalanobis distance
• Data as Time Series with Envelopes
• Global Standard Deviation
• Local Standard Deviation
• Windowed Standard Deviation
• Data Interpreted as Stochastic Series
• Pearson product-moment correlation coefficient
• Spearman's rank correlation coefficient
• Data Interpreted as a Probability Distribution Function
• Kolmogorov-Smirnov test
• Cramér-von Mises criterion

Time series 7

References
[1] Lin, Jessica and Keogh, Eamonn and Lonardi, Stefano and Chiu, Bill. A symbolic representation of time series, with implications for
streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003.
url: http:/ / doi. acm. org/ 10. 1145/ 882082. 882086
[2] Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley.
[3] Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Cliffs, NJ: Prentice Hall.
[4] Lawson, Charles L., Hanson, Richard, J. (1987). Solving Least Squares Problems. Society for Industrial and Applied Mathematics, 1987.
[5] Gershenfeld, N. (1999). The nature of mathematical modeling. p.205-08
[6] Boashash, B. (ed.), (2003) Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, Elsevier Science, Oxford, 2003
ISBN ISBN 0-08-044335-4
[7] Nikolić D, Muresan RC, Feng W, Singer W (2012) Scaled correlation analysis: a better way to compute a cross-correlogram. European
Journal of Neuroscience, pp. 1–21, doi:10.1111/j.1460-9568.2011.07987.x http:/ / www. danko-nikolic. com/ wp-content/ uploads/ 2012/ 03/
Scaled-correlation-analysis. pdf
[8] Mormann, Florian and Andrzejak, Ralph G. and Elger, Christian E. and Lehnertz, Klaus. 'Seizure prediction: the long and winding road.
Brain, 2007,130 (2): 314-33.url : http:/ / brain. oxfordjournals. org/ content/ 130/ 2/ 314. abstract
[9] Land, Bruce and Elias, Damian. Measuring the "Complexity" of a time series. URL: http:/ / www. nbb. cornell. edu/ neurobio/ land/
PROJECTS/ Complexity/
[10] Ropella, G.E.P.; Nag, D.A.; Hunt, C.A.; , "Similarity measures for automated comparison of in silico and in vitro experimental results,"
Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE , vol.3, no., pp.
2933- 2936 Vol.3, 17-21 Sept. 2003 doi: 10.1109/IEMBS.2003.1280532 URL: http:/ / ieeexplore. ieee. org/ stamp/ stamp. jsp?tp=&
arnumber=1280532& isnumber=28615

Further reading
• Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley.
• Box, George; Jenkins, Gwilym (1976), Time series analysis: forecasting and control, rev. ed., Oakland,
California: Holden-Day
• Brillinger, D. R. (1975). Time series: Data analysis and theory. New York: Holt, Rinehart. & Winston.
• Brigham, E. O. (1974). The fast Fourier transform. Englewood Cliffs, NJ: Prentice-Hall.
• Elliott, D. F., & Rao, K. R. (1982). Fast transforms: Algorithms, analyses, applications. New York: Academic
Press.
• Gershenfeld, Neil (2000), The nature of mathematical modeling, Cambridge: Cambridge Univ. Press,
ISBN 978-0-521-57095-4, OCLC 174825352
• Hamilton, James (1994), Time Series Analysis, Princeton: Princeton Univ. Press, ISBN 0-691-04289-6
• Jenkins, G. M., & Watts, D. G. (1968). Spectral analysis and its applications. San Francisco: Holden-Day.
• Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press. ISBN 978-0-12-564901-8
• Shasha, D. (2004), High Performance Discovery in Time Series, Berlin: Springer, ISBN 0-387-00857-8
• Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Cliffs, NJ: Prentice Hall.
• Wiener, N.(1964). Extrapolation, Interpolation, and Smoothing of Stationary Time Series.The MIT Press.
• Wei, W. W. (1989). Time series analysis: Univariate and multivariate methods. New York: Addison-Wesley.
• Weigend, A. S., and N. A. Gershenfeld (Eds.) (1994) Time Series Prediction: Forecasting the Future and
Understanding the Past. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series
Analysis (Santa Fe, May 1992) MA: Addison-Wesley.
• Durbin J., and Koopman S.J. (2001) Time Series Analysis by State Space Methods. Oxford University Press.

Time series 8

External links
• A First Course on Time Series Analysis (http://statistik.mathematik.uni-wuerzburg.de/timeseries/) - an open
source book on time series analysis with SAS
• Introduction to Time series Analysis (Engineering Statistics Handbook) (http://www.itl.nist.gov/div898/
handbook/pmc/section4/pmc4.htm) - a practical guide to Time series analysis
• MATLAB Toolkit for Computation of Multiple Measures on Time Series Data Bases (http://www.jstatsoft.org/
v33/i05/paper)

Forecasting
Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been
observed. A commonplace example might be estimation of some variable of interest at some specified future date.
Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series,
cross-sectional or longitudinal data, or alternatively to less formal judgemental methods. Usage can differ between
areas of application: for example, in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for
estimates of values at certain specific future times, while the term "prediction" is used for more general estimates,
such as the number of times floods will occur over a long period.
Risk and uncertainty are central to forecasting and prediction; it is generally considered good practice to indicate the
degree of uncertainty attaching to forecasts. In any case, the data must be up to date in order for the forecast to be as
accurate as possible.[1]
Although quantitative analysis can be very precise, it is not always appropriate. Some experts in the field of
forecasting have advised against the use of mean square error to compare forecasting methods.[2]

Categories of forecasting methods

Qualitative vs. quantitative methods
Qualitative forecasting techniques are subjective, based on the opinion and judgment of consumers, experts;
appropriate when past data is not available. It is usually applied to intermediate-long range decisions. Examples of
qualitative forecasting methods are: informed opinion and judgment, the Delphi method, market research, historical
life-cycle analogy.
Quantitative forecasting models are used to estimate future demands as a function of past data; appropriate when past
data are available. The method is usually applied to short-intermediate range decisions. Examples of quantitative
forecasting methods are: last period demand, simple and weighted moving averages (N-Period), simple exponential
smoothing, multiplicative seasonal indexes.

Forecasting 9

Naïve approach
Naïve forecasts are the most cost-effective and efficient objective forecasting model, and provide a benchmark
against which more sophisticated models can be compared. For stable time series data, this approach says that the
forecast for any period equals the previous period's actual value.

Reference class forecasting
Reference class forecasting was developed by Oxford professor Bent Flyvbjerg to eliminate or reduce bias in
forecasting by focusing on distributional information about past, similar outcomes to that being forecasted.[3] Daniel
Kahneman, Nobel Prize winner in economics, calls Flyvbjerg's counsel to use reference class forecasting to de-bias
forecasts, "the single most important piece of advice regarding how to increase accuracy in forecasting.”[4]

Time series methods
Time series methods use historical data as the basis of estimating future outcomes.
• Moving average
• Weighted moving average
• Kalman filtering
• Exponential smoothing
• Autoregressive moving average (ARMA)
• Autoregressive integrated moving average (ARIMA)
e.g. Box-Jenkins
• Extrapolation
• Linear prediction
• Trend estimation
• Growth curve

Causal / econometric forecasting methods
Some forecasting methods use the assumption that it is possible to identify the underlying factors that might
influence the variable that is being forecast. For example, including information about weather conditions might
improve the ability of a model to predict umbrella sales. This is a model of seasonality which shows a regular pattern
of up and down fluctuations. In addition to weather, seasonality can also be due to holidays and customs such as
predicting that sales in college football apparel will be higher during football season as opposed to the off season.[5]
Casual forecasting methods are also subject to the discretion of the forecaster. There are several informal methods
which do not have strict algorithms, but rather modest and unstructured guidance. One can forecast based on, for
example, linear relationships. If one variable is linearly related to the other for a long enough period of time, it may
be beneficial to predict such a relationship in the future. This is quite different from the aforementioned model of
seasonality whose graph would more closely resemble a sine or cosine wave. The most important factor when
performing this operation is using concrete and substantiated data. Forecasting off of another forecast produces
inconclusive and possibly erroneous results.
Such methods include:
• Regression analysis includes a large group of methods that can be used to predict future values of a variable using
information about other variables. These methods include both parametric (linear or non-linear) and
non-parametric techniques.
• Autoregressive moving average with exogenous inputs (ARMAX)[6]

Forecasting 10

Judgmental methods
Judgmental forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates.
• Composite forecasts
• Delphi method
• Forecast by analogy
• Scenario building
• Statistical surveys
• Technology forecasting

Artificial intelligence methods
• Artificial neural networks
• Group method of data handling
• Support vector machines
Often these are done today by specialized programs loosely labeled
• Data mining
• Machine Learning
• Pattern Recognition

Other methods
• Simulation
• Prediction market
• Probabilistic forecasting and Ensemble forecasting

Forecasting accuracy
The forecast error is the difference between the actual value and the forecast value for the corresponding period.

where E is the forecast error at period t, Y is the actual value at period t, and F is the forecast for period t.
Measures of aggregate error:

Mean absolute error (MAE)

Mean Absolute Percentage Error (MAPE)

Mean Absolute Deviation (MAD)

Percent Mean Absolute Deviation (PMAD)

Mean squared error (MSE)

Root Mean squared error (RMSE)

Forecast skill (SS)

Average of Errors (E)

Forecasting 11

Business forecasters and practitioners sometimes use different terminology in the industry. They refer to the PMAD
as the MAPE, although they compute this as a volume weighted MAPE. For more information see Calculating
demand forecast accuracy.
Reference class forecasting was developed to increase forecasting accuracy by framing the forecasting problem so as
to take into account available distributional information.[7] Daniel Kahneman, winner of the Nobel Prize in
economics, calls the use of reference class forecasting "the single most important piece of advice regarding how to
increase accuracy in forecasting.”[8] Forecasting accuracy, in contrary to belief, cannot be increased by the addition
of experts in the subject area relevant to the phenomenon to be forecast.[9]
See also
• Calculating demand forecast accuracy
• Consensus forecasts
• Forecast error
• Predictability
• Prediction intervals, similar to confidence intervals
• Reference class forecasting

Applications of forecasting
The process of climate change and increasing energy prices has led to the usage of Egain Forecasting of buildings.
The method uses forecasting to reduce the energy needed to heat the building, thus reducing the emission of
greenhouse gases. Forecasting is used in the practice of Customer Demand Planning in every day business
forecasting for manufacturing companies. Forecasting has also been used to predict the development of conflict
situations. Experts in forecasting perform research that use empirical results to gauge the effectiveness of certain
forecasting models.[10] Research has shown that there is little difference between the accuracy of forecasts performed
by experts knowledgeable of the conflict situation of interest and that performed by individuals who knew much
less.[11]
Similarly, experts in some studies argue that role thinking does not contribute to the accuracy of the forecast.[12] The
discipline of demand planning, also sometimes referred to as supply chain forecasting, embraces both statistical
forecasting and a consensus process. An important, albeit often ignored aspect of forecasting, is the relationship it
holds with planning. Forecasting can be described as predicting what the future will look like, whereas planning
predicts what the future should look like.[13][14] There is no single right forecasting method to use. Selection of a
method should be based on your objectives and your conditions (data etc.).[15] A good place to find a method, is by
visiting a selection tree. An example of a selection tree can be found here.[16] Forecasting has application in many
situations:
• Supply chain management - Forecasting can be used in Supply Chain Management to make sure that the right
product is at the right place at the right time. Accurate forecasting will help retailers reduce excess inventory and
therefore increase profit margin. Studies have shown that extrapolations are the least accurate, while company
earnings forecasts are the most reliable.[17] Accurate forecasting will also help them meet consumer demand.
• Economic forecasting
• Earthquake prediction
• Egain Forecasting
• Land use forecasting
• Player and team performance in sports
• Political Forecasting
• Product forecasting
• Sales Forecasting
• Technology forecasting

Forecasting 12

• Telecommunications forecasting
• Transport planning and Transportation forecasting
• Weather forecasting, Flood forecasting and Meteorology

Limitations
As proposed by Edward Lorenz in 1963, long range weather forecasts, those made at a range of two weeks or more,
are impossible to definitively predict the state of the atmosphere, owing to the chaotic nature of the fluid dynamics
equations involved. Extremely small errors in the initial input, such as temperatures and winds, within numerical
models doubles every five days.[18]

References
[1] Scott Armstrong, Fred Collopy, Andreas Graefe and Kesten C. Green (2010 (last updated)). "Answers to Frequently Asked Questions" (http:/
/ qbox. wharton. upenn. edu/ documents/ mktg/ research/ FAQ. pdf). .
[2] J. Scott Armstrong and Fred Collopy (1992). "Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons" (http:/ /
marketing. wharton. upenn. edu/ ideas/ pdf/ armstrong2/ armstrong-errormeasures-empirical. pdf). International Journal of Forecasting 8:
69–80. .
[3] Flyvbjerg, B. (2008). "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice" (http:/ /
www. sbs. ox. ac. uk/ centres/ bt/ Documents/ Curbing Optimism Bias and Strategic Misrepresentation. pdf). European Planning Studies 16
(1): 3–21. .
[4] Daniel Kahneman, 2011, Thinking, Fast and Slow (New York: Farrar, Straus and Giroux), p. 251
[5] Nahmias, Steven (2009). Production and Operations Analysis.
[6] Ellis, Kimberly (2008). Production Planning and Inventory Control Virginia Tech. McGraw Hill. ISBN 978-0-390-87106-0.
[7] Flyvbjerg, B. (2008) "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice." (http:/ /
www. sbs. ox. ac. uk/ centres/ bt/ Documents/ Curbing Optimism Bias and Strategic Misrepresentation. pdf) European Planning Studies,16
(1), 3-21.]
[8] Daniel Kahneman (2011) Thinking, Fast and Slow (New York: Farrar, Straus and Giroux) (p. 251)
[9] J. Scott Armstrong (1980). "The Seer-Sucker Theory: The Value of Experts in Forecasting" (http:/ / www. forecastingprinciples. com/
paperpdf/ seersucker. pdf). Technology Review: 16–24. .
[10] J. Scott Armstrong, Kesten C. Green and Andreas Graefe (2010). "Answers to Frequently Asked Questions" (http:/ / qbox. wharton. upenn.
edu/ documents/ mktg/ research/ FAQ. pdf). .
[11] Kesten C. Greene and J. Scott Armstrong (2007). "The Ombudsman: Value of Expertise for Forecasting Decisions in Conflicts" (http:/ /
marketing. wharton. upenn. edu/ documents/ research/ Value of expertise. pdf). Interfaces (INFORMS) 0: 1–12. .
[12] Kesten C. Green and J. Scott Armstrong (1975). "Role thinking: Standing in other people’s shoes to forecast decisions in conflicts" (http:/ /
www. forecastingprinciples. com/ paperpdf/ Escalation Bias. pdf). Role thinking: Standing in other people’s shoes to forecast decisions in
conflicts 39: 111–116. .
[13] "FAQ" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=3& Itemid=3).
Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28.
[14] Kesten C. Greene and J. Scott Armstrong. 2015.pdf "Structured analogies for forecasting" (http:/ / www. qbox. wharton. upenn. edu/
documents/ mktg/ research/ INTFOR3581 - Publication%) (PDF). qbox.wharton.upenn.edu. 2015.pdf.
[15] "FAQ" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=3& Itemid=3#D.
_Choosing_the_best_method). Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28.
[16] "Selection Tree" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=17& Itemid=17).
Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28.
[17] J. Scott Armstrong (1983). "Relative Accuracy of Judgmental and Extrapolative Methods in Forecasting Annual Earnings" (http:/ / www.
forecastingprinciples. com/ paperpdf/ Monetary Incentives. pdf). Journal of Forecasting 2: 437–447. .
[18] Cox, John D. (2002). Storm Watchers. John Wiley & Sons, Inc.. pp. 222–224. ISBN 0-471-38108-X.

• Armstrong, J. Scott (ed.) (2001) (in English). Principles of forecasting: a handbook for researchers and
practitioners. Norwell, Massachusetts: Kluwer Academic Publishers. ISBN 0-7923-7930-6.
• Flyvbjerg, Bent, 2008, "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class
Forecasting in Practice," European Planning Studies, vol. 16, no. 1, January, pp. 3-21. (http://www.sbs.ox.ac.
uk/centres/bt/Documents/Curbing Optimism Bias and Strategic Misrepresentation.pdf)
• Ellis, Kimberly (2010) (in English). Production Planning and Inventory Control. McGraw-Hill.
ISBN 0-412-03471-9.

Forecasting 13

• Geisser, Seymour (1 June 1993) (in English). Predictive Inference: An Introduction. Chapman & Hall, CRC
Press. ISBN 0-390-87106-0.
• Gilchrist, Warren (1976) (in English). Statistical Forecasting. London: John Wiley & Sons. ISBN 0-471-99403-0.
• Hyndman, R.J., Koehler, A.B (2005) "Another look at measures of forecast accuracy" (http://www.
robjhyndman.com/papers/mase.pdf), Monash University note.
• Makridakis, Spyros; Wheelwright, Steven; Hyndman, Rob J. (1998) (in English). Forecasting: methods and
applications (http://www.robjhyndman.com/forecasting/). New York: John Wiley & Sons.
ISBN 0-471-53233-9.
• Kress, George J.; Snyder, John (30 May 1994) (in English). Forecasting and market analysis techniques: a
practical approach. Westport, Connecticut, London: Quorum Books. ISBN 0-89930-835-X.
• Rescher, Nicholas (1998) (in English). Predicting the future: An introduction to the theory of forecasting. State
University of New York Press. ISBN 0-7914-3553-9.
• Taesler, R. (1990/91) Climate and Building Energy Management. Energy and Buildings, Vol. 15-16, pp 599 –
608.
• Turchin, P. (2007) "Scientific Prediction in Historical Sociology: Ibn Khaldun meets Al Saud". In: History &
Mathematics: Historical Dynamics and Development of Complex Societies. (http://edurss.ru/cgi-bin/db.
pl?cp=&page=Book&id=53185&lang=en&blang=en&list=Found) Moscow: KomKniga. ISBN
978-5-484-01002-8
• Sasic Kaligasidis, A et al. (2006) Upgraded weather forecast control of building heating systems. p. 951 ff in
Research in Building Physics and Building Engineering Paul Fazio (Editorial Staff), ISBN 0-415-41675-2
• United States Patent 6098893 Comfort control system incorporating weather forecast data and a method for
operating such a system (Inventor Stefan Berglund)

External links
• Forecasting Principles: "Evidence-based forecasting" (http://www.forecastingprinciples.com)
• International Institute of Forecasters (http://www.forecasters.org)
• Introduction to Time series Analysis (Engineering Statistics Handbook) (http://www.itl.nist.gov/div898/
handbook/pmc/section4/pmc4.htm) - A practical guide to Time series analysis and forecasting
• Time Series Analysis (http://www.statsoft.com/textbook/sttimser.html)
• Global Forecasting with IFs (http://www.ifs.du.edu)
• Earthquake Electromagnetic Precursor Research (http://www.quakefinder.com)


Stationary process
In mathematics, a stationary process (or strict(ly) stationary process or strong(ly) stationary process) is a
stochastic process whose joint probability distribution does not change when shifted in time or space. Consequently,
parameters such as the mean and variance, if they exist, also do not change over time or position.
Stationarity is used as a tool in time series analysis, where the raw data are often transformed to become stationary;
for example, economic data are often seasonal and/or dependent on a non-stationary price level. An important type
of non-stationary process that does not include a trend-like behavior is the cyclostationary process.
Note that a "stationary process" is not the same thing as a "process with a stationary distribution". Indeed there are
further possibilities for confusion with the use of "stationary" in the context of stochastic processes; for example a
"time-homogeneous" Markov chain is sometimes said to have "stationary transition probabilities". On the other
hand, all stationary Markov random processes are time-homogeneous.

Definition
Formally, let be a stochastic process and let represent the cumulative distribution
function of the joint distribution of at times . Then, is said to be stationary if,
for all , for all , and for all ,

Since does not affect , is not a function of time.

Examples
As an example, white noise is stationary. The sound of a cymbal
clashing, if hit only once, is not stationary because the acoustic power
of the clash (and hence its variance) diminishes with time. However, it
would be possible to invent a stochastic process describing when the
cymbal is hit, such that the overall response would form a stationary
process.

An example of a discrete-time stationary process where the sample
space is also discrete (so that the random variable may take one of N
possible values) is a Bernoulli scheme. Other examples of a
discrete-time stationary process with continuous sample space include
some autoregressive and moving average processes which are both
subsets of the autoregressive moving average model. Models with a Two simulated time series processes, one
non-trivial autoregressive component may be either stationary or stationary the other non-stationary. The
Augmented Dickey–Fuller test is reported for
non-stationary, depending on the parameter values, and important
each process and non-stationarity cannot be
non-stationary special cases are where unit roots exist in the model. rejected for the second process.

Let Y be any scalar random variable, and define a time-series { Xt }, by
.
Then { Xt } is a stationary time series, for which realisations consist of a series of constant values, with a different
constant value for each realisation. A law of large numbers does not apply on this case, as the limiting value of an
average from a single realisation takes the random value determined by Y, rather than taking the expected value of Y.
As a further example of a stationary process for which any single realisation has an apparently noise-free structure,
let Y have a uniform distribution on (0,2π] and define the time series { Xt } by


Then { Xt } is strictly stationary.

Weaker forms of stationarity

Weak or wide-sense stationarity
A weaker form of stationarity commonly employed in signal processing is known as weak-sense stationarity,
wide-sense stationarity (WSS) or covariance stationarity. WSS random processes only require that 1st moment
and covariance do not vary with respect to time. Any strictly stationary process which has a mean and a covariance is
also WSS.
So, a continuous-time random process x(t) which is WSS has the following restrictions on its mean function

and autocovariance function

The first property implies that the mean function mx(t) must be constant. The second property implies that the
covariance function depends only on the difference between and and only needs to be indexed by one variable
rather than two variables. Thus, instead of writing,

the notation is often abbreviated and written as:

This also implies that the autocorrelation depends only on , since

When processing WSS random signals with linear, time-invariant (LTI) filters, it is helpful to think of the correlation
function as a linear operator. Since it is a circulant operator (depends only on the difference between the two
arguments), its eigenfunctions are the Fourier complex exponentials. Additionally, since the eigenfunctions of LTI
operators are also complex exponentials, LTI processing of WSS random signals is highly tractable—all
computations can be performed in the frequency domain. Thus, the WSS assumption is widely employed in signal
processing algorithms.

Second-order stationarity
The case of second-order stationarity arises when the requirements of strict stationarity are only applied to pairs of
random variables from the time-series. The definition of second order stationarity can be generalized to Nth order
(for finite N) and strict stationary means stationary of all orders.
A process is second order stationary if the first and second order density functions satisfy

for all , , and . Such a process will be wide sense stationary if the mean and correlation functions are finite.
A process can be wide sense stationary without being second order stationary.


Other terminology
The terminology used for types of stationarity other than strict stationarity can be rather mixed. Some examples
follow.
• Priestley[1][2] uses stationary up to order m if conditions similar to those given here for wide sense
stationarity apply relating to moments up to order m. Thus wide sense stationarity would be equivalent to
"stationary to order 2", which is different from the definition of second-order stationarity given here.
• Honarkhah[3] also uses the assumption of stationarity in the context of multiple-point geostatistics, where
higher n-point statistics are assumed to be stationary in the spatial domain.

References
[1] Priestley, M.B. (1981) Spectral Analysis and Time Series, Academic Press. ISBN 0-12-564922-3
[2] Priestley, M.B. (1988) Non-linear and Non-stationary Time Series Analysis, Academic Press. ISBN 0-12-564911-8
[3] Honarkhah, M and Caers, J, 2010, Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling (http:/ / dx. doi. org/ 10. 1007/
s11004-010-9276-7), Mathematical Geosciences, 42: 487 - 517

External links
• Spectral decomposition of a random function (Springer) (http://eom.springer.de/s/s086360.htm)

Stochastic process
In probability theory, a stochastic process (pronunciation: /stəʊˈkæstɪk/), or sometimes random process (widely used)
is a collection of random variables; this is often used to represent the evolution of some random value, or system,
over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of
describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary
differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or
starting point) is known, there are several (often infinitely many) directions in which the process may evolve.
In the simple case of discrete time, a stochastic process amounts to a sequence of random variables known as a time
series (for example, see Markov chain). Another basic type of a stochastic process is a random field, whose domain
is a region of space, in other words, a random function whose arguments are drawn from a range of continuously
changing values. One approach to stochastic processes treats them as functions of one or several deterministic
arguments (inputs, in most cases regarded as time) whose values (outputs) are random variables: non-deterministic
(single) quantities which have certain probability distributions. Random variables corresponding to various times (or
points, in the case of random fields) may be completely different. The main requirement is that these different
random quantities all have the same type. Type refers to the codomain of the function. Although the random values
of a stochastic process at different times may be independent random variables, in most commonly considered
situations they exhibit complicated statistical correlations.
Familiar examples of processes modeled as stochastic time series include stock market and exchange rate
fluctuations, signals such as speech, audio and video, medical data such as a patient's EKG, EEG, blood pressure or
temperature, and random movement such as Brownian motion or random walks. Examples of random fields include
static images, random terrain (landscapes), wind waves or composition variations of a heterogeneous material.


Formal definition and basic properties

Definition
Given a probability space and a measurable space , an S-valued stochastic process is a
collection of S-valued random variables on , indexed by a totally ordered set T ("time"). That is, a stochastic
process X is a collection

where each is an S-valued random variable on . The space S is then called the state space of the process.

Finite-dimensional distributions
Let X be an S-valued stochastic process. For every finite subset , the k-tuple
is a random variable taking values in . The distribution
of this random variable is a probability measure on . This is called a finite-dimensional distribution of X.
Under suitable topological restrictions, a suitably "consistent" collection of finite-dimensional distributions can be
used to define a stochastic process (see Kolmogorov extension in the next section).

Construction
In the ordinary axiomatization of probability theory by means of measure theory, the problem is to construct a
sigma-algebra of measurable subsets of the space of all functions, and then put a finite measure on it. For this
purpose one traditionally uses a method called Kolmogorov extension.[1]
There is at least one alternative axiomatization of probability theory by means of expectations on C-star algebras of
random variables. In this case the method goes by the name of Gelfand–Naimark–Segal construction.
This is analogous to the two approaches to measure and integration, where one has the choice to construct measures
of sets first and define integrals later, or construct integrals first and define set measures as integrals of characteristic
functions.

Kolmogorov extension
The Kolmogorov extension proceeds along the following lines: assuming that a probability measure on the space of
all functions exists, then it can be used to specify the joint probability distribution of
finite-dimensional random variables . Now, from this n-dimensional probability distribution we
can deduce an (n − 1)-dimensional marginal probability distribution for . Note that the
obvious compatibility condition, namely, that this marginal probability distribution be in the same class as the one
derived from the full-blown stochastic process, is not a requirement. Such a condition only holds, for example, if the
stochastic process is a Wiener process (in which case the marginals are all gaussian distributions of the exponential
class) but not in general for all stochastic processes. When this condition is expressed in terms of probability
densities, the result is called the Chapman–Kolmogorov equation.
The Kolmogorov extension theorem guarantees the existence of a stochastic process with a given family of
finite-dimensional probability distributions satisfying the Chapman–Kolmogorov compatibility condition.


Separability, or what the Kolmogorov extension does not provide
Recall that in the Kolmogorov axiomatization, measurable sets are the sets which have a probability or, in other
words, the sets corresponding to yes/no questions that have a probabilistic answer.
The Kolmogorov extension starts by declaring to be measurable all sets of functions where finitely many coordinates
are restricted to lie in measurable subsets of . In other words, if a yes/no question about f
can be answered by looking at the values of at most finitely many coordinates, then it has a probabilistic answer.
In measure theory, if we have a countably infinite collection of measurable sets, then the union and intersection of all
of them is a measurable set. For our purposes, this means that yes/no questions that depend on countably many
coordinates have a probabilistic answer.
The good news is that the Kolmogorov extension makes it possible to construct stochastic processes with fairly
arbitrary finite-dimensional distributions. Also, every question that one could ask about a sequence has a
probabilistic answer when asked of a random sequence. The bad news is that certain questions about functions on a
continuous domain don't have a probabilistic answer. One might hope that the questions that depend on uncountably
many values of a function be of little interest, but the really bad news is that virtually all concepts of calculus are of
this sort. For example:
1. boundedness
2. continuity
3. differentiability
all require knowledge of uncountably many values of the function.
One solution to this problem is to require that the stochastic process be separable. In other words, that there be some
countable set of coordinates whose values determine the whole random function f.
The Kolmogorov continuity theorem guarantees that processes that satisfy certain constraints on the moments of
their increments have continuous modifications and are therefore separable.

Filtrations
Given a probability space , a filtration is a weakly increasing collection of sigma-algebras on ,
, indexed by some totally ordered set T, and bounded above by . I.e. for with s < t,
.
A stochastic process X on the same time set T is said to be adapted to the filtration if, for every , is
[2]
-measurable.

The natural filtration
Given a stochastic process , the natural filtration for (or induced by) this process is the
filtration where is generated by all values of up to time s = t. I.e.
.
A stochastic process is always adapted to its natural filtration.

Classification
Stochastic processes can be classified according to the cardinality of its index set (usually interpreted as time) and
state space.


Discrete time and discrete states
If both and belong to , the set of natural numbers, then we have models which lead to Markov chains. For
example:
(a) If means the bit (0 or 1) in position of a sequence of transmitted bits, then can be modelled as a
Markov chain with 2 states. This leads to the error correcting viterbi algorithm in data transmission.
(b) If means the combined genotype of a breeding couple in the th generation in a inbreeding model, it can be
shown that the proportion of heterozygous individuals in the population approaches zero as goes to ∞.[3]

Continuous time and continuous state space
The paradigm of continuous stochastic process is that of the Wiener process. In its original form the problem was
concerned with a particle floating on a liquid surface, receiving "kicks" from the molecules of the liquid. The particle
is then viewed as being subject to a random force which, since the molecules are very small and very close together,
is treated as being continuous and, since the particle is constrained to the surface of the liquid by surface tension, is
at each point in time a vector parallel to the surface. Thus the random force is described by a two component
stochastic process; two real-valued random variables are associated to each point in the index set, time, (note that
since the liquid is viewed as being homogeneous the force is independent of the spatial coordinates) with the domain
of the two random variables being R, giving the x and y components of the force. A treatment of Brownian motion
generally also includes the effect of viscosity, resulting in an equation of motion known as the Langevin equation.[4]

Discrete time and continuous state space
If the index set of the process is N (the natural numbers), and the range is R (the real numbers), there are some
natural questions to ask about the sample sequences of a process {Xi}i ∈ N, where a sample sequence is {Xi(ω)}i ∈ N.
1. What is the probability that each sample sequence is bounded?
2. What is the probability that each sample sequence is monotonic?
3. What is the probability that each sample sequence has a limit as the index approaches ∞?
4. What is the probability that the series obtained from a sample sequence from converges?
5. What is the probability distribution of the sum?
Main applications of discrete time continuous state stochastic models include Markov chain Monte Carlo (MCMC)
and the analysis of Time Series.

Continuous time and discrete state space
Similarly, if the index space I is a finite or infinite interval, we can ask about the sample paths {Xt(ω)}t ∈ I
1. What is the probability that it is bounded/integrable/continuous/differentiable...?
2. What is the probability that it has a limit at ∞
3. What is the probability distribution of the integral?

References
[1] Karlin, Samuel & Taylor, Howard M. (1998). An Introduction to Stochastic Modeling, Academic Press. ISBN 0-12-684887-4.
[2] Durrett, Rick. Probability: Theory and Examples. Fourth Edition. Cambridge: Cambridge University Press, 2010.
[3] Allen, Linda J. S., An Introduction to Stochastic Processes with Applications to Biology, 2th Edition, Chapman and Hall, 2010, ISBN
1-4398-1882-7
[4] Gardiner, C. Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences, 3th Edition, Springer, 2004, ISBN
3540208828


Further reading
• Wio, S. Horacio, Deza, R. Roberto & Lopez, M. Juan (2012). An Introduction to Stochastic Processes and
Nonequilibrium Statistical Physics. World Scientific Publishing. ISBN 978-981-4374-78-1.
• Papoulis, Athanasios & Pillai, S. Unnikrishna (2001). Probability, Random Variables and Stochastic Processes.
McGraw-Hill Science/Engineering/Math. ISBN 0-07-281725-9.
• Boris Tsirelson. "Lecture notes in Advanced probability theory" (http://www.webcitation.org/5cfvVZ4Kd).
• Doob, J. L. (1953). Stochastic Processes. Wiley.
• Klebaner, Fima C. (2011). Introduction to Stochastic Calculus With Applications. Imperial College Press.
ISBN 1-84816-831-4.
• Bruce Hajek (July 2006). "An Exploration of Random Processes for Engineers" (http://www.ifp.uiuc.edu/
~hajek/Papers/randomprocesses.html).
• "An 8 foot tall Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the
beans dropping through the quincunx pattern" (http://www.youtube.com/watch?v=AUSKTk9ENzg). Index
Funds Advisors IFA.com (http://www.ifa.com).
• "Popular Stochastic Processes used in Quantitative Finance" (http://www.sitmo.com/article/
popular-stochastic-processes-in-finance/). sitmo.com.
• "Addressing Risk and Uncertainty" (http://www.goldsim.com/Content.asp?PageID=455).

Covariance
In probability theory and statistics, covariance is a measure of how much two random variables change together. If
the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds
for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.[1] In the opposite
case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables
tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency
in the linear relationship between the variables. The magnitude of the covariance is not that easy to interpret. The
normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of
the linear relation.
A distinction must be made between (1) the covariance of two random variables, which is a population parameter
that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which serves as an
estimated value of the parameter.

Definition
The covariance between two jointly distributed real-valued random variables x and y with finite second moments is
defined[2] as

where E[x] is the expected value of x, also known as the mean of x. By using the linearity property of expectations,
this can be simplified to

For random vectors and (of dimension m and n respectively) the m×n covariance matrix is equal to

Covariance 21

where mT is the transpose of the vector (or matrix) m.
The (i,j)-th element of this matrix is equal to the covariance Cov(xi, yj) between the i-th scalar component of x and
the j-th scalar component of y. In particular, Cov(y, x) is the transpose of Cov(x, y).

For a vector of m jointly distributed random variables with finite second moments, its
covariance matrix is defined as

Random variables whose covariance is zero are called uncorrelated.
The units of measurement of the covariance Cov(x, y) are those of x times those of y. By contrast, correlation
coefficients, which depend on the covariance, are a dimensionless measure of linear dependence. (In fact, correlation
coefficients can simply be understood as a normalized version of covariance.)

Properties
• Variance is a special case of the covariance when the two variables are identical:

• If x, y, w, and v are real-valued random variables and a, b, c, d are constant ("constant" in this context means
non-random), then the following facts are a consequence of the definition of covariance:

For sequences x1, ..., xn and y1, ..., ym of random variables, we have

For a sequence x1, ..., xn of random variables, and constants a1, ..., an, we have

A more general identity for covariance matrices
Let be a random vector, let denote its covariance matrix, and let be a matrix that can act on . The
result of applying this matrix to is a new vector with covariance matrix
.
This is a direct result of the linearity of expectation and is useful when applying a linear transformation, such as a
whitening transformation, to a vector.

Covariance 22

Uncorrelatedness and independence
If x and y are independent, then their covariance is zero. This follows because under independence,

The converse, however, is not generally true. For example, let x be uniformly distributed in [-1, 1] and let y = x2.
Clearly, x and y are dependent, but

In this case, the relationship between y and x is non-linear, while correlation and covariance are measures of linear
dependence between two variables. Still, as in the example, if two variables are uncorrelated, that does not imply that
they are independent.

Relationship to inner products
Many of the properties of covariance can be extracted elegantly by observing that it satisfies similar properties to
those of an inner product:
1. bilinear: for constants a and b and random variables x, y, z, σ(ax + by, z) = a σ(x, z) + b σ(y, z);
2. symmetric: σ(x, y) = σ(y, x);
3. positive semi-definite: σ2(x) = σ(x, x) ≥ 0, and σ(x, x) = 0 implies that x is a constant random variable (K).
In fact these properties imply that the covariance defines an inner product over the quotient vector space obtained by
taking the subspace of random variables with finite second moment and identifying any two that differ by a constant.
(This identification turns the positive semi-definiteness above into positive definiteness.) That quotient vector space
is isomorphic to the subspace of random variables with finite second moment and mean zero; on that subspace, the
covariance is exactly the L2 inner product of real-valued functions on the sample space.
As a result for random variables with finite variance, the inequality

holds via the Cauchy–Schwarz inequality.
Proof: If σ2(y) = 0, then it holds trivially. Otherwise, let random variable

Then we have

QED.

Covariance 23

Calculating the sample covariance
The sample covariance of N observations of K variables is the K-by-K matrix with the entries

,

which is an estimate of the covariance between variable j and variable k.
The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of
the random vector , a row vector whose jth element (j = 1, ..., K) is one of the random variables. The reason the
sample covariance matrix has in the denominator rather than is essentially that the population mean
is not known and is replaced by the sample mean . If the population mean is known, the analogous
unbiased estimate is given by

Comments
The covariance is sometimes called a measure of "linear dependence" between the two random variables. That does
not mean the same thing as in the context of linear algebra (see linear dependence). When the covariance is
normalized, one obtains the correlation coefficient. From it, one can obtain the Pearson coefficient, which gives us
the goodness of the fit for the best possible linear function describing the relation between the variables. In this sense
covariance is a linear gauge of dependence.

References
[1] http:/ / mathworld. wolfram. com/ Covariance. html
[2] Oxford Dictionary of Statistics, Oxford University Press, 2002, p. 104.

External links
• Hazewinkel, Michiel, ed. (2001), "Covariance" (http://www.encyclopediaofmath.org/index.php?title=p/
c026800), Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4
• MathWorld page on calculating the sample covariance (http://mathworld.wolfram.com/Covariance.html)
• Covariance Tutorial using R (http://www.r-tutor.com/elementary-statistics/numerical-measures/covariance)

Autocovariance 24

Autocovariance
In statistics, given a real stochastic process X(t), the autocovariance is the covariance of the variable against a
time-shifted version of itself. If the process has the mean , then the autocovariance is given by

where E is the expectation operator.

Stationarity
If X(t) is stationary process, then the following are true:
for all t, s
and

where

is the lag time, or the amount of time by which the signal has been shifted.
As a result, the autocovariance becomes

where RXX represents the autocorrelation in the signal processing sense.

Normalization
When normalized by dividing by the variance σ2, the autocovariance C becomes the autocorrelation coefficient
function c,[1]

The autocovariance function is itself a version of the autocorrelation function with the mean level removed. If the
signal has a mean of 0, the autocovariance and autocorrelation functions are identical.[1]
However, often the autocovariance is called autocorrelation even if this normalization has not been performed.
The autocovariance can be thought of as a measure of how similar a signal is to a time-shifted version of itself with
an autocovariance of σ2 indicating perfect correlation at that lag. The normalisation with the variance will put this
into the range [−1, 1].

Properties
The autocovariance of a linearly filtered process

is

Autocovariance 25

References
• P. G. Hoel, Mathematical Statistics, Wiley, New York, 1984.
• Lecture notes on autocovariance from WHOI [2]
[1] Westwick, David T. (2003). Identification of Nonlinear Physiological Systems. IEEE Press. pp. 17–18. ISBN 0-471-27456-9.
[2] http:/ / w3eos. whoi. edu/ 12. 747/ notes/ lect06/ l06s02. html

Autocorrelation
Autocorrelation is the cross-correlation of
a signal with itself. Informally, it is the
similarity between observations as a
function of the time separation between
them. It is a mathematical tool for finding
repeating patterns, such as the presence of a
periodic signal which has been buried under
noise, or identifying the missing
fundamental frequency in a signal implied
by its harmonic frequencies. It is often used
in signal processing for analyzing functions
or series of values, such as time domain
signals.

Definitions A plot showing 100 random numbers with a
"hidden" sine function, and an autocorrelation
Different fields of study define
(correlogram) of the series on the bottom.
autocorrelation differently, and not all of
these definitions are equivalent. In some
fields, the term is used interchangeably with
autocovariance.

Statistics
In statistics, the autocorrelation of a random
process describes the correlation between
values of the process at different times, as a
function of the two times or of the time
difference. Let X be some repeatable
process, and i be some point in time after the Visual comparison of convolution, cross-correlation and autocorrelation.

start of that process. (i may be an integer for
a discrete-time process or a real number for a continuous-time process.) Then Xi is the value (or realization)
produced by a given run of the process at time i. Suppose that the process is further known to have defined values for
mean μi and variance σi2 for all times i. Then the definition of the autocorrelation between times s and t is

where "E" is the expected value operator. Note that this expression is not well-defined for all time series or
processes, because the variance may be zero (for a constant process) or infinite. If the function R is well-defined, its

Autocorrelation 26

value must lie in the range [−1, 1], with 1 indicating perfect correlation and −1 indicating perfect anti-correlation.
If Xt is a second-order stationary process then the mean μ and the variance σ2 are time-independent, and further the
autocorrelation depends only on the difference between t and s: the correlation depends only on the time-distance
between the pair of values but not on their position in time. This further implies that the autocorrelation can be
expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the
more familiar form

and the fact that this is an even function can be stated as

It is common practice in some disciplines, other than statistics and time series analysis, to drop the normalization by
σ2 and use the term "autocorrelation" interchangeably with "autocovariance". However, the normalization is
important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the
strength of statistical dependence, and because the normalization has an effect on the statistical properties of the
estimated autocorrelations.

Signal processing
In signal processing, the above definition is often used without the normalization, that is, without subtracting the
mean and dividing by the variance. When the autocorrelation function is normalized by mean and variance, it is
sometimes referred to as the autocorrelation coefficient.[1]
Given a signal , the continuous autocorrelation is most often defined as the continuous
cross-correlation integral of with itself, at lag .

where represents the complex conjugate and represents convolution. For a real function, .
The discrete autocorrelation at lag for a discrete signal is

The above definitions work for signals that are square integrable, or square summable, that is, of finite energy.
Signals that "last forever" are treated instead as random processes, in which case different definitions are needed,
based on expected values. For wide-sense-stationary random processes, the autocorrelations are defined as

For processes that are not stationary, these will also be functions of , or .
For processes that are also ergodic, the expectation can be replaced by the limit of a time average. The
autocorrelation of an ergodic process is sometimes defined as or equated to[1]

These definitions have the advantage that they give sensible well-defined single-parameter results for periodic
functions, even when those functions are not the output of stationary ergodic processes.
Alternatively, signals that last forever can be treated by a short-time autocorrelation function analysis, using finite
time integrals. (See short-time Fourier transform for a related process.)

Autocorrelation 27

Multi-dimensional autocorrelation is defined similarly. For example, in three dimensions the autocorrelation of a
square-summable discrete signal would be

When mean values are subtracted from signals before computing an autocorrelation function, the resulting function
is usually called an auto-covariance function.

Properties
In the following, we will describe properties of one-dimensional autocorrelations only, since most properties are
easily transferred from the one-dimensional case to the multi-dimensional cases.
• A fundamental property of the autocorrelation is symmetry, , which is easy to prove from the
definition. In the continuous case,
the autocorrelation is an even function
when is a real function,
and the autocorrelation is a Hermitian function

when is a complex function.
• The continuous autocorrelation function reaches its peak at the origin, where it takes a real value, i.e. for any
delay , . This is a consequence of the Cauchy–Schwarz inequality. The same result holds
in the discrete case.
• The autocorrelation of a periodic function is, itself, periodic with the same period.
• The autocorrelation of the sum of two completely uncorrelated functions (the cross-correlation is zero for all )
is the sum of the autocorrelations of each function separately.
• Since autocorrelation is a specific type of cross-correlation, it maintains all the properties of cross-correlation.
• The autocorrelation of a continuous-time white noise signal will have a strong peak (represented by a Dirac delta
function) at and will be absolutely 0 for all other .
• The Wiener–Khinchin theorem relates the autocorrelation function to the power spectral density via the Fourier
transform:

• For real-valued functions, the symmetric autocorrelation function has a real symmetric transform, so the
Wiener–Khinchin theorem can be re-expressed in terms of real cosines only:

Autocorrelation 28

Efficient computation
For data expressed as a discrete sequence, it is frequently necessary to compute the autocorrelation with high
computational efficiency. The brute force method based on the definition can be used. For example, to calculate the
autocorrelation of , we employ the usual multiplication method with right shifts:
231
×231
________
231
693
462
_____________
2 9 14 9 2
Thus the required autocorrelation is (2,9,14,9,2). In this calculation we do not perform the carry-over operation
during addition because the vector has been defined over a field of real numbers. Note that we can halve the
number of operations required by exploiting the inherent symmetry of the autocorrelation.
While the brute force algorithm is order n2, several efficient algorithms exist which can compute the autocorrelation
in order n log(n). For example, the Wiener–Khinchin theorem allows computing the autocorrelation from the raw
data X(t) with two Fast Fourier transforms (FFT)[2]:
FR(f) = FFT[X(t)]
S(f) = FR(f) FR*(f)
R(τ) = IFFT[S(f)]
where IFFT denotes the inverse Fast Fourier transform. The asterisk denotes complex conjugate.
Alternatively, a multiple τ correlation can be performed by using brute force calculation for low τ values, and then
progressively binning the X(t) data with a logarithmic density to compute higher values, resulting in the same n
log(n) efficiency, but with lower memory requirements.

Estimation
For a discrete process of length defined as with known mean and variance, an estimate
of the autocorrelation may be obtained as

for any positive integer . When the true mean and variance are known, this estimate is unbiased. If
the true mean and variance of the process are not known there are a several possibilities:
• If and are replaced by the standard formulae for sample mean and sample variance, then this is a biased
estimate.
• A periodogram-based estimate replaces in the above formula with . This estimate is always biased;
[3][4]
however, it usually has a smaller mean square error.
• Other possibilities derive from treating the two portions of data and
separately and calculating separate sample means and/or sample variances for use
in defining the estimate.
The advantage of estimates of the last type is that the set of estimated autocorrelations, as a function of , then
form a function which is a valid autocorrelation in the sense that it is possible to define a theoretical process having

Autocorrelation 29

exactly that autocorrelation. Other estimates can suffer from the problem that, if they are used to calculate the
variance of a linear combination of the 's, the variance calculated may turn out to be negative.

Regression analysis
In regression analysis using time series data, autocorrelation of the errors is a problem. Autocorrelation of the errors,
which themselves are unobserved, can generally be detected because it produces autocorrelation in the observable
residuals. (Errors are also known as "error terms" in econometrics.)
Autocorrelation violates the ordinary least squares (OLS) assumption that the error terms are uncorrelated. While it
does not bias the OLS coefficient estimates, the standard errors tend to be underestimated (and the t-scores
overestimated) when the autocorrelations of the errors at low lags are positive.
The traditional test for the presence of first-order autocorrelation is the Durbin–Watson statistic or, if the explanatory
variables include a lagged dependent variable, Durbin's h statistic. A more flexible test, covering autocorrelation of
higher orders and applicable whether or not the regressors include lags of the dependent variable, is the
Breusch–Godfrey test. This involves an auxiliary regression, wherein the residuals obtained from estimating the
model of interest are regressed on (a) the original regressors and (b) k lags of the residuals, where k is the order of the
test. The simplest version of the test statistic from this auxiliary regression is TR2, where T is the sample size and R2
is the coefficient of determination. Under the null hypothesis of no autocorrelation, this statistic is asymptotically
distributed as with k degrees of freedom.
Responses to nonzero autocorrelation include generalized least squares and the Newey–West HAC estimator
(Heteroskedasticity and Autocorrelation Consistent).[5]

Applications
• One application of autocorrelation is the measurement of optical spectra and the measurement of
very-short-duration light pulses produced by lasers, both using optical autocorrelators.
• Autocorrelation is used to analyze Dynamic light scattering data, which notably enables to determine the particle
size distributions of nanometer-sized particles or micelles suspended in a fluid. A laser shining into the mixture
produces a speckle pattern that results from the motion of the particles. Autocorrelation of the signal can be
analyzed in terms of the diffusion of the particles. From this, knowing the viscosity of the fluid, the sizes of the
particles can be calculated.
• The Small-angle X-ray scattering intensity of a nanostructured system is the Fourier transform of the spatial
autocorrelation function of the electron density.
• In optics, normalized autocorrelations and cross-correlations give the degree of coherence of an electromagnetic
field.
• In signal processing, autocorrelation can give information about repeating events like musical beats (for example,
to determine tempo) or pulsar frequencies, though it cannot tell the position in time of the beat. It can also be used
to estimate the pitch of a musical tone.
• In music recording, autocorrelation is used as a pitch detection algorithm prior to vocal processing, as a distortion
effect or to eliminate undesired mistakes and inaccuracies.[6]
• Autocorrelation in space rather than time, via the Patterson function, is used by X-ray diffractionists to help
recover the "Fourier phase information" on atom positions not available through diffraction alone.
• In statistics, spatial autocorrelation between sample locations also helps one estimate mean value uncertainties
when sampling a heterogeneous population.
• The SEQUEST algorithm for analyzing mass spectra makes use of autocorrelation in conjunction with
cross-correlation to score the similarity of an observed spectrum to an idealized spectrum representing a peptide.

Autocorrelation 30

• In Astrophysics, auto-correlation is used to study and characterize the spatial distribution of galaxies in the
Universe and in multi-wavelength observations of Low Mass X-ray Binaries.
• In panel data, spatial autocorrelation refers to correlation of a variable with itself through space.
• In analysis of Markov chain Monte Carlo data, autocorrelation must be taken into account for correct error
determination.

References
[1] Patrick F. Dunn, Measurement and Data Analysis for Engineering and Science, New York: McGraw–Hill, 2005 ISBN 0-07-282538-3
[2] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Upper Saddle River, NJ:
Prentice–Hall, 1994.
[3] Spectral analysis and time series, M.B. Priestley (London, New York : Academic Press, 1982)
[4] Percival, Donald B.; Andrew T. Walden (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate
Techniques. Cambridge University Press. pp. 190–195. ISBN 0-521-43541-2.
[5] Christopher F. Baum (2006). An Introduction to Modern Econometrics Using Stata (http:/ / books. google. com/ ?id=acxtAylXvGMC&
pg=PA141& dq=newey-west-standard-errors+ generalized-least-squares). Stata Press. ISBN 1-59718-013-0. .
[6] Tyrangiel, Josh (2009-02-05). "Auto-Tune: Why Pop Music Sounds Perfect" (http:/ / www. time. com/ time/ magazine/ article/
0,9171,1877372,00. html). Time Magazine. .

External links
• Weisstein, Eric W., " Autocorrelation (http://mathworld.wolfram.com/Autocorrelation.html)" from
MathWorld.
• Autocorrelation articles in Comp.DSP (DSP usenet group). (http://www.dsprelated.com/comp.dsp/keyword/
Autocorrelation.php)
• GPU accelerated calculation of autocorrelation function. (http://www.iop.org/EJ/abstract/1367-2630/11/9/
093024/)


Cross-correlation
In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag
applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for
searching a long-signal for a shorter, known feature. It also has applications in pattern recognition, single particle
analysis, electron tomographic averaging, cryptanalysis, and neurophysiology.
For continuous functions, f and g, the cross-correlation is defined as:

where f * denotes the complex conjugate of f.
Similarly, for discrete functions, the cross-correlation is defined as:

The cross-correlation is similar in nature to
the convolution of two functions.
In an autocorrelation, which is the
cross-correlation of a signal with itself, there
will always be a peak at a lag of zero unless
the signal is a trivial zero signal.
In probability theory and statistics,
correlation is always used to include a
standardising factor in such a way that
correlations have values between −1 and +1, Visual comparison of convolution, cross-correlation and autocorrelation.
and the term cross-correlation is used for
referring to the correlation corr(X, Y) between two random variables X and Y, while the "correlation" of a random
vector X is considered to be the correlation matrix (matrix of correlations) between the scalar elements of X.

If and are two independent random variables with probability density functions f and g, respectively, then the
probability density of the difference is formally given by the cross-correlation (in the signal-processing
sense) ; however this terminology is not used in probability and statistics. In contrast, the convolution
(equivalent to the cross-correlation of f(t) and g(−t) ) gives the probability density function of the sum .

Explanation
As an example, consider two real valued functions and differing only by an unknown shift along the x-axis.
One can use the cross-correlation to find how much must be shifted along the x-axis to make it identical to .
The formula essentially slides the function along the x-axis, calculating the integral of their product at each
position. When the functions match, the value of is maximized. This is because when peaks (positive areas)
are aligned, they make a large contribution to the integral. Similarly, when troughs (negative areas) align, they also
make a positive contribution to the integral because the product of two negative numbers is positive.
With complex-valued functions and , taking the conjugate of ensures that aligned peaks (or aligned troughs)
with imaginary components will contribute positively to the integral.
In econometrics, lagged cross-correlation is sometimes referred to as cross-autocorrelation[1]


Properties
• The correlation of functions f(t) and g(t) is equivalent to the convolution of f *(−t) and g(t). I.e.:

• If f is Hermitian, then
•
• Analogous to the convolution theorem, the cross-correlation satisfies:

where denotes the Fourier transform, and an asterisk again indicates the complex conjugate. Coupled with fast
Fourier transform algorithms, this property is often exploited for the efficient numerical computation of
cross-correlations. (see circular cross-correlation)
• The cross-correlation is related to the spectral density. (see Wiener–Khinchin theorem)
• The cross correlation of a convolution of f and h with a function g is the convolution of the correlation of f and g
with the kernel h:

Normalized cross-correlation
For image-processing applications in which the brightness of the image and template can vary due to lighting and
exposure conditions, the images can be first normalized. This is typically done at every step by subtracting the mean
and dividing by the standard deviation. That is, the cross-correlation of a template, with a subimage
is

.

where is the number of pixels in and , is the average of f and is standard deviation of f.
In functional analysis terms, this can be thought of as the dot product of two normalized vectors. That is, if

and

then the above sum is equal to

where is the inner product and is the L² norm. Thus, if f and t are real matrices, their normalized
cross-correlation equals the cosine of the angle between the unit vectors F and T, being thus 1 if and only if F equals
T multiplied by a positive scalar.
Normalized correlation is one of the methods used for template matching, a process used for finding incidences of a
pattern or object within an image. It is also the 2-dimensional version of Pearson product-moment correlation
coefficient.


Time series analysis
In time series analysis, as applied in statistics, the cross correlation between two time series describes the normalized
cross covariance function.
Let represent a pair of stochastic processes that are jointly wide sense stationary. Then the cross
covariance is given by [2]

where and are the means of and respectively.
The cross correlation function is the normalized cross-covariance function.

where and are the standard deviations of processes and respectively.
Note that if for all t, then the cross correlation function is simply the autocorrelation function.

Scaled correlation
In the analysis of time series scaled correlation can be applied to reveal cross-correlation exclusively between fast
components of the signals, the contributions of slow components being removed.[3]

Time delay analysis
Cross-correlations are useful for determining the time delay between two signals, e.g. for determining time delays
for the propagation of acoustic signals across a microphone array.[4][5] After calculating the cross-correlation
between the two signals, the maximum (or minimum if the signals are negatively correlated) of the cross-correlation
function indicates the point in time where the signals are best aligned, i.e. the time delay between the two signals is
determined by the argument of the maximum, or arg max of the cross-correlation, as in

References
[1] Campbell, Lo, and MacKinlay 1996: The Econometrics of Financial Markets, NJ: Princeton University Press.
[2] von Storch, H.; F. W Zwiers (2001). Statistical analysis in climate research. Cambridge Univ Pr. ISBN 0-521-01230-9.
[3] Nikolić D, Muresan RC, Feng W, Singer W (2012) Scaled correlation analysis: a better way to compute a cross-correlogram. European
Journal of Neuroscience, pp. 1–21, doi:10.1111/j.1460-9568.2011.07987.x http:/ / www. danko-nikolic. com/ wp-content/ uploads/ 2012/ 03/
Scaled-correlation-analysis. pdf
[4] Rhudy, Matthew; Brian Bucci, Jeffrey Vipperman, Jeffrey Allanach, and Bruce Abraham (November 2009). "Microphone Array Analysis
Methods Using Cross-Correlations". Proceedings of 2009 ASME International Mechanical Engineering Congress, Lake Buena Vista, FL.
[5] Rhudy, Matthew (November 2009). "Real Time Implementation of a Military Impulse Classifier". University of Pittsburgh, Master's Thesis.


External links
• Cross Correlation from Mathworld (http://mathworld.wolfram.com/Cross-Correlation.html)
• http://scribblethink.org/Work/nvisionInterface/nip.html
• http://www.phys.ufl.edu/LIGO/stochastic/sign05.pdf
• http://www.staff.ncl.ac.uk/oliver.hinton/eee305/Chapter6.pdf
• http://www.springerlink.com/content/0150455247264825/fulltext.pdf

Time series and forecasting from wikipedia

Time series and forecasting from wikipedia

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Time series and forecasting from wikipedia

Similar a Time series and forecasting from wikipedia (20)

Más de Monica Barros

Más de Monica Barros (13)

Time series and forecasting from wikipedia