Professional Documents
Culture Documents
Some definitions
Auto Correlation (AC)
Auto Correlation of lag k is the correlation coefficient between the original
series and another series which is nothing but the original series lagged by k
terms. A plot of these values at various lags is called the Auto Correlation
Function (ACF)
Partial Auto Correlation (PAC)
Partial autocorrelation of lag k is the correlation coefficient between the
original series and another series which is the original series lagged by k
terms, after the effect of other lags are removed. A plot of these values at
various lags is called the Partial Auto Correlation Function (PACF)
Stationarity
In statistics, a stationary process (or strict(ly) stationary process or
strong(ly) stationary process) is a stochastic process whose joint
probability distribution does not change when shifted in time. Consequently,
parameters such as the mean and variance, also do not change over time
and do not follow any trends.
Seasonality
Patterns that repeat over known, fixed periods of time within the data set of a
tie series is known as seasonality.
ARIMA models
ARIMA models are a class of models which are used very often to forecast time
series values. This concept was propose by George Box and Gwilym Jenkins and
hence they are also referred to as Box-Jenkins method.
The overall procedure for forecasting a time series consists of the following 4 steps.
Step 1 : Identification
Step 2 : Estimation (and selection)
Here, c is a constant,
are parameters to be determined by linear
regression and
is white noise.
Here, is the mean of the series, the 1, ..., q are the parameters of the model
and the t, t1,..., tq are white noise error terms.
Thus, a moving-average model is conceptually a linear regression of the current
value of the series against current and previous (unobserved) white noise error
terms or random shocks. The random shocks at each point are assumed to be
mutually independent and to come from the same distribution, typically a normal
distribution, with mean zero and constant variance.
The underlying assumption in ARMA models is that the series is stationary in the
mean and variance. In case the original series is non stationary, we use differencing
to make the series stationary. We, then, proceed to decide the values of p and q to
fit a ARMA model to the stationary series. The number of differencing generally will
not be more than 2.
The following figure illustrates the effect of differencing.
The first graph clearly indicates that the series is not stationary. After first
differencing, the series is modified, but is still not stationary. After the second
differencing, we see that the series has become stationary.
If d is the level of differencing used, then the model is described as ARIMA (p,d,q).
In the above example, d=2.
Seasonality
A stochastic process is said to be a seasonal (or periodic) time series with
periodicity s if Zt and Zt+ks have the same distribution.
In other words, in a scatter plot, if we see a pattern repeating at regular intervals,
we can conclude that the series has seasonality.
Seasonal differencing will generally remove seasonality.
The following plot shows a series which displays seasonality.
Once the seasonality is removed, if there is non stationarity (or trend), we need to
do a normal differencing and make the series stationary before proceeding with
further analysis.
The following plot shows a series with seasonality along with a trend (indicating
non-stationary series).
The chart below shows the steps followed for defining a model and validating it.
Estimating p and q
Once stationarity and seasonality have been addressed, the next step is to identify
the order (i.e. the p and q) of the autoregressive and moving average terms.
This can be done using the ACF and PACF plots.
The partial autocorrelation of an AR(p) process is zero at lag p + 1 and greater, so
the appropriate maximum lag is the one beyond which the partial autocorrelations
are all zero.
The autocorrelation function of an MA(q) process becomes zero at lag q + 1 and
greater, so we determine the appropriate maximum lag for the estimation by
examining the sample autocorrelation function to see where it becomes
insignificantly different from zero for all lags beyond a certain lag, which is
designated as the maximum lag q.
The rules for determining the values of p and q are summarized below.
ACF
PACF
AR(p)
Damped sinusoidal
/ exponential
decay
Zero at lag p+1
and greater
MA(q)
Zero at lag q+1
and greater
Damped sinusoidal
/ exponential
decay
Autoregressive model (use the partial
autocorrelation plot to identify the order
p)
Moving average model (order q
identified by where autocorrelation plot
becomes zero)
Mixed autoregressive and moving
average model
Florian
No significant autocorrelations
No decay to zero or very slow decay
High values at fixed intervals
Estimating parameters
White noise
Non-stationarity make the series
stationary
Seasonality use seasonal differencing
While the pure AR model parameters can be estimated by least square method, the
MA parameters need trial and error. Another most commonly used method is the
Maximum Likelihood method.
References
1. Box, George; Jenkins, Gwilym (1970), Time series analysis: Forecasting and control,
San Francisco: Holden-Day
2. Makridakis, Wheelwright and Hyndman (2005), Forecasting Methods and
Applications,3rd ed. John Wiley and sons.
3. Abraham, Bovas and Ledolter, Johannes (2005) Statistical Methods for Forecasting,
John Wiley and sons
Note: The graphs and figures are taken from the web from various sites.