You are on page 1of 200

Applied econometrics

Radu Lupu, PhD.



Curriculum
Forecasting with Regression Analysis
Sampling and Statistical Inference
Sampling error confidence intervals
Statistical significance
The t-statistic
Intro to Time Series Forecasting
Standard Econometric Time series analysis
Stationarity
Autocorrelation functions
AR, MA, ARMA processes
Curriculum
Econometrics and Risk Management
Value-at-Risk
Monte Carlo simulations
Historical Simulations

Instruments
Excel
E-views
Ppt. slides
Text notes
Evaluation
Assignments only (no exam)
Deadlines will be announced one week in
advance
Assignments are submitted via e-mail at
applied.ec@gmail.com
Students can work in teams of 2
Only one file will be submitted for each
assignment with the name
Lastname1.Firstname1_Lastname2.Firstname2,
no matter what extension
Assignments will have equal weight in the grade
Forecasting with regression
analysis
Problem
You are interested in selling your house
You need to find the proper price for it
You use prices on other houses and try to
forecast or estimate a price for your house
Indistinguishable Data
Selling Prices of a Sample of Ten Houses
$109,360
$137,980
$131,230
$130,230
$125,410
$124,270
$139,030
$140,160
$144,220
$154,190
Indistinguishable Data (cont.)
Assume that housing prices have remained
stable over the last year and any house in the
sample is as likely to be a representative of
the selling price for your house
Your house is indistinguishable from the
houses in the sample
A probabilistic forecast of your house price
would simply necessitate the frequency
distribution of the 10 selling prices
Indistinguishable Data (cont.)
The best guess would be:
The mean
The median
The mode
The mean is $133,618. How good is this estimate?
If all prices were close to this then we would feel
confident about this guess
We are less confident if the prices are widely
dispersed
One measure of dispersion is the standard
deviation: $12,406

Indistinguishable Data the mean
We define the residual as the difference
between a point forecast and an actual value
The MEAN has the minimum sum of squared
residuals it is a least-squares estimate
(proof using the program use median)
The regression estimates are also least-
square estimates
Distinguishable Data
What if the sample contains information on
both the selling prices and the square footage
of the ten houses?
Larger houses are distinguishable from
smaller houses.
If your house has 1,682 square feet of living
area, it seems reasonable to confine your
attention to lookalike houses.
Distinguishable Data (cont.)
Selling Prices of a Sample of
Ten Houses
Size
$109,360 1,404
$137,980 1,477
$131,230 1,503
$130,230 1,552
$125,410 1,608
$124,370 1,633
$139,030 1,717
$140,160 1,775
$144,220 1,832
$154,190 1,934
Distinguishable Data (cont.)
No house is a perfect lookalike but there are
4 houses that are more than 1,600 and less
than 1,800 square feet they look more like
our house
The average for these 4 houses is $132,243,
slightly less than the average across all 10
houses and the sample standard deviation is
$8,513, less than the sample st dev across all
10 houses
Distinguishable Data (cont.)
All 10
houses
1,400
1,599
1,600
1,799
1,800
1,999
Sample
Average
$133,618 $127,200 $132,243 $149,205
Sample
Standard
Deviation
$12,406 $12,381 $8,513 $7,050
Number of
Observations
10 4 4 2
By restricting our attention to a subset of the data consisting of houses that
are nearly indistinguishable lookalikes,
- we have slightly refined our point forecast,
- and have somewhat increased its accuracy
Sample residuals and their standard
deviation
The standard deviation for our house was
$8,513 but was based on only 4 observations
and, therefore, 3 degrees of freedom
Questions:
Does the st dev properly measure the forecast
uncertainty?
Could the differences in the 3 cells have arisen
by sampling error?
Have the partitioning reduced the forecast
uncertainty?
Sample residuals and their standard
deviation (cont.)
Remember: sample residual = value of the
dependent variable minus the forecast value
We compute the residuals for all the houses:
We identify each cell
We use the cell average as a point forecast
Compute the residual as the difference between
the actual selling price and the forecast
Sample residuals and their standard deviation (cont.)
Selling Price Residuals Size
$109,360 - $17,840 1,404 Size =
1,400 1,599
$137,980 $10,780 1,477
$131,230 $4,030 1,503
$130,230 $3,030 1,552
Average $127,200
$125,410 -$6,833 1,608 Size =
1,600 1,799
$124,370 -$7,873 1,633
$139,030 $6,788 1,717
$140,160 $7,918 1,775
Average $132,243
$144,220 -$4,985 1,832 Size =
1,800 1,999
$154,190 $4,985 1,934
Average 149,205
Average $0
Sum of Squares 727,012,525
Degrees of
Freedom
7
RSD $10,191
Sample residuals and their standard
deviation (cont.)
If
we want to take into account only the size as the
only distinguishing factor
and the residuals are not likely to be larger in
magnitude in one cell as opposed to another
Then the residuals are indistinguishable
The residual of -$17,840 for the first
observation is just as relevant a measure of
our forecast error as residuals computed in
the 2 other cells!
Sample residuals and their standard deviation
Tricks in computing the residuals
1. Computation
Take each value and subtract the mean (residuals)
Square the differences
Add them to obtain a sum of squares
Divide by the degrees of freedom
Take the square root
2. Degrees of freedom
Each sample mean uses up a degree of freedom
In our case there are 3 means, so there are left 10 7 = 3
degrees of freedom
The RSD (residual standard deviation) is $10,191,
which is a modest improvement over the sample
standard deviation of $12,406
Using Data More Efficiently
If there were other variables (age of houses,
neighborhood, etc.) we could continue the process of
defining narrow cells
The sample size is small though. With a higher sample
size it would have been practical
The degrees of freedom decline the denominator in
RSD gets smaller
The cells adjacent to our cell may have some
information for the forecast
The partition was somewhat arbitrary
A way around these problems: create a model that
specifies the relationship between selling prices and
the variables that help us to forecast

A Regression Model
Some larger houses will sell for less than
some smaller houses it is reasonable to
assume that as size goes up, selling price will
go up on average
If an increase in square footage is
accompanied by an increase with the same
amount in price then the relation in linear
We could have diminishing returns to
scale or increasing returns to scale
examine a scatter diagram
Regression
A linear relation:



y
est
is an estimate or point forecast of the actual
selling price (y)
x
1
is a house size in square feet
b
0
and b
1
are the regression coefficients,
estimated from the data
1 1 0
x b b y
est
+ =
Least squares estimates
We can find the least square estimates by:
1. Choosing arbitrary values for b
0
and b
1

2. Computing a forecast y
est
of houses price
by multiplying its size by b
1
and adding b
0

3. Computing a residual y-y
est
for each house
4. Squaring and summing all the residuals
We do not need to do that we have a
computer! use Solver in Excel
Inputs in a regression analysis
1. Identify the dependent variable
2. Specify the independent variable(s)
3. Specify the relevant data (ex. dont use data
from Paris if we need forecast for Berlin)
4. Specify the nature of the relationship
between the dependent and the
independent variables (linear, IRS, DRS)
5. Provide values of the dependent and
independent for the relevant observations
(data file)
Outputs from a Regression Analysis
Regression Coefficients b
0
and b
1

Model 1
Selling price = dependent variable
Square footage = independent variable
Run the regression (Excel and Eviews)
Coefficients are:
b
0
= 35,524
b
1
= 59,69
What does this mean?
Model 1 interpretation
The value of b
1
tells us that if our regression
model is specified correctly, each additional
square foot of living space adds an
average of about $60 to the value of the
house
The constant term b
0
tells us that, a house
with 0 square feet is sold for about $35,500
(looks nonsensical but we come back to this)
Model 2 2 factors
Lets introduce a second independent variable the
age of the house (in years)



2 2 1 1 0
x b x b b y
est
+ + =
Model 1 Model 2
Constant
term
35,524 4,045
Coefficient for:
Size
59,69 86,84
Age
-695,8
Smaller than
model 1
If x
1
increases by one
unit (one square foot)
while x
2
remains
constant, y
est
will
increase by 86.84
The value of a regression coefficient associated with a given
independent variable depends on what other independent variables are
included in the model!
Uncertainty in regression coefficients
The observations used in a regression
analysis can be thought of as a sample from
some larger population (or process)
The estimates are subject to sampling error
Each sample regression coefficient is a point
estimate of a true regression coefficient
The error of this estimate is given by the
coefficients standard error
Sampling and statistical inference
As previously seen for the mean, the standard error for
the regression coefficient enables you to derive a
(normal) confidence distribution for the true regression
coefficient, confidence intervals and t values
The standard error for b
1
is 15.10 and its value is 59.69
With a confidence of 68% we can say that the true
population coefficient is between 44.59 and 74.79
The t value is 59.69/15.10=3.95 >2, so it is virtually
certain that the true regression coefficient is positive
Proxy effects
We observe that b
1
has a value of 59.69 in
Model 1 and a value of 86.84 in Model 2
Why are they different?
In Model 2 b
1
estimates the increase in
selling price when size increases by 1 square
foot with age held constant. Why is b
1

higher in this case?
We also observe that when size is held
constant, the relation of age to selling price is
negative (-695,8): older houses tend to sell
for less than newer houses of the same size
Proxy effects (cont.)
It also turns out that age and size are
positively correlated the correlation
coefficient is 0.81
When age is left out size not only reflects its
own relation with selling price, but it also
proxies for the relationship of age with
selling price
Because the relation age-selling price is
negative but the relation age-size is positive,
the age effect for which size proxies is
negative, so b
1
tends to be smaller in Model 1
Proxy effects - occurence
Proxy effects occur when two variables x
1
and x
2
:
1. Are correlated with each other
2. At least one of them is related to the dependent
variable y
3. Only one of them is included in a regression
The proxy effect is higher when the correlation
between the two x is higher and the closer the
relationship between the dependent and one and x
If both x
1
and x
2
are included in the model, their
individual relationship with y will be correctly sorted
out unless they proxy for some other variables not
included in the regression
Forecasts Point Forecasts
In a regression model, a forecast of the
dependent variable can be made by
multiplying known values of the independent
variables by their respective estimated
regression coefficients, adding the products
and finally adding the estimated constant
term
In Model 2, a point forecast for a house that
has 1.682 square feet and 10 year old is
y
est
= 4,045 + 86.84*1,682 695.8*10 =
=143,152
Probabilistic Forecasts
When we perform a regression, we attempt to take
into account all distinguishing factors that can
explain how a dependent variable varies in value
What is left is a collection of indistinguishable
residuals.
Residuals are retrospective, they come from data
already in hand
We need a prospective forecast all point forecasts
are surely in error
We need to quantify and attach probabilities to
these prospective errors
Probabilistic Forecasts residuals
The only information about the variability of
our forecasts is comprised in the variability of
the residuals
Because the residuals are indistinguishable,
it means that each one is a representative of
the error we are likely to make
The probability of having a deviation from
the forecast will be in the future not far from
the probability of having the same deviation
in the past
Probabilistic Forecasts simulation
scenarios
If we want a probabilistic forecast for our
1,682 square-foot 10-year old house, we will
start with the point estimate of $143,152
Each residual added to the point forecast
would give a possible forecast value y
Since there are 10 residuals, it means that
each can happen with a probability of 1/10
Probabilistic Forecasts
Selling Price Size Age
Y
est
=
4,045+86.84*S
ize-695.8*Age Residuals
$109,360 1,404
20 112054 -2694.2
$137,980 1,477
2 130917 7062.52
$131,230 1,503
5 131088 141.961
$130,230 1,552
4 136039 -5809
$125,410 1,608
23 127682 -2272.5
$124,370 1,633
34 122200 2169.98
$139,030 1,717
25 135757 3273.38
$140,160 1,775
23 142185 -2024.9
$144,220 1,832
28 143656 563.962
$154,190 1,934
25 154601 -411.15
Probabilistic Forecasts
Point Forecast
Residual = Forecast
Error Probabilistic Forecast
143,152 -5809.02713 137,343
143,152 -2694.1957 140,458
143,152 -2272.47674 140,880
143,152 -2024.94909 141,127
143,152 -411.149607 142,741
143,152 141.9606212 143,294
143,152 563.9618054 143,716
143,152 2169.978872 145,322
143,152 3273.380333 146,425
143,152 7062.516638 150,215
1/10
1/10
1/10
1/10
1/10
1/10
1/10
1/10
1/10
1/10
When you are not faced with a clearly defined decision
problem so that you suffer from the uncertainty, you
can provide a confidence interval
Standard practice is to assume that the sample
residuals came from a population of normally
distributed residuals, with mean 0 and standard
deviation equal to the estimated RSD
The forecast itself has a mean of y
est
and a standard
deviation equal to RSD
The RSD for the ten houses is 4,072
Thus, a 68% confidence interval for our 1,682-square-
foot 10-year-old house is between is between 143,152
4,072 = 139,080 and 143,152 4,072 = 147,224
Probabilistic Forecasts
Measures of Goodness of Fit
Residual Standard Deviation
Is the square root of the sum of squared residuals
divided by the degrees of freedom
Degrees of freedom are n minus the number of
regression coefficients (including the constant
term)
In comparing regressions having the same
observations and the same dependent variable,
the one with the lowest RSD indicates a better fit
Still, if we add more independent variables RSD
generally decreases
Measures of Goodness of Fit
Coefficient of Determination R
2

RSD is measured in units of the dependent
variable it is hard to judge if an RSD of
4,072 indicates a good fit or a bad fit
R
2
is an index, measures the percent
improvement in fit that a regression provides
relative to a base case which assumes that
the values of the dependent variable are
indistinguishable
R
2
= (Total Sum of Squares Sum of
Squared Residuals)/Total Sum of Squares
R
2

The Total Sum of Squares is in fact a base-
case it is the case in which we consider that
the values of the dependent variable are
indistinguishable i.e. the best forecast is the
mean (TSS is simply the sum of squared
residuals around the mean of the dependent
variable)
Thus, R
2
represents the percent reduction in
the base-case sum of squares achieved by
the regression
Adjusted R
2

Still the R
2
will increase as we add more
independent variables because the RSS will
reduce
Adjusted R
2
is a coefficient of determination
that penalizes the addition of extra
independent variables to the regression
model it is an R
2
that will not change simply
by the fact that we add another independent
variable to the regression model
Transformed variables
Transformations can increase the ability to
specify relationships between a dependent
and a number of independent variables

Use independent variables that are not
contemporaneous with the dependent in a time
series
Use ordinal and categorical variables as
independent
Lagged variables in Time Series for
Modeling Noncontemporaneous Effects
Suppose we believe that advertising affects
sales
If we have a time series of a companys
advertising expenditures and unit sales, we
could regress sales on advertising and see if
there is any relationship between them
We may include price (as another variable)
so that advertising does not proxy for its
effect
Lagged variables
On reflection we realize that although
advertising this month might be related to
sales this month, there might be a carryover
effect from advertising expenditures made in
previous months:
Advertising last month may have an influence on
this months sales
Advertising two months ago may have an
influence (maybe smaller) on this months sales
We say that we believe that the effects of past
advertising persist for some time
Lagged transformations
To see if sales of this month are dependent on
advertising this month, last month, 2 months ago
and on prices this month we will use sales as the
dependent variable (y
t
) and compute the
independent variables:
x
t
(advertising expenditures this month)
x
t-1
(advertising expenditures last month)
x
t-2
(advertising expenditures 2 months ago)
p
t
(price this month)
Lagged
variables
Lags
We could also lag the values of the
dependent variable we might believe that
sales tend to persist (last months sales
influence this months sales)
However, each additional lag costs 2
degrees of freedom one for the additional
independent variable in the model (for
computing its mean) and another one for the
observation lost by creating the lag
Dummy Variables
Sometimes ordinal or categorical variables
may be plausible explanatory variables
Selling prices of houses may depend on their
condition
We can use a scale from 1 to 5 (1 is poor and
5 is excellent). We expect that as condition is
better, price is higher
This means that we assume that the average
difference between 2 houses with levels 1 and
2 is the same as the difference between 2
houses with levels 4 and 5
Binary dummy 0 and 1
Suppose there are only 2 types of houses:
Frame houses (coded 0)
Brick houses (coded 1)
If we use this dummy as an explanatory variable
then the regression coefficient tells us by how much
brick houses differ from frame houses in selling
price, on average
If coefficient is 1,234, it means that brick houses
sell, on average, for $1,234 more than frame
houses, all other things kept constant (if it is -3,456
then brick houses sell for $3,456 less than frame
houses)
3 categories
Suppose there are 3 categories of houses:
Mixed frame and brick
All brick
All frame
We create 2 dummies:
One will have value 1 if the house is mixed frame and brick
and 0 if not (i.e. if it is either frame or all brick)
The other will have value 1 if the house is all brick, 0 if not
All frame will be a base case
Suppose the regression coefficients for the first and
second dummies are 1,234 and -3,456
This means that, relative to the frame houses (base
case) the mixed houses are sold for $1,234 more while
all brick houses are sold for $3,456 less
In general c categories
If we have c categories then we can define c-
1 dummy variables by using one of them as
the base case and then use the dummies in a
regression with other factors
We can change the base case all the time by
changing the dummies and using other
regression
However, the coefficients will change but they
will have the same interpretation
Sampling and statistical
inference
Sampling error confidence intervals
Statistical significance
The t-statistic


Sampling error
Problem:
You ask 100 potential customers how much will
they spend on a proposed new product next year
You compute the sample average and get
$32.51
You want to make an inference from these
responses about how much will be spent by the
average potential customer in the target
population next year
Possible inferences
My best estimate of average sales per
potential customer is $32.52

Average sales per potential customer will be
between $27.37 and $37.65 with 95%
confidence

Average sales per potential customer will be
greater than the breakeven amount of $27 at
a 2 % level of significance
Statistical
Estimation
Confidence
Interval
Test of Statistical
Significance
Confidence intervals
Confidence in how close a sample estimate is
to the true population mean depends on
sample size (n)
dispersion of the sample observations
measured by sample standard deviation
Everything being equal, your uncertainty
about the value of a population mean will be
less, the larger the sample size and the
smaller the dispersion in the sample values
Confidence distribution
The level of confidence about a value of a
population mean can be expressed in terms
of a confidence distribution
When we make inferences about the mean,
we will get a distribution of the sample mean:
The average of this distribution is equal to the
sample mean (m)
The standard deviation (standard error) is equal to
the standard deviation (s) divided by the square
root of the sample size
The shape of the confidence distribution is
Normal
Constructing the confidence interval
The market researcher asks a sample of 100
potential customers how much they will
spend next year
The sample mean is $32.51 and the standard
deviation is 25.7
The best estimate of what the average
potential customer will spend is $32.51
The sampling error is 25.7/100 = 2.57
Thus, with 68% confidence, we can say that
the average potential customer will spend
between $32.51-2.57 and $32.51+2.57
Statistical significance test of significance
Rather than estimating a point at which or an
interval in which the sample mean lies, you
can indicate if the population mean is likely to
fall above or below a critical value c
For instance you may want to know if the
average sales per capita is likely to exceed
the breakeven level of $27
If the mean is $32.51 and the breakeven
(critical value) is $27, how likely it is that the
population mean is on the same side of the
critical value?
One way to answer the statistical significance
problem
Construct a confidence interval
Determine whether the interval overlaps the critical
value
If it does not, the sample outcome is said to be
statistically significant
If it does overlap the sample outcome is said to be not
significant
The tests have a corresponding level of significance.
Ex.: a sample result is statistically significant at a level
of 2% when critical value falls outside a 95%
confidence interval
Our example
The 95% confidence interval we found was
from 27.37 to 37.65 and it does not cover the
critical value c=27
Therefore, relative to 27, the sample outcome
of 32.51 is statistically significant at the 2%
level
The t-statistic
Rather than determining statistical significance by
constructing a confidence interval, the following
shortcut procedure produces the same results
If c is the critical value, compute the statistic




If t>2 or t<-2 the sample outcome is significant at the
2% level. (redo computation for proof)
|
.
|

\
|

=

=
n
s
c m c m
t
error standard
Intro to Time series
forecasting (in Excel)
Objectives
Often the task is to predict the next of a
series of periodic observations of a quantity
(ex. demand of a product)
The observations form what is called a time
series
When we rely only on past observations of
that quantity to predict future occurences, the
approach is called time series forecasting
This forecasting relies on the premise that
past patterns in the time series will carry on
into the future

The forecasting process
Step 1 Collect numerical data from internal
(ex. previous sales figure, production /
inventory quantities) and external sources
(national economic indicators, changes in
customer demographics)
Step 2 Generate a forecast based on the
numerical data (numerical forecasting
techniques)
Step 3 Check the accuracy of the
forecasting technique (maintain integrity of
forecasts)
The forecasting process
Step 4 Include qualitative judgements not
represented in the numerical data (adjust the
forecasts with qualitative or deterministic
judgements like regression)
Step 5 Apply the forecast in making
decisions (basing decision only on forecasted
value, and not uncertainty, can cause serious
problems particularly if the cost of the actual
outcome exceeding the forecast is quite
different than the cost of it being lower than
the forecast)
Approaches to Forecast
Qualitative techniques when historical data do
not exist or are too expensive; gather relevant
information, including opinions of experts
Causal procedures regression attempt to use
knowledge about one or more factors (independent
variables) to predict the value of another factor
(dependent variable)
Time series methods analysis of historical data
about the quantity to be predicted and its
subsequent extrapolation into the future; useful
when hundreds of thousands of items must be
forecast
The problem
A sales manager needs to forecast sales for the
coming month for a particular type of car
It is first of October and we have the prior nine
months of sales
Month
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Car
Sales
21 23 21 20 21 19 28 32 26 ?
Basic approaches for One-period
Forecasts Simple Approaches
All-Period average assumption that sales in
October will behave like the composite of sales in
prior months; sales from prior months have same
importance
Prior Period - use October forecast because it is
most recent we give 100% weight to the prior
months sales
To decide which is a better technique, we ask how
each of the techniques would have done if they were
used to forecast each month up to October

Forecasting with average and last period
Month Jan Feb Mar April May June July Aug Sept Oct
Actual Car Sales 21 23 21 20 21 19 28 32 26
All-period
Average
21.0 22.0 21.7 21.3 21.2 20.8 21.9 23.1 23.4
Prior period as
forecast
21.0 23.0 21.0 20.0 21.0 19.0 28.0 32.0 26.0
Failure of forecasting
The All-period average can be characterized
as being quite unresponsive to abrupt
fluctuations in sales
Prior Period forecast is extremely responsive
to sale fluctuations
We should look at techniques that are
somewhere in between these two extremes
Moving Average
It is usually reasonable to assume that more
recent information is of more value than older
information
A problem with basing a forecast on only the
prior periods sales can be that something
may cause sales to jump abnormally one
period
A logical compromise is to base the forecast
on the average of a certain number of most
recent periods this is the moving average
(ex.: forecast for Oct can be mean of prior 3
months)
Smoothed Average
Sales for the month prior to the month to be
forecast is most relevant, the month prior to
that be less relevant and so on
Each prior month receives comparably less
relevance
This decline in significance can be
represented by an exponential function this
is why we call this forecasting technique
exponential smoothing

Exponential smoothing
If we have both a forecast and an actual figure for the
prior period, then the new forecast can be calculated
as:
New forecast = x (most recent actual) + (1-) x (most
recent forecast)
is a value that is greater than 0 and less than 1. If =
0.2, then 20% of the forecast will go to the prior period
sales and 80% will go to the prior forecast
It is like a MA but the weights are not constant
Still, in a MA we need n periods of data, in a ES we
only need 2 data current value and previous forecast
(sometimes provided by some other model)
Exponential smoothing (cont.)
To use ES, we need to determine an
appropriate factor to use
determines how responsive the forecast will
be to jumps in the prior period
A low means the forecast is unresponsive to
jumps
A higher than 0.5 means the forecast will be
extremely responsive to jumps ( = 1 is the
same as the nave prior-period forecast)
Compute the forecast for each month
using MA and ES
Month Jan Feb Marc April May June July Aug Sept Oct
Actual Car
Sales
21 23 21 20 21 19 28 32 26
?
3 month MA 21.7 21.3 20.7 20.0 22.7 26.3 28.7
ES
alpha=0.
3
21.0 21.6 21.4 21.0 21.0 20.4 22.7 25.5 25.6
The forecasts for October differ somewhat.
The decision maker needs some means of comparing the quality of the various
possible forecasts.
Comparison of forecasts
In order to evaluate the various forecasting
approaches, we calculate the forecasts that
each would have given in the past periods
and compare them
We provide graphs of the four approaches
used on the automobile data
The forecast patterns are quite different,
particularly in the latter months where sales
level moves up
Comparison of Forecasts
0
5
10
15
20
25
30
35
Jan Feb March April May June July Aug Sept
Actual
All-period Average
0
5
10
15
20
25
30
35
Jan Feb March April May June July Aug Sept
Actual
Prior period Forecast
0
5
10
15
20
25
30
35
Jan Feb March April May June July Aug Sept
Actual
3-month MA
0
5
10
15
20
25
30
35
Jan Feb March April May June July Aug Sept
Actual
ES
The Error of a Forecast
The error of a forecast is the degree the
forecast is off from the actual amounts
Absolute Error is the distance of the
forecast from the actual
Absolute Error = Actual - Forecast
Relative Error is the percentage the forecast
differs from actual
Relative Error = (Actual Forecast) / Actual

How do we choose weather absolute or
relative error is more appropriate?
If an absolute error of 1 is just as severe when the
actual is low as when the actual is high then an
absolute error model is more appropriate
If an absolute error of 1 is ten times as significant
when actual = 10 as when actual = 100, then a
relative error model is more appropriate
In order to compare the forecasting approaches we
will compute summary measures of the errors over
all periods
We look for approaches that have the smallest
measures of error
We need a forecast that is unbiased and precise
Bias
A simple summary of the errors of a particular
forecast is to average the errors for all periods
If the average is > 0 then we say that the forecast
has a positive bias (if < 0, negative bias)
Bias = the degree to which the forecast tends to
overshoot (negative bias) or under-shoot (positive
bias) the actual amounts
We can compute bias in both absolute and relative
terms:
Absolute Bias = average of all absolute errors (Average
Error AE)
Relative Bias = average of all relative errors (Mean
Percentage Error MPE)
Bias (cont.)
If a particular forecast method has a known
amount of bias, then we can simply adjust
each future forecast for the bias
If a forecast tends to under-estimate sales by 2
cars then we can just add two to all future
forecasts to eliminate the bias
Much worse is for forecasts to be imprecise
Precision
Precision = the distance the forecast tends
to be from the actual
For ex.: a forecasting technique tends to
over-estimate by 10 half of the time and
under-estimate by 10 the other half of the
time
On average the forecast is right (unbiased)
But the forecast technique is imprecise, since it is
always off by 10
Precision 3 ways of measurement
1. Compute the average of absolute values of all the
error terms this is called MAD (mean absolute
deviation)
2. Compute the average of the square of each of the
error terms this is called MSE (mean squared
error). This is appropriate when an absolute error of
2 is four times as significant as an error of 1
3. Compute the average of the absolute values of each
relative error this is called MAPE (mean absolute
percentage error). This indicates what percent the
forecast tends on average to be off from the actual
Computation of Error Summary Measures
Month Jan Feb Mar April May June July Aug Sept
Actual Car Sales 21 23 21 20 21 19 28 32 26
All-period Average 21.0 22.0 21.7 21.3 21.2 20.8 21.9 23.1
Absolute Error 2.0 -1.0 -1.7 -0.3 -2.2 7.2 10.1 2.9
Absolute Value of
Absolute Errors
2.0 1.0 1.7 0.3 2.2 7.2 10.1 2.9
Square Error 4.0 1.0 2.8 0.1 4.8 51.4 102.9 8.3
Relative Error 9% -5% -8% -1% -12% 26% 32% 11%
Absolute Value of
Relative Error
9% 5% 8% 1% 12% 26% 32% 11%
Comparison of Forecasting Techniques
Statistic All Per
Avg
Prior
Per
3-m
MA
ES
=0.3
ES
=0.2
AE
2.13 0.63 2.22 1.93 2.17
MAD
3.41 3.38 3.56 2.93 3.12
MSE
21.9 18.38 26.15 19.42 20.73
MPE
6.40% 1.25% 6.29% 5.82% 6.72%
MAPE
12.86% 13.28% 12.95% 10.94% 11.60%
Oct.
Forecast
23.4 26 28.7 25.6 24.5
Which one do we choose?
At first glance we might be tempted to use
the Prior Period as it has the lowest AE and
MPE
More important is the precision. The PP has
the worst MAPE
ES with =0.3 has the best absolute and
relative precision
There is no one best forecasting technique
for all types of data
There are some characteristics of the data
that will influence our forecasting techniques

Multi period patterns
Seasonality
Trend and Cycle
Some standard models
Considering Seasonality
We expect that the more information we have about
past sales, the more accurately we can predict the
future
In our example it would be more difficult to come up
with an accurate forecast two months out
(November), or more
However, if we have more info about the past we
may improve our potential for forecasting further
By accounting for regular patterns in the data we
may improve our forecast one-period ahead as well
Seasonality
Digging into the archives we found sales
figures for the two years prior to the data
used above
It is reasonable to assume that the
automotive sales are seasonal with
increases and decreases occurring at about
the same time each year
We observe the seasonality by looking at the
graphs for monthly observations during the 3
years of data
Sales
0
5
10
15
20
25
30
35
j
a
n
M
a
r
M
a
y
J
u
l
y
S
e
p
t
N
o
v
j
a
n
M
a
r
M
a
y
J
u
l
y
S
e
p
t
N
o
v
j
a
n
M
a
r
M
a
y
J
u
l
y
S
e
p
t
N
o
v
j
a
n
Slight sales decline from January to May, sharp increase about
October and a decline until the end of the year.
For any future year, it is reasonable to presume that August sales will
be higher than April sales the actual amount will be quantified by
calculating seasonality factors
Seasonality Factor
These factors will tell us what percentage any
given month of sales will be relative to an average
of all months
For ex.:
to calculate a March seasonality factor, we observe that
for 1992, March sales were 20.9
We will compare that value to the average sales for a year
centered on March
Suppose we take the year from Oct. 1991 and running to
Sept 1992 (12 months with an average of 23.3)

896 . 0
3 . 23
9 . 20
_ _ _ _ _
92 _
_
92
= =
=
March g surroundin year for sales Avg
Sales March Known
Factor y Seasonalit
March
Seasonality Factor
We interpret the 0.896 to mean that sales for
March of 1992 were 89.6% of the average
of the surrounding year
March is not really the centered month in that
year, it is the 6
th
month out of 12
It would be more accurate to average the
year beginning with Sept 91 with the year
beginning with Oct 91 to get a true centered-
moving-average
Computing Seasonality Factors
We will compute seasonality factors for each
observation, excepting the first 5 months and
the last 6 months, for which we do not have a
complete year surrounding the observation
We can average all seasonality factors
available for a particular month to obtain a
seasonality index for that month
For ex.: we average the seasonality factor of
March 91 with that of March 92 to come up
with a factor for March that can be used in
forecasting
Deseasonalize
When we have a seasonal index for each month we
can deseasonalize the prior sales figures by
dividing the actual sales figures by the
corresponding seasonal index
This will remove the variation which we attribute to
seasonality
We can then generate forecasts using the
deseasonalized data which we would expect not to
be mis-forecast due to seasonality
These forecasts can then be re-seasonalized by
multiplying them by the corresponding seasonality
factors which puts back in the effects of
seasonality
Considering Trend and Cycle
It was necessary to use the same deseasonalized
forecast for October 92 and for forecasts of 2
months ahead, 3 months ahead, and so on
The ES approach gives only one forecast and if we
need to forecast more than one period ahead a flat
extrapolation into the future is made
We may have reasons to believe that over time the
de-seasonalized figures are generally increasing or
decreasing
We will explore the idea of a trend a consistent,
long term movement
We will assume a trend that is a straight-line pattern
up or down, but exponential or other curved trends
are also possible
Trend and Cycle
The de-seasonalized data consists of the
combination of trend and cycle
Cycle is an up and down movement
associated with general business conditions
with an irregular period that is very hard to
predict
In many industries the business cycle ranges
from 2 to 10 years
We will focus on separating the trend from
the cycle
Trend
Regressing the de-seasonalized sales on the time
index (1,2,3,33 months) we obtain the results:
Coefficients
Standard
Error
t Stat P-value
Lower
95%
Upper
95%
Intercept 27.24782705 0.797328257 34.17391 3.58E-26 25.62167 28.87399
X Variable 1 -0.161728621 0.040920028 -3.95231 0.000417 -0.24519 -0.07827
The -0.16 indicates that the de-seasonalized sales
figures tend to decrease by about -0.16 cars per
month
Trend and Cycle
X Variable 1 Line Fit Plot
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
0 10 20 30 40
X Variable 1
Y
The wandering of the data around the trend line reflects the cycle
that is part of the data and some leftover unpredictable occurencies
For ex.: Observation 30 (June 92) falls below the trend line
Trend
We might improve the ES of using the trend
estimate by assuming that the October forecast is
0.16 lower than the 23.75 forecast, and that
November forecast is 2 times 0.16 lower than the
23.75 forecast and so on
The trend can enhance the forecast for the following
months
One weakness of the linear regression approach is
that it assumes that the trend is constant over time
If we have reasons to believe that the trend is
changing, then we would prefer an approach that
will continuously change the trend estimate over
time
Holts Model
Regular ES attempts to estimate the LEVEL of sales
in the future
Holts Model improves this by computing both a
LEVEL and a TREND
The TREND forecast reflects the current expected
period-to-period change in sales level for the future
Given the LEVEL and TREND estimates made in a
particular month, we can produce a k month-ahead
forecast that would be the current estimate of
LEVEL plus k times the TREND
Holts Model finding the new LEVEL
All we need is to be clear about how the
estimates of LEVEL and TREND are updated
each period
Holts idea was to smooth each estimate with
ES ex.: the Sept. actual sales can be used to
update the LEVEL forecast according to the
following equation

LEVEL
Sept
= SALES
Sept
+ (1- )(LEVEL
Aug
+ TREND
Aug
)

where is a smoothing constant in ES
Holts Model finding the new TREND
Now that we have an updated LEVEL forecast, we
can use it to update the TREND forecast
We do not have a TREND actual value, so we use
the difference between the level of Sept and the
level of Aug
TREND will be:

TREND
Sept
=(LEVEL
Sept
LEVEL
Aug
) +(1-)TREND
Aug

where is another constant for ES which determines
the sensitivity of the TREND estimate to changes in
TREND. Generally is small (ex.: 0.2) because
trend is a long-term effect it should not change
rapidly
Holts Model forecasts
We can use the September LEVEL and
TREND estimates to forecast future sales as
follows:
FORECAST
Oct
=LEVEL
Sept
+ 1 * TREND
Sept
FORECAST
Nov
=LEVEL
Sept
+ 2 * TREND
Sept
FORECAST
Dec
=LEVEL
Sept
+ 3 * TREND
Sept
Holts Model properties
This model does not consider seasonality
It can be applied to de-seasonalized sales
and the resulting data to be re-seasonalized
It would be easier to have a model that
consider both trend and seasonality
Winters Model
It is similar to Holts Model it contains
updating estimates of both LEVEL and
TREND
In addition, this model updates seasonality
factors each period, again using ES
As with Holt, assume that we have estimates
of LEVEL and TREND from Aug
Assume further that we have a seasonality
factor which was updated from September of
the prior year
Winters Model include de-seasonalized
data
SEASON = seasonality factor
Observe that the de-seasonalized Sept sales
equals Sales
Sept
/SEASON
priorSept
The Sept sales cab be used to update the
LEVEL forecast with the equation:

LEVEL
Sept
=*(SALES
Sept
/SEASON
priorSept
)+(1-)*
*(LEVEL
Aug
+TREND
Aug
)
Winters Model updating TREND and
SEASON
The TREND estimate can be updated as in Holt:
TREND
Sept
=(LEVEL
Sept
LEVEL
Aug
) +(1-)TREND
Aug
Note that LEVEL
Sept
represents an updated de-
seasonalized forecast for Sept sales, so
SALES
Sept
/LEVEL
Sept
represents the implied
seasonality factor
Therefore we can update the seasonality factor:
SEASON
Sept
= (SALES
Sept
/LEVEL
Sept
)+(1- ) SEASON
priorSept

where is a smoothing constant between 0 and 1. It is
usually higher than and since each seasonality
estimate is only updated once per year
Winters Model forecasts

FORECAST
Oct
=(LEVEL
Sept
+1*TREND
Sept
)*SEASON
priorOct


FORECAST
Nov
=(LEVEL
Sept
+2*TREND
Sept
)*SEASON
priorNov


FORECAST
Dec
=(LEVEL
Sept
+3*TREND
Sept
)*SEASON
priorDec


Other Advanced Techniques
Winter is a general method that will work quite well
in a variety of settings. Still other variations exist
Trend can be exponential, quadratic etc.
We assumed that trend is additive (TREND is added
to LEVEL); other models assume multiplicative
trends
We may have a dramatic jump (a competitor
suddenly goes out of business) there are models
with adjusting alpha, beta, gamma
The methods we saw assumed a given period (12
months) for the seasonality. Other methods impose
combinations of seasonal patterns with different
length of seasonality (3, 5 months) and separate out
the estimates of their effects
Implementation considerations
Choosing a technique
We should consider the reasonableness of the
assumptions (if seasonality does not exist we use
Holt)
Deciding on aggregation
A car company may sell cars, trucks, vans We
may decide on forecasting the selling of the blue
XYZ sports-car with options which is a detailed
level this may have a lot of error
We could decide on forecasting the sales of cars
in general (aggregated level) then we have the
problem of disaggregating

Implementation considerations
Determining initial model parameters
The models presented have low sensitivity to
starting parameters after a few periods the
parameters tend to stabilize
We can use any reasonable values to start
Using forecasts in decision making
Error exists
The probability distribution of the errors can be
used to calibrate the uncertainty of forecasts
Implementation considerations
Monitoring forecast accuracy
Even if history is relevant, future conditions may
change and the forecast error rises dramatically
A way to watch this is the use of a tracking
signal a chart could be made to track the actual
forecast error after each period
The forecaster may choose a threshold value of
error which may be considered unacceptable
If the threshold is exceeded then we re-evaluate
the forecasting technique

Standard Econometric Time
series analysis using E-views
Stationarity
Autocorrelation functions
AR, MA, ARMA processes
Where we are and what we need
We have seen how to analyze a time series
of monthly sales in an intuitive manner
We will try to extend the previous techniques
to an advanced approach to time series
forecasting
These techniques consider the time series as
a stochastic process and can provide a more
efficient approach on how to use the history
to come up with a forecast
Stochastic processes
A random stochastic process is a collection of
random variables ordered in time
We will use GDP as an example:
Y
t
denotes a random variable (GDP at moment t)
Y
1
Y
2
Y
3
Y
87
Y
88
are random variables from
moment 1, 2, 3, 87, 88 (GDPs at all these
moments)
Keep in mind that each of these Ys is a
random variable

Stochastic variables and
Stochastic processes
Day 1
Day 2
Day 3
Day T
Random variables, we
only see their realizations
Stochastic process
GDP is a stochastic process
US GDP was $2872.8 billion in the first quarter of
1970
In theory, the GDP for that quarter could have been
any number, depending on the economic and
political climate prevailing
The figure $2872.8 is a particular realization of all
such possibilities
Therefore we will say that GDP is a stochastic
process and the actual values we observed from
1970-I until 1991-IV are a particular realization of
that process (i.e. sample)
Stationarity
What would be the time series that I would like
best in order to do very good forecasts?
We need to assume some stability of the data
before making forecasting analysis
In econometric words, stability means
independence of time (a series without
seasonality) or stationarity (a series that has
statistical properties which are constant,
stationary, from one observation to the other)
Stationarity - definition
Broadly speaking, a stochastic process is
said to be stationary if
its mean and variance are constant over time and
the value of the covariance between the two time
periods depends only on the distance or gap or
lag between two time periods and not the actual
time at which the covariance is computed (no
seasonality, no dependence on time).
Stationarity
Realizations of k
GDPs
( )
( )
k k t t
t
t
Y Y Cov
Y Var
Y E

=
=
=

,
) (
2
If the process is stationary,
all these indicators are
constant i.e. time
invariant
A stationary time series will tend to return to its mean (called mean
reversion) and fluctuations around this mean (measured by its variance) will
have a constant amplitude on average.
Stationarity
If a series is not stationary in the sense just defined,
it is called nonstationary time series (for ex. a
series with time-varying mean or a time-varying
variance or both)
Why is stationarity so important?
If a time series is not stationary, we can only study
its behavior for the time period under consideration
each set of time series data will therefore be for a
particular episode
As a consequence, it is not possible to easily
generalize to other periods history does not look
like the future, so careful with forecasting!
Stationarity (cont.)
If we look at some data, weak stationarity
means that the values fluctuate with some
constant variation around a constant level
0 100 200 300 400 500 600 700 800 900 1000
-4
-3
-2
-1
0
1
2
3
4
0 100 200 300 400 500 600 700 800 900 1000
-30
-20
-10
0
10
20
30
40
50
60
Stationary Not Stationary
Stationarity (cont.)
( )
k t t l
Y Y Cov

= ,
( )
t
Y Var =
0

k k
=

( ) ( ) ( )
t k t t k t k t t
Y Y Cov Y Y Cov Y Y Cov , , ,
) ( ) ( +
= =
Covariance between
return at moment t
and return at t-k
Same covariance
for variables
realized at equal
distance in time
Realiza
tion Y
t

Realization
Y
t-k

This means
that
Variance is
the same
Covariance with k lags before
t is the same as covariance
with k lags after t
Quick test for stationarity
It is common to
assume that a series
of asset returns is
weakly stationary
We can check for this
if we have a
sufficiently large
number of
observations
We can divide the
data into subsamples
and check for the
consistency of the
results!
If stationarity holds then the mean and
variance of the first subsample should
equal (statistically) the mean and variance
of the second subsample
A special stationary process white noise
A special stochastic process is the purely
random or white noise process
This process has zero mean and a constant
variance
2
and is serially uncorrelated
So, a stationary process (constant mean,
constant variance and constant covariance
zero no matter what lag we use)
This is the series of residuals that we use in
the regression analysis (the differences of the
realized values of the dependent variable and
the fitted values of the dependent)
Some classical nonstationary processes
To better understand the properties of the
stationary time series we will look at some
important nonstationary time series
The classic example is the random walk
model (exchange rates, asset prices seem to
follow a random walk they are
nonstationary)
We will see two types of random walks:
Random walks without drift
Random walks with drift
Random Walk without Drift
Suppose u
t
is a white noise with mean 0 and
variance
2

The series Y
t
is said to be a random walk if
Y
t
= Y
t-1
+ u
t

The value of Y at time t is equal to its value at time
(t-1) plus a random shock
We can think of this equation as a regression of Y at
time t on its value lagged one period
The beta coefficient is 1!
It is usually said that stock prices are essentially
random and we can not (on average) make
profitable speculations (if one could predict
tomorrows price on the basis of todays price, we
would all be millionaires)
Random Walk without Drift
The best forecast for
tomorrow is just the
price of today
Time varying variance - nonstationarity
Variance increases
linearly so we lose
in precision for each
forecast
Can be shown that the variance increases with
time the forecast is more and more foggy the
more we go into the future
This evolution of the variance also means that
variance of Y
t+k
does not have the same variance
as Y
t
, which means that the random walk without a
drift is not stationary
RW persistence of shocks
Y
t
is the sum of initial Y
0
plus the sum of
random shocks
Thus, the impact of a particular shock does
not die away
If u
12
= 2 rather than 0 (its average) then all
the series following the 12
th
realization will be
higher with 2 units the effect of this shock
never dies out!
Econometricians say RW has infinite memory
RW write the difference
Interestingly, if we write the RW process as


where delta () is the first difference operator,
then the obtained process (the first difference
process) is stationary (it is the white noise)
Random Walk with Drift (RWD)
The random walk with drift has another
constant parameter

where delta () is the drift parameter
The name drift comes from the fact that if we
write the preceding equation as

it shows that Y
t
drifts upward or downward,
depending on being positive or negative

RWD
It can be shown that



which means that for the RWD both the mean
and the variance are time variant (change in
time)
Therefore, RWD is nonstationary too!
Stochastic and Deterministic Trends
The distinction between stationary and nonstationary
stochastic processes has a crucial bearing on
whether the trend (the slow long run evolution of the
series) is deterministic or stochastic
If a trend is completely predictable it is a
deterministic trend
If a trend is not predictable it is a stochastic trend
We can have 4 situations:
1. Pure Random Walk
2. Random Walk with Drift
3. Random Walk with Deterministic Trend
4. Random Walk with Drift and Deterministic Trend
Pure Random Walk
Start with

If
We say that the series Y
t
is a
difference stationary process
IS
STATIONARY
Random Walk with Drift
Start with

If
The trend is called a
stochastic trend
Nonstationarity can be eliminated
by taking first differences of the
time series
Deterministic Trend
Start with

If
This is a trend stationary process. Even if the
mean of Y
t
depends on t, so it is changing, its
variance is constant.
This procedure of removing the trend is
called detrending
RW with Drift and Deterministic Trend
Start with

If
This is a RW with drift
and deterministic trend.
Deterministic vs. Stochastic Trend
The phenomenon of spurious regression
To see why stationary time series are so important
we look at two random walk models with no relation
between them whatsoever


Regression results provide significant coefficient
this is the spurious regression
If we regress differences in Y to differences in X
then we get non-significant coefficient
This is due to stochastic trends
Before running a regression we should look at the
stationarity of the variables used in the regression
Tests of stationarity
In practice two questions:
How do we find out if a given time series is
stationary?
If we find that a given time series is not stationary, is
there a way that it can be made stationary?
Next points:
Graphical analysis
The correlogram test
Unit root test of stationarity
Transforming non-stationary time series - making
them stationary
Finding stationarity
Graphical analysis
Start with the time series graph
The GDP has been increasing, showing an upward
trend, suggesting that the mean has been
changing perhaps the GDP is not stationary!
Autocorrelation function (ACF) and Correlogram
We could analyze the correlations (statistical
relation) of Y
t
with Y
t-k
for a lot of ks
This will tell us how much the past is
incorporated into the future (how much of the
future can be explained by the past)
We analyze the white noise, which is stationary
We also look at the RW, which is nonstationary
We try to figure out if there is a pattern in the
autocorrelations that could give us a criteria for
finding the stationary process
Correlation and Autocorrelation Function
The correlation coefficient between two
random variables X and Y is:



Real Mean
of X
Real Mean
of Y
The sample estimator of the correlation
coefficient
Sample Mean
of X
Sample Mean
of Y
Autocorrelation Function
Assume a weakly stationary time series Y
t

The relation between Y
t
and its k lags Y
t-l
is of
interest, we are dealing with autocorrelation (called
the lag-k autocorrelation of Y
t
).
is a function of l
equality
( )
( ) ( )
( )
( )
0
, ,

k
t
k t t
k t t
k t t
k
Y Var
Y Y Cov
Y Var Y Var
Y Y Cov
= = =

Sample estimator of the autocorrelation


The Sample Autocorrelation Function is the function

A linear time series model can be characterized by its
ACF. Linear time series modeling makes use of the
sample ACF to capture the dynamics of the data.
Testing for zero autocorrelations has been used as a
tool to check the efficient market assumption

Lag-k sample
autocorrelation
( )( )
( )
1 0 ,

1
2
1
< s

=
+ =
+
T k
Y Y
Y Y Y Y
T
t
t
T
k t
k t t
k

Correlogram
Check the behavior of autocorrelations on a
graph
Look at their behavior for the white noise and
for the RW
What are the differences?
For the white noise the autocorrelations are
around 0
For the RW the autocorrelation coefficients at
various lags are very high for all the lags. The
autocorrelation coefficient starts at a very high
value and declines very slowly toward zero as the
lag lengthens
GDP correlogram
The correlogram of the GDP time series
looks very much like the correlogram of the
RW
The correlation coefficient starts at a very
high value at lag 1 (0.969) and declines
slowly

Choice of Lag Length
This is an empirical matter
A rule of thumb is to compute ACF up to one-
third to one-quarter the length of the time
seris
For our data we have 88 observations so
we look at 22-29 lags
There are also some statistical criterion (not
our purpose here)
How do we test for lag significance?
Statistical result:
If Y
t
is an iid sequence satisfying
E(Y
t
2
) is finite
then the sample autocorrelation estimator is
asymptotically normal with mean zero and
variance 1/T for any fixed integer k.
This result can be used for testing:
H
0
:
H
1
:
Eviews gives p-values! If p-value is lower than
0.05, then the lag is significantly different from 0
with 95% confidence
This info provides a
test for the lags!
What we did until now
We used Eviews to build the correlogram:
Took the GDP data from Excel and pasted into
the GDP series created in Eviews
Select the series gdp, click on Views, then
Correlogram
Said that the correlogram for the GDP series
looks very much like the correlogram of the RW
Unit Root Test
The starting point is the process

We know that if =1, then the process is a RW
without drift, which is stationary
What if we regress Y
t
on its one period lagged value
Y
t-1
and find out if the estimated is statistically
equal to 1 (i.e. a unit root)
We can do

and see if = 1 is statistically 0
If = 0, then the process is nonstationary but the
first differences are stationary

Augmented Dickey Fuller test
In Eviews we can test the following:

Y
t
Y
t-1
= Y
t-1
+ (Y
t-1
Y
t-2
)
If is less than 0, then the series is stationary with 0 mean
Y
t
Y
t-1
= + Y
t-1
+ (Y
t-1
Y
t-2
)
If is less than 0, then the series is stationary with nonzero
mean
Y
t
Y
t-1
= + t + Y
t-1
+ (Y
t-1
Y
t-2
)
If is less than 0, then the series is stationary with nonzero
mean around a deterministic trend

Eviews provides the values of the coefficients in a
DF equation and the p-values for the coefficient
which tell us about the stationarity of the time series
Presence of this factor simply
insures that the regression
provides good results
Transforming nonstationary time series
In order to avoid spurious regressions we need to
make sure that the variables used in the regression
are stationary
We use the ADF and check the unit root
If the series have a unit root then we take the first
difference (ex. GDP
t
GDP
t-1
) and use this series in
the regression
If the first difference series has a unit root too, then
we can use the second difference (ex. [GDP
t

GDP
t-1
] [GDP
t-1
GDP
t-2
]) and so on
Eviews allows us to test up to the second difference
Trend-stationary process
This is a process that is stationary around the
trend line but nonstationary altogether
Hence, the simplest way to make such a
series all stationary is to regress the series
on time and the residuals will be stationary

Run this regression,
where t is simply the
series 1, 2, 3, T
This series will be stationary.
It is called the detrended time
series. For forecasting
purposes we only put the trend
back to our forecast
Taking seasonality out
There may be time series that show
seasonality and because of that they are
nonstationary
Looking at the Earnings per share reported
every quarter by Johnson and Johnson
The ACF shows strong serial correlations
Taking seasonality out
After taking the first difference we have a time series with the
following ACF:






We observe that the ACF is strong when the lag is a multiple
of periodicity 4
We could take the seasonality out by simply taking another
difference of the data: value at t minus value at t-4
Taking seasonality out
In our case we observe the seasonality for the first
difference time series D(Y
t
) = Y
t
Y
t-1
To take seasonality out we take the 4
th
differences
of the series D(Y
t
):
D(Y
t
) D(Y
t-4
) = Y
t
Y
t-1
Y
t-4
+ Y
t-5

This is called seasonal differencing
The seasonality may be the cause of the non-
stationarity so, after taking seasonality out, the
series may become stationary

Approaches to economic forecasting
1. Exponential smoothing methods
2. Single-equation regression models
3. Simultaneous-equation regression models
4. Autoregressive integrated moving average
models (ARIMA)
5. Vector autoregression
Stationary time series
If a time series is stationary, then we can
model it in many ways
We will use ARIMA or Box-Jenkins
methodology to analyze a stationary time
series
Autoregressive processes (AR)
Moving Average processes (MA)
Autoregressive and Moving Average processes
(ARMA)
Integrated Autoregressive and Moving Average
processes (ARIMA)
AR
If Y
t
is a stationary time series (already tested with
the ADF test) then it can look like:
Y
t
=
1
(Y
t-1
) + u
t

where is the mean of Y and u
t
is a white noise
This is a first-order autoregressive process or
AR(1)
This model says that the forecast value of Y at time t
is simply some proportion (=
1
) of its value at time t-
1 plus a random shock or disturbance at time t
The Y values are expressed around their mean
values
AR (p)
We can also have this stationary process
Y
t
=
1
(Y
t-1
) +
1
(Y
t-2
) + u
t

which is a second-order autoregressive process,
or AR(2)
In general we can have AR(p) processes
Y
t
=
1
(Y
t-1
) +
1
(Y
t-2
) + +
1
(Y
t-p
) + u
t

p is higher when more lagged values have
something to say about the future values
more values from the past have influence on
the future values
Forecasting with AR
The one step ahead forecast for an AR(1):
Y
forcast1
=
1
(Y
present
) + u
forecast1
Hence, the forecast error comes from the u
t

We saw that u
t
is a white noise, so the error is associated
with its variance
2
(how much it moves around the mean)
The 2-step ahead forecast
Y
forcast2
=
1
(Y
forecast1
) + u
forecast2
Y
forcast2
=
1
(
1
(Y
present
) + u
forecast1
) + u
forecast2
For the 2-step ahead, the error comes from
1
u
1
+u
2
. If we
compute the variance of this we will see that it is
(1+
1
)
2
which is slightly higher than the variance of the one step
ahead variance but it does not increase linearly
MA
Another mechanism that may generate values for Y
could be
Y
t
= +
1
u
t-1
+ u
t

where is a constant and u is simply a white noise
Here Y at time t is equal to a constant plus a moving
average of the current and past error terms
Thus Y follows a first-order moving average, or an
MA(1)
Under the same logic as for AR(p) we may have a q-
order moving average MA(q) as:
Y
t
= +
1
u
t-1
+
2
u
t-2
+
3
u
t-3
+ +
q
u
t-q
+ u
t

ARMA
An Autoregressive and Moving Average
process ARMA(1,1) looks like this:

Y
t
= +
1
Y
t-1
+
1
u
t-1
+ u
t


We can also have an ARMA(p,q) process,
where p is the number of autoregressive
terms and q is the number of moving average
terms
ARIMA
I comes from integrated a series is said to be
integrated of order 1 (2,d) if the series is
nonstationary but its first (second, d
th
)-difference
is stationary
So, we use the ADF to see if the series is stationary
and if not, then we do the first difference
If the first difference is stationary then we apply the
ARMA model to model the series
ARIMA(p,d,q) means a series that is stationary at its
d
th
difference and the stationarity is modeled by
using p terms of AR and q terms of MA
Box-Jenkins methodology
Step 1 Identification
Step 2 Estimation
Step 3 Diagnostic
Checking
Step 4 Forecasting

Step 1 - Identification
We use the autocorrelation function (ACF) and
the partial autocorrelation function (PACF)
PACF shows the significance of the following
terms:
Y
t
=
1
(Y
t-1
) + u
t
Y
t
=
1
(Y
t-1
) +
2
(Y
t-2
) + u
t

Y
t
=
1
(Y
t-1
) +
2
(Y
t-2
) ++
p
(Y
t-p
)
+ u
t
General guidelines for patterns of ARMA
processes
ACF and PACF for GDP series
ACF declines very slowly ACF up to 23 lags
are statistically significantly different from 0
After the first lag the PACF drops dramatically
and all PACF lags are statistically insignificant
Much different is the correlogram for the first
difference of the GDP:
All ACFs at lag 1, 8 and 12 seem statistically
significant
Same for PACF
How do we choose the correct ARMA pattern
for the GDP time series?
How do correlograms look for the AR,
MA, ARMA?
We will look at AR(1), AR(2), MA(1), MA(2),
ARMA(1,1), ARMA(2,2) and so on
Each of these stochastic processes exhibits
typical patterns of ACF and PACF
If the time series of the first difference of the
GDP fits one of these patterns we can
identify the time series with that process
Of course we will have to apply diagnostic
tests to see if the ARMA model is reasonably
accurate
Identifying the GDP lag order
Autocorrelations decline up to lag 4
Except for lags 8 and 12, the rest of them is
not statistically significantly different from 0
PACF are also significant at the same lags
We will choose an AR(12) for the first
difference of GDP but we do not need to put
all the terms up to lag 12, we only use the
terms 1, 8 and 12:
Y
t
= +
1
Y
t-1
+
1
Y
t-8
+
1
Y
t-12

Step 2: Estimation
Run the ARMA we identified - Results from Eviews
Variable Coefficient Std. Error t-Statistic Prob.
C 23.08936 2.980356 7.747181 0.0000
AR(1) 0.342768 0.098794 3.469531 0.0009
AR(8) -0.299466 0.101599 -2.947523 0.0043
AR(12) -0.264371 0.098582 -2.681742 0.0091
R-squared 0.293124 Mean dependent var 21.52933
Adjusted R-squared 0.263256 S.D. dependent var 36.55936
S.E. of regression 31.38030 Akaike info criterion 9.782096
Sum squared resid 69915.33 Schwarz criterion 9.905695
Log likelihood -362.8286 F-statistic 9.813965
Durbin-Watson stat 1.766317 Prob(F-statistic) 0.000017
Step 3: Diagnostic Checking
One simple diagnostic is to obtain the
residuals from the estimation
Compute the ACF and PACF of the residuals
See if the residuals have any significant
autocorrelation
If not, it means that the ARMA process that
we used took all the autocorrelations out, so
we are ready for forecasts
Step 4: Forecasting
We used data from 1970-I to 1991-IV
We want to forecast the GDP for the first four
quarters of 1992
We analyzed the differences in the GDP
levels, so, in order to find the levels of GDP in
the 4 coming quarters we will undo the first
difference. The first forecast will be
Y
1992-I
-Y
1991-IV
= +
1
(Y
1991-IV
-Y
1991-III
)+
8
(Y
1989-
IV
-Y
1989-III
)+
12
(Y
1988-IV
-Y
1988-III
)+u
1992-I


Rewrite the model and fill in the numbers
Step by step forecasting:
1. Obtain the data many time series data
2. Check stationarity for all the series that we
have
3. Check stationarity of the first and second
difference in case the series of levels is not
stationary
4. Transform the data to obtain stationary time
series
5. Run regressions (if the analysis requires) using
the stationary data
Step by step forecasting
1. Do forecasts using the regressions for your
analysis
2. If we do not have data for explanatory variables or
the regression provided insignificant coefficients or
we have many time series to forecast then we
model the stationary time series obtained by
transformation:
1. Identify the ARMA process at hand
2. Estimate the ARMA process
3. Diagnostics checking
4. Forecasting

Example:
Our job is to forecast sales for the following 4
quarters
Values of sales may depend much on
the personal disposable income
the personal consumption expenditure
competitors profits
What are the forecasts?
Steps:
Check stationarity of all the series in the
analysis:
Eviews:
Type: series sales, series gdp, series income, series
comp
Open each time series and paste data from Excel
Click on View and choose Unit Root Test
Check Level (we look at the stationarity of the series itself)
and check Trend and Intercept (to test for all the possible
problems)
If Prob* is higher than 0.05, then the series in non-
stationary (this happens most of the times)
If Prob* is lower than 0.05, then the series is stationary, we
can use it as it is (for regressions or time series forecasting)
Steps:
Transform the non-stationary process(es) into
stationary time series
Check for the stationarity of the first-difference time series:
Use Unit Root Test, check 1
st
Difference
If Prob* is higher than 0.05, then we check for the 2
nd

Difference too
If Prob* is lower than 0.05, then the first difference is
stationary, so we can use it for further analysis
If 1
st
difference is not stationary, we could also look at the
correlogram of the first difference
Use View, Correlogram, 1
st
Difference and see if there are some
possible seasonality patterns in the data (have a look at the fourth
lag, as we have quarterly data)
Take the seasonality out by doing the 4
th
difference of the series
of first differences
Steps:
With the stationary data we try to find an
ARMA model to fit the data
Build the correlogram (View, Correlogram, 1
st

difference or 2
nd
difference or simply the de-
seasonalized data)
Find ACFs and PACFs that are significant
Pick the ARMA model you want
Estimate the ARMA model
Click Quick (main menu), Estimate Equation, then
write the name of the series you do the ARMA for
(gdp, income, comp or D(gdp), D(income),
D(comp) for the first difference)
Steps:
Choose other ARMA specification to see if
you can find a better adjusted R
2
Do the diagnostic checking for the ARMA you
choose
In the equation window click on View, Residual
Tests, Correlogram Q statistics
See if the correlogram has significant ACFs of
PACFs
If the residuals are significant at some lags then
we should choose another ARMA (redo the last
analysis)
Steps:
If all the series are stationary, then do the
regression
Quick, Estimate Equation, write the regression:
(ex. sales c gdp income comp
or sales=c(1) + c(2)*gdp+c(3)*income+c(3)*comp)
Check for significance of the parameters
Save the equation click Name (same window)
Do the forecasts for all the explanatory variables
using their estimated ARMA coefficients 4 quarters
into the future
Use the forecasts of the explanatory variables in the
estimated regression in order to build forecasts for
the sales in the regression
Steps:
We could also use the ARMA procedure for
the sales itself, and come up with time series
forecasts
Compute the averages of the forecasts from
the regression and the forecasts from time
series to come up with a more calibrated
forecast for the sales
Our example:
Sales does not have a unit root, hence Sales is
stationary
Still, Sales seems to have seasonality (4
th
level)
After de-seasonalizing the data, we found AR with
lags 3 and 5 to be significant
The 1
st
difference of PDI is stationary, no significant
autocorrelation it looks like a white noise
The 1
st
difference of the PCE is stationary, we tried
to fit an AR with lags 2 and 3
The 1
st
difference of the Profits is stationary, we
found significance at lags 1 and 5
Regression and forecasts
We run the regression of the deseasonlized
Sales (being stationary) on first differences of
PCE, PDI and Profits
We found no significance! The regression will
not be able to help us in the forecast
Thus, in order to do the forecasts we only use
the properties of the time series of sales as it is
One alternative would be to use Holts model or
Winters model
The better alternative is to use the estimated
ARMA(3,5;0) to do the forecast
Your forecast for sales
The forecast for the first quarter of 1992 will be
computed from:

Sales
1992-I
Sales
1991-I
= -28530.49 -0.444 (Sales
1991-II

Sales
1990-II
) 0.286 (Sales
1990-IV
Sales
1989-IV
)

The following forecasts will be computed in the same
manner
If we want a forecast that includes many models, then we
could simply do the average of the forecasts computed
using other models (Holts of Winters) and the forecast
coming from this ARMA estimation model
Econometrics and Risk
Management
Value-at-Risk
Monte Carlo simulations
Historical Simulations
Using time series analysis in risk management
Risk management huge industry
Need a way to measure the risk how possible is for a
company to lose money due to its exposure to some
economic variable
Ex.: exposure to currency rate changing
In order to measure the risk we need to put
probabilities to all the possible events (what could
possibly happen) this means to know the distribution
of the economic variable
To do this we need to look into the past, find a history
of the random variable and extract the distribution
Then see how much we could loose in, say, 5% of the
times
Understanding VaR
There is a X% probability to suffer a $Y loss
or more over the next Z hours/days.
Standard Normal Distribution

R
z

=
Using Standard
Normal Tables
o z R + =
Past evolution
Future evolution
Normal
random
variable
STANDARD
Normal
Random
Variable
Probability
Density
Lower tail probabilities
Confidence Level and Confidence Factor
R
R
R
% 76 . 1
% 11 . 0
=
=
o

Example
Value-at-Risk (VaR)
There is a 1% probability to suffer a
$100,000 loss or more over the next 24
hours.
Bad
returns are
at the
1% left tail
probability
Assumption
Risk management
For risk management purposes the
distribution of the future value of the
economic variable to which we are exposed
is very important
Knowledge of this distribution provides us
with a measurement of the risk we are
running
The indicator that shows us the value of the
risk is called VaR value at risk
VaR is the value that we could lose if the 5%
worst events happen
VaR computation
The computation of VaR sometimes needs a more
complex technique the simulation technique
If the distribution of the random variable is not
normal, then we may not have a formula for VaR,
but we may approximate the changes of the
economic variable with a process
To compute the VaR we could simply build
simulations of the process, i.e. a lot of possible
tracks of the values of the variable
Each track will be assumed to have the same
probability to show up (to happen) as any other
track
We will look at the values produced by the
simulation and compute the VaR

You might also like