You are on page 1of 107

Introduction to Financial Econometrics

Sebastiano Manzan

Introduction to Financial Econometrics


Sebastiano Manzan
2015 Sebastiano Manzan

Contents
1. Getting Started with R . . .
1.1 Working with data in R
1.2 Plotting the data . . . .
1.3 From prices to returns .
1.4 Distribution of the data
1.5 Creating functions in R .
1.6 Loops in R . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

1
1
3
5
8
10
10

2. Linear Regression Model . . . . . . . . . . . .


2.1 LRM with one independent variable . . .
2.2 Functional forms . . . . . . . . . . . . . .
2.3 The role of outliers . . . . . . . . . . . . .
2.4 LRM with multiple independent variables
2.5 Omitted variable bias . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

14
14
17
21
23
27

3. Time Series Models . . . . . . . . . . . . . . . . . .


3.1 The Auto-Correlation Function (ACF) . . . . . .
3.2 The Auto-Regressive (AR) Model . . . . . . . .
3.3 Model selection for AR models . . . . . . . . .
3.4 Forecasting with AR models . . . . . . . . . . .
3.5 Seasonality . . . . . . . . . . . . . . . . . . . .
3.6 Trends in time series . . . . . . . . . . . . . . .
3.7 Deterministic and Stochastic Trend . . . . . . .
3.8 Why non-stationarity is a problem for OLS? . .
3.9 Testing for non-stationarity . . . . . . . . . . .
3.10 What to do when a time series is non-stationary

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

31
31
34
38
38
40
45
47
53
56
61

4. Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Moving Average (MA) and Exponential Moving Average (EMA) . . . . . . . . . . . . . . . .
4.2 Auto-Regressive Conditional Heteroskedasticity (ARCH) models . . . . . . . . . . . . . . . .

62
63
67

5. Measuring Financial Risk . . .


5.1 Value-at-Risk (VaR) . . .
5.2 Historical Simulation (HS)
5.3 Simulation Methods . . .
5.4 VaR for portfolios . . . .
5.5 Backtesting VaR . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

. 81
. 81
. 89
. 92
. 98
. 102

1. Getting Started with R


The aim of the first chapter is to get you started with the basic steps of using R to analyze data, such as obtaining
economic or financial data, uploading the data in R, and conducting a preliminary analysis by estimating the
mean and the standard deviation and plotting a histogram. We consider several ways of loading data in R, both
from a text file or downloading the data directly from the internet. R provides an extensive set of functions
to perform basic and also sophisticated analysis. One of the most important functions that you will use in R
is help(): by typing the name of the function within the brackets you obtain detailed information about the
functions arguments and its expected outcome. With hundred of functions used in a typical R session and
each having several arguments, even the most experienced R programmers ask for help().

1.1 Working with data in R


The first task is to load some data to analyze in R and there are two ways to do this. One is to upload a file that
is stored in your computer as a text file (e.g., comma separated values or csv) using the read.csv() command.
The file sp500_1970.csv is the monthly time series of the S&P 500 Index from January 1970 until December
2013 and can be uploaded in R with the command sp500 = read.csv('sp500_1970.csv'). The sp500 object
represents a R data frame which does not have any time series connotation. However, since the data file
represents a time series it is convenient to define sp500 as a time series object which makes some tasks easier
(e.g., plotting). I will use the zoo package to define sp500 as a time series and we need to provide the function
with the starting and/or ending date in addition to the frequency of the data. However, we first need to load
the package in the R environment that can be done with the command require(zoo). We can now redefine
the object from data frame to zoo as follows: sp500 = zooreg(sp500, start=c(1970,1), frequency=12),
and sp500 is now a zoo object that starts in January 1970 with frequency 12 since the data are 12 monthly
observations in a year.
If we do not have the data stored in our computer, we can then simply download the data from some data
providers, such as Yahoo Finance for financial data and from the Federal Reserve Economic Database FRED
for US and international macroeconomic data. There are several packages that provide functions to do this:
package tseries has function get.hist.quote() to download data from Yahoo
package quantmod has function getSymbols() to download data from Yahoo and some other providers
package fImport has functions yahooSeries() and fredSeries() to download from Yahoo and FRED
First, you need to load the package with the require() command (in case the package is not installed in your
machine, you need to use the command install.packages()). For example, to download data for the S&P
500 Index (ticker: GSPC) at the monthly frequency using the get.hist.quote() function you would type the
following command:

Getting Started with R

require(tseries)
sp500 <- get.hist.quote("^GSPC", start="1970-01-01",end="2015-08-31", quote="AdjClose",
provider="yahoo", compression="m", quiet=TRUE)

where sp500 is defined as a zoo object and the arguments of the function are (see help(get.hist.quote) for
more details):
1.
2.
3.
4.

the ticker of the stock or index that you want to download


start and end dates in the format year-mm-dd
quote can be Open,High, Low, Close; if left unspecified the function downloads all of them
compression can be d for daily, w for weekly, or m for monthly

An alternative to using get.hist.quote() is the getSymbols() function from the quantmod package which
has the advantage to allow downloading data for several symbols simultaneously. For example, if we are
interested in obtaining data for Apple Inc. (ticker: AAPL) and the S&P 500 Index from January 1970 to August
2015 we could run the following lines of code:
require(quantmod)
getSymbols(c("AAPL","^GSPC"), src="yahoo", from='1990-01-02',to='2015-08-31')

[1] "AAPL" "GSPC"

Notice that the getSymbols() function does not require you to specify the frequency of the data, but it
uses the daily frequency by default. To subsample to lower frequencies, the package provides the functions
to.weekly() and to.monthly() that convert the series (from daily) to weekly or monthly. Notice also that
the getSymbols() function creates as many objects as tickers and each will contain the open, high, low, close,
adjusted close price and the volume which are zoo objects. The quantmod package provides functions to extract
information from these objects, such as Ad(AAPL) that returns the adjusted closing price of the AAPL object
(similarly, the functions Op(x), Hi(x), Lo(x), Cl(x), Vo(x) extract the open, high, low, close price and volume).
The package also has functions such as LoHi(x) which creates a series with the difference between the highest
and lowest intra-day price. Another useful function is ClCl(x) which calculates the daily percentage change
of the closing price in day t relative to the closing in day t 1.
The third possibility is to use the fImport package which has function yahooSeries() that can download
several tickers for a specified period of time and frequency. An example is provided below:
require(fImport)
data <- yahooSeries(c("AAPL", "^GSPC"), from="2003-01-01",
to="2015-08-31", frequency="monthly")

In addition, the package fImport also provides the function fredSeries() which allows to download data
from FRED which includes thousands of macroeconomic and financial series for the U.S. and other countries.
Similarly to the ticker of a stock, you need to know the FRED symbol for the variable that you are interested
to download by visiting the FRED webpage. For example, UNRATE is the civilian unemployment rate for
the US, CPIAUCSL is the Consumer Price Index (CPI) for all urban consumers, and GDPC1 is the real Gross
Domestic Product (GDP) in billions of chained 2009 dollars. We can download the three macroeconomic
variables together as follows:

Getting Started with R

require(fImport)
options(download.file.method="libcurl")
macrodata = fredSeries(c('UNRATE','CPIAUCSL','GDPC1'), from="1950-01-02", to="2015-08-31")

1.2 Plotting the data


Once we have loaded a dataset in R we might want to find the length of the time series, the number of
variables, or by simply looking at the data. If the object contains one time series and we want to find its
length, we use the command length(sp500) which returns 548 that is the number of monthly observations in
the object. In the case of the macrodata object created above, it has length equal to 2358 since it is composed
of a data frame with 3 columns and 786 rows. This can be easily verified by enquiring the dimension of the
object with the following command:
dim(macrodata)

[1] 786

Next, we might want to look at the data using the head() and tail(), commands which provide the first and
last 5 observations:
head(sp500)

1970-01-02
1970-02-02
1970-03-02
1970-04-01
1970-05-01
1970-06-01

AdjClose
85.0
89.5
89.6
81.5
76.6
72.7

tail(macrodata)

Getting Started with R

GMT
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01

UNRATE CPIAUCSL GDPC1


5.5
235
NA
5.5
236
NA
5.4
236 16324
5.5
237
NA
5.3
238
NA
5.3
238
NA

since these objects are defined as time series, the function also prints the date of each observation. Notice that
in the macro-dataset we have many NA that represent missing values for the GDP variable. This is due to the
fact that the unemployment rate and the CPI Index are available monthly, whilst Real GDP is only available
at the quarterly frequency. The object can be converted to the quarterly frequency using the to.quarterly()
function in the quantmod package as follows: to.quarterly(as.zooreg(macrodata),OHLC=FALSE).
Plotting the data is a useful way to learn about the behavior of a series over time and this can be done very
easily in R using the function plot(). For example, a graph of the S&P 500 Index from 1970 to 2013 can
be easily generated with the command plot(sp500) which produces the following graph. Since we defined
sp500 as a time series object (more specifically, as a zoo object), the x-axis represents time without any further
intervention by the user.

For many economic and financial variables that display exponential growth over time, it is often convenient to
plot the log of the variable rather than its level. This has the additional advantage that differences between the
values at two points in time represent an approximate percentage change of the variable in that period of time.
This can be achieved by plotting the natural logarithm of the variable with the command plot(log(sp500),
xlab="Time",ylab="S&P 500 Index"):

Getting Started with R

Notice that we added two more arguments to the plot() function to personalize the labels on the axis, instead
of the default values (compare the two graphs).

1.3 From prices to returns


Financial analysis is typically interested in analyzing the return rather than the price of an asset. Lets define
the price of the S&P 500 in month t by Pt . There are two types of returns that we are interested in calculating:
1. Simple return: Rt = (Pt Pt1 )/Pt1
2. Logarithmic return: rt = log(Pt ) log(Pt1 ) (or continuously compounded)
Creating the returns from price series in R is simple, thanks to the fact that we defined the data as time series
objects (e.g., zoo) which have functions such as lag() that provide lead/lag values of a time series (i.e., Pt1 ).
We can thus create the simple and logarithmic returns of the S&P 500 Index with the following commands:
simpleR = 100 * (sp500-lag(sp500, -1))/lag(sp500,-1)
logR
= 100 * (log(sp500) - lag(log(sp500), -1))

where we multiply by 100 to obtain percentage returns. An alternative way to calculate these returns is by
using the diff() function which takes the first difference of the time series, that is, Pt = Pt Pt1 . Using
the diff() function we would calculate returns as follows:
simpleR = 100 * diff(sp500) / lag(sp500, -1)
logR
= 100 * diff(log(sp500))

A first step in data analysis is to calculate descriptive statistics that summarize the main features of the
distribution of the data, such as the average/median returns and their dispersion. One way to do this is by
using the summary() function which provides the output shown below:

Getting Started with R

summary(simpleR)

Index
Min.
:1970-02-02
1st Qu.:1981-06-16
Median :1992-11-02
Mean
:1992-10-31
3rd Qu.:2004-03-16
Max.
:2015-08-03

AdjClose
Min.
:-21.76
1st Qu.: -1.85
Median : 0.94
Mean
: 0.68
3rd Qu.: 3.60
Max.
: 16.30

summary(logR)

Index
Min.
:1970-02-02
1st Qu.:1981-06-16
Median :1992-11-02
Mean
:1992-10-31
3rd Qu.:2004-03-16
Max.
:2015-08-03

AdjClose
Min.
:-24.54
1st Qu.: -1.87
Median : 0.93
Mean
: 0.58
3rd Qu.: 3.53
Max.
: 15.10

You may notice that the mean, median, and 1st and 3rd quartile (25% and 75%) are quite close values but the
minimum (and the maximum) are quite different: for the simple return the maximum drop is -21.763% and
for the logarithmic return the maximum drop is -24.543%. The reason for this is that the logarithmic return is
an approximation to the simple return that works well when the returns are small but becomes increasingly
unreliable for large (positive or negative) returns.
An advantage of using logarithmic returns is that it simplifies the calculation of multiperiod returns. This is
due to the fact that the (continuously compounded) return over k periods is given by rtk = log(Pt )log(Ptk )
which can be expressed as the sum of one-period logarithmic returns, that is

rtk = log(Pt ) log(Ptk ) =

rtj+1

j=1

Instead, for simple returns the multi-period return would be calculated as Rtk = kj=1 (1 + Rtj+1 ) 1. One
reason to prefer logarithmic to simple returns is that it is easier to derive the properties of the sum of random
variables, rather than their product. The disadvantage of using the continuously compounded return is that
when calculating the return of a portfolio the weighted average of log returns of the individual assets is only
an approximation of the log portfolio return. However, at the daily and monthly horizons returns are very
small and thus the approximation error is relatively minor.
Descriptive statistics can also be obtained by individual commands that calculate the mean(), sd() (standard
deviation), median(), and empirical quantiles (quantile( , tau) with tau a value between 0 and 1). The
package fBasics provides additional functions such as skewness() and kurtosis() which are particularly
relevant in the analysis of financial data. This package has also a function basicStats() that provides a table
of descriptive statistics as follows (see help(basicStats) for details):

Getting Started with R

require(fBasics)
basicStats(logR)

nobs
NAs
Minimum
Maximum
1. Quartile
3. Quartile
Mean
Median
Sum
SE Mean
LCL Mean
UCL Mean
Variance
Stdev
Skewness
Kurtosis

AdjClose
547.000
0.000
-24.543
15.104
-1.872
3.534
0.576
0.932
315.183
0.190
0.204
0.948
19.644
4.432
-0.714
2.628

In addition, in the presence of several assets we might be interested in calculating the covariance and
correlation among these assets. Lets define Ret to have two columns representing the return of Apple in
the first column, and the return of the S&P 500 Index in the second column. We can use the functions cov()
and cor() to estimate the covariance and correlation as follows:
cov(Ret, use='complete.obs')

aapl sp500
aapl 193.8 24.1
sp500 24.1 18.5

Where the elements in the diagonal are the variances of the Apple and S&P 500 returns and the off-diagonal
element is the covariance between the two series (the off-diagonal elements are the same bebause the
covariance between X and Y is the same as the covariance between Y and X). The correlation matrix is
calculated as:
cor(Ret, use='complete.obs')

aapl sp500
aapl 1.000 0.402
sp500 0.402 1.000

where the diagonal elements are equal to 1 because it represents the correlation of X with X and the offdiagonal element is the correlation between the two returns.
We can now plot the monthly simple and logarithmic return as follows:

Getting Started with R

plot(simpleR)

plot(logR, col=2, xlab="Time", ylab="S&P 500 Return")


abline(h=0, col=4, lty=2, lwd=2)

where the abline() command produces a horizontal line at 0 with a certain color (col=4 is blue), type of line
(lty), and width (lwd).

1.4 Distribution of the data


Another useful tool to analyze data is to look at frequency plots which can be interpreted as estimators of
the probability density function. Histograms represent a popular way to visual the sample frequency that a

Getting Started with R

variable takes values in a certain bin/interval. The function hist() serves this purpose and can be used as
follows:
hist(logR, breaks=50, main="S&P 500")

This is a basic histogram in which we set the number of bins to 50 and the title of the graph to Apple. We
can also add a nonparametric estimate of the frequency that smooths out the roughness of the histogram and
makes the density estimates continuous (N.B.: the prob=TRUE option makes the y-scale probabilities instead
of frequencies):
hist(logR, breaks=50,main="S&P 500",xlab="Return",ylab="",prob=TRUE)
lines(density(logR,na.rm=TRUE),col=2,lwd=2)
box()

Getting Started with R

10

1.5 Creating functions in R


So far we discussed functions that are already available in R, but the advantage of using a programming
language is that you are able to write functions that are specifically taylored to the analysis you are planning
to conduct. We will illustrate this with a simple example. Earlier, we calculate the average monthly return
of the S&P 500 Index using the command mean(logR) that is equal to 0.576. We can write a function that
calculates the mean
and compares the results with those of the function mean(). Since the sample mean is
= T Rt /T we can write a function that takes a time series as input and gives the sample
obtained as R
t=1
mean as output. We can call this function mymean and the syntax of defining a function is as follows:
mymean <- function(Y)
{
Ybar <- sum(Y) / length(Y)
return(Ybar)
}
mymean(logR)

## [1] 0.576

Not surprisingly, the result is the same as the one obtained using the mean function. More generally, a function
can take several arguments, but it has to return only one outcome, which could be a list of items. The function
we defined above is quite simple and it has several limitations: 1) it does not take into account that the series
might have NAs, and 2) it does not calculate the mean of each column in case there are several. As an exercise,
modify the mymean function to accomodate for these issues.

1.6 Loops in R
A loop consists of a set of commands that we are interested to repeat a pre-specified number of times and to
store the results for further analysis. There are several types of loops, with the for loop probably the most
popular. The syntax in R to implement a for loop is as follows:
for (i in 1:N)
{
## write your commands here
}

where i is an indicator and N is the number of times the loop is repeated. As an example, we can write a
function that contains a loop to calculate the sum of a variable and compare the results to the sum() function
provided in R. This function could be written as follows:

11

Getting Started with R

mysum <- function(Y)


{
N
= length(Y)
# define N as the number of elements of Y
sumY = 0
# initialize the variable that will store the sum of Y
for (i in 1:N)
{
sumY = sumY + as.numeric(Y[i]) # current sum is equal to previous sum
}
# plus the i-th value of Y
return(sumY)
# as.numeric(): makes sure to transform
}
# from other classes to a number
c(sum(logR), mysum(logR))

## [1] 315 315

Notice that to define the mysum() function we only use the basic + operator and the for loop. This is just a
simple illustration of how the for loop can be used to produce functions that perform a certain operation
on the data. Lets consider another example of the use of the for loop that demonstrates the validity of the
Central Limit Theorem (CLT). We are going to do this by simulation, which means that we simulate data and
calculate some statistic of interest and repeat these operations a large number of times. In particular, we want
to demonstrate that, no matter how the data are distributed, the sample mean is normally distributed with
mean the population mean and variance given by 2 /N , where 2 is the population variance of the data and
N is the sample size. We assume that the population distribution is N (0, 4) and we want to repeat a large
number of times the following operations:
1. Generate a sample of length N
2. Calculate the sample mean
3. Repeat 1-2 S times
Every statistical package provides functions to simulate data from a certain distribution. The function rnorm(
N, mu, sigma) simulate N observations from the normal distribution with mean mu and standard deviation
sigma whilst rt(N, df, ncp) generates a sample of length N from the t distribution with df degrees-of-freedom
and non-centrality parameter ncp. The code to perform this simulation is as follows:
S
N
mu
sigma

=
=
=
=

1000
1000
0
2

# set the number of simulations


# set the length of the sample
# population mean
# population standard deviation

Ybar = vector('numeric', S)

# create an empty vector of S elements


# to store the t-stat of each simulation

for (i in 1:S)
{
= rnorm(N, mu, sigma)
Y
Ybar[i] = mean(Y)
}
c(mean(Ybar), sd(Ybar))

# Generate a sample of length N


# store the t-stat

Getting Started with R

[1] -0.000337

12

0.062712

The object Ybar contains 1000 elements each representing the sample mean of a random sample of length 1000
drawn from a certain distribution. We expect that these values are distributed as a normal distribution with
mean equal to 0 (the population mean) and standard deviation 2/31.623 = 0.063. We can assess this by plotting
the histogram of Ybar and overlap it with the distribution of the sample mean. The graph below shows that
the two distribution seem very close to each other. This is confirmed by the fact that the mean of Ybar and its
standard deviation are both very close to their expected values. To evaluate the normality of the distribution,
we can estimate the skewness and kurtosis of store which we expect to be close to zero to indicate normality.
These values are -0.042 and -0.199 which can be considered close enough to zero to conclude that Ybar is
normally distributed.

What would happen if we generate samples from a t instead of a normal distribution? For a small number
of degrees-of-freedom the t distribution has fatter tails than the normal, but the CLT is still valid and we
should expect results similar to the previous ones. We can run the same code as above, but replace the line Y
= rnorm(N, mu, sigma) with Y = rt(N, df) with df=4. The plot of the histogram and normal distribution
(with 2 = df /(df 2)) below shows that the empirical distribution of Ybar closely tracks the asymptotic
distribution of the sample mean.

Getting Started with R

13

2. Linear Regression Model


In this Chapter we review the Linear Regression Model (LRM) in the context of some relevant financial
application and discuss the R functions that are used to estimate the model. A more detailed discussion of
of the LRM and the methods to conduct inference can be found in the introductory textbooks of @sw and
@wool.

2.1 LRM with one independent variable


The LRM assumes that Yt = 0 +1 Xt +t where Yt is called the dependent variable, Xt is the independent
variable, 0 and 1 are parameters, and t is the error term. Typically, we use subscript t when we have time
series data and subscript i for cross-sectional data (e.g., different individuals, firms, countries). The aim of
the LRM is to explain the variation over time/units of the dependent variable Y based on the variation of
the independent variable X: high values of Y are explained by high or low values of X, depending on the
parameter 1 . We estimate the LRM by Ordinary Least Squares (OLS) which is a method that chooses the
parameter estimates that minimize the sum of squared residuals. For the LRM we have analytical formulas

t ,Yt
= Xt ,Yt XYt
for the OLS coefficient estimates. The estimate of the slope coefficient is given by 1 = X

2
Xt

2 and
where
X
Y2t represent the sample variances of two variables, and
Xt ,Yt and Xt ,Yt are the sample
t
where
covariance and the correlation of Xt and Yt , respectively. The intercept is given by 0 = Y 1 X
and Y represent the sample mean of Xt and Yt . Lets assume that aaplret represents the monthly excess
X
return of Apple which we consider our dependent variable, and that sp500ret is the monthly excess return
of the S&P 500. To calculate the estimate of the slope coefficient 1 we need to first estimate the covariance
of the stock and the index, and the variance of the index return. The R commands to estimate these quantities
were introduced in the previous Chapter, so that 1 can be calculated as follows:

cov(aaplret, sp500ret) / var(sp500ret)

[1] 1.301034

The interpretation of the coefficient estimate is that if the S&P 500 changes by 1% then we expect that the
Apple stock prices changes by 1.3010343%. The alternative way to calculate 1 is using the formula with the
correlation coefficient and the ratio of the standard deviations of the two assets:
cor(aaplret, sp500ret) * sd(aaplret) / sd(sp500ret)

[1] 1.301034

Once the slope parameter is estimated, we can then estimate the intercept as follows:
14

15

Linear Regression Model

mean(aaplret) - beta1 * mean(sp500ret)

[1] 0.7095597

Naturally, R has functions that estimate the LRM automatically and provide a wealth of information. The
function is called lm() for linear model and below is the way it is used to estimate the LRM:
fit <- lm(aaplret ~ sp500ret)
fit

Call:
lm(formula = aaplret ~ sp500ret)
Coefficients:
(Intercept)
0.7096

sp500ret
1.3010

The fit object provides the coefficient estimates and the function that has been used; a richer set of statistics
is provided by the function summary(fit) as shown below:
Call:
lm(formula = aaplret ~ sp500ret)
Residuals:
Min
1Q
-79.660 -5.796

Median
0.777

3Q
7.725

Max
32.215

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.7096
0.7598
0.934
0.351
sp500ret
1.3010
0.1753
7.420 1.34e-12 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.77 on 286 degrees of freedom
Multiple R-squared: 0.1614,
Adjusted R-squared: 0.1585
F-statistic: 55.06 on 1 and 286 DF, p-value: 1.34e-12

The table provided by the summary() function provides standard errors of the coefficient estimates, the tstatistics and p-values for the null hypothesis that the coefficient is equal to zero (and stars to indicate the
significance at 1, 5, and 10%), the R2 and its adjusted version, and the F test statistic for the overall significance
of the regression. Interpreting the magnitude of the R2 is always related to what can be expected in the specific
context that is being analyzed. In this case, a 14% is not a very high number and there is a lot of variability of
Apple returns that is not explained by a linear relationship with the market. This becomes evident when we
plot the time series of the residuals:

16

Linear Regression Model

plot(fit$residuals)
abline(h=0, col=2, lty=2)

It is clear from this graph that the residuals are large in absolute magnitude (see the scale of the y-axis) and
some large negative residuals that are probably due to company-specific news hitting the market. We can
also evaluate the normality of the residuals graphically using a Quantile-Quantile (QQ) plot that represents
a scatter plot of the empirical quantiles of the time series against the quantiles of a normal distribution with
mean and variance estimated from the time series.
qqnorm(fit$residuals)
qqline(fit$residuals,col=2)

# Quantile-Quantile function
# diagonal line

The continuous line in the graph represents the diagonal and we expect the QQ points to be along or close to the line
if the residuals are normally distributed. Instead, in this case we observe large deviations from the diagonal

Linear Regression Model

17

on the left tail and some smaller deviations on the right tail of the residuals distribution. The dot on the
bottom-left corner has coordinates approximately equal to -3% and -80%: this means that there is only 0.1%
probability that a normal distribution with the estimated mean and variance takes values smaller than -3%. On
the other hand, we observe that for Apple that same quantile is -80%, which indicates that the residuals of the
model have fatter tails relative to the normal distribution. This is because the monthly returns of Apple have
some extreme events that are not explained by exposure of the stock to market risk (1 Xt ). This produces
large unexplained returns that might have occurred because of the release of firm-specific news (rather than
market-wide) which depressed the stock price in that specific month. In a later Section we discuss the effect
of outliers on OLS coefficient estimates.

2.2 Functional forms


In the example above we consider the case of a linear relationship between the independent variable X and
the dependent variable Y. However, there are situations in which the relationship between X and Y might not
be well-explained by a linear model. This can happen when the effect on Y of changes in X depend on the
level of X. In this Section, we discuss two functional forms that are relevant in some financial applications.
One functional form that can be used to account for nonlinearities in the relationship between X and Y is
the quadratic model, which simply consists of adding the square of the independent variable as an additional
regressor. The Quadratic Regression Model is given by Yt = 0 + 1 Xt + 2 Xt2 + t that, relative to
the linear model, adds some curvature to the relationship through the quadratic term. The model can still be
estimated by OLS and the expected effect of a one unit increase in X is now given by 1 + 2 2 Xt . In
this case, the effect on Y of changes in X is thus a function of the level of the independent variable X, while
in the linear model the effect is 1 whatever the value of X is.
Another simple variation of the LRM which introduces nonlinearities consists of assuming that the slope
coefficient of X is different below or above a certain threshold value of X. For example, if Xt is the market
return we might want to evaluate if there is a different (linear) relationship between the stock and the market
return when the market return is, e.g., below the mean/median as opposed to above the mean/median. We
can define two new variables as follows: XtU P = Xt I(Xt m) and XtDOW N = Xt I(Xt < m),
where I(A) is the indicator function which takes value 1 if the event A is true and zero otherwise, and m is
a threshold value to define the variable above and below.
An interesting application of nonlinear functional forms is to model hedge fund returns. As opposed to mutual
funds that mostly buy and sell stocks and make limited use of financial derivatives and leverage, hedge funds
make extensive use of these instruments to hedge downside risk and boost their returns. This produces a
nonlinear response to changes in market returns which might not be well captured by a linear model. A
simple example to illustrate why this could be the case, assume that a hedge funds holds a portfolio composed
of one call option on a certain stock. The performance/payoff is clearly nonlinearly related to the price of the
underlying asset and will be better approximated by either of the functional forms discussed above. We can
empirically investigate this issue using the Credit Suisse Hedge Fund Indexes that provide the overall and
strategy-specific performance of a portfolio of hedge funds. Indexes are available for the following strategies:
convertible arbitrage, dedicated short bias, equity market neutral, event driven, global macro and long-short
equity among others. Since individual hedge fund returns data are proprietary, we will use these Indexes that
could be interpreted as a sort of fund-of-funds of the overall universe or for specific strategies of hedge funds.
The file hedgefunds.csv contains the returns of 14 Credit Suisse Hedge Fund Indexes (the HF index and 13

Linear Regression Model

18

strategies indexes) from January 1993 until December 2013 at the monthly frequency. We start the analysis
by considering the HF Index which represents the first column of the file. Below we show a scatter plot of the
HF Index return against the S&P 500. There seems to be positive correlation between these indexes, although
the most striking feature of the plot is the difference in scale between the x- and y-axis: the HF returns range
between 6% while the equity index between 20% and 12%. The standard deviation of the HF index is
2.111% compared to 4.444% for the S&P 500 Index which shows that hedge funds, in general, provide an hedge
against large movements in markets (the S&P 500 in this case).
plot(sp500, hfindex, main="Credit Suisse Hedge Fund Index", cex.main=0.75)
abline(v=0, col=2)
abline(h=0, col=2)

Before introducing non-linearities, we estimate a linear model in which the HF index return is explained by
the S&P 500 return. The results below show the existence of a statistically significant relationship between the
two returns, with a 0.274 exposure of the HF return to the market return (if the market return changes by 1
% then we expect the fund return to change by 0.274%). The R2 of the regression is 0.3314986, which is not
very high and might indicate that a nonlinear model might be more successful to explain the time variation of
the hedge fund returns. We can add the fitted linear relationship 0.564 + 0.274 * sp500 to the previous scatter
plot to have a graphical understanding of the LRM.
fitlin

<- lm(hfindex ~ sp500)

(Intercept)
sp500

Estimate Std. Error t value Pr(>|t|)


0.564
0.113
5.008
0
0.274
0.025 10.864
0

Linear Regression Model

19

Lets consider now the quadratic model in which we add the square of the market return:
sp500sq <- sp500^2
fitquad <- lm(hfindex ~ sp500 + sp500sq)

(Intercept)
sp500
sp500sq

Estimate Std. Error t value Pr(>|t|)


0.715
0.133
5.357
0.000
0.254
0.027
9.536
0.000
-0.007
0.003 -2.072
0.039

If the coefficient of the square term is statistically significant then we conclude that a nonlinear form is a
better model for the relationship between market and fund returns. In this case, we find that the p-value for
the null hypothesis that the coefficient of the square market return is equal to zero is 0.039 (or 3.9%) which
is smaller than 0.10 (or 10%) and thus find evidence of nonlinearity in the relationship between the HF and
the market return. In the plot below we see that the contribution of the quadratic term (dashed line) is more
apparent at the extremes, while for small returns it overlaps with the fitted values of the linear model. In
terms of goodness-of-fit, the adjusted R2 of the quadratic regression is 0.338 compared to 0.329 for the linear
model, which is a modest but significant increase that makes the quadratic model preferable.

Linear Regression Model

20

The second type of nonlinearity we discussed earlier is to assume the relationship is linear but with different
slopes below and above a certain threshold. In the example below we consider as threshold the median value of
the independent variable, which in this case is the market return. We then create two variables that represent
the market return below the median (sp500down) and above the median (sp500up). These two variables can
then enter the lm() command and the OLS estimation results are reported below.
m
sp500up
sp500down
fitupdown

<<<<-

(Intercept)
sp500up
sp500down

median(sp500)
sp500 * (sp500 >= m)
sp500 * (sp500 < m)
lm(hfindex ~ sp500up + sp500down)

Estimate Std. Error t value Pr(>|t|)


0.718
0.171
4.209
0
0.222
0.050
4.486
0
0.313
0.042
7.544
0

The slope coefficient (or market exposure) above the median is 0.222 and below is 0.313, which suggests that
the HF Index is more sensitive to downward movements of the market relative to upward movements. We
conclude that there is evidence of a significant difference in the exposure of HF returns to positive/negative
market conditions, although it is not very strong. The adjusted R2 of 0.3299409 is slightly higher than the
linear model but lower relative to the quadratic model. The graph below shows the fitted regression line for
this model.

Linear Regression Model

21

2.3 The role of outliers


One of the indexes provided by Credit Suisse is for the equity market neutral strategy. The aim of this strategy
is to provide positive expected return, with low volatility and no correlation with the equity market. Below
is a scatter plot of the market return (proxied by the S&P 500) and the HF equity market neutral return.

The problem with this graph is that the scale of the y-axis ranges between -40% and 10% and all of the HF
returns, except for one month, happen on the much smaller range between 10%. The extreme observation
that skews the graph corresponds to a month in which the market index lost 7.5% and the equity market
neutral index lost over 40%. To find out when the extreme observation occurred, we can use the command
which(hfneutral < -30) which indicates that it represents the 179th observation and corresponds to Nov
2008. What happened in Nov 2008 to create a loss of 40% to an aggregate index of market neutral strategy
hedge funds? In the following press release Credit Suisse discusses that they marked down to zero the assets of

Linear Regression Model

22

the Kingate Global Fund, which was a hedge fund based on the British Virgin Island that acted as a feeder for
the Madoff funds and was completely wiped out all of its assets. Since the circumstances were so exceptional
and unlikely to happen again to such a large extent, it is probably warranted to simply drop that observation
from the sample when estimating the model parameters.
The most important reason for excluding extreme observations from the sample is that they contribute to
bias the coefficient estimates away from their true values. We can use the equity market neutral returns as
an example to evaluate the effect of one extreme event on the parameters. Below, we first estimate a LRM of
the fund returns on the market return and then the same regression, but dropping observation 179 from the
sample.
fit0 <- lm(hfneutral ~ sp500 )

(Intercept)
sp500

Estimate Std. Error t value Pr(>|t|)


0.352
0.179
1.971
0.05
0.199
0.040
4.971
0.00

fit1 <- lm(hfneutral[-179] ~ sp500[-179])

(Intercept)
sp500[-179]

Estimate Std. Error t value Pr(>|t|)


0.560
0.063
8.818
0
0.128
0.014
8.971
0

These results indicate that by dropping the Nov 2008 return the estimate of 0 increases from 0.352 to 0.56
while the market exposure of the fund declines from 0.199 to 0.128, thus indicating a smaller exposure to
market returns. However, even after removing the outlier the exposure is statistically significant at 1%, that
suggests that the aggregate index is not market neutral. In term of goodness-of-fit, the R2 increases from
0.09405 to 0.2534943 due to dropping the large error experience in November 2008.

23

Linear Regression Model

2.4 LRM with multiple independent variables


In practice, we might be interested to include more than just one variable to explain the dependent variable
Y . If we have K independent variables that we are interested to include in the regression we can denote the
observations of each variable by Xk,t for k = 1, , K. The linear regression with multiple regressors is
given by:
Yt = 0 + 1 X1,t + + K XK,t + t
Also in this case we can use OLS to estimate the parameters k (for k = 1, , K) which consists of the set
of parameters that minimizes the sum of the squared residuals. Care should be given to the correlation among
the independent variables to avoid cases of extremely high dependence. The extreme case of correlation equal
to 1 among two independent variables is defined as perfect collinearity and the model cannot be estimated
because it is impossible to separately identify the coefficients of the two variables when the two variables are
actually the same (apart from scaling). A similar situation arises when an independent variable has correlation
1 with a linear combination of some other independent variables. The solution in this case is to drop one of
the variables as discussed earlier. In practice, it is more likely to happen that the regressors have very high
correlation although not equal to 1. In this case the model can be estimated but the coefficient estimates are
highly variable and thus become unreliable. For example, equity markets move together in response to news
that affect the economies worldwide. So, there are significant co-movements among these markets and thus
high correlation which can become problematic in some situations. Care should be taken to test the robustness
of the results to the inclusion of variables with high correlation.
In R the LRM with multiple regressors is estimated using the lm() command discussed before. As an example,
we extend the earlier 1-factor model to a 3-factor model in which we use 3 independent variables or factors to
explain the variation over time of the returns of a mutual fund. The factors are called the Fama-French factors
after the two economists that first proposed these factors to model risk in financial returns. In addition to the
market return, they construct two additional risk factors:
Small minus Big (SMB) which is defined as the difference in the return of a portfolio of small
capitalization stocks and a portfolio of large capitalization stocks. The aim of this factor is to capture
the premium from investing in small cap stocks.
High minus Low (HML) is obtained as the difference between a portfolio of high Book-to-Market (B-M)
ratio stocks and a portfolio of low B-M stocks. The high B-M ratio stocks are also called value stocks
while the low B-M ratio ones are referred to as growth stocks. The factor provides a proxy for the
premium from investing in value relative to growth stocks.
These factors are freely available at Ken French webpage. The file F-F_Research_Data_Factors.txt contains
the market return, SMB, HML, and the risk-free rate (RF). To upload the data we can use the following
command:
FF <- read.table('F-F_Research_Data_Factors.txt')
head(FF, 3)

Linear Regression Model

192607
192608
192609

24

Mkt.RF
SMB
HML
RF
2.95 -2.50 -2.67 0.22
2.63 -1.20 4.50 0.25
0.38 -1.33 -0.30 0.23

tail(FF, 3)

201310
201311
201312

Mkt.RF
SMB
HML RF
4.17 -1.53 1.39 0
3.12 1.31 -0.38 0
2.81 -0.44 -0.17 0

The dataset starts in July 1926 and ends in December 2013 and each column represents the factors discussed
above (plus the risk-free rate) with the returns expressed in percentage. The file is imported as a data.frame
and it is useful to give a time series characterization as we did in the previous Chapter using the command
FF <- zooreg(FF, start=c(1926,7), end=c(2013,12), frequency=12) which defines the dataset as a zoo
object.

25

Linear Regression Model

Lets look at some descriptive statistics of these factors, in particular the mean and standard deviation. Since we
have a matrix with 4 columns (market returns, SMB, HML, and the risk-free rate) using the mean() function
would calculate just one number that represents the average of all columns. Instead, we want to apply the
mean() function to each column and there is an easy way to do this using the function apply(). This function
applies a function (e.g., mean or sd) to all the rows or columns of a matrix (second argument of the function
equal to 1 applies to rows and 2 to columns). In our case:
apply(FF, 2, mean)

Mkt.RF
SMB
HML
RF
0.6487429 0.2342095 0.3938190 0.2871429

The results show that the average monthly market return from 1926 to 2013 has been 0.649% (approx. 7.785%
yearly) in excess of the risk-free rate. The monthly average of SMB is 0.234% (approx 2.808% yearly) which
measures the monthly extra-return from investing on a portfolio of small caps relative to investing on large
capitalization stocks. The average of HML provides the extra-return from investing on value stocks relative
to growth stocks which annualized corresponds to about 4.728%.
apply(FF, 2, sd)

Mkt.RF
SMB
HML
RF
5.4137611 3.2317907 3.5117457 0.2539519

The standard deviation of monthly returns is 5.414% which can be annualized by multiplying by 12. The
SMB and HML factors show significant volatility, since their monthly standard deviations are 3.232 and 3.512,
respectively. Another quantity that we need to calculate to better understand the behavior of these risk factors
is their correlation. We can calculate the correlation matrix using the cor() function that was introduced
earlier:
cor(FF)

Mkt.RF
SMB
Mkt.RF 1.00000000 0.33429226
SMB
0.33429226 1.00000000
HML
0.21574891 0.11992280
RF
-0.06622179 -0.05650984

HML
RF
0.21574891 -0.06622179
0.11992280 -0.05650984
1.00000000 0.01528333
0.01528333 1.00000000

where we find that both SMB and HML are weakly correlated to the market returns (0.334 and 0.216,
respectively) and also among each other (0.12). In this sense, it seems that the factors capture relatively
uncorrelated sources of risk which is valuable from a diversification stand-point.
Lets consider an application to mutual funds returns and their relationship to the FF factors. We download
from Yahoo Finance data for the DFA Small Cap Value mutual fund (ticker: DFSVX). The fund invests in

26

Linear Regression Model

company with small capitalization that are considered undervalued according to a valuation ratio (e.g., Book
to Market ratio) and holding the investment until these ratios are considered fair. In the long run, this strategy
outperforms the market, although it comes at the risk that they might significantly under perform over
shorter evaluation periods. We estimate a three-factor model with the monthly excess return of DFSVX as the
dependent variable and the regressors are represented by MKT, SMB, and HML. The three-factor model can
be estimated using the lm() command. However, when dealing with time series the package dyn provides a
wrapper for the lm() function which synchronizes the variables in case they span different time periods, but
also allows to use time series commands (like diff() and lag()) in the Equation definition. Notice that in the
regression below we are regressing ex_dfsvx on the first three columns of FF (FF[,1:3]) with the dependent
variable starting in March 1993, but the independent starting in 1926. In this case the dyn$lm() will adjust
automatically the series such that they all span the same time period:
fit
<- dyn$lm(ex_dfsvx ~ FF[,1:3]) # regress the excess fund returns on the 3 F-F factors
summary(fit)

Call:
lm(formula = dyn(ex_dfsvx ~ FF[, 1:3]))
Residuals:
Min
1Q Median
-4.3860 -0.6704 -0.0041

3Q
0.7024

Max
4.6372

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001708
0.072859
0.023
0.981
FF[, 1:3]1 1.057903
0.016828 62.865
<2e-16 ***
FF[, 1:3]2 0.806633
0.023013 35.051
<2e-16 ***
FF[, 1:3]3 0.673636
0.024075 27.981
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.129 on 245 degrees of freedom
(801 observations deleted due to missingness)
Multiple R-squared: 0.9618,
Adjusted R-squared: 0.9613
F-statistic: 2056 on 3 and 245 DF, p-value: < 2.2e-16

As it is clear from the regression results, the fund has not only exposure to the market factor, but also to
SMB and HML with positive coefficients. These results indicate that the fund overweighs value and small
stocks (relative to growth and large) and thus benefits, in the long-run, from the premium from investing
in those stocks. We expected this result since the fund states clearly its strategy to buy undervalued small
cap stocks. However, keep in mind that we could have ignored the investment strategy followed by the fund,
and the regression results would have indicated that the manager overweighs small cap value stocks in the
portfolio. In this model, positive coefficients for SMB and HML indicate that the fund manager overweighs
(relative to the market) small caps and/or value stocks, while a negative value suggest that the manager
gears the investments toward large caps and/or growth stocks. The intercept estimate is typically referred in
the finance literature as alpha and is interpreted as the risk-adjusted return, with the risk being measured

Linear Regression Model

27

by 1 M KTt + 2 SM Bt + 3 HM Lt .The example above shows how to use the LRM to analyze the
investing style of a mutual fund or an investment strategy. In conjunction with nonlinear functional forms,
this model can also be useful to investigate the style and the risk factors of hedge fund returns. In this case the
role of nonlinearities is essential given the time-variation in the exposure (s) of this type of funds, as well as
the extensive use of derivative which imply nonlinear exposure to the risk factors.

2.5 Omitted variable bias


An important assumption that is introduced when deriving the properties of the OLS estimator is that the set
of regressors included in the model represents all those that are relevant to explain the dependent variable.
However, in practice it is difficult to make sure that this assumption is satisfied. In some situations we might
not observe a variable that we believe relevant to explain the dependent variable, while in others we might
not know which variables are actually relevant. What is the effect of omitting relevant variables on the OLS
coefficient estimates? The answer depends on the correlation between the omitted variable and the included
variable(s). If the omitted and included variables are correlated, then the estimate of the slope coefficient of
the included variable will be biased, that is, it will be significantly different from its true value. However, if
we omit a relevant variable that is not correlated with any of included variables, then we do not expect any
bias in the coefficient estimate. To illustrate this concept, we will discuss first an example based on real data
and then show the estimation bias in a simulation exercise.
Assume that we are given to analyze a mutual fund return with the aim to understand the risk factors
underlying the performance of the fund. Since we are not told the investment strategy of the fund, we could
start the empirical analysis by assuming that the relevant variables to include in the regression are the FamaFrench factors. The 3 factor model is thus given by Rtf und = 0 +1 M KTt +2 SM Bt +3 HM Lt +t
and its estimation by OLS provides the following results:
(Intercept)
FF[, 1:3]1
FF[, 1:3]2
FF[, 1:3]3

Estimate Std. Error t value Pr(>|t|)


0.320
0.298
1.073
0.285
1.077
0.065 16.678
0.000
0.395
0.089
4.459
0.000
0.110
0.092
1.196
0.233

The fund has an exposure of 1.077 against the US equity market which is highly significant. In addition, there
seems to be significant (at 10%) exposure to SMB with a coefficient of 0.395 and a t-statistic of 4.459, but not
to HML which has a t-statistic of 1.196. In addition, the R2 of the regression is equal to 0.65, which indicates a
reasonable fit for this type of regressions. Based on these results, we would conclude that the fund invests in
US equity with a focus on small cap stocks. However, it turns out that the fund is the Oppenheimer Developing
Markets (ticker: ODMAX) which is a fund that invests exclusively in stocks from emerging markets and does
not hold any US stock. The results above appear as inconsistent with the declared investment strategy of the
fund: how is it possible that the exposures to the MKT and SMB are large and significant in the regression
above but the funds does not hold any US stock? It is possible that, despite these factors not being directly
relevant to explain the performance of the fund, they indirectly proxy for the effect of an omitted risk factor
that is correlated with MKT and SMB. Given the investment objective of the fund, we could consider including
as an additional risk factor the MSCI Emerging Markets (EM) Index which seems a more appropriate choice
of benchmark for this fund. In terms of correlation between the EM returns and the FF factors, the results
below indicate that there is a strong positive correlation with the US-equity market, as demonstrated by a
correlation of 0.699, and by much lower correlations with SMB (positive) and HML (negative).

Linear Regression Model

EM Mkt.RF
1.000 0.699

28

SMB
HML
RF
0.290 -0.184 -0.001

It seems thus reasonable to include the EM Index returns as an additional risk factor to explain the performance
of ODMAX. The Table below shows the estimation results of a regression of ODMAX excess monthly returns
on 4 factors, the EM factor in addition to the FF factors.
(Intercept)
EM
FF[, 1:3]1
FF[, 1:3]2
FF[, 1:3]3

Estimate Std. Error t value Pr(>|t|)


0.639
0.137
4.651
0.000
0.803
0.030 27.201
0.000
0.127
0.046
2.764
0.006
0.218
0.041
5.294
0.000
0.103
0.042
2.430
0.016

Not surprisingly, the estimated exposure to EM is 0.803 and highly significant, whilst the exposure to the
FF factors decline significantly. In particular, adding EM to the regression has the effect of reducing the
coefficient of the MKT from 1.077 to 0.127. This large change (relative to its standard error) in the coefficient
can be attributed to the effect of omitting a relevant variable (i.e., EM) which produces bias in the coefficient
estimates of the FF factors. The estimate from the first regression of 1.077 is biased because it does not represent
(only) the effect of MKT on ODMAX, but also acts as a good proxy for the omitted source of risk of the EM
Index, given the large and positive correlation between MKT and EM. The effect of omitted variables and
the resulting bias in the coefficient estimates is not only an econometric issue, but it has important practical
implications. If we use the LRM for performance attribution, that is, disentangling the systematic component
of the fund return (beta) from the risk-adjusted part (alpha), then omitting some relevant risk factors has the
effect of producing bias in the remaining coefficients and thus changes our conclusion about the contribution
of each component to the performance of the fund.
To further illustrate the effect of omitted variables in producing biased coefficient estimates, we can perform
a simulation study of the problem. The steps of the simulation are as follows:
1. We will assume that the dependent variable Yt (for t = 1, , T ) is generated by the following model:
Yt = 0.5 X1,t + 0.5 X2,t + t where X1,t and X2,t are simulated from the (multivariate) normal
distribution with mean 0 and variance 1 for both variables, with their correlation set equal to . The
error term t is also normally distributed with mean 0 and standard deviation 0.5. In the context of this
simulation exercise, the model above for Yt represents the true model for which we know the population
values of the parameters (i.e., 0 = 0 and 1 = 2 = 0.5).
2. We then estimate by OLS the following model: Yt = 0 + 1 X1,t + t where we intentionally omit
X2,t from the regression. Notice that X2,t is both a relevant variable to explain Yt (since 2 = 0.5) and
it is correlated with X1,t if we set = 0
3. We repeat step 1-2 S times and store the estimate of 1
We can then analyze the properties of 1 , the estimate of 1 , by, for example, plotting a histogram of the
S values obtained in the simulation. If omitting X2,t does not introduce bias in the estimate of 1 , then we
would expect the histogram to be centered at the true value of the parameter 0.5. Instead, the histogram will
be shifted away from the true value of the parameter if the omission introduces estimation bias. The code
below starts by setting the values of the parameters, such as the number of simulations, length of the time
series, and the parameters of the distributions. Then the for loop iterates S times step 1 and 2 described above,
while the bottom part of the program plots the histogram.

29

Linear Regression Model

require(MASS) # this package is needed for function `mvrnorm()` to simulate from the
# multivariate normal distribution
S
T
mu
cor
Sigma
beta
eps

<<<<<<<-

1000
# set the number of simulations
300
# set the number of periods
c(0,0)
# mean of variables X1 and X2
0.7
# correlation coefficient between X1 & X2
matrix(c(1,cor,cor,1), 2, 2) # covariance matrix of X = [X1, X2]
c(0.5, 0.5)
# slope coefficient of X = [X1, X2]
rnorm(T, 0, 0.5)
# errors

betahat <- matrix(NA, S, 1)

# vector to store the estimates of beta1

for (i in 1:S)
{
X
<- mvrnorm(T, mu, Sigma)

# loop starts here


# simulate the indep. variable X = [X1, X2]
# `mvnorm` from package MASS simulate the
# multivariate normal distribution (dim: Nx2)
Y
<- beta[1]*X[,1] + beta[2] * X[,2] + eps# simulate the dep. variable Y (dim: Nx1)
fit <- lm(Y ~ X[,1])
# fit the linear model of Y on X1 but not X2
betahat[i] <- coef(fit)[2]
# store the estimated coefficient of X1
}
# loop ends here
# set the limits of the x-axis
xmin <- min(min(betahat), 0.45)
xmax <- max(betahat)
# plot the histogram of betahat together with a vertical line at the true value of beta
hist(betahat, breaks=50, xlim=c(xmin, xmax), freq=FALSE,main="",xlab="",ylab="")
box()
abline(v=beta[1], col=2, lwd=4 )
text(beta[1], 1, "TRUE VALUE",pos=4, font=2)

In the simulation exercise we set the correlation between X1,t and X2,t equal to 0.7 (line cor = 0.7) and

Linear Regression Model

30

it is clear from the histogram above that the distribution of the OLS estimate is shifted away from the true
value of 0.5. This illustrates quite well the problem of omitted variable bias: we expect the estimates of 1
to be close to the true value of 0.5, but we find that these estimates range from 0.75 to 0.95. The bias that
arises from omitting a relevant variable does not disappear by using longer samples, but on the fact that we
omitted a relevant variable which is highly correlated with an included variable. If the omitted variable was
relevant but uncorrelated with the included variable, then the histogram of the OLS estimates would look like
the following plot that is produced with the earlier code and by setting cor = 0.

3. Time Series Models


In this chapter we introduce time series models that are very often used in economics and finance. The LRM
in the previous Chapter aims at relating the variation (over time or in the cross-section) of variable Y with
another variable X. In many cases we cannot make a statement about causality between X and Y , but rather
about correlation which could go either way. The approach is different in time series models: we assume
that the independent variable X is represented by past values of the dependent variable Y . We denote by
Yt the value of a variable in time period t (e.g., minutes, hours, days, weeks, months, quarters, years). When
we write Yt1 we refer to the value of the variable in the previous time period relative to today (i.e., t)
and, more generally, Ytk (for k 1) indicates the value of the variable k periods before t. k is referred
as the lag and Ytk as the lagged value. The aim of time series models can be stated as follows: without
considering information about any other variable, is there information in past values of a variable to predict
the variable today? In addition to providing insights on the properties of the time series, these models are
useful in forecasting since they use the past to model the present and the present to forecast the future.

3.1 The Auto-Correlation Function (ACF)


A first exploratory tool that is used in time series analysis is the Auto-Correlation Function (ACF). The ACF
represents the correlation between the variable Yt and the lagged value Ytk . It is estimated from a data
sample using the usual formula of the correlation, that is dividing the sample covariance between Yt and
Ytk and the sample variance of Yt . The R function to calculate the ACF is acf() and in the example below
we apply it to the monthly returns of the S&P 500 Index from February 1990 until December 2013:
require(tseries)
spm <- get.hist.quote("^GSPC", start="1990-01-01", end="2014-01-01", quote="AdjClose",
compression="m", retclass="zoo", quiet=TRUE)
spm.ret <- 100 * diff(log(spm))
# sp500 is defined as a zoo time series object but not as
# regularly spaced data (monthly); the acf() fuction
# applied to irregularly spaced objects gives an error, unless
# you add as argument 'na.action=na.exclude'
acf(spm.ret, lag.max=12, plot=TRUE)

Error in na.fail.default(as.ts(x)): missing values in object

acf(spm.ret, lag.max=12, plot=TRUE, na.action=na.exclude)

31

Time Series Models

32

# An alternative is to define sp500 to be a zooreg ('reg' for regular) object;


# in the rest of the chapter I will use the spm.ret defined as a zooreg
spm.ret <- zooreg(spm.ret, start=c(1990,1), frequency=12)
acf(spm.ret, lag.max=12, plot=TRUE, main="")

The option plot=TRUE implies that the function provides a graph where the horizontal axis is represented by
lag k (starting at 0 and expressed as fraction of a year) and the vertical axes represents the autocorrelation
which is a
value between -1 and 1. The horizontal dashed lines represents the 95% confidence interval equal
to 1.96/ T for the null hypothesis that the population auto-correlation at lag k is equal to 0. If the autocorrelation at lag k is within the interval we conclude that we do not reject the null that the correlation
coefficient for that lag is equal to zero (at 5% level). If the option plot=FALSE then the function prints the
estimates of the auto-correlation up to lag.max (equal to 12 in this example):
Autocorrelations of series 'spm.ret', by lag
0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
1.000 0.071 -0.002 0.071 0.048 0.022 -0.073 0.037 0.058 -0.006
0.8333 0.9167 1.0000
0.011 0.028 0.078

This analysis considers monthly returns for the S&P 500 Index and shows that there is very small and
statistically insignificant serial correlation across months. We can use the ACF to investigate if there is
evidence of serial correlation at the daily frequency. For the same sample period. The average daily percentage
return in this sample period is 0.027% and the daily standard deviation is 1.156% (total number of days is 6049).
The ACF plot up to 25 lags is reported below:

Time Series Models

33

Since the sample size (T ) is quite large at the daily frequency these bands are tight around zero. However, we
find that some of these auto-correlations are statistically significant (lag 1, 2, 5, 7 and 10) at 5%, although from
an economic standpoint they are very small to provide predictive power for the direction of future returns.
Although the S&P 500 returns do not show any significant correlation at the daily and monthly frequency,
it is common to find for several asset classes that their absolute or square returns show significant and longlasting serial correlations. This is a property of financial returns in general, and mostly at the daily frequency
and higher. The next plot shows the ACF for the absolute and square daily S&P 500 returns up to lag 100 days:

Time Series Models

34

It is clear from these graphs that the absolute and square returns are significantly positively correlated and that
the auto-correlation decays very slowly. This shows that large (small) absolute returns are likely to be followed
by large (small) absolute returns, that is, the magnitude of returns is correlated rather than their direction.
This is associated with the evidence that returns display volatility clusters that represent periods (that can
last several months) of high volatility followed by periods of low volatility. This suggests that volatility is
persistent (and thus predictable), while returns are unpredictable.

3.2 The Auto-Regressive (AR) Model


The first model that we consider is a linear regression that relates Yt to the previous value of the variable
(hence, auto-regressive), that is, Yt = 0 + 1 Yt1 + t where 0 and 1 are coefficients to be estimated
and t is an error term with mean zero and variance 2 . This model is typically referred to as AR(1) since we are
only using the first lag of the variable to explain the time variation in the current period. Similarly to the LRM,
this model represents the E(Yt |Yt1 = y) line in the scatter plot of Yt1 and Yt . If the last period realization
of the variable was y then we expect the realization in the current period to be E(Yt |Yt1 = y) = 0 +1 y.
The parameter 1 could be interpreted as in the LRM as the expected change in Yt for a unit change of Yt1 .
Since the dependent and independent variables represent the same variable observed at different points in
time, the slope 1 is also interpreted as a measure of persistence of the time series. By persistence we mean
the extent to which past values above/below the mean are likely to persist above/below the mean in the current
period. A large positive value of 1 represents a situation in which the series is persistently above/below the
mean for several periods, while a small positive value means that the series is likely to oscillate close to the
mean. The concept of persistence is also related to that of mean-reversion which measures the speed at which
a time series reverts back to its long-run mean. The higher the persistence of a time series the longer it will
take for deviations from the long-run mean to be absorbed. Persistence is associated with positive values of
1 , while negative values of the parameter implies that the series fluctuates around the mean. This is because
a value above the mean in the previous period is expected to be below the mean in the following period and
vice versa for negative values. In economics and finance we typically experience positive coefficients due
to the persistent nature of economic shocks. To estimate the AR(1) model we can use OLS to estimate the
parameters of the model and asymptotically (for large sample sizes) we expect the estimates to be consistent.

35

Time Series Models

Lets work with an example. We estimate an AR(1) model on the S&P 500 return at the monthly frequency.
I will discuss two (of several) ways to estimate an AR model in R. One way is to use the lm() function in
conjunction with the dyn package which gives lm() the capabilities to handle time series data and operations,
such as the lag() operator. The estimation is implemented below:
require(dyn)
fit <- dyn$lm(spm.ret ~ lag(spm.ret, -1))
summary(fit)

Call:
lm(formula = dyn(spm.ret ~ lag(spm.ret, -1)))
Residuals:
Min
1Q
-18.450 -2.391

Median
0.538

3Q
2.730

Max
10.338

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.5584
0.2573
2.17
0.031 *
lag(spm.ret, -1)
0.0707
0.0592
1.19
0.234
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.31 on 284 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.00499,
Adjusted R-squared: 0.00149
F-statistic: 1.42 on 1 and 284 DF, p-value: 0.234

where 0 = 0.558 and 1 = 0.071 represents the estimate of the coefficient of the lagged monthly return.
The estimate of 1 is quite close to zero which suggests that the monthly returns have little persistence and
that it is difficult to predict the return next month knowing the return for the current month. In addition,
we can test the null hypothesis that 1 = 0 and the t-statistic is 1.193 with a p-value of 0.234 suggesting
that the coefficient is not significant at 10%. The lack of persistence in financial returns at high-frequencies
(intra-daily or daily) but also at lower frequencies (weekly, monthly, quarterly) is well documented and it is
one of the stylized facts of returns common across asset classes.
Another function that can be used to estimate AR models is the ar() from the tseries package. The inputs of
the function are the series Yt , the maximum order/lag of the model, and the estimation method. An additional
argument of the function is aic that can be TRUE/FALSE. This option refers to the Akaike Information Criterion
(AIC) that is a method to select the optimal number of lags in the AR model. It is calculated as a penalized
goodness of fit measure, where the penalization is a function of the order of the AR model. In the case of
AR(1) the search is over lag 0 and 1 as you can see by comparing the two regression output below:

36

Time Series Models

# use 'na.action=na.exclude' if spm.ret is not a zooreg object


ar(spm.ret, aic=FALSE, order.max=1, method="ols",
demean = FALSE, intercept = TRUE)

Call:
ar(x = spm.ret, aic = FALSE, order.max = 1, method = "ols", demean = FALSE,

intercept = TRUE)

Coefficients:
1
0.071
Intercept: 0.558 (0.256)
Order selected 1

sigma^2 estimated as

18.4

ar(spm.ret, aic=TRUE, order.max=1, method="ols",


demean = FALSE, intercept = TRUE)

Call:
ar(x = spm.ret, aic = TRUE, order.max = 1, method = "ols", demean = FALSE,

intercept = TRUE)

Intercept: 0.601 (0.254)


Order selected 0

sigma^2 estimated as

18.5

By setting aic=TRUE the results show that the selected order is 0 since the first lag is (statistically) irrelevant
(i.e., the best model is Yt = 0 + t ). The option demean means that by default the ar() function subtracts
the mean from the series before estimating the regression. In this case, the ar() function estimates the model
Yt = 0 +1 Yt1 +t the function estimates Yt Y = 1 (Yt1 Y )+t without an intercept. Estimating
the two models provides the same estimate of 1 because E(Yt ) = 0 + 1 E(Yt1 ); if we denote the
expected value of Yt by , then using the previous equation we can express the intercept as 0 = (1 1 ).
By replacing this value for 0 in the AR(1) model we obtain Yt = (1 1 ) + 1 Yt1 + t that can be
rearranged as Yt = 1 (Yt1 ) + t . So, estimating the model in deviations from the mean or with an
intercept leads to the same estimate of 1 .
More generally, we do not have to restrict ourselves to explain Yt using only Yt1 but can also use Yt2 ,
Yt3 and further lags in the past. The reason why more lags might be needed to model a time series relates
to the stickiness in wages, prices, and expectations which might delay the effect of shocks on economic and
financial variables. A generalization of the AR(1) model is the AR(p) that includes p lags of the variable:
Yt = 0 + 1 Yt + 2 Yt2 + + p Ytp + t which can be estimated with the dyn$lm() or ar()
commands. Since we are using monthly data, we can set p (order.max in the ar() function) equal to 12 and
the estimation results are provided below:

37

Time Series Models

Call:
ar(x = spm.ret, aic = FALSE, order.max = 12, method = "ols",
Coefficients:
1
2
0.063 -0.017
10
11
0.011
0.015

3
0.107
12
0.065

4
0.037

5
0.034

6
-0.091

7
0.038

8
0.035

demean = FALSE, intercept = TRUE)

9
-0.003

Intercept: 0.436 (0.273)


Order selected 12

sigma^2 estimated as

17.6

The coefficient estimates are quite close to zero, but to evaluate their statistical significance we need to obtain
the standard errors of the estimates and calculate the t-statistics for the null hypothesis that the coefficients
are equal to zero:
coef
se_coef
t
t

<- fit$ar
<- fit$asy.se.coef$ar
<- as.numeric(coef / se_coef)

[1] 1.0564 -0.2862


[9] -0.0561 0.1854

1.7760
0.2541

0.6109
1.1048

0.5652 -1.5191

0.6445

0.5844

where we can see that only the third lag is significant at 10%, although not at 5%. Since most of the lags are not
significant, we prefer to reduce the number of lags and thus the amount of parameters that we are estimating.
We can select the optimal number of lags in the AR(p) model by setting the option aic=TRUE:
Call:
ar(x = spm.ret, aic = TRUE, order.max = 12, method = "ols", demean = FALSE,

intercept = TRUE)

Intercept: 0.601 (0.254)


Order selected 0

sigma^2 estimated as

18.5

which shows that the model that maximizes the AIC criterion is a model with no lags. This confirms the
evidence from the ACF that past values of the series have little relevance in explaining the dynamics of the
current level of the variable Yt .
A similar analysis can be conducted on daily S&P 500 returns and their absolute or square transformations.
Below we present the results for an AR(5) model estimated on the absolute return of the S&P 500. It might
be preferable to use dyn$lm over ar() because it provides the typical regression output with a column for the
coefficient estimate, the standard errors, the t-statistics, and the p-values. The results are as follows:

38

Time Series Models

fit <- dyn$lm(abs(spd.ret) ~ lag(abs(spd.ret), (-1):(-5)))


round(coef(summary(fit)), 3)

(Intercept)
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),

(-1):(-5))1
(-1):(-5))2
(-1):(-5))3
(-1):(-5))4
(-1):(-5))5

Estimate Std. Error t value Pr(>|t|)


0.241
0.017
14.21
0
0.057
0.013
4.50
0
0.180
0.013
14.41
0
0.117
0.013
9.26
0
0.143
0.013
11.45
0
0.193
0.013
15.32
0

The table shows that all lags are significant (even using a 1% significance level). The largest coefficient is 0.193
which considered together with the other coefficients that are positive and statistically significant the overall
persistence of absolute daily returns means that the series is quite predictable. Using the ar() function to
select the order by AIC results in the choice of 12 lags.

3.3 Model selection for AR models


In practice, how do we choose p? One approach to the selection of the order of AR models consists of
maximizing selection criteria that account for the goodness-of-fit of the model but penalize for the number
of lags included. The reason for the penalization is to avoid over-fitting: measures like R2 do not decrease
when we increase the lag order so that it would lead to the selection of large lags. By penalizing the goodnessof-fit measure leads to add lags only when the increase in the measure is larger than the penalization. The
most often used criterion is the Akaike Information Criterion (AIC) given by log(RSS/T ) + 2 (1 + p)/T ,
where RSS is the Residual Sum of Squares of the model, T is the sample size, and p is the lag order of
the AR model. The AIC is calculated for models with different p and the selected order lag is the one that
minimizes the criterion. An alternative criterion is the Bayesian Information Criterion (BIC) which is given
by log(RSS/T ) + log(T ) (1 + p)/T . BIC applies a higher penalization relative to AIC (log(T ) instead of
2) and leads to smaller lag orders.

3.4 Forecasting with AR models


One of the advantages of AR models is that they are very suitable to produce forecasts about the future
value of the series. In the simple case of an AR(1) model, at time t we can make a forecast about t + 1 as
t+1 |Yt ) = 0 + 1 Yt , where Yt is known and the coefficient estimates are obtained by OLS. If the
E(Y
t+2 |Yt ) = 0 + 1 E(Y
t+1 |Yt ) =
interest is to forecast two-step ahead, then the forecast is equal to E(Y
2

0 + 1 (0 + 1 Yt ) = 0 (1 + 1 ) + 1 Yt and similarly for longer horizon forecasts.


As an example, assume that we are interested in forecasting next period GDP. The earlier results for the ADF
test suggest that log(GDP ) is non-stationary so that we need to transform the series by differencing. Taking
differences of the logarithm of GDP produces growth rates or percentage changes (if multiplied by 100). The
plot of the percentage growth rate of GDP together with the NBER recession periods are shown below.

Time Series Models

39

Assume that we are at the beginning of 2014 and the GDP data for the forth quarter of 2013 have been released
which allows to calculate the growth rate in quarter 4 of 2013. The aim is to forecast the percentage growth of
GDP in the first quarter of 2014 based on the information available at that time. The first step is to estimate
an AR(p) model on data up to 2013Q4. We can first use the R command ar() to select the order p of the AR(p)
model and the order selected is 1. We can then estimate an AR(1) model for the log-difference of real GDP:
fit <- dyn$lm(dlGDP ~ lag(dlGDP,-1))

(Intercept)
lag(dlGDP, -1)

Estimate Std. Error t value Pr(>|t|)


0.482
0.071
6.81
0
0.386
0.057
6.72
0

which has a R2 of 0.152. The forecast for 2014Q1 based on the information available in 2013Q4 is obtained
as 0.482+0.386*0.648 which is equal to an expectation of GDP growth of 0.732%. The realized GDP growth in
that quarter happened to be -0.744 and the forecast error is thus -1.476% which is quite large relative to the
standard deviation of the time series of 0.939%. In addition, qualitatively the AR(1) model predicts persistence
of growth rates whilst in this case the realization was a temporary contraction of output in 2014Q1. R has of
course functions that produce forecasts automatically; for example:
fit <- ar(dlGDP, method="ols", order=1, demean=FALSE, intercept=TRUE)
predict(fit, n.ahead=4)

40

Time Series Models

$pred
Qtr1 Qtr2 Qtr3 Qtr4
2014 0.732 0.764 0.776 0.781
$se
Qtr1 Qtr2 Qtr3 Qtr4
2014 0.855 0.917 0.926 0.927

where the variable dlGDP represents the percentage growth rates of GDP and the function predict() produces
forecasts up to n.ahead periods. The $se output of the predict() function represents the standard error of
the forecast and provides a measure of uncertainty around the forecast.

3.5 Seasonality
A seasonal pattern in a time series represents the regular occurrence of higher/lower realizations of the
variable in certain periods of the year. The seasonal pattern is related to the frequency at which the time
series is observed. For daily data it could be by the 7 days of the week, for monthly data by the 12 months
in a year, and for quarterly data by the 4 quarters in a year. For example, electricity consumption spikes
during the summer months while being lower in the rest of the year. Of course there are many other factors
that determine the consumption of electricity which might grow over time, but seasonality captures the
characteristic of systematically higher/lower values at certain times of the year. As an example, lets assume
that we want to investigate if there is seasonality in the S&P 500 returns at the monthly frequency. To capture
the seasonal pattern we use dummy variables that take value 1 in a certain month and zero in all other months.
For example, we define the dummy variable JANt to be equal to 1 if month t is January and 0 otherwise,
F EBt takes value 1 every February and it is 0 otherwise, and similarly for the remaining months. We can
then include the dummy variables in a regression model, for example,
Yt = 0 + 1 Xt + 2 F EBt + 3 M ARt + 4 AP Rt + 5 M AYt +

6 JU Nt + 7 JU Lt + 8 AU Gt + 9 SEPt + 10 OCTt + 11 N OVt + 12 DECt + t


where Xt can be lagged values of Yt and/or some other relevant independent variable. Notice that in this
regression we excluded one dummy variable (in this case JANt ) to avoid perfect collinearity among the
regressors (dummy variable trap). The coefficients of the other dummy variables should then be interpreted
as the expected value of Y in a certain month relative to the expectation in January, once we control for the
independent variable Xt . There is an alternative way to specify this regression which consists of including all
12 dummy variables and excluding the intercept: Yt = 1 Xt +1 JANt +2 F EBt +3 M ARt +4 AP Rt +
5 M AYt + 6 JU Nt + 7 JU Lt + 8 AU Gt + 9 SEPt + 10 OCTt + 11 N OVt + 12 DECt + t The first
regression is typically preferred because a test of the significance of the coefficients of the 11 monthly dummy
variables provides evidence in favor or against seasonality. The same hypothesis could be evaluated in the
second specification by testing that the 12 parameters of the dummy variables are equal to each other.
We can create the seasonal dummy variables by first defining a variable month using the cycle() function
which identifies the month for each observation in a time series object. For example, for the monthly S&P 500
returns we have:

41

Time Series Models

month <- cycle(spm.ret)


head(month, 12)

1990(1) 1990(2) 1990(3) 1990(4)


1
2
3
4
1990(9) 1990(10) 1990(11) 1990(12)
9
10
11
12

1990(5)
5

1990(6)
6

1990(7)
7

1990(8)
8

This shows that the month variable provides the month of each observation from 1 to 12. We can then use a
logic statement to define the monthly dummy variables such as JAN = as.numeric(quarter == 1) which
returns (printed are the first 12 observations of JAN)
[1] 1 0 0 0 0 0 0 0 0 0 0 0

To illustrate the use of seasonal dummy variables I consider a simple example in which I regress the monthly
return of the S&P 500 on the 12 monthly dummy variables (no Xt variable for now). As discussed before,
to avoid the dummy variable trap, I opt for the exclusion of the intercept/constant from the regression and
include the 12 dummy variables. The model is implemented as follows:
fit <- dyn$lm(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC)
summary(fit)

Call:
lm(formula = dyn(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC))
Residuals:
Min
1Q
-19.890 -2.087

Median
0.488

Coefficients:
Estimate Std.
JAN
-0.171
FEB
1.364
MAR
1.642
APR
0.905
MAY
-0.625
JUN
0.688
JUL
-1.133
AUG
-0.480
SEP
1.326
OCT
1.322
NOV
1.859
DEC
0.514
---

3Q
2.887

Max
8.905

Error t value Pr(>|t|)


0.874
-0.19
0.846
0.874
1.56
0.120
0.874
1.88
0.061 .
0.874
1.03
0.302
0.874
-0.72
0.475
0.874
0.79
0.432
0.874
-1.30
0.196
0.874
-0.55
0.584
0.874
1.52
0.131
0.874
1.51
0.132
0.874
2.13
0.034 *
0.893
0.58
0.565

42

Time Series Models

Signif. codes:

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.28 on 275 degrees of freedom


Multiple R-squared: 0.0666,
Adjusted R-squared: 0.0258
F-statistic: 1.63 on 12 and 275 DF, p-value: 0.0818

The results indicate that, in most months, the expected return is not significantly different from zero, except
for March and November. In both cases the coefficient is positive which indicates that in those months it
is expected that returns are higher relative to the other months. To develop a more intuitive understanding
of the role of the seasonal dummies, the graph below shows the fitted or predicted returns from the model
above. In particular, the expected return of the S&P 500 in January is -0.171%, that is, E(Rt |JANt = 1) =
-0.171, while in February the expected return is E(Rt |F EBt = 1) = 1.364% and so on. These coefficients
are plotted in the graph below and create a regular pattern that is expected to repeat every year:

Returns seems to be positive in the first part of the year and then go into negative territory during the summer
months only to return positive toward the end of the year. However, keep in mind that only March and
November are significant at 10%. We can also add the lag of the S&P 500 return to the model above to see if
the significance of the monthly dummy variables changes and to evaluate if the goodness of the regression
increases:
fit <- dyn$lm(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB + MAR + APR +
MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC)
summary(fit)

43

Time Series Models

Call:
lm(formula = dyn(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB +
MAR + APR + MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC))
Residuals:
Min
1Q
-19.321 -2.330

Median
0.367

3Q
2.821

Max
9.343

Coefficients:
Estimate Std. Error t value Pr(>|t|)
lag(spm.ret, -1)
0.0629
0.0604
1.04
0.299
JAN
-0.2472
0.8952
-0.28
0.783
FEB
1.3751
0.8759
1.57
0.118
MAR
1.5563
0.8797
1.77
0.078 .
APR
0.8016
0.8814
0.91
0.364
MAY
-0.6822
0.8775
-0.78
0.438
JUN
0.7273
0.8766
0.83
0.407
JUL
-1.1764
0.8768
-1.34
0.181
AUG
-0.4084
0.8785
-0.46
0.642
SEP
1.3562
0.8763
1.55
0.123
OCT
1.2387
0.8795
1.41
0.160
NOV
1.7756
0.8795
2.02
0.044 *
DEC
0.3987
0.9015
0.44
0.659
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.29 on 273 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.0703,
Adjusted R-squared: 0.0261
F-statistic: 1.59 on 13 and 273 DF, p-value: 0.0877

The results show that in July and August the expected return of the S&P 500 is negative, by 1.176% and 0.408%,
respectively, while in December it positive and equal to 0.399%.
Seasonality is a common characteristics of macroeconomic variables. Typically, we analyze these variables
on a seasonally-adjusted basis, which means that the statistical agencies have already removed the seasonal
component from the variable. However, they also provide the variables before the adjustment and we consider
the Department Stores Retail Trade (FRED ticker RSDSELDN) at the monthly frequency. The time series graph
for this variable from January 1995 is shown below.

Time Series Models

44

The seasonal pattern is quite clear in the data and it seems to happen toward the end of the year. We can
conjecture that the spike in sales is probably associate with the holiday season in December, but we can
quantitatively test this hypothesis by estimating a linear regression model in which we include monthly
dummy variables as explanatory variables to account for this pattern. The regression model for the sales in $
of department stores, denoted by Yt , is
Yt = 0 + 1 Xt + 2 F EBt + 3 M ARt + 4 AP Rt + 5 M AYt + 6 JU Nt + 7 JU Lt + 8 AU Gt +
9 SEPt + 10 OCTt + 11 N OVt + 12 DECt + t
where, in addition to the monthly dummy variables, we have the first order lag of the variable. Below are
shown the regression results:
fit <- dyn$lm(Y ~ lag(Y, -1) + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC)

Estimate Std. Error t value Pr(>|t|)


(Intercept) -13544.583
879.715 -15.397
0
lag(Y, -1)
0.902
0.029 31.033
0
FEB
15560.687
542.250 28.697
0
MAR
16863.826
523.734 32.199
0
APR
14961.717
474.782 31.513
0
MAY
16019.665
478.189 33.501
0
JUN
14490.353
455.418 31.818
0
JUL
14573.837
472.162 30.866
0
AUG
16449.176
483.557 34.017
0
SEP
13444.096
450.119 29.868
0
OCT
16245.844
493.089 32.947
0
NOV
19172.537
463.364 41.377
0
DEC
24815.757
371.732 66.757
0

Time Series Models

45

The results indicate the retail sales at department stores are highly persistent with an AR(1) coefficient of
0.902. To interpret the estimates of the seasonal coefficients we need to notice that the dummy for the month
of January was left out so that all other seasonality dummy coefficients should be interpreted as the difference
is sales relative to the first month of the year. The results indicate that all coefficients are positive and thus
retail sales are higher than January. From the low levels of January, sales seems to increase toward the summer,
remain relative stable during the summer months, and then increase significantly in November and December.
Below is a graph of the variable and the model fit (red dashed line).
plot(Y, col="gray",lwd=5)
lines(fitted(fit),col=2,lty=2,lwd=2)

3.6 Trends in time series


A trend in a time series is defined as the tendency of an economic or financial time series to grow over time.
Example are the real Gross Domestic Product (FRED ticker: GDPC1), the Consumer Price Index (FRED ticker:
CPIAUCSL), and the S&P 500 index (YAHOO ticker: GSPC).

Time Series Models

46

The three series have in common the feature of growing over time with no tendency to revert back to the
mean. In fact, the trending behavior of the variables implies that the mean of the series is also increasing
over time rather than being approximately constant as time progresses. For example, if we were to estimate
the mean of GDP, CPI, and the S&P 500 in 1985 it would not have been a good predictor of the future value
of the mean because the mean value of a variable with a trend keeps growing over time. This type of series
are called non-stationary because the mean and the variance of the distribution are changing over time. On
the other hand, series are defined stationary when their long-run distribution is constant as time progresses.
For these variables, we can thus conclude that their time series graph clearly indicate that the variables are
non-stationary.
Very often in economics and finance we prefer to take the natural logarithm since it makes the exponential
growth of some variables approximately linear. The same variables discussed above are shown below in
natural logarithm:

Taking the log of the variables is particularly relevant for the S&P 500 Index which shows a considerable
exponential behavior, at least until the end of the 1990s. Also for GDP the time series plot seems to become
more linear, while for CPI we observe different phases of a linear trend up to the end of the 1960s, the 1970s,

47

Time Series Models

and again since the the 1980s.

3.7 Deterministic and Stochastic Trend


A simple approach to model this type of time series is to assume that they follow a deterministic (non-random)
trend, and to start we can assume that this trend is linear. In this case we are assuming that the series Yt evolves
according to the model Yt = 0 +1 t+dt where the deviation dt is a zero mean stationary random variable
(e.g., an AR(1) process). The independent variable denoted t is the time trend and takes value 1 for the first
observation, value 2 for the second observation and value T for the last observation. This type of trend is
referred to as deterministic since it makes the prediction that every time period the series Yt is expected to
increase by 1 . It is often referred as the trend-stationary model since the deterministic trend accounts for
the non-stationary behavior of the time series and the deviation from the trend is assumed to be stationary.
Estimating by OLS this model in R requires first to create the trend variable and then we can use the lm() or
dyn$lm() function as before:
trend <- zooreg((1:length(gdp)), start=c(1950,1), frequency=4)
fit
<- dyn$lm(log(gdp) ~ trend)
coef(fit)

(Intercept)
7.761

trend
0.008

where the estimate of 1 is 0.008 which indicates that real GDP is expected to grow 1% every quarter. The
application of the linear trend model to the three series shown above gives as a result the dashed trend line
shown in the graph below:

The deviation of the series from the fitted trend line are small for GDP, but for CPI and S&P 500 indices
they are persistent and last for long periods of time (i.e., 10 years or even longer). Such persistent deviations
might be due to the inadequacy of the linear trend model and the need to consider a nonlinear trend. This

Time Series Models

48

can be accommodated by adding a quadratic and cubic term to the linear trend model which becomes: Yt =
0 + 1 t + 2 t2 + 3 t3 + dt where t2 and t3 represent the square and cube of the trend variable.
The implementation in R requires the only additional step of creating the quadratic and cubic terms as shown
below:
trend2 <- trend^2
trend3 <- trend^3
fitsq <- dyn$lm(log(gdp) ~ trend + trend2)
round(summary(fitsq)$coefficients, 4)

(Intercept)
trend
trend2

Estimate Std. Error t value Pr(>|t|)


7.6741
0.0063 1211.5
0
0.0034
0.0000
89.1
0
0.0000
0.0000
-19.5
0

fitcb
<- dyn$lm(log(gdp) ~ trend + trend2 + trend3)
round(summary(fitcb)$coefficients, 4)

(Intercept)
trend
trend2
trend3

Estimate Std. Error t value Pr(>|t|)


7.6944
0.0082 936.245
0.0000
0.0031
0.0001 33.161
0.0000
0.0000
0.0000
0.362
0.7173
0.0000
0.0000 -3.749
0.0002

The linear (dashed line) and cubic (dash-dot line) deterministic trends are shown in the Figure below. For
the case of GDP the differences between the two lines is not visually large, although the quadratic and/or
cubic regression coefficients might be statistically significant at conventional levels. In addition, the AIC of
the linear model is 281.471 while for quadratic and cubic trend model is -1004.701 and -1016.595. Hence, in this
case we would select the cubic model which does slightly better relative to the quadratic, and significantly
better relative to the linear trend model. However, for CPI the cubic trend seems to capture the slow increase
in the log CPI index at the beginning of the sample, followed by a rapid increase and then again a slower
growth of the index. The period of rapid growth of the CPI index happened in the 1970s when the surge in
oil prices led to an increase of the rate of inflation in the US and globally. However, it could be argued that
the deterministic trend model might not represent well the behavior of the CPI and S&P 500 index since the
series departs from the trend (even the cubic one) for long periods of time.

Time Series Models

49

Another way to visualize the goodness of the trend-stationary model is to plot dt , the residuals or deviation
from the cubic trend, and investigate their time series properties. Below we show the time series graph of the
deviations and their ACF function with lag up to 20 (quarters for GDP and months for CPI and S&P 500):

Time Series Models

50

The ACF shows clearly that the deviation of the log GDP from the cubic trend is persistent but with rapidly
decaying values. To the contrary, for CPI and S&P 500 we observe that the serial correlation decays very
slowly which is typical of non-stationary time series. In other words, we find that the deviations exhibit
non-stationary behavior even after taking into account for a deterministic trend.
An alternative model that is often used in asset and option pricing is the random walk with drift model. The
model takes the following form: Yt = + Yt1 + t where is a constant and t is an error term with mean
zero and variance 2 . The random walk model assumes that the expected value of Yt is equal to the previous
value of the series (Yt1 ) plus a constant term (which can be positive or negative). In formula we can write
this as E(Yt |Yt1 ) = + Yt1 . The model can also be reformulated by substituting backwards the value of
Yt1 which, based on the model, is + Yt2 + t1 and we obtain Yt = 2 + Yt2 + t + t1 . Then we
can substitute Yt2 , Yt3 , and so on until we reach Y0 and the model can be written as

51

Time Series Models

Yt = + Yt1 + t
= + + Yt2 + t1 + t
= + + + Yt3 + t2 + t1 + t
=
= Y0 + t +

tj+1

j=1

This shows that a random walk with drift model can be expressed as the sum of a deterministic trend (t )
and a term which is the sum of all past errors/shocks
to the series. In case the drift term is set equal to zero,

the model reduces to Yt = Yt1 + t = Y0 + tj=1 tj+1 which is called the random walk model (without
drift since is equal to zero). Hence, another way to think of the random walk model with drift is as the sum
of a deterministic linear trend and a random walk process.
The relationship between the trend-stationary and the random walk with drift models becomes clear if we
assume that the deviation from the trend dt follow an AR(1) process, that is, dt = dt1 + t , where is
the coefficient of the first lag and t is a mean zero random variable. Similarly to above, we can do backward
substitution of the AR term in the trend-stationary model, that is,

Yt = 0 + 1 t + dt
+ dt1 + t
+ 2 dt2 + t1 + t
+ ...
+ t + t1 + 2 t2 + + t1 1
t

+
j1 tj+1
j=1

Comparing this equation with the one obtained above for the random walk with drift model we find that the
former is a special case of the latter for = 1. We have thus related the two models in being only different
in terms of the persistence of the deviation from the trend. If the coefficient is less than 1 the deviation
is stationary and thus the trend-stationary model can be used to de-trend the series and then conduct the
analysis on the deviation. However, when = 1 the deviation is non-stationary (i.e., random walk) and
the approach just described is not valid anymore and we will discuss later what to do in this case. A more
practical way to understand the issue of the (non-)stationarity of the deviation from the trend is to think
in terms of the speed at which the series is likely to revert back to the trend-line. Series that oscillate often
around the trend are stationary while persistent deviations from the trend (slow reversion) are an indication
of non-stationarity. How do we know if a series (e.g., the deviation from the trend) is stationary or not? In the
following section we will discuss a test that evaluates this hypothesis and thus provides guidance as to what
modeling approach to take.
The previous analysis of GDP, CPI, and the S&P 500 index shows that the deviations of GDP from its trend
seem to revert to the mean faster relative to the other two series: this can be seen both in the time series plot

Time Series Models

52

and also from the quickly decaying ACF. The estimate of the trend-stationary model shows that we expect
GDP to grow around 0.8% per quarter (or 3.2% annualized), although GDP alternates periods above trend
(expansions) and periods below trend (recessions). The alternation between expansions and recessions thus
captures the mean-reverting nature of the GDP deviations from the long-run trend and its stationarity. We can
evaluate the ability of the trend-stationary model to capture the features of the business cycle by comparing
the periods of positive and negative deviations with the peak and trough dates of the business cycle decided
by the NBER dating committee. In the graph below we plot the deviation from the cubic trend estimated
earlier together with the gray areas that indicate the period of recessions.
# dates from the NBER business cycle dating committee
xleft = c(1953.25, 1957.5, 1960.25, 1969.75, 1973.75, 1980,
1981.5, 1990.5, 2001, 2007.917) # beginning
xright = c(1954.25, 1958.25, 1961, 1970.75, 1975, 1980.5, 1982.75,
1991, 2001.75, 2009.417) # end
#fitgdp is the lm() object for the cubic trend model
plot(residuals(fitgdp), ylim=c(-0.10,0.10), xlab="", ylab="")
abline(h=0, col=2, lwd=2, lty=2)
rect(xleft, rep(-0.10,10), xright, rep(0.10,10), col="gray90", border=NA)

Overall, there is a tendency for the deviation to sharply decline during recessions (gray areas), and then
increase during the recovery period and the expansion, which seem to have last longer since the mid-1980s.
Earlier we discussed that the distribution of a non-stationary variable changes over time. We can now derive
the mean and variance of Yt when it follows a trend-stationary model and when it follows a random walk
with drift. Under the trend-stationary model the dynamics follows Yt = 0 + 1 t + dt and we can make the
simplifying assumption that dt = dt1 +t with t a mean zero and variance 2 error term. Based on these
assumptions, we obtain that E(dt ) = 0 and variance V ar(dt ) = 2 /(1 2 ), so that Et (Yt ) = 0 + 1 t
and V art (Yt ) = V ar(dt ) = 2 /(1 2 ).This demonstrates that the mean of a trend-stationary variable is
a function of time and not constant, while the
On )
the other hand for the random walk
( variance is constant.

with drift model we have that Et (Yt ) = Et Y0 + t + tj=1 tj+1 = Y0 + t and V art (Yt ) = t2 .

53

Time Series Models

From these results we see that for the random walk with drift model both the mean and the variance are time
varying, while for the trend-stationary model only the mean varies with time.
The main difference between the trend-stationary and random walk with drift models thus consists if the
non-stationarity properties of the deviations from the deterministic trend. For the trend-stationary model the
deviations are considered stationary and they can be analysed using regression models estimated by OLS to
investigate their dynamics. However, for the random walk with drift the deviations are non-stationary and its
time series cannot be considered in regression models because of several statistical issues that will be discussed
in the next Section, followed by a discussion of an approach to statistically test if a series is non-stationary
and non-stationary around a (deterministic) trend.

3.8 Why non-stationarity is a problem for OLS?


The estimation of time series models can be conducted by standard OLS techniques, with the only additional
requirement that the series is stationary. If this is the case, the OLS estimator is consistent and t-statistics are
distributed according to the Student t distribution. However, these properties fail to hold when the series is
non-stationary and three problems arise:
1. the OLS estimate of the AR coefficients are biased in small samples
2. the t test statistic is not normally distributed (even in large samples)
3. the regression of a non-stationary variables on an independent non-stationary variable leads to spurious
results of dependence between the two series
We will discuss and illustrate these problems in the context of a simulation study using R. To show the first
fact we simulate B time series from a random walk with drift model Yt = + Yt1 + t for t = 1, , T
and then estimate an AR(1) model Yt = 0 + 1 Yt1 + t and store the B OLS estimates of 1 . We perform
this simulation for a short and a long time series (T = 25 and T = 500, respectively) and compare the mean,
median, and histogram of the 1 across the B simulations. Below is the code for T = 25:
require(dyn)
set.seed(1234)
T
B
mu
sigma

=
=
=
=

25
1000
0.1
1

# length of the time series


# number of simulation
# value of the drift term
# standard deviation of the error

beta = matrix(NA, B,1) # object to store the DF test stat


for (b in 1:B)
{
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
# simulates the series
Y
fit
<- dyn$lm(Y ~ lag(Y,-1))
# OLS estimation of DF regression
beta[b] <- summary(fit)$coef[2,1]
# stores the results
}
# plotting
hist(beta, breaks=50, freq=FALSE, main="")
abline(v=1, col=2,lwd=2)
box()

Time Series Models

54

The histogram shows that the empirical distribution of 1 over 1000 simulations ranges between 0.158 and
1.154 with a mean of 0.812 and median of 0.842. Both the mean and the median are significantly smaller
relative to the true value of 1. This demonstrates the bias in the OLS estimates in small samples when the
variable is non-stationary. However, this bias has a tendency to decline for larger sample sizes as shown in
the histogram below that is produced by the same code above but for T = 500:

In this case the min/max estimate are 0.948 and 1.005 with a mean and median of 0.997 and 0.998, respectively.
For the larger sample of 500 periods, even though the series is non-stationary, the coefficient estimates of 1
are close to the theoretical value of 1 and thus there is no bias.

Time Series Models

55

The second fact that arises when estimating AR by OLS when variables are non-stationary is that the t statistic
does not follow the normal distribution even when samples are large. This can be seen clearly in the histogram
below for the test statistic for the null hypothesis that 1 = 1 and for T=500. The histogram below shows
that the distribution of t statistic for the null hypothesis that 1 = 1 which is skewed to the left relative to
the standard normal distribution.

The third problem with non-stationary variables occurs when the interest is the relationship between X
and Y and both variables are non-stationary. This could lead to spurious results of significant evidence of a
relationship between the two series when indeed they are independent of each other. An intuitive explanation
for this result can be provided when considering, e.g., two independent random walk with drift: estimating a
LRM regression model finds co-movement between the series due to the existence of a trend in both variables
that makes the series move in the same or opposite direction. The simulation below shows more intuitively
this results for two independent processes X and Y with the same drift parameter , but independent of
each other (i.e., Y is not a function of X). The histogram of the t statistics for the significance of 1 in
Yt = 0 + 1 Xt + t is shown in the left plot, while the R2 of the regression is shown on the right. The
distribution of the t test statistic has a significant positive mean and would lead to an extremely large number
of rejections of the hypothesis that 1 = 0, when indeed it is equal to zero. Also the distribution of the R2
shows that in the vast majority of these 1000 simulations we would find a moderate to large fit measure
which would suggest a significant relationship between the two variables, although the truth is that there is
no relationship.

56

Time Series Models

require(dyn)
set.seed(1234)
T
B
mu
sigma
tstat
R2

=
=
=
=

500
1000
0.1
1

# length of the time series


# number of simulation
# value of the drift term
# standard deviation of the error

= matrix(NA, B,1) # object to store the DF test stat


= beta

for (b in 1:B)
{
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
Y
X
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
fit
<- dyn$lm(Y ~ X)
tstat[b] <- summary(fit)$coef[2,3]
R2[b]
<- summary(fit)$r.square

# simulates the Y series


# simulates the X series

# OLS estimation of DF regression


# stores the results

}
# plotting
par(mfrow=c(1,2))
hist(tstat, breaks=50, freq=FALSE, main="")
box()
hist(R2, breaks=30, freq=FALSE, main="", xlim=c(0,1))
box()

3.9 Testing for non-stationarity


From an empirical point of view, our task is to establish if a series is stationary or non-stationary. In case there
is a trend/drift in the series (e.g., as for GDP, CPI and S&P 500 in the graphs above) then we need to understand

57

Time Series Models

if the deviation from the trend is stationary or non-stationary. Typically, only by visually inspecting at the
series it is possible to determine if there is a drift or not in the series and thus to narrow down the question
of the stationarity of the deviation from the deterministic trend. If a drift in the series is not apparent, then it
is likely that the series does not have a trend and thus the question is whether the series is an AR process or
a random walk without drift.
To test for stationarity we follow the usual approach of calculating a test statistic with a known distribution
under the null hypothesis which allows to decide whether we are confident on the stationarity of the series
or not. We do this by estimating the following processes by OLS:
Yt = Yt1 + t
Yt = + Yt1 + t
In the case of the first Equation we are assuming a mean-zero AR(1) model which becomes a random walk
(without drift) in case = 1, whilst for the following Equation we are estimating a AR(1) process which
becomes a random walk with drift in case = 1. The null hypothesis we are interested in testing is H0 : = 1
(non-stationarity) against the alternative H1 : < 1 (stationarity). Rejection of the null hypothesis leads to
the conclusion that the series is stationary while failure to reject is interpreted as evidence that the series is
non-stationary. These models are typically reformulated by subtracting Yt1 from both the left and right side
of the previous Equations which results in
Yt = Yt1 + t
Yt = + Yt1 + t
where Yt = Yt Yt1 and = 1. Testing the hypothesis that = 1 is thus equivalent to test = 0.
The test is referred to as the Dickey-Fuller (DF) test and is given by DF = where and
are the OLS
estimate of and its standard error assuming homoskedasticity. Under the null hypothesis that the series
is non-stationary, the DF statistic is not distributed according to the Student t since it requires running a
regression that involves non-stationary variables which leads to the problems discussed earlier. Instead, it
follows a different distribution with critical values that are tabulated and are provided below.
The non-standard distribution of the DF test statistic can be investigated via a simulation study using R. The
code below performs a simulation in which we generate a random walk time series (without drift), estimate
the DF regression equation, and store the t-statistic of Yt1 which represents the DF test statistic. We repeat
these operations B times and then plot a histogram of the DF statistic together with the Student t distribution
with T-1 degree-of-freedom (T represents the length of the time series set in the code).

58

Time Series Models

require(dyn)
set.seed(1234)
T
B
mu
sigma

=
=
=
=

100
500
0.1
1

#
#
#
#

length of the time series


number of simulation
value of the drift term
standard deviation of the error

DF = matrix(NA, B,1) # object to store the DF test stat


for (b in 1:B)
{
Y
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
# simulates the series
fit
<- dyn$lm(diff(Y) ~ lag(Y,-1))
# OLS estimation of DF regression
DF[b] <- summary(fit)$coef[2,3]
# stores the results
}
# plotting
hist(DF, breaks=50, xlim=c(-6,6), freq=FALSE, main="")
curve(dt(x, df=(T-1)), add=TRUE, col=2)
box()

The graph shows clearly that the distribution of the DF statistic does not follow the t distribution: it has
a negative mean and median (instead of 0), it is skewed to the right (positive skewness) rather than being
symmetric, and its empirical 5% quantile is -2.849 instead of the theoretical value of -1.66. Since we are
performing a one-sided test against the alternative hypothesis H1 : < 0, using the one-sided 5% critical
value from the t distribution would lead to reject the null hypothesis of non-stationarity too often relative
to the appropriate critical values derived by Dickey and Fuller. For the simulations exercise above, the
(asymptotic) critical value for the null of a random walk with drift is -2.86 at 5% significance level. The
percentage of simulations for which we reject the null based on this critical value and that from the t
distribution are

59

Time Series Models

sum(DF < -2.86) / B

[1] 0.048

sum(DF < -1.67) / B

[1] 0.376

This shows that using the critical value from the t-distribution would lead to reject too often (37.6% of the
times) relative to the expected level of 5%. Instead, using the correct critical value the null is rejected 4.8%
which is quite close to the 5% significance level.
In practical implementation, it is typically advisable to include lags of Yt to control for serial correlation
in the changes of the variable. This is called the Augmented Dickey Fuller (ADF) and
requires to estimate
the following regression model (for the case with a constant): Yt = + Yt1 + pj=1 j Ytj + t
which consists of adding lags of the change of Yt . The ADF test statistic is calculated as before by taking the
t-statistic of the Yt1 , that is, ADF = /
. Another variation of the test also includes a trend variable in
the regression model to calculate the test statistic, that is,
Yt = + Yt1 + 0 t +

j Ytj + t

j=1

The reason for including a deterministic trend in the ADF regression is that we want to be able to discriminate
between a deterministic trend model and a random walk model with drift. If we do not reject the null
hypothesis then we conclude that the time series follows a random walk with drift whilst in case of rejection
we conclude in favor of the deterministic trend model.
Once we calculate the DF or ADF test statistic we need to evaluate its statistical significance using the
appropriate critical values. As discussed earlier, these statistics have a special distribution and critical values
have been tabulated for the case with/without a constant and with/without a trend and for various sample
sizes. Below you can find the critical values obtained for various sample sizes for a model with constant and
with or without trend:
Sample Size
T = 25
T = 50
T = 100
T = 250
T = 500
T=

Without
1%
-3.75
-3.58
-3.51
-3.46
-3.44
-3.43

Trend

With

Trend

5%
-3.00
-2.93
-2.89
-2.88
-2.87
-2.86

1%
-4.38
-4.15
-4.04
-3.99
-3.98
-3.96

5%
-3.60
-3.50
-3.45
-3.43
-3.42
-3.41

The non-stationarity test is implemented in the urca package using the function ur.df(). Below is an

60

Time Series Models

application to the logarithm of the real GDP:


require(urca)
adf <- ur.df(log(gdp), type="trend", lags=4) # type: "none", "drift", "trend"
summary(adf)

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.03076 -0.00466

Median
0.00052

3Q
0.00494

Coefficients:
Estimate Std. Error t
(Intercept) 9.06e-02
8.45e-02
z.lag.1
-1.08e-02
1.09e-02
tt
7.23e-05
8.81e-05
z.diff.lag1 3.31e-01
6.42e-02
z.diff.lag2 9.85e-02
6.75e-02
z.diff.lag3 -4.73e-02
6.69e-02
z.diff.lag4 -4.37e-02
6.34e-02
--Signif. codes: 0 '***' 0.001 '**'

Max
0.03379

value Pr(>|t|)
1.07
0.28
-0.99
0.32
0.82
0.41
5.16 5.1e-07 ***
1.46
0.15
-0.71
0.48
-0.69
0.49
0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.00852 on 245 degrees of freedom


Multiple R-squared: 0.156, Adjusted R-squared: 0.135
F-statistic: 7.53 on 6 and 245 DF, p-value: 2.04e-07

Value of test-statistic is: -0.988 12.1 2.27


Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.98 -3.42 -3.13
phi2 6.15 4.71 4.05
phi3 8.34 6.30 5.36

In this case the ADF test statistic is -0.99 which should be compared to the critical value at 5% -3.42 and we do
not reject the null hypothesis that the series is non-stationary and follows a random walk with drift model.

Time Series Models

61

3.10 What to do when a time series is non-stationary


If the ADF test leads to the conclusion that a series is non-stationary then we need to understand if the source
of non-stationarity is a trend-stationary model or a random walk model (with or without a drift). In the first
case, the non-stationarity is solved by de-trending the series, which means that the series is regressed on a time
trend and the residuals of the regression are then modeled using time series models (e.g., AR(p)). However,
in case we find that the series follows a random walk model then the solution to the non-stationarity is to
take the difference of the series and thus analyze Yt = Yt Yt1 . The reason for this is that differecing the
series removes the trend and this can be seen in the random walk with drift model Yt = + Yt1 + t where
by taking Yt1 to the left-hand side results in Yt = + t . If we assume that t is normally distributed with
mean 0 and variance 2 then Yt will also be normally distribution with mean and variance 2 which is
obviously stationary.

4. Volatility Models
A stylized fact across many asset classes is that the standard deviation of returns, often referred to as volatility,
varies significantly over time. The graph below shows the time series of the daily returns of the S&P 500 from
the beginning of 1990 until the end of 2013. The mean daily return for the S&P 500 returns is 0.028% and its
standard deviation is 1.142%. This estimate of the standard deviation represents a long-run average between
periods of high and low volatility. In the Figure below we can see that returns are within the two standard
deviation confidence bands (dashed lines in the graph) for long periods of time. Occasionally, there are sudden
bursts of high volatility that last for several months before volatility decreases again. In other words, volatility
is time-varying in the sense that it alternates between regimes of low and high volatility. This fact should be
accounted for by any model of financial returns since volatility is a proxy for risk, which is an important
input to many financial decisions (e.g., from option pricing to risk management). Modeling volatility is thus
a particularly relevant task in financial econometrics.

The model for high-frequency (daily and intra-daily) returns that we will work with in this Chapter has
the following structure: Rt+1 = t+1 + t+1 t+1 where the asset return is decomposed in following three
components:
Expected return: t+1 represents the expected return of the asset. At the daily and intra-daily frequency,
this component is typically assumed equal to zero. Alternatively, it could be assumed to follow an AR(p)
process t+1 = 0 +1 Rt + +p Rtp+1 or being a function of other contemporaneous variables,
that is, t+1 = 0 + 1 Xt .
Volatility: t+1 is the standard deviation of the shock conditional on information available at time t

62

Volatility Models

63

Unexpected shock: t+1 represents the shock that occurred in period t. A simple assumption is that
it is normally distribution with mean zero and variance one, but more sophisticated distributional
assumption can be introduced.
The aim of this Chapter is to discuss different models to estimate and forecast t+1 using simple techniques
such as Moving Average (MA) and Exponential Moving Average (EMA), followed by a discussion of a
more sophisticated time series models such as the Auto-Regressive Conditional Heteroskedasticy (ARCH)
model. ARCH models have been extended in several directions, but for the purpose of this Chapter we will
consider the two most important generalizations: GARCH (Generalized ARCH) and GJR-GARCH (Glosten,
Jagganathan and Runkle ARCH) which includes an asymmetric effect of positive/negative shocks on volatility.

4.1 Moving Average (MA) and Exponential Moving Average (EMA)


A simple approach to estimate the conditional variance is to average the square returns over a recent window
of observations. Lets denote the window size as M (e.g., M = 25 days).
(MA)
( 2 The2 Moving Average
)
1
2
2 , is given by 2
estimate of the variance in day t, denoted t+1
=
R
+
R
+

+
R
=
t
t1
t+1
tM
+1
M

M
1
2
2 . The two extreme values of the window
t+1
j=1 Rtj+1 and the standard deviation is calculated as
M
2
2
are M = 1, which implies t+1
= Rt2 , and M = t, that leads to t+1
= 2 , where 2 represents the
unconditional variance estimated in the full sample. Small values of M imply that the volatility estimate is
very responsive to the most recent square returns, whilst for large values of M the estimate responds very little
to the latest returns. Another way to look at the role of the window size on the smoothness of the volatility
is to interpret it as an average of the last M days each carrying a weight of 100/M % (i.e., for M = 25 the
weight is 4%). When M increases, the weight given to each observation in the window becomes smaller so
that each daily square returns (even when extreme) has a smaller impact on changing the volatility estimate.

The MA approach can be implemented in R using the rollmean() function provided in package zoo which
requires to specify the window size (M in the notation above). An example for the S&P 500 daily returns is
provided below:
require(zoo)
sigma25 <- rollmean(sp500daily^2, 25, align="right")
plot(abs(sp500daily), col="gray", xlab="", ylab="")
lines(sigma25^0.5, col=2, lwd=2)

Volatility Models

64

The effect of increasing the window size from M = 25 (continuous line) to 100 (dashed line) is shown in the
graph below: the longer window smooths out the fluctuation of the MA(25) since each observation is given
a smaller weight. This implies that large (negative or positive) returns increase volatility less relative to MA
calculated on smaller windows.
require(zoo)
sigma100 <- rollmean(sp500daily^2, 100, align="right")
plot(sigma25^0.5, col=2, xlab="", ylab="")
lines(sigma100^0.5, lwd=2, col=4, lty=2)

One drawback of the MA approach is that a large daily return is able to increase significantly the estimate
while it makes it decrease when the observations drops out of the window. This is because the MA approach
distributes the weight across the last M observations while returns older than M receive a zero weight. An
extension of the MA approach is to have a smoothly decreasing weight assigned to older returns instead of the

65

Volatility Models

discrete jump of the weight from 1/M to 0 at the M + 1th observation. This approach is called Exponential
Moving Average (EMA) and it is calculated as follows:
2
t+1

2
=
(1 )j1 Rtj+1
j=1

where is a smoothing parameter between zero and one. After some algebra, the expression above can be
rewritten as follows:
2
t+1
= (1 ) t2 + Rt2

which shows that the conditional variance estimate in day t+1 is given by a weighted average of the previous
day estimate and the square return in day t, with the weights equal to 1 and , respectively. A typical
value of is 0.06 which means that day t has a 6% weight in determining that days volatility, day t 1 has
weight (0.94 * 6) = 5.64%, day t 2 has weight (0.942 * 6) = 5.3016%, day t k has weight (0.94k * 6)% and so
on. The method is called Exponential MA because the weights are decaying exponentially and became more
popular in finance since it was proposed by J.P. Morgan to model and predict volatility for Value-at-Risk (VaR)
calculations.
The typical value for M is 25 (one trading month) and for it is 0.06. The graph below compares the volatility
estimates from the MA(25) and EMA(0.06) methods. In this example we use the package TTR which provides
functions to calculate moving averages, both simple and of the exponential type:
require(TTR)
# SMA() function for Simple Moving Average; n = number of days
ma25 <- SMA(sp500daily^2, n=25)
# EMA() function for Exponential Moving Average; ratio = lambda
ema06 <- EMA(sp500daily^2, ratio=0.06)
plot(ma25^0.5, ylim=c(0, 6), ylab="", xlab="")
lines(ema06^0.5,col=2)

Volatility Models

66

The picture shows daily volatility estimate for over 23 years and it is difficult (at this scale) to see large
differences between the two methods. However, by plotting a sub-period of time, some differences become
evident. Below we plot the three year period between beginning 2008 and end of 2009. The biggest difference
between the two methods is that in periods of rapid change (from small/large to large/small returns) the EMA
line captures the change in volatility regime more smoothly relative to the simple MA. This is in particular
clear in the second part of 2008 and the beginning of 2009, with some marked difference between the two
lines.

The parameters of MA and EMA are, usually, chosen a priori rather than being estimated from the data.
There are possible ways to estimate these parameters although they are not popular in the financial literature
and typically do not provide great benefit for practical purposes. In the following Section we discuss a more
general volatility model which generalizes the EMA model and it is typically estimated from the data.

Volatility Models

67

4.2 Auto-Regressive Conditional Heteroskedasticity (ARCH)


models
The ARCH model was proposed in the early 1980s to model the time-varying variance of the inflation rate.
Since then it has gained popularity in finance as a way to model the intermittent behavior of the volatility of
asset returns. The simplest version of the model assumes that the return is given by Rt+1 = t+1 t , where the
2
conditional variance t2 is a function of the previous period square return, that is, t+1
= + Rt2 where
and are parameters to be estimated. The AR part in the ARCH name relates to the fact that the variance of
returns is a function of the lagged (square) returns. This specification is called the ARCH(1) model and can be
2
2
generalized to include p lags which results in the ARCH(p) model: t+1
= +1 Rt2 + +p Rtp+1
The
empirical evidence indicates that financial returns typically require p to be large. A parsimonious alternative to
the ARCH model is the Generalized ARCH (GARCH) model which is characterized by the following Equation
2
for the conditional variance: t+1
= + Rt2 + t2 where todays variance is affected both by last
period square return as well as by last period conditional variance. This specification is parsimonious, relative
2
to an ARCH model with p large, since all past square returns are included in t1
using only 3 parameters
instead of p + 1. The specification above represents a GARCH(1,1) model since it includes one lag of the
square return and the conditional variance. However, it can be easily generalized to the GARCH(p,q) case in
which p lags of the square return and q lags of the conditional variance are included. The empirical evidence
suggests that the GARCH(1,1) is typically the best model for several asset classes that is only in rare instances
outperformed by p and q different from 1. We can derive the mean of the conditional variance that, in the
2 ) that represents
case of the GARCH(1,1), is given by: 2 = /(1 ( + )) where by 2 we denote E(t+1
2
the unconditional variance as opposed to t2 that denotes the conditional one. The difference is that t+1
is
2
a forecast of volatility based on the information available today, while represents an average over time
of these conditional forecasts. This result shows that a condition for the variance return to be finite is that
+ < 1. If this condition is not satisfied then the variance of the returns is infinite and volatility behaves
like a random walk model, instead of being mean-reverting. To understand this, we can recall the discussion
of the AR(1) in which a coefficient (in absolute value) less than 1 guarantees that the time series is stationary,
and thus mean-reverting. We can think similarly about the case of the variance, where the condition that
+ is less than 1 provides the ground for volatility to be stationary and mean reverting. What does it
mean that variance (or volatility) is mean reverting? It means that volatility might be persistent, but oscillates
between periods in which it is higher and lower than its unconditional level. This interpretation is consistent
with the empirical observation that volatility switches between periods in which it is high and others in which
it is low. On the other hand, non-stationary volatility implies that periods of high volatility (i.e., higher than
average) are expected to persist in the long run rather than reverting back to the unconditional variance. The
same holds for period of low volatility which, in case of non-stationarity, is expected to last in the long run.
This distinction is practically relevant, in particular when the interest is to forecast future volatility and at
long horizons.
The distinction between mean-reverting (i.e., stationary) and non-stationary volatility is an important
property that differentiates volatility models. In particular, the EMA model can be considered a special case of
the GARCH conditional variance when its parameters are constrained to the following values: = 0, =
and = 1 . This shows that EMA imposes the assumption that + = 1, and thus that volatility is
non-stationary. Empirically, this hypothesis can be tested by assuming the null hypothesis + = 1 versus
the one sided hypothesis that + < 1.
Another GARCH specification that has become popular was proposed by Glosten, Jagganathan and Runkle

68

Volatility Models

(hence, GJR-GARCH) which assumes that the square return has a different effect on volatility depending on
its sign. The conditional variance Equation of this model is
2
t+1
= + 1 Rt2 + 1 Rt2 I(Rt 0) + t2

In this specification, when the return is positive its effect on the conditional variance is 1 and when it is
negative the effect is 1 + 1 . Testing the hypothesis that 1 = 0 thus provides a test of the asymmetric effect
of shocks on volatility. Empirically, for many assets the estimation results show that 1 is estimated close to
zero and insignificant, while 1 is found positive and significant. The evidence thus indicates that negative
shocks lead to more uncertainty and an increase in the volatility of asset returns, while positive shocks do not
have a relevant effect.

Estimation of GARCH models


ARCH/GARCH models cannot be estimated using OLS because the model is nonlinear in parameters.
Furthermore, the volatility is not directly observable which does not allow the OLS estimation of the
conditional variance model as it would be the case if the dependent variable is observable. However, recent
advances in financial econometrics that will be discussed in a later Section have partly overcome this difficulty.
The estimation of GARCH models is thus performed using an alternative (to OLS) estimation technique called
Maximum Likelihoood (ML). The ML estimation method represents a general estimation principle that can
be applied to a large set of models, not only to volatility models.
It might be useful to first discuss the ML approach to estimation in comparison to the familiar OLS one.
The OLS approach is to choose the parameter values that minimize the sum of the square residuals since it
measures the unfitness of the model to explain the data for a certain set of parameter values. Instead, the
approach of ML is to choose the parameter values that maximize the likelihood or probability that the data
were generated by the model using a certain set of parameter values. For the case of a simple AR model with
homoskedastic errors it can be shown that the OLS and ML estimators are equivalent. A difference between
OLS and ML is that the first provides analytical formula for the estimation of linear models, whilst the ML
estimator is the result of numerical optimization.
We assume that volatility models are of the general form discussed earlier, i.e., Rt+1 = t+1 + t+1 t .
Based on the distributional assumption introduced earlier, we can say that the standardized residuals (Rt+1
t+1 )/t+1 should be normally distributed with mean 0 and variance 1. Bear in mind that both t+1 and
t2 depend on parameters that are denoted by and . For example, for an AR(1)-GARCH(1,1) model the
parameter of the mean is = (0 , 1 ) and the parameters of the conditional variance is = (, , ).
Given the assumption of normality, the density or likelihood function of Rt+1 is
[

1
exp
f (Rt+1 | , ) =
2
2 ( )
2t+1

Rt+1 t+1 ( )
t+1 ( )

)2 ]

which represents the normal density evaluated at Rt+1 . We wrote the conditional mean and variance as
t+1 ( ) and t2 ( ) to make explicit their dependence on parameters over which the likelihood function
will be maximized. Since we have T returns, we are interested in the joint likelihood of the observed returns
and denoting by p the largest lag of Rt+1 used in the conditional mean and variance, we can define the
(conditional) likelihood function L( , ) (= f (Rp+1 , , Rt+1 | , , R1 , , Rp )) as

69

Volatility Models

L( , ) =
f (Rt+1 | , ) =
exp
2
2 ( )
2t+1
t=p+1
t=p+1

Rt+1 t+1 ( )
t+1 ( )

)2 ]

The ML estimates and are thus obtained by maximizing the likelihood function L( , ). It is
convenient to log-transform the likelihood function to simplify the task of maximizing the function. We
denote the log-likelihood by l( , ) and it is given by
[

T
1
l( , ) = ln L( , ) =
2

ln(2) +

(
2
ln t+1
( )

t=p+1

Rt+1 t+1 ( )
t+1 ( )

)2 ]

since the first term ln(2) does not depend on any parameter, it can be dropped from the function. The
estimates and are then obtained by maximizing
T
1
l( , ) =
2

t=p+1

[
2
ln t+1
( )

(
+

Rt+1 t+1 ( )
t+1 ( )

)2 ]

The maximization of the likelihood or log-likelihood is performed numerically, which means that we use
algorithms to find the maximum of this function. The problem with this approach is that, in some situations,
the likelihood function is not well-behaved since it is characterized by local maxima or it is flat, which means
that it is relatively constant for a large set of parameter values. The numerical search has to be started at some
initial values and, in the difficult cases just mentioned, the choice of these values is extremely important to
achieve the global maximum of the function. The choice of valid starting values for the parameters can be
achieved by a small-scale grid search over the space of possible values of the parameters.
In the case of volatility models the likelihood function is usually well-behaved and achieves a maximum
quite rapidly. In the following Section we discuss some R packages which implement GARCH estimation and
forecasting.

Inference for GARCH models

GARCH in R
There are several packages that provide functions to estimate models from the GARCH family. One of the
earliest is the garch() function in the tseries package. Being one of the earliest, it is quite limited in the type
of models it can estimate. Below is an example of the estimation of a GARCH(1,1) model to the daily S&P 500
returns:

70

Volatility Models

require(tseries)
fit <- garch(ts(sp500daily), order=c(1,1), trace=FALSE)
round(summary(fit)$coef, 3)

a0
a1
b1

Estimate
0.011
0.076
0.915

Std. Error
0.001
0.004
0.005

t value Pr(>|t|)
8.188
0
17.127
0
180.214
0

Where a0, a1 and b1 represent the parameter , , and respectively. The output provides standard errors for
the parameter estimates as well as t-stats and p-values for the null hypothesis that the coefficients are equal
to zero. The estimate of is 0.076 and of is 0.915, with their sum equal to 0.991, which is close enough to 1
to conclude that volatility is non-stationary.
More flexible functions for GARCH estimation are provided by the package fGarch, that allows flexibility in
modeling the conditional mean t+1 and the conditional variance t2 with time series models. The function to
perform the estimation is called garchFit. The example below shows the application of the garchFit function
to the daily returns of the S&P 500 index. The first example estimates a GARCH(1,1) without the intercept in
the conditional mean (i.e., t+1 = 0) and the results are thus comparable to the earlier ones for the garch
function; we then add in the model the intercept (t+1 = ), and finally we consider an AR(1)-GARCH(1,1)
model. Notice that to specify the AR(1) for the conditional mean we use the function arma(p,q) which is a
more general function than those used to estimate AR(p) models.
require(fGarch)
fit <- garchFit(~garch(1,1), data=sp500daily, include.mean=FALSE, trace=FALSE)
round(fit@fit$matcoef, 3)

omega
alpha1
beta1

Estimate
0.011
0.076
0.915

Std. Error
0.002
0.007
0.007

t value Pr(>|t|)
5.624
0
11.375
0
125.069
0

While the point estimates are equal to those obtained earlier for the garch function, the standard errors
are different due to differences between analytical and numerical standard errors. In the example below we
consider an intercept in the conditional mean which leads to small changes in the coefficient estimates of the
volatility parameters:
fit <- garchFit(~garch(1,1), data=sp500daily, trace=FALSE)
round(fit@fit$matcoef, 3)

71

Volatility Models

mu
omega
alpha1
beta1

Estimate
0.053
0.012
0.079
0.912

Std. Error
0.010
0.002
0.007
0.008

t value Pr(>|t|)
5.321
0
5.762
0
11.382
0
120.749
0

The results show that the mean is estimated equal to 0.053% and it is statistically significant even at 1%.
Hence, for the daily S&P 500 returns the assumption of a zero expected return is rejected. It might also be
interesting to evaluate the need to introduce some dependence in the conditional mean, for example, by
assuming a AR(1) model. The command arma(1,0) + garch(1,1) in the garchFit() function estimates an
AR(1) model with GARCH(1,1) conditional variance:
fit <- garchFit(~ arma(1,0) + garch(1,1), data=sp500daily, trace=FALSE)
round(fit@fit$matcoef, 3)

mu
ar1
omega
alpha1
beta1

Estimate
0.054
-0.013
0.011
0.078
0.912

Std. Error
0.010
0.013
0.002
0.007
0.008

t value Pr(>|t|)
5.371
0.000
-0.959
0.337
5.760
0.000
11.375
0.000
121.046
0.000

The estimate of the AR(1) coefficient is -0.013 and it is not statistically significant at 10% level, which shows
the irrelevance of including dependence in the conditional mean of daily financial returns.
Based on the GARCH model estimation, we can then obtain the conditional variance and the conditional
standard deviation. The conditional standard deviation is extracted by appending @sigma.t to the garchFit
object:
sigma <- fit@sigma.t
# define sigma as a zoo object since it is numeric
sigma <- zoo(sigma, order.by=index(sp500daily))
plot(sigma, type="l", main="Standard deviation", xlab="",ylab="")

Volatility Models

72

The graph shows the significant variation over time of the standard deviation that alternates between periods
of low and high volatility, in addition to sudden increases in volatility due to the occurrence of large returns.
It is also interesting to compare the fitted standard deviation from the GARCH model with the ones obtained
from the MA and EMA methods. As the plot below shows, the three estimates track each other very closely.
The correlation between the MA and GARCH conditional standard deviation is 0.98 and between EMA and
GARCH is 0.988 and, to a certain extent, they can be considered very good substitutes for each other (in
particular at short horizons).

Another quantity that we need to analyze to evaluate the goodness of the GARCH model is the residuals.
Appending @residuals to the garchFit estimation object we can extract the residuals of the GARCH
model that represent an estimate of t+1 t . The plot below shows that the residuals maintain most of the
characteristics of the raw returns, in particular the clusters of volatility. This is because these residuals have
been obtained from the last fitted model in which
t+1 = 0.054 + (-0.013) Rt1 and the residuals are thus
given by Rt+1
t+1 . The contribution of the intercept is to demean the return series while the small

Volatility Models

73

coefficient on the lagged return leads to returns that are very close to the residuals.
res
<- fit@residuals
res
<- zoo(res, order.by=index(sp500daily))
par(mfrow=c(1,2))
plot(res, type="l", main="Unstandardized Residuals", xlab="", ylab="", cex.main=0.8)
plot(res/sigma, type="l", main="Standardized Residuals", xlab="", ylab="", cex.main=0.8)

It is more informative to consider the standardized returns, that is, (Rt+1


t+1 )/
t+1 which should satisfy
the properties of an i.i.d. sequence, that is: 1) normally distributed (given our assumptions), and 2) no serial
correlation in the residuals and square residuals. First, we plot the standardized residuals which do not seem
to show any sign of heteroskedasticity (e.g., volatility clustering) since it has been taken into account by t+1 .
However, we notice that there are three large negative standardized residuals, with a magnitude of about
minus 6%, which are quite unlikely to happen assuming a normal distribution.
To evaluate the properties of the standardized residuals, we start from the evaluation of their distributional
properties. One way to do this is to assess their normality, for example using a QQ plot.

Volatility Models

74

It is clear from the QQ plot that the left-tail of the standardized residuals distribution seems to disagree
with normality: there are too many large negative returns (relative to what would be expected under the
normal) to be able to say that the residuals are normally distributed. We can further investigate this issue by
calculating the skewness, equal to -0.255, and that excess kurtosis, equal to 8.747. We can also consider the
auto-correlation of residuals and square residuals to assess if there is neglected dependence in the conditional
mean and variance:

Overall, there is weak evidence of auto-correlation in the standardized residuals and in their squares, so that
the GARCH(1,1) model seems to be well specified to model the daily returns of the S&P 500.
Another package that provides functions to estimate a wide range of GARCH models is the rugarch package.
This packages requires first to specify the functional form of the conditional mean and variance using the
function ugarchspec() and then proceed with the estimation using the function ugarchfit(). Below is an
example for an AR(1)-GARCH(1,1) model estimated on the daily S&P 500 returns:

75

Volatility Models

require(rugarch)
spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(1,0)))
fitgarch = ugarchfit(spec = spec, data = sp500daily)

mu
ar1
omega
alpha1
beta1

Estimate
SE
0.053
5.380
-0.013 -0.959
0.011
5.459
0.078 10.763
0.912 113.059

The estimation results are the same as those obtained for the fGarch package and more information about the
estimation results can be obtained using command show(fitgarch) as shown below:
*---------------------------------*
*
GARCH Model Fit
*
*---------------------------------*
Conditional Variance Dynamics
----------------------------------GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(1,0,0)
Distribution
: norm
Optimal Parameters
-----------------------------------Estimate Std. Error
t value Pr(>|t|)
mu
0.053207
0.009889
5.38046 0.00000
ar1
-0.012898
0.013449 -0.95902 0.33755
omega
0.011496
0.002106
5.45930 0.00000
alpha1 0.078496
0.007293 10.76270 0.00000
beta1
0.912106
0.008068 113.05857 0.00000
Robust Standard Errors:
Estimate Std. Error
mu
0.053207
0.009120
ar1
-0.012898
0.012249
omega
0.011496
0.003234
alpha1 0.078496
0.012923
beta1
0.912106
0.013591

t value
5.8344
-1.0530
3.5553
6.0740
67.1116

LogLikelihood : -8456.918
Information Criteria
-----------------------------------Akaike
Bayes

2.6863
2.6917

Pr(>|t|)
0.000000
0.292362
0.000378
0.000000
0.000000

Volatility Models

Shibata
2.6863
Hannan-Quinn 2.6882
Weighted Ljung-Box Test on Standardized Residuals
-----------------------------------statistic p-value
Lag[1]
0.03057 0.86119
Lag[2*(p+q)+(p+q)-1][2]
0.38666 0.98534
Lag[4*(p+q)+(p+q)-1][5]
5.08162 0.09963
d.o.f=1
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
-----------------------------------statistic p-value
Lag[1]
3.89 0.048562
Lag[2*(p+q)+(p+q)-1][5]
11.83 0.003082
Lag[4*(p+q)+(p+q)-1][9]
13.13 0.009965
d.o.f=2
Weighted ARCH LM Tests
-----------------------------------Statistic Shape Scale P-Value
ARCH Lag[3]
0.0788 0.500 2.000 0.7789
ARCH Lag[5]
0.5472 1.440 1.667 0.8696
ARCH Lag[7]
0.6365 2.315 1.543 0.9644
Nyblom stability test
-----------------------------------Joint Statistic: 2.782
Individual Statistics:
mu
0.07596
ar1
1.59443
omega 0.20362
alpha1 0.23997
beta1 0.17185
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic:
1.28 1.47 1.88
Individual Statistic:
0.35 0.47 0.75
Sign Bias Test
-----------------------------------t-value
prob sig
2.5578 1.056e-02 **
Sign Bias
Negative Sign Bias 0.4089 6.826e-01
Positive Sign Bias 3.1726 1.518e-03 ***
Joint Effect
38.0232 2.795e-08 ***

Adjusted Pearson Goodness-of-Fit Test:

76

Volatility Models

77

-----------------------------------group statistic p-value(g-1)


1
20
169.6
3.462e-26
2
30
186.2
7.020e-25
3
40
214.2
4.781e-26
4
50
234.6
4.826e-26

Elapsed time : 0.6000862

The estimation of the GJR-GARCH model is quite straightforward in this package and requires to specify the
option model='gjrGARCH' in ugarchspec(), in addition to selecting the orders for the conditional mean and
variance as shown below:
require(rugarch)
spec = ugarchspec(variance.model=list(model="gjrGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(1,0)))
fitgjr = ugarchfit(spec = spec, data = sp500daily)

mu
ar1
omega
alpha1
beta1
gamma1

Estimate
SE
0.025
2.546
-0.003 -0.211
0.015
6.803
0.000
0.001
0.916 111.416
0.137 11.003

The results for the S&P 500 confirm the earlier discussion that positive returns have a negligible effect in
increasing volatility (
1 =0) while negative returns have a very large and significant effect (
1 = 0.137). The
plot below on the left compares the time series of the volatility estimates for GARCH and GJR and the plot on
the right-hand side shows the difference between the two estimates. This graph shows clearly that also the
difference has clusters of volatility which are due to large negative returns that increase significantly more
t+1 for GJR than GARCH, as opposed to positive returns that have no effect on volatilities for GJR.

78

Volatility Models

The selection of the best performing volatility model can be done using the AIC selection criterion, similarly
to the selection of the optimal order p for AR(p) models. The package rugarch provides the function
infocriteria() that calculates AIC and several other selection criteria. These criteria are different in the
amount of penalization that they involve for adding more parameters (AR(1)-GJR has one parameter more
than AR(1)-GARCH). For all criteria, the best model is the one that provides the smallest value. In this case
the GJR specification clearly outperforms the basic GARCH(1,1) model for all criteria.
ciao <- cbind(infocriteria(fitgarch), infocriteria(fitgjr))
colnames(ciao) <- c("GARCH","GJR")
ciao

Akaike
Bayes
Shibata
Hannan-Quinn

GARCH
2.686323
2.691679
2.686322
2.688179

GJR
2.654422
2.660849
2.654421
2.656649

The function ugarchforecast() allows to compute the out-of-sample forecasts for a model n.ahead periods.
The plot below shows the forecasts made in 2014-12-31 when the volatility estimate t+1 was 0.893 for GARCH
and 0.782. Both models forecast an increase in volatility in the future since the volatility is mean-reverting in
these models (and at the moment the forecast was made volatility was below its long-run level ).
garchforecast <- ugarchforecast(fitgarch, n.ahead=250)
gjrforecast
<- ugarchforecast(fitgjr, n.ahead=250)

Volatility Models

T+1
T+2
T+3

79

2014-12-30 19:00:00 2014-12-30 19:00:00


0.913
0.854
0.915
0.856
0.917
0.859

plot(sigma(garchforecast), type="l", xlab="",ylab="", ylim=c(ymin, ymax))


lines(sigma(gjrforecast), col=2)

Finally, we can compare the GARCH and GJR specifications based on the effect of a shock (t ) on the
conditional variance (t2 ). The left plot refers to the GARCH(1,1) model and clearly show that positive and
negative shocks (of the same magnitude) increase the conditional variance by the same amount. However, the
news impact curve for the GJR model clearly shows the asymmetric effect of shocks, since there is no effect
when t1 is positive but a large effect when the shock is negative.
newsgarch <- newsimpact(fitgarch)
newsgjr
<- newsimpact(fitgjr)
plot(newsgarch$zx, newsgarch$zy, type="l", xlab=newsgarch$xexpr,
ylab=newsgarch$yexpr, lwd=2, main="GARCH", cex.main=0.8)
abline(v=0, lty=2)
abline(v=0, lty=2)

Volatility Models

80

5. Measuring Financial Risk


The events of the early 1990s motivated the financial industry to develop methods to measure the potential
losses in portfolio values due to adverse market movements. Measuring risk is a necessary condition to manage
risk: first we need to be able to quantify the amount of risk the institution is facing and then elaborate a
strategy to control the effect of these potential losses. The aim of this chapter is to discuss the application of
some methods that have been proposed to measure of financial risk, in particular market risk which represents
the losses deriving from adverse movements of equity or currency markets, interest rates etc.

5.1 Value-at-Risk (VaR)


VaR was introduced in the mid-1990s by the investment bank J.P. Morgan as an approach to measure the

potential portfolio loss that an institution might face if an unlikely adverse event occurred at a certain time
horizon. Lets define the profit/loss of a financial institution in day t + 1 by Rt+1 = 100 ln(Wt+1 /Wt ),
where Wt+1 is the portfolio value in day t + 1. Then Value-at-Risk (VaR) at 100(1 )% is defined as
1
P (Rt+1 V aRt+1
)=

where the typical values of are 0.01 and 0.05. In practice, V aRt+1 is calculated every day and for an horizon
of 10 days (2 trading weeks). If V aRt+1 is expressed in percentage return it can be easily transformed into
1
dollars by multiplying the portfolio value in day t (denoted by Wt ) with the expected loss, that is, V aRt+1
=
1
Wt (exp(V aRt+1 /100) 1). From a statistical point view, 99% VaR represents the 1% quantile, that is,
the value such that there is only 1% probability that the random variable takes a value smaller or equal to
that value. The graphs below shows the Probability Density Function (PDF) and the Cumulative Distribution
Function (CDF). Risk calculation is concerned with the left tail of the return distribution since those are the rare
events that have a large and negative effect on the portfolio of financial institutions. This is the reason why
the profession has devoted a lot of energy to make sure that the left tail, rather than the complete distribution,
is appropriately specified since a poor model for the left tail implies poor risk estimates (poor in a sense that
will become clear in the backtesting section).

81

82

Measuring Financial Risk

VaR assuming normality


Calculating VaR requires making an assumption about the profit/loss distribution. The simplest and most
familiar assumption we can introduce is that Rt+1 N (, 2 ), where represents the mean of the
distribution and 2 its variance. Another way to make this assumption is to assume that the profit/loss follows
the model Rt+1 = + t+1 , with t+1 N (0, 1). The quantiles for this model are given by + z , where
is the probability level and z the quantile of the standard normal distribution. The typical levels of
for VaR calculations are 1 and 5% and the corresponding z are -1.64 and -2.33, respectively. Hence, 99% VaR
under normality is obtained as follows:
0.99
V aRt+1
= 2.33

As an example, assume that an institution is holding a portfolio that replicates the S&P 500 Index and that
it wants to calculate the 99% VaR for this position. If we assume that returns are normally distributed, then
0.99 we need to estimate the expected daily return of the portfolio (i.e., ) and its expected
to calculate V aRt+1
volatility (i.e., ). Lets assume that we believe that the distribution is approximately constant over time so that
we can estimate the mean and standard deviation of the returns over a long period of time. In the illustration
below we use the S&P 500 time series that was used in the volatility chapter that consists of daily returns
from 1990 to 2014 (6300 observations).
mu
= mean(sp500daily)
sigma = sd(sp500daily)
= mu + qnorm(0.01) * sigma
var

[1] -2.629

The 99% VaR is -2.629% and represent the maximum loss of holding the S&P500 that is expected for the
following day with 99% probability. If we had used a shorter estimation window of one year (252 observations),
the V aR estimation would have been -1.572%. The difference between the two VaR estimates is quite
remarkable since we only changed the size of the estimation window. The standard deviation declines from

83

Measuring Financial Risk

1.142% in the full sample to 0.707% in the shorter sample, whilst the mean changes from 0.028% to 0.073%. As
discussed in the volatility modeling Chapter, it is extremely important to account for time variation in the
distribution of financial returns if the interest is to estimate VaR at short horizons (e.g., a few days ahead).

Time-varying VaR
So far we assumed that the mean and standard deviation of the return distribution are constant and represent
the long-run distribution of the variable. However, this might not be the best way to predict the distribution
of the profit/loss at very short horizons (e.g., 1 to 10 days ahead) if the return volatility changes over time. In
particular, in the volatility chapter we discussed the evidence that the volatility of financial returns changes
over time and introduced models to account for this behavior. We can model the conditional distribution of
the returns in day t+1 by assuming that both t+1 and t+1 are time-varying conditional on the information
available in day t. Another decision that we need to make to specify the model is the distribution of the errors.
We can assume, as above, that the errors are normally distributed so that 99% VaR is calculated as:
0.99
V aRt+1
= t+1 2.33 t+1

where the expected return t+1 can be either constant (i.e. t+1 = ), or an AR(1) process (t+1 = 0 +
1 Rt ), and the conditional variance can be modeled as MA, EMA, or with a GARCH-type model. In the
example below, I assume that the conditional mean is constant (and equal to the sample mean) and model the
conditional variance of the demeaned returns as an EMA with parameter = 0.06:
require(TTR)
<- mean(sp500daily)
mu
sigmaEMA <- EMA((sp500daily-mu)^2, ratio=0.06)^0.5
var
<- mu + qnorm(0.01) * sigmaEMA

The VaR time series inherits the time variation in volatility which alternates between calm periods of low
volatility and risk, and other periods of increased uncertainty and thus the possibility of large losses. For this

84

Measuring Financial Risk

example, we find that in 2.207% of the 6299 days the return was smaller relative to VaR. Since we calculated
VaR at 99% we expected to experience only 1% of days with violations.

Expected Shortfall (ES)


VaR represents the maximum (minimum) loss that is expected with, e.g., 99% (1%) probability. However, it can

be criticized on the ground that it does not convey the information on the potential loss that is expected if
indeed an extreme event (only likely 1% of less) occurs. For example, a VaR of -5.52% provides no information
on how large the portfolio loss is expected to be if the portfolio return will happen to be smaller than VaR. That
is, how large do we expect the loss be in case VaR is violated? A risk measure that quantifies this potential
1
1
loss is Expected Shortfall (ES) which is defined as ESt+1
= E(Rt+1 |Rt+1 V aRt+1
) that is, the
expected portfolio return conditional on being on a day in which the return is smaller than VaR. This risk
measure focuses the attention on the left tail of the distribution and it is highly dependent on the shape of the
distribution in that area, while it neglects all other parts of the distribution.
An analytical formula for ES is available if we assume that returns are normally distributed. In particular, if
1
Rt+1 = t+1 t+1 with t+1 N (0, 1), then VaR is calculated as V aRt+1
= z t+1 . The conditioning event
is that the return in the following day is smaller than VaR and the probability of this event happening is ,
e.g. 0.01. We then need to calculate the expected value of Rt+1 over the interval from minus infinity to Rt+1
which corresponds to a truncated normal distribution with density function given by
1
1
f (Rt+1 |Rt+1 V aRt+1
) = (Rt+1 )/(V aRt+1
/t+1 )
1
where () and () represent the PDF and the CDF of the normal distribution (i.e., (V aRt+1
/t+1 ) =
(z ) = ). We can thus express ES as

1
ESt+1
= t+1

(z )

where z is equal to -2.33 and -1.64 for equal to 0.01 and 0.05, respectively. If we are calculating VaR at 99%
so that is equal to 0.01 then ES is equal to
0.99
ESt+1
= t+1

(2.33)
= 2.64t+1
0.01

where the value 2.64 can be obtained in R typing the command (2*pi)(-0.5) * exp(-(2.332)/2) /
0.01 or using the function dnorm(-2.33) / 0.01. If = 0.05 then the constant to calculate ES is -2.08 instead
of -1.64 for VaR. Hence, ES leads to more conservative risk estimates since the expected loss in a day in which
VaR is exceeded is always larger than VaR. We can plot the difference between VaR and ES as a function of
1 in the following graph:

85

Measuring Financial Risk

sigma
alpha
ES
VaR

= 1
= seq(0.001, 0.05, by=0.001)
= - dnorm(qnorm(alpha)) / alpha * sigma
= qnorm(alpha) * sigma

K rule
The Basel Accords require VaR to be calculated at a horizon of 10 days and for a risk level of 99%. In addition,
the
Accords allow
financial institution to scale up the 1-day VaR to the 10 day horizon by multiplying it
by 10. Why 10? Under
what conditions is the VaR for the cumulative returns over 10 days, denoted by
V aRt+1:t+10 , equal to 10 V aRt+1 ?
In day t we are interested in calculating the risk of holding the portfolio over a horizon of K days, that is,
assuming that we can liquidate the portfolio only on the Kth day. Regulators require banks to use K = 10
that corresponds to two trading weeks. To calculate risk of holding the portfolio in the next K days we need
to obtain thedistribution of the sum of K daily returns or cumulative return, denoted by Rt+1:t+k , which
is given by K
k=1 Rt+k = Rt+1 + + Rt+K , where Rt+k is the return in day t + k. If we assume that
these daily returns are independent and identically distributed (i.i.d.) with mean and variance 2 , then the
expected value of the cumulative return is
(
E

)
Rt+k

k=1

= K

k=1

and its variance is

V ar

(K

k=1

)
Rt+k

k=1

2 = K 2

86

Measuring Financial Risk

so that the standard deviation of the cumulative return is equal to K. If we maintain the normality
assumption that we introduced earlier, than the 99% V aR of Rt+1:t+K is given by

1
V aRt+1:t+K
= K 2.33 K
In this formula, the mean and standard deviation are estimated on daily returns and they are then scaled up
to horizon K.
This result relies on the assumption that returns are serially independent which allows us to set all covariances
between returns in different days equal to zero. The empirical evidence from the ACF of daily returns indicates
that this assumption is likely to be accurate most of the time, although in times of market booms or busts
returns could be, temporarily, positively correlated. What would be the effect of positive correlation in returns
on VaR? The first-order covariance can be expressed in terms of correlation as 2 , with the first order
serial correlation. To keep things simple, assume that we are interested in calculating VaR for the two-day
return, that is, K = 2. The variance of the cumulative return is V ar(Rt+1 + Rt+2 ) which is equal to
V ar(Rt+1 ) + V ar(Rt+2 ) + 2Cov(Rt+1 , Rt+2 ) = 2 + 2 + 2 2 . This can be re-written as 2 2 (1 + )
which shows that in the presence of positive correlation the cumulative return becomes riskier, relative to the
0.99
independent case, since 2 2 (1 + ) > 2 2 . The Value-at-Risk for the two-day return is then V aRt+1:t+2
=

2 2.33 2 1 + , which is smaller relative to the VaR assuming independence that is given by
2 2.33 2. Hence, neglecting positive correlation in returns leads to underestimating risk and the
potential portfolio loss deriving from an extreme (negative) market movement.

VaR assuming non-normality


One of the stylized facts of financial returns is that the empirical frequency of large positive/negative returns
is higher relative to the frequency we would expect if returns were normally distributed. This finding of fat
tails in the return distribution can be partly explained by time-varying volatility: the extreme returns are the
outcome of a regime of high volatility that occurs occasionally, although most of the time the returns are in
a calm period with low volatility. The effect of mixing periods of high and low volatility is that the volatility
estimate based on the full sample overestimates uncertainty when volatility is low, and underestimates it
when it is indeed high. This can be illustrated with a simple example: assume that in 15% of days volatility
is high and equal to 5% while in the remaining 85% of the days it is low and equal to 0.5%. Average volatility
based on all observations is then 0.15 * 5 + 0.85 * 0.5 = 1.175%. This implies that in days of high volatility, the
returns appear large relative to a standard deviation of 1.175% although they are normal considering that they
belong to the high volatility regimes with a standard deviation of 5. On the other hand, the bulk of returns
belong to the low volatility regime which looks like a concentration of small returns relative to the average
volatility of 1.175%.
To illustrate the evidence of deviation from normality and the role of volatility modeling in financial returns,
the graphs below show the QQ plot for the S&P 500 returns (left plot; scaled to have mean 0 and variance
1) and the returns when they are standardized by the EMA volatility estimate calculated above (right; in
formula Rt /t ). The comparison shows that accounting for time-variation in volatility reduces significantly
the evidence of non-normality. However, there is still a slight deviation from the diagonal on the left tail
and, to some extent, also on the right tail. This result is very important for risk calculation since we are
mostly interested on the far left of the distribution: time-variation in volatility is the biggest factor driving
non-normality in returns so that when we calculate risk measures we are confident that a combination of

87

Measuring Financial Risk

the normal distribution and a dynamic method to forecast volatility should provide reasonably accurate VaR
estimates. However, we still might want to account for the non-normality in the standardized returns t and
we will consider two possible approaches in this Section.

One approach to relax the assumption of normally distributed errors is the Cornish-Fisher approximation
which consists of performing a Taylor expansion of the normal distribution around its mean. This has the
effect of producing a distribution which is a function of skewness and kurtosis. We skip the mathematical
details of the derivation and focus on the VaR calculation when this approximation is adopted. If we assume
the mean is equal to zero, the 99% VaR for normally distributed returns is calculated as 2.33 t+1 or,
more generally, by z t+1 for 100(1 )% VaR, where 1
1 represents the 1 -quantile of the standard
normal distribution. With the Cornish-Fisher (CF) approximation VaR is calculated in a similar manner, that
1
is, V aRt+1
= zCF t+1 , where the quantile zCF is calculated as follows:
zCF = z +

] SK 2 [ 5
]
] EK [ 3
SK [ 2
z 1 +
z 3z +
2z 5z
6
24
36

where SK and EK represent the skewness and excess kurtosis, respectively, and for = 0.01 we have
CF = z . However,
that z = 2.33. If the data are normally distributed then SK = EK = 0 so that z1

CF
in case the distribution is asymmetric and/or with fat tails the effect is that z z . In practice, we
estimate the skewness and the excess kurtosis from the sample and use those values to calculate the quantile
CF (black line) and its relationship to z
for VaR calculations. In the plot below we show the quantile z0.01
0.01 (red
dashed line) as a function of the skewness and excess kurtosis parameters. The left plot shows the effect of the
skewness parameter on the quantile, while holding the excess kurtosis equal to zero. Instead, the plot on the
right shows the effect of increasing values of excess kurtosis, while the skewness parameter is kept constant
CF is smaller than z
and equal to zero. As expected, the z0.01
0.01 = 2.33 and it is interesting to notice that
CF more than positive skewness of the same magnitude.
negative skewness increases the (absolute) value of z0.01
This is due to the fact that negative skewness implies a higher probability of large negative returns compared
to large positive returns. The effect on VAR of accounting for asymmetry and fat tails in the data is thus to
provide more conservative risk measures.

Measuring Financial Risk

88

alpha = 0.01
EK = 0; SK = seq(-1, 1, by=0.05)
z = qnorm(alpha)
zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +
(SK^2/36) * (2*z^5 - 5*z)
EK = seq(0, 10,by=0.1); SK = 0
zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +
(SK^2/36) * (2*z^5 - 5*z)

An alternative approach to allow for non-normality is to make a different distributional assumption for t+1
that captures the fat-tailness in the data. A distribution that is often considered is the t distribution with a
small number of degrees of freedom. Since the t distribution assigns more probability to events in the tail
of the distribution, it will provide more conservative risk estimates relative to the normal distribution. The
graphs below show the t distribution for 4, 10, and degree-of-freedom, while the plot on the right zooms
on the shape of the left tail of these distributions. It is clear that the smaller the d.o.f used the more likely are
extreme events relative to the standard normal distribution (d.o.f. = ). So, also this approach delivers more
conservative risk measures relative to the normal distribution since it assigns higher probability to extreme
events.
plot(dnorm, xlim=c(-4,4), col="black", ylab="", xlab="",yaxt="n")
curve(dt(x,df=4), add=TRUE,col="orange")
curve(dt(x,df=10), add=TRUE,col="purple")

Measuring Financial Risk

89

To be able to use the t distribution for risk calculation we need to set the value of the degree-of-freedom
parameter, denoted by d. In the context of a GARCH volatility model this can be easily done by considering
d as an additional parameter to be estimated by maximizing the likelihood function based on the assumption
that the t+1 shocks follow a td distribution. A simple alternative approach to estimate the degree-of-freedom
parameter d exploits the fact that the excess kurtosis of a td distribution is equal to EK = 6/(d 4) (for
d > 4) which is only a function of the parameter d. Thus, based on the sample excess kurtosis we can then
back out an estimate of d. The steps are as follows:
1. estimate by ML the GARCH model assuming that the errors are normally distributed
2. calculate the standardized residuals as t = Rt /t
3. estimate the excess kurtosis of the standardized residuals and obtain d as d = 6/EK + 4 (for d > 4)
For the standardized returns of the S&P 500 the sample excess kurtosis is equal to 2.22 so that the estimate of
d is equal to approximately 7 which indicates the need for a fat-tailed distribution. In practice, it would be
advisable to estimate the parameter d jointly with the remaining parameters of the volatility model, rather
than separately. Still, this simple approach provides a starting point to evaluate the usefulness of fat tailed
distributions in risk measurement.

5.2 Historical Simulation (HS)


So far we discussed methods to calculate VaR that rely on choosing a probability distribution and a volatility
model. A distributional assumption we made is that returns are normal, which we later relaxed by introducing
the Cornish-Fisher approximation and the t distribution. In terms of the volatility model, we considered
several possible specifications such as EMA, GARCH, and GJR-GARCH, which we can compare using
selection criteria.
An alternative approach that avoids making any assumption and let the data speak for itself is represented
by Historical Simulation (HS). HS consists of calculating the empirical quantile of returns at 100 % level
based on the most recent M days. This approach is called non-parametric because it estimates 99% VaR to be
the return value such that 1% of the sample is smaller, whatever the return distribution and volatility structure
might be. In addition to this flexibility, HS is very easy to compute since it does not require the estimation of

Measuring Financial Risk

90

any parameter for the volatility model and the distribution. However, there are also a few difficulties in using
HS to calculate risk measures. One issue is the choice of the estimation window size M. Practitioners often
use values between M=250 and 1000, but, similarly to the choice of smoothing in MA and EMA, this is an
ad hoc value that has been validated by experience rather than being optimally selected based on a criterion.
Another complication is that HS applied to daily returns provides a VaR measure at the one day horizon which,
for
regulatory purposes, should then be converted to a 10-day horizon. What is typically done is to apply the
10 rule discussed before, although it does not have much theoretical justification in the context of HS since
we are not actually making any assumption about the return distribution. An alternative would be to calculate
VaR as the 1% quantile of the (non-overlapping) cumulative return instead of the daily return. However, this
would imply a much smaller sample size, in particular for small M.
The implementation in R is quite straightforward and shown below. The function quantile() calculates the
1 quantile for a return series which we can combine with the function rollapply() from package zoo
to apply it recursively to a rolling window of size M. For example, the command rollapply(sp500daily,
250, quantile, probs=alpha, align="right") calculates the function quantile (for probs=alpha and
alpha=0.01) for the sp500daily time series with the first VaR forecast for day 251 until the end of the sample.
The graph below shows a comparison of VaR calculated with the HS method for an estimation window of 250
and 1000 days. The shorter estimation window makes the VaR more sensitive to market events as opposed
to M=1000 that changes very slowly. A characteristic of HS that is particularly evident when M=1000 is that it
might be constant for long periods of time, even though volatility might have significantly decreased.
M1 = 250
M2 = 1000
alpha = 0.01
hs1 <- rollapply(sp500daily, M1, quantile, probs=alpha, align="right")
hs2 <- rollapply(sp500daily, M2, quantile, probs=alpha, align="right")

Measuring Financial Risk

91

How good is the risk model? One metric that is used


as an indicator of goodness of the risk model is the
frequency of returns that are smaller than VaR, that is, Tt=1 I(Rt+1 V aRt+1 )/T , where T represents the
total number of days and I() denotes the indicator function. A good risk model should have the frequency
of violations or hits close to the level 1 for which VaR is calculated. If we are calculating VaR at 99% then
we would expect that the model shows (approximately) 1% violations. For HS above, we have that for VaR
calculated on the 250 window there are 1.405% of days that represent violations. For the longer window of
M=1000 the violations of VaR represents 1.528% of the sample. For HS, we thus find that they are larger than
expected and in the backtesting Section we will evaluate if this difference is statistically significant.
In the graph below, the VaR calculated from HS is compared to the EMA forecast. Although the VaR level
and the dynamics is quite similar, HS changes less rapidly and remains constant for long periods of time, in
particular in periods of rapid changes in market volatility.
require(TTR)
ema <- 2.33 * EMA(sp500daily^2, ratio=0.06)^0.5

Measuring Financial Risk

92

5.3 Simulation Methods


We discussed several approaches that can be used to calculate VaR based on parametric assumptions or that are
purely non-parametric in nature. In most cases we demonstrated the technique by forecasting Value-at-Risk
at the 1-day horizon. However, for regulatory purposes the risk measure should be calculated at a horizon of
10 days. One approach that is widely used by practitioners
and which is also allowed by regulators is to scale

up the 1-day VaR to 10-day by multiplying it by 10. However, there are several limitations in doing this and
it becomes particularly innacurate when used for VaR models that assume a GARCH volatility structure.
An alternative is to use simulation methods that generate artificial future returns that are consistent with
the risk model. The model makes assumptions about the volatility model, and the distribution of the error
term. Using simulations we are able to produce a large number of future possible paths for the returns that
are conditional on the current day. In addition, it becomes very easy to obtain the distribution of cumulative
returns by summing daily simulated returns along a path. We will consider two popular approaches that
differ in the way simulated shocks are generated: Monte Carlo Simulation (MC) consists of iterating the
volatility model based on shocks that are simulated from a certain distribution (normal, t or something else),
and Filtered Historical Simulation (FHS) that assumes the shocks are equal to the standardized returns and
takes random sample of those values. The difference is that FHS does not make a parametric assumption for
the t+1 (similarly to HS), while MC does rely on such assumption.

Monte Carlo Simulation


At the closing of January 22, 2007 and September 29, 2008 the risk manager needs to forecast the distribution of
returns from 1 up to K days ahead, including the cumulative or multi-period returns over these horizons. The
model for returns is Rt+1 = t+1 t+1 , where t+1 is the volatility forecast based on the information available
up to that day. The t+1 are the random shocks that the risk manager assumes are normally distributed with
mean 0 and variance 1. A Monte Carlo simulation of the future distribution of the portfolio returns requires
the following steps:
1. Estimation of the volatility model: we assume that t+1 follows a GARCH(1,1) model and estimate
the model by ML using all observations available up to and including that day. Below is the code to

Measuring Financial Risk

93

estimate the model using the rugarch package. The volatility forecast for the following day is 0.483%
which is significantly lower relative to the sample standard deviation from January 1990 to January
2007 of 0.993%. To use some terminology introduced earlier, the conditional forecast (0.483%) is lower
relative to the unconditional forecast (0.993%).

spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0), include.mean=FALSE))
# lowvol represents the January 22, 2007 date
fitgarch = ugarchfit(spec = spec, data = window(sp500daily, end=lowvol))

omega alpha1
0.005 0.054

T+1

beta1
0.942

2007-01-22 19:00:00
0.483

1. The next step consists of simulating a large number of return paths, say S, that are consistent with the
model assumption that Rt+1 = t+1 t+1 . Since we have already produced the forecast t+1 = 0.483,
to obtain simulated values of Rt+1 we only need to generate values for the error term t+1 . This can be
easily done in R using the command rnorm() which returns random values from the standard normal
distribution (and rt() does the same for the t distribution). Denote by s,t+1 the s-th simulated value
of the shock (for s = 1, , S), then the s-th simulated value of the return is produced by multiplying
t+1 and s,t+1 , that is, Rs,t+1 = t+1 s,t+1 .
2. The next step is to use the simulate returns Rs,t+1 to predict volatility next period, denoted by s,t+2 .
Since we have assumed a GARCH specification the volatility forecast is obtained by (s,t+2 )2 = +
2
(Rs,t+1 )2 + t+1
and the (simulated) returns at time t + 2 are obtained as Rs,t+2 = s,t+2 s,t+2 ,
where s,t+2 represent a new set of simulated values for the shocks in day t + 2.
3. Continue the iteration to calculate s,t+k and Rs,t+k for k = 1, , K. The cumulative or multi-period

return between t + 1 and t + k is then obtained as Rs,t+1:t+K = kj=1 Rs,t+j .


The code below shows how a MC simulation can be performed and plots the quantiles of Rs,t+k as a function
of k on the left plot and the quantiles of s,t+k on the right plot (from 0.10 to 0.90 at intervals of 0.10 in
addition to 0.05 and 0.95). The volatility is branching out from t+1 = 0.483 and its uncertainty increases
with the forecast horizon. The green dashed line represents the average (over the simulations) volatility at
horizon k that, as expected for GARCH models, tends toward the unconditional mean of volatility. Since
January 22, 2007 was a period of low volatility, the graph shows that we should expect volatility to increase
at longer horizons. This is also clear from the left plot where the return distribution becomes wider and wider
as the forecast horizon progresses.

Measuring Financial Risk

94

set.seed(9874)
S = 10000 # number of MC simulations
K = 250
# forecast horizon
# create the matrices to store the simulated return and volatility
R
= zoo(matrix(sigma*rnorm(S), K, S, byrow=TRUE), order.by=futdates)
Sigma = zoo(matrix(sigma, K, S), order.by=futdates)
# iteration to calculate R and Sigma based on the previous day
for (i in 2:K)
{
Sigma[i,] = (gcoef['omega'] + gcoef['alpha1'] * R[i-1,]^2 +
gcoef['beta1'] * Sigma[i-1,]^2)^0.5
R[i,] = rnorm(S) * Sigma[i,]
}

How would the forecasts of the return and volatility distribution look like if they were made in a period of high
volatility? To illustrate this scenario we consider September 29, 2008 as the forecast base. The GARCH(1,1)
forecast of volatility for the following day is 3.165% and the distribution of simulate returns and volatilities
are shown in the graph below. Since the day was in a period of high volatility, the assumption of stationarity
of volatility made by the GARCH model implies that volatility will reduce in the future. This is clear from the
return quantiles converging in the left plot, as well as from the declining average volatility in the right plot.

Measuring Financial Risk

95

The cumulative or multi-period return can be easily obtained by the R command cumsum(Ret) where Ret
represents the K by S matrix of simulated one-day returns that the function cumulatively sums over the
columns. The outcome is also a K by S matrix with each column representing a possible path of the cumulative
return from 1 to K steps ahead. The plots below show the quantiles at each horizon k calculated across the S
simulated cumulative returns. The quantile at probability 0.01 represents the 99% VaR that financial institutions
are required to report for regulatory purposes (at the 10 day horizon). The left plot represents the distribution
of expected cumulative returns conditional on being on January 22, 2007 while the right plot is conditional
on September 29, 2008. The same scale of the y-axis for both plots highlights the striking difference in the
dispersion of the distribution of future cumulative returns. Although we saw above that the volatility of daily
returns is expected to increase after January 22, 2007 and decrease following September 29, 2008, the levels
of volatility in these two days are so different that when accumulated over a long period of time they lead to
very different distributions for the cumulative returns.

Measuring Financial Risk

96

MC simulations make also easy to calculate ES. For each horizon k, ES can be calculated as the average of those
simulated returns that are smaller than VaR. The code below shows these steps for the two dates considered
and the plots compare the two risk measures conditional on being on January 22, 2007 and on September 29,
2008. Similarly to the earlier discussion, ES provides a larger (in absolute value) potential loss relative to VaR,
a difference that increases with the horizon k.
VaR
VaRmat
Rviol
ES

=
=
=
=

zoo(apply(Rcum, 1, quantile, probs=0.01), order.by=futdates)


matrix(VaR, K, S, byrow=FALSE)
Rcum * (Rcum < VaRmat)
zoo(rowSums(Rviol) / rowSums(Rviol!=0), order.by=futdates)

Filtered Historical Simulation (FHS)


Filtered Historical Simulation (FHS) combines aspects of the parametric and the HS approaches to risk
calculation. This is done by assuming that the volatility of the portfolio return can be modeled with, e.g., EMA
or a GARCH specification, while the non-parametric HS approach is used to model the standardized returns
t+1 = Rt+1 /t+1 . It can be considered a simulation method since the only difference with the MC method
discussed above is that the random draws from the standardized returns are used to generate simulated returns
instead of draws from a parametric distribution (e.g., normal or t). This method might be preferred when the
risk manager is uncomfortable making assumptions about the shape of the distribution, either in terms of
the thickness of the tails or the symmetry of the distribution. In R this method is implemented by replacing
the command rnorm(S) that generates a random normal sequence of length S with sample(std.resid, S,
replace=TRUE), where std.resid represents the standardized residuals. This command produces a sample of
length S of values randomly taken from the std.resid series. Hence, we are not generating new data, but we
are simply taking different samples of the same standardized residuals so that to preserve their distributional
properties.
The graphs below compare the MC and FHS approaches in terms of the expected volatility of the daily returns
(i.e., average volatility at each horizon k across the S simulations) and 99% VaR (i.e., 0.01 quantile of the

Measuring Financial Risk

97

simulated cumulative returns) for the two forecasting dates that we are considering. The expected future
volatility by FHS converges at a slightly lower speed relative to MC during periods of low volatility, while
the opposite is true when the forecasting point occurs during a period of high volatility. This is because FHS
does not restrict the shape of the distribution on the left tail (as MC does given the assumption of normality)
so that large negative standardized returns contribute to determine future returns and volatility. Of course,
this result is specific to the S&P 500 daily returns that we are considering as an illustrative portfolio and it
might be different when analyzing other portfolio returns.
In terms of VaR calculated on cumulative returns we find that FHS predicts lower risk relative to MC at long
horizon, with the difference becoming larger and larger as the horizon K progresses. Hence, while at short
horizons the VaR forecasts are quite similar, they become increasingly different at longer horizon.
# standardized residuals
std.resid = as.numeric(residuals(fitgarch, standardize=TRUE))
std.resid1 = as.numeric(residuals(fitgarch1, standardize=TRUE))
for (i in 2:K)
{
Sigma.fhs[i,]

= (gcoef['omega'] + gcoef['alpha1'] * R.fhs[i-1,]^2 +


gcoef['beta1'] * Sigma.fhs[i-1,]^2)^0.5
R.fhs[i,] = sample(std.resid, S, replace=TRUE) * Sigma.fhs[i,]
Sigma.fhs1[i,] = (gcoef1['omega'] + gcoef1['alpha1'] * R.fhs1[i-1,]^2 +
gcoef1['beta1'] * Sigma.fhs1[i-1,]^2)^0.5
R.fhs1[i,] = sample(std.resid1, S, replace=TRUE) * Sigma.fhs1[i,]

98

Measuring Financial Risk

5.4 VaR for portfolios


So far we discussed the simple case of a portfolio composed of only one asset and calculated the Value-at-Risk
for such position. However, financial institutions hold complex portfolios that include many assets and expose
them to several types of risks. How do we calculate V aR for such diversified portfolios?
Lets first
return as a weighted average of the individual asset returns, that is,
Jcharacterize the portfolio
p
w
R
where
R
represents
the portfolio return in day t, and wj,t and Rj,t are the weight
Rtp =
t
j=1 j,t j,t
and return of asset j in day t (and there is a total of J assets). To calculate VaR for this portfolio we need to
model the distribution of Rtp . If we assume that the underlying assets are normally distributed, then also the
portfolio return is normally distributed and we only need to estimate its mean and standard deviation. In the
case of 2 assets (i.e., J = 2) the expected return of the portfolio return is given by
E(Rtp ) = t,p = w1,t 1 + w2,t 2
which is the weighted average of the expected return of the individual assets. The portfolio variance is equal
to

99

Measuring Financial Risk

2
2
2
V ar(Rtp ) = t,p
= w1,t
12 + w2,t
22 + 2w1,t w2,t 1,2 1 2

which is a function of the individual (weighted) variances and the correlation between the two assets, 1,2 .
The portfolio Value-at-Risk is then given by

t,p 2.33t,p

2 2 + w 2 2 + 2w w
= w1,t 1 + w2,t 2 2.33 w1,t
1,t 2,t 1,2 1 2
1
2,t 2

0.99
V aRp,t
=

0.99 formula can be expressed as follows:


In case it is reasonable to assume that 1 = 2 = 0, the V aRt,p

0.99
V aRp,t
=

2.33

2 2 + w 2 2 + 2w w
w1,t
1,t 2,t 1,2 1 2
1
2,t 2

2 2 + 2.332 w 2 2 + 2 2.332 w w
= 2.332 w1,t
1,t 2,t 1,2 1 2
1
2,t 2

0.99 )2 + (V aR0.99 )2 + 2 V aR0.99 V aR0.99


= (V aR1,t
1,2
2,t
1,t
2,t

which shows that the portfolio VaR in day t can be expressed in terms of the individual VaRs of the assets and
the correlation between the two asset returns. Since the correlation coefficient ranges between 1, the two
extreme cases of correlation implies the following VaR:
(
)
0.99 = V aR0.99 + V aR0.99 the two assets are perfectly correlated and the total
1,2 = 1: V aRp,t
1,t
2,t
portfolio VaR is the sum of the individual VaRs
0.99 = |V aR0.99 V aR0.99 | the two assets have perfect negative correlation then
1,2 = 1: V aRp,t
1,t
2,t
the total risk of the portfolio is given by the difference between the two VaRs since the risk in one asset
is offset by the other asset, and vice versa.
More generally, both the mean and variance could be varying over time conditional on past information. In
0.99 can be rewritten as
this case V aRp,t
0.99
V aRp,t
= w1,t 1,t + w2,t 2,t 2.33

2 2 + w 2 2 + 2w w
w1,t
1,t 2,t 12,t 1,t 2,t
1,t
2,t 2,t

In the Equation above we added a t subscript also to the correlation coefficient, that is, 12,t represents the
correlation between the two asset conditional on the information available up to that day. There is evidence
supporting the fact that correlations between assets might be changing over time in response to market events
or macroeconomic shocks (e.g., a recession). In the following Section we discuss some methods that can be
used to model and predict correlations.

100

Measuring Financial Risk

Modeling correlations
A simple approach to modeling correlations consists of using MA and EMA smoothing similarly to the case of
forecasting volatility. However, in this case the object to be smoothed is not the square return, but the product
of the returns of asset 1 and 2 (N.B.: we are implicitly assuming that the mean of both assets can be set equal
to zero). Denote the return of asset 1 by R1,t , of asset 2 by R2,t , and by 12,t the covariance between the two
assets in day t. We can estimate 12,t by a MA of M days:

12,t+1

M
1
=
R1,tm+1 R2,tm+1
M
m=1

and the correlation is then obtained by dividing the covariance estimate with the standard deviation of the
two assets, that is:
12,t+1 = 12,t+1 / (1,t+1 2,t+1 )
In case the portfolio is composed of J assets then there are (J 1)/2 asset pairs for which we need to calculate
correlations. An alternative approach is to use EMA smoothing which can be implemented using the recursive
formula discussed earlier:
12,t+1 = (1 )12,t + R1,t R2,t
and the correlation is obtained by dividing the covariance by the forecasts of the standard deviations for the
two assets.
To illustrate the implementation in R we assume that the firm is holding a portfolio that invests a fraction w1
in a gold ETF (ticker: GLD) and the remaining fraction 1 w1 in the S&P 500 ETF (ticker: SPY). The closing
prices are downloaded from Yahoo Finance starting in Jan 02, 2005 and ending on Apr 14, 2015 and the goal is
to forecast portfolio VaR for the following day. We will assume that the expected daily returns for both assets
are equal to zero and forecast volatilities and the correlation between the assets using EMA with = 0.06.
In the code below R represents a matrix with 2587 rows and two columns representing the GLD and SPY daily
returns.
library(TTR)
prod
<- R[,1] * R[,2]
# EMA for the product of returns
cov
<- EMA(prod, ratio=0.06)
# Apply the EMA function to each column of R and make it a zoo object
sigma <- zoo(apply(R^2, 2, EMA, ratio=0.06), order.by=time(R))^0.5
corr
<- cov / (sigma[,1] * sigma[,2])

Measuring Financial Risk

101

The time series plot of the EMA correlation shows that the dependence between the gold and S&P 500 returns
oscillates significantly around the long-run correlation of 0.062. In certain periods gold and the S&P 500 have
positive correlation as high as 0.81 and in other periods as low as -0.65. During 2008 the correlation between the
two assets became large and negative since investors fled the equity market toward gold that was perceived
as a safe haven during turbulent times. Based on these forecasts of volatilities and correlation, portfolio VaR
can be calculated in R as follows:
# weight of asset 1
w1 = 0.5
w2 = 1 - w1 # weight of asset 2
VaR = -2.33 * ( (w1*sigma[,1])^2 + (w2*sigma[,2])^2 +
2*w1*w2*corr*sigma[,1]*sigma[,2] )^0.5

The one-day portfolio Value-at-Risk fluctuates substantially between -0.79% and -8.05%, which occurred
during the 2008-09 financial crises. It is also interesting to compare the portfolio VaR with the risk measure if

Measuring Financial Risk

102

the portfolio is fully invested in either asset. The time series graph below shows the VaR for three scenarios:
a portfolio invested 50% in gold and equity, 100% gold, and 100% S&P 500. Portfolio VaR is between the VaRs
for the individual assets when correlation is positive. However, in those periods in which the two assets have
negative correlation (e.g., 2008) the portfolio VaR is higher than both individual VaRs since the two assets are
moving in opposite directions and the risk exposures partly offset each other.
VaRGLD = -2.33 * sigma[,1]
VaRSPY = -2.33 * sigma[,2]

Exposure mapping

5.5 Backtesting VaR


In the risk management literature backtesting refers to the evaluation/testing of the properties of a risk model
based on past data. The approach consists of using the risk model to calculate VaR based on the information
(e.g., past returns) that was available to the risk manager at that point in time. The VaR forecasts are then
compared with the actual realization of the portfolio return to evaluate if they satisfy the properties that we
expect should hold for a good risk model. One such characteristic is that the coverage of the risk model,
defined as the percentage of returns smaller than VaR, should be close to 100*(1-)%. In other words, if we
are testing VaR with equal to 0.01 we should expect, approximately, 1% of days with violations. If we find
that returns were smaller than V aR significantly more/less often than 1%, then we conclude that the model
has inappropriate coverage. For the S&P 500 return and VaR calculated using the EMA method the coverage
is equal to

103

Measuring Financial Risk

V = na.omit(lag(sp500daily,1) <= var)


T1 = sum(V)
TT
= length(V)
alphahat = mean(T1/TT)

[1] 0.022

In this case, we find that the 99% VaR is violated more often (0.022) then expected (0.01). We can test the
hypothesis that = 0.01 (or 1%) by comparing the likelihood that the sample has been generated by as
opposed to its estimate
, which in this example is equal to 0.022. One way to test this hypothesis is to define
the event of a violation of VaR, that is Rt+1 V aRt+1 , as a binomial random variable with probability
that the event occurs. Since we have a total of T days and introducing the assumption that violations are
independent of each other, then the joint probability of having T1 violations (out of T days) is
L(, T1 , T ) = T1 (1 )T0
where T0 = T T1 . The hypothesis = 0.01 can be tested by comparing this likelihood above at the
estimated and at the theoretical value of 0.01 (more generally, if VaR is calculated at 95% then is 0.05).
This type of tests are called likelihood ratio tests and can be interpreted as the distance between the theoretical
value (i.e., using = 0.01) of the likelihood of obtaining T1 violation in T days and the likelihood based on
the sample estimate
. The statistic and distribution of the test for Unconditional Coverage (UC) are
(
U C = 2 ln

L(0.01, T1 , T )
L(
, T1 , T )

)
21

where
= T1 /T and 21 denotes the chi-square distribution with 1 degree-of-freedom. The critical values
at 1, 5, and 10% are 6.63, 3.84, and 2.71, respectively, and the null hypothesis = 0.01 is rejected if LRU C is
larger than the critical value. In practice, the test statistic can be calculated as follows:
[
2 T1 ln

0.01

(
+ T0 ln

0.99
1

)]

In the example discussed above we have


= 0.022, T1 = 138, and T is 6268. The test statistic is thus
UC

= -2 * ( T1 * (log(0.01/alphahat))
+ (TT - T1) * log(0.99/(1-alphahat)))

[1] 68.1

Since 68.1 is larger than 3.84 we reject the null hypothesis at 5% significance level that = 0.01 and conclude
that the risk model provides inappropriate coverage.
While testing for coverage, we introduced the assumption that violations are independent of each other. If
this assumption fails to hold, then a violation in day t has the effect of increasing/decreasing the probability

104

Measuring Financial Risk

of experiencing a violation in day t + 1, relative to its unconditional level. A situation in which this might
happen is during financial crises when markets enter a downward spiral which is likely to lead to several
consecutive days of violations and thus to the possibility of underestimating risk. The empirical evaluation of
this assumption requires the calculation of two probabilities: 1) the probability of having a violation in day
t given that a violation occurred in day t 1, and 2) the probability of having a violation in day t given
that no violation occurred the previous day. We denote the estimates of these conditional probabilities as
1,1
and
0,1 , respectively. They can be estimated from the data by calculating T1,1 and T0,1 that represent the
number of days in which a violation was preceded by a violation and a no violation, respectively. In R we can
determine these quantities as follows:
T11 = sum((lag(V,-1)==1) & (V==1))
T01 = sum((lag(V,-1)==0) & (V==1))

where we obtain that T0,1 = 131 and T1,1 = 7, with their sum equal to T1 . Similarly, we can calculate T1,0
and T0,0 and the estimates are 131 and 5998, respectively, that sum to T0 . Since we look at violations in
two consecutive days we lose one observation and our total sample size is now T 1 = TRUE. We can then
calculate the conditional probabilities of a violation in a day given that the previous day there was no violation
as
0,1 = T0,1 /(T0,1 + T1,1 ) while the probability of a violation in two consecutive days, that is,
1,1 = 1

0,1 = T1,1 /(T0,1 + T1,1 ). Similarly, we can calculate


1,0 and
0,0 . For the daily S&P 500 introduced earlier,
the estimates are
1,1 = 0.051 and
0,1 = 0.949, and
1,0 = 0.021 and
0,0 = 0.979. While we have an overall
estimated probability of a violation of 2.2%, this probability is equal to 5.072% after a day in which the risk
model was violated and 2.137% following a day without a violation. Since these probabilities are significantly
different from each other, we should conclude that the violations are not independent over time. To make this
conclusion in statistical terms, we can statistically test the hypothesis of independence of the violations by
stating the null hypothesis as 0,1 = 1,1 = which can be tested using the same likelihood ratio approach
discussed earlier. In this case, the statistic is calculated as the ratio of the likelihood under independence,
L(
, T1 , T ), relative to the likelihood under dependence, which we denote by L(
0,1 ,
1,1 , T0,1 , T1,1 , T )
which is given by
T

1,0
0,1
L(
0,1 ,
1,1 , T0,1 , T1,1 , T ) =
1,0
(1
1,0 )T1,1
0,1
(1
0,1 )T0,0

the test statistic and distribution for the hypothesis of Independence (IND) in this case is
(
IN D = 2 ln

L(
, T0 , T )
L(
0,1 ,
1,1 , T0,1 , T1,1 , T )

)
21

and the critical values are the same as for the UC test. The numerator of the IND test is the likelihood in the
denominator of the UC test and it is based on the empirical estimate of rather than the theoretical value .
The value of the IN D test statistic is 4.04 which is greater relative to the critical value at 5% so that we reject
the null hypothesis that the violations occur independently.
Finally, we might be interested in testing the hypothesis 0,1 = 1,1 = 0.01 which tests jointly the
independence of the violations as well as the coverage which should be equal to 1%. This test is referred
to as Conditional Coverage and the test statistic is given by the sum of the previous two test statistics, that
is, CC = IN D + U C and it is distributed as a 22 . The critical values at 1, 5, and 10% in this case are 9.21,
5.99, and 4.61, respectively.

You might also like