Professional Documents
Culture Documents
Global Association
@GARP of Risk Professionals
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Copyright© 2017, 2016, 2015, 2014, 2013, 2012, 2011 by Pearson Education, Inc.
All rights reserved.
Pearson Custom Edition.
This copyright covers material written expressly for this volume by the editor/s as well as the compilation
itself. It does not cover the individual selections herein that first appeared elsewhere. Permission to reprint
these has been obtained by Pearson Education, Inc. for this edition only. Further reproduction by any means,
electronic or mechanical, induding photocopying and recording, or by any information storage or retrieval
system, must be arranged with the individual copyright holders noted.
Grateful acknowledgment I• made to the followlng murce• for permlHlon to reprint materlal
copyrighted or controlled by them:
Chapters 2, 3, 4, 6, and 7 by Michael Miller, reprinted from Mathematics and Statistics for Finandal Risk
Management, Second Edition (2013), by permission of John Wiley & Sons, Inc.
Chapters 10 and 11 by John Hull, reprinted from Risk Management and Finandal Institutions, Fourth Edition,
by permission of John Wiley & Sons, Inc.
Chapters 5, 6, 7, and 8 by Francis X. Diebold, reprinted from Elements of Forecasting, Fourth Edition (2006),
by permission of the author.
Chapter 13 "Simulation Methods," by Chris Brooks, reprinted from Introductory Econometrics for Finance,
Third Edition (2014), by permission of cam bridge University Press.
All trademarks, service marks, registered trademarks, and registered service marks are the property of their
respective owners and are used herein for identification purposes only.
Pearson Education, Inc., 330 Hudson Street, New York, New York 10013
A Pearson Education Company
www.pearsoned.com
1 2 3 4 5 6 7 8 9 10 xxxx 19 18 17 16
000200010272074294
EEB/KS
Ill
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Student•s t Distribution 39
F-Distribution 40 CHAPTER 6 LINEAR REGRESSION
Triangular Distribution 41 WITH ONE REGRESSOR 69
Beta Distribution 42
The Linear Regression Model 70
Mixture Distributions 42
Estimating the Coefficients
of the Linear Regression Model 72
47 The Ordinary Least
CHAPTER4 BAYESIAN ANALYSIS
Squares Estimator 73
OLS Estimates of the Relationship
Overview 48 Between Test Scores and the
Student-Teacher Ratio 74
Bayes' Theorem 48 Why Use the OLS Estimator? 75
Iv • Contents
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Concluslon 100
CHAPTER7 REGRESSION WITH A
SINGLE REGRESSOR 87 Summary 101
Appendix 101
Testing Hypotheses about The Gauss-Markov Conditions and
One of the Regression a Proof of the Gauss-Markov
Theorem 101
Coefficients 88
The Gauss-Markov Conditions 101
Two-Sided Hypotheses Concerning 131 88
The Sample Average Is the Efficient
One-Sided Hypotheses Concerning 131 90
Linear Estimator of E(Y) 102
Testing Hypotheses about
the Intercept (J0 91
Contents • v
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
vi • Contents
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) PsffI: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
viii • Contents
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Contents • Ix
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Dr. Rene Stulz•, Everett D. Reese Chair of Banking and Dr. Victor Ng, CFA, MD, Chief Risk Architect, Market Risk
Monetary Economics Management and Analysis
The Ohio State University Goldman Sachs
Richard Apostolik, President and CEO Dr. Matthew Pritsker . Senior Financial Economist
Global Association of Risk Professionals Federal Reserve Bank of Boston
Michelle McCarthy Beck, MD, Risk Management Dr. Samantha Roberts, FRM, SVP, Retail Credit Modeling
Nuveen Investments PNC
Richard Brandt, MD, Operational Risk Management Liu Ruixia, Head of Risk Management
Citibank Industrial and Commercial Bank of China
Dr. Christopher Donohue, MD Dr. Til Schuermann, Partner
Global Association of Risk Professionals Oliver Vt.yman
Herve Geny, Group Head of Internal Audit Nick Strange, FCA, Head of Risk Infrastructure
London Stock Exchange Bank of England, Prudential Regulation Authority
Keith Isaac, FRM, VP, Operational Risk Management Sverrir Thorvaldsson, FRM, CRO
TD Bank Jslandsbanki
William May, SVP
Global Association of Risk Professionals
Dr. Attilio Meucci, CFA
CRO, KKR
·Chairman
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
t1tat1ve Analysis, Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe and distinguish between continuous and • Distinguish between independent and mutually
discrete random varia bles. exclusive events.
• Define and distinguish between the probability • Define joint probability, descri be a probability matrix,
density function, the cumulative distri bution and calculate joint probabilities using probability
function, and the inverse cumulative distribution matrices.
function. • Define and calculate a conditional probability, and
• Calculate the probability of an event given a discrete distinguish between conditional and unconditional
probability function. probabilities.
i Chapter 2 of Mathematics and Statistics for Financial Risk Management, Second Edition, by Michael B. Miller.
Excerpt s
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In this chapter we explore the application of probabilities this reason, for a continuous varia ble, the probability of
to risk management. We also introduce basic terminol any specific value occurring is zero.
ogy and notations that will be used throughout the rest of Even though we cannot talk about the proba bility of a
this book.
specific value occurring, we can talk a bout the probabil
ity of a varia ble being within a certain range. Take, for
DISCRETE RANDOM VARIABLES example, the return on a stock market index over the next
year. We can talk about the probability of the index return
The concept of probability is central to risk management. being between 6% and 7%, but talking a bout the prob
Many concepts associated with proba bility are decep a bility of the return being exactly 6.001 % is meaningless.
tively simple. The basics are easy, but there are many Between 6% and 7% there are an infinite number of pos
potential pitfalls. sible values. The probability of anyone of those infinite
values occurring is zero.
In this chapter, we will be working with both discrete
and continuous random varia bles. Discrete random vari For a continuous random variable X, then, we can write:
(1.31)
a bles can take on only a countable number of values-for P[r1 < X < r2]= p
example, a coin, which can be only heads or tails, or a
bond, which can have only one of several letter ratings which states that the probability of our random varia ble,
(AAA, AA, A, BBB, etc.). Assume we have a discrete ran X, being between r, and r2 is equal top.
dom variable X, which can take various values, X;. Further
assume that the probability of any given X; occurring is P; Probability Density Functions
We write: For a continuous random variable, the proba bility of a
P[X= x,J = p, s.t. x1 E {x1, X2, . • • , xn} (1.1) specific event occurring is not well defined, but some
1 events are still more likely to occur than others. Using
wher e P[·] is our probability operator.
annual stock market retu rns as an example, if we look at
An important property of a random variable is that the 50 years of data, we might notice that there are more
sum of all the probabilities must equal one. In other data points between 0% and 10% than there are between
words, the proba bility of any event occurring must equal 10% and 20%. That is, the density of points between 0%
one. Something has to happen. Using our current nota- and 10% is higher than the density of points between 10%
tion, we have: and 20%.
For a continuous random variable we can define a prob
tP; = 1
i=i
(1.2) a bility density function (PDF), which tells us the likelihood
of outcomes occurring between any two points. Given
our random varia ble, X. with a probability p of being
between r1 and r2' we can define our density function, ((;<).
CONTINUOUS RANDOM VARIABLES
such that:
In contrast to a discrete random varia ble, a continuous
random variable can take on any value within a given r,
'�..
1 "s.t.• is shorthand for "such that•. The final term indicates that x, J f(x)dx= 1 (1.S)
is a member of a set that includes n possible values, x,. x2, •••• xn.
You could read the full equation as: "The probability that X equals
x1 is equal to Pr such that x1 is a member of the set x1, x2• to xn. g where 'min and rmu. define the lower and upper bounds of f()<).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
[ I
Ques tion :
Define the probability density function for the price of a
zero coupon bond with a notional value of $10 as:
i 50
0
� dx = -1 - xi
100
=
1- - ( 92
100
_ 82 ) = JZ_
100
=
17%
so
where x is the price of the bond. What is the probability
that the price of the bond is between $8 and $9?
Ans wer : Cumulatlve Distribution Functions
First, note that this is a legitimate pr obability function. By
Closely related to the concept of a probability density
integrating the PDF from its minimum to its maximum,
function is the concept of a cumulative distribution func
we can sh ow that the pr obability of any value occurring is
tion or cumulative density function (both abbreviated
indeed one :
CD F). A cumulative distribution function tells us the
probability of a random variable being less than a certain
value. The CDF can be found by integrating the probabil
ity density function from its lower bound. Traditionally, the
If we graph the function, as in Figure 1-1, we can also cumulative distribution function is denoted by the capital
see that the area under the curve is one. Using simple letter of the corresponding density function. For a random
geometry: variable X with a probability density function f(.x), then,
F(a) =
f f(x)dx =
P[X s a] (1.1)
Chapter 1 Probabllltles • 5
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0.2 Examplel.2
Question:
Calculate the cumulative distribution function
for the probability density function from the
previous problem:
--
� CDF(a=7) f(x) = 50x - s.t 0 :S x :S 10 (1.10)
� O.i
...
Then answer the previous problem: What is
the probability that the price of the bond is
between $8 and $9?
Answer:
The CDF can be found by integrating the PDF:
jr(x)dx j� dx
,[, ]"
F(a)
0.0 �------...,..-..-�--,,-
= =
0 050
J d
, G
0 2 4 6 8 10
x 2
= - x x= - -x
liM'hlilfl Relationship between cumulative distribution 50 0 50 2 0 100
f(x) = dF(x)dx
(1.7)
As before, the probability of the price ending up between
$8 and $9 is 17 '6.
That the CDF is nondecreasing is another way of saying
that the PDF cannot be negative.
If instead of wanting to know the probability that a ran
dom variable is less than a certain value, what if we want Inverse Cumulative Distribution
to know the probability that it is greater than a certain
Functions
value, or between two values? We can handle both cases
by adding and subtracting cumulative distribution func The inverse of the cumulative distribution can also be
tions. To find the probability that a variable is between useful. For example, we might want to know that there is
two values, a and b, assuming bis greater than a , we a 5% probability that a given equity index will return less
subtract: than -10.6%, or that there is a 1 % probability of interest
rates increasing by more than 2 '6over a month.
P[a < X � b] = J
b
"
f(x)dx = F(b) - F(a) (1.8) More formally, if F(a) is a cumulative distribution function,
then we define F-1(p), the inverse cumulative distribution,
To get the probability that a variable is greater than a cer as follows:
tain value, we simply subtract from 1:
F(a) = p � F-1<.p) = a s.t. o :s: p :s: 1 (1.11)
P[X > a] = 1 - F (a) (1.9)
As we will see in Chapter 3, while some popular distri
This result can be obtained by substituting infinity for bin butions have very simple inverse cumulative distribution
the previous equation, remembering that the CDF at infin functions, for other distributions no explicit
ity must be 1. inverse exists.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Chapter 1 Probabilities • 7
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Stock
CONDITIONAL PROBABILITY
Outperform Undarparform
Bonds Upgrade 15% 5% 20% The concept of independence is closely related to the
concept of conditional probability. Rather than trying to
No Change 30% 25% 55% determine the probability of the market being up and
Downgrade 5% 20% 25% having rain, we can ask, NWhat is the probability that the
stock market is up given that it is raining?" We can write
50% 50% 100% this as a conditional probability:
Bonds versus stock matrix. P[market up I rain]= p (U4)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The vertical bar signals that the probability of the first independent of rain, then the probability that the market
argument is conditional on the second. You would read is up given that it is raining must be equal to the uncon
Equation (1.14) as "The probability of 'market up' given ditional probability of the market being up. To see why
'rain' is equal top.• this is true, we replace the conditional probability of the
Using the conditional probability, we can calculate the market being up given no rain with the conditional prob
probability that it will rain and that the market will be up. ability of the market being up given rain in Equation (1.17)
(we can do this because we are assuming that these two
P[market up and rain] = P[market up I rain] P[rain] (1.15) ·
conditional probabilities are equal).
For example, if there is a 10% probability that it will rain P[market up] = P[market up I rain] P[rain] ·
tomorrow and the probability that the market will be up + P[market up I rain] . P[rain]
given that it is raining is 40%, then the probability of rain
and the market being up is 4%: 40% x 10% = 4%. P[market up] = P[market up I rain] (P[rain] + P[rain])
·
tion; rain can be read as "not rain." Remember that Equation (1.21) is true only if the market
In general, if a random variable X has n possible values, x1, being up and rain are independent. If the weather some
x2, , xn, then the unconditional probability of Y can be
• • •
how affects the stock market, however, then the condi
tional probabilities might not be equal. We could have a
n
calculated as:
situation where:
P[Y] = L P[Y I x,]P[x,] (1.18)
/:1 P[market up I rain] * P[market up I rain] (1.:Z:Z)
If the probability of the market being up on a rainy day In this case, the weather and the stock market are no lon
is the same as the probability of the market being up on ger independent. We can no longer multiply their prob
a day with no rain, then we say that the market is condi abilities together to get their joint probability.
tionally independent of rain. If the market is conditionally
Chapter 1 Probabllltles • 9
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Interpret and apply the mean, standard deviation, • Describe the four central moments of a statistical
and variance of a random variable. variable or distribution: mean, variance, skewness.
• Calculate the mean, standard deviation, and variance and kurtosis.
of a discrete random variable. • Interpret the skewness and kurtosis of a statistical
• Interpret and calculate the expected value of a distribution, and interpret the concepts of
discrete random variable. coskewness and cokurtosis.
• Calculate and interpret the covariance and • Describe and interpret the best linear unbiased
correlation between two random variables. estimator.
• Calculate the mean and variance of sums of
variables.
Excerpt s
i Chapter 3 of Mathematics and Statistics for Financial Risk Management, Second Edition, by Michael B. Miller.
11
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In this chapter we will learn how to describe a collection where jl(pronounced "mu hatN) is our estimate of the true
of data in precise statistical terms. Many of the concepts mean. based on our sample of n returns. We call this the
will be familiar, but the notation and terminology might sample mean.
be new. The median and mode are also types of averages. They
are used less frequently in finance, but both can be use
ful. The median represents the center of a group of data;
AVERAGES within the group, half the data points will be less than the
median, and half will be greater. The mode is the value
Everybody knows what an average is. We come across that occurs most frequently.
averages every day, whether they are earned run averages
in baseball or grade point averages in school. In statis
Example 2.1
tics there are actually three different types of averages:
means, modes, and medians. By far the most commonly Question:
used average in risk management is the mean. Calculate the mean, median, and mode of the following
data set:
Population and Sample Data -20%, -10%, -5%, -5%, 0%, 10%, 10%, 10%, 19%
If you wanted to know the mean age of people working Answer:
in your firm, you would simply ask every person in the
firm his or her age, add the ages together, and divide by Mean: = _! (-20% -10% -5% -5% + 0% + 10% + 10%
9
the number of people in the firm. Assuming there are n
+ 10% + 19%)
employees and a1 is the age of the ith employee, then the
=1%
mean, p. is simply:
Mode = 10%
µ. = -!.a = -(a + � + · · · + an- + an ) .
1 " 1
n l�l 1 n 1
(2.1) Median = 0%
I
It is important at this stage to differentiate between If there is an even number of data points, the median is
population statistics and sample statistics. In this exam found by averaging the two centermost points. In the fol
ple, p. is the population mean. Assuming nobody lied lowing series:
about his or her age, and forgetting about rounding
errors and other trivial details, we know the mean age
5%, 10%, 20%, 25%
of the people in your firm exactly. We have a complete the median is 15%. The median can be useful for summariz
data set of everybody in your firm; we've surveyed the ing data that is asymmetrical or contains significant outliers.
entire population. A data set can also have more than one mode. If the
This state of absolute certainty is, unfortunately, quite rare maximum frequency is shared by two or more values, all
in finance. More often, we are faced with a situation such of those values are considered modes. In the following
as this: estimate the mean return of stock ABC, given the example, the modes are 10% and 20%:
most recent year of daily returns. In a situation like this, 5%, 10%, 10%, 10%, 14%, 16%, 20%, 20%, 20%, 24%
we assume there is some underlying data-generating pro
cess, whose statistical properties are constant over time. In calculating the mean in Equation (2.1) and Equa-
The underlying process has a true mean, but we cannot tion (2.2), each data point was counted exactly once. In
observe it directly. We can only estimate the true mean certain situations, we might want to give more or less
based on our limited data sample. In our example, assum weight to certain data points. In calculating the aver
ing n returns, we estimate the mean using the same for age return of stocks in an equity index, we might want
mula as before: to give more weight to larger firms, perhaps weighting
their returns in proportion to their market capitalizations.
1 n = -1 (r. + r. + · · · + r + r )
Ji.= -!.r. Given n data points, x, = x1, x2, x. with corresponding
2 (2.2) • • • ,
n l-l I n 1 n-1 n
weights, w1, we can define the weighted mean, p.,.... as:
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
=
I,w1x1
n P[V = $40] = 20%
P[V =$200] =45%
µ.,.. �
I,w,
(2.3)
1-1
At year-end, the value of the portfolio, V. can have only
one of three values, and the sum of all the probabilities
The standard mean from Equation (2.1) can be viewed as
must sum to 100%. This allows us to calculate the final
a special case of the weighted mean, where all the values
probability:
have equal weight.
P[V = $120] = 100% - 20% - 45% = 35%
Discrete Random Variables
The mean of Vis then $140:
For a discrete random variable, we can also calculate the
mean, median, and mode. For a random variable, X, with µ. = 0.20 · $40 + 0.35 · $120 + 0.45 · $200 = $140
possible values, X;, and corresponding probabilities, P;o we
define the mean, IJ., as: The mode of the distribution is $200; this is the most
likely single outcome. The median of the distribution is
(2.4) $120; half of the outcomes are less than or equal to $120.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
To calculate the median, we need to find m, such that the As with the median, it is a common mistake, based on
integral of f(,x) from the lower bound of f(,x), zero, to m is inspection of the PDF, to guess that the mean is 5. How
equal to 0.50. That is, we need to find: ever, what the PDF is telling us is that outcomes between
5 and 10 are much more likely than values between 0 and
J- dx = 0.50
mx 5 (the PDF is higher between 5 and 10 than between 0
0 50 and 5). This is why the mean is greater than 5.
First we solve the left-hand side of the equation:
EXPECTATIONS
0.2
On January 15, 2005, the Huygens space probe
landed on the surface of Titan, the largest moon
of Saturn. This was the culmination of a seven
year-long mission. During its descent and for
over an hour after touching down on the sur
face, Huygens sent back detailed images, scien
tific readings, and even sounds from a strange
world. There are liquid oceans on Titan, the
� 0.1
....
landing site was littered with "rocksN composed
of water ice, and weather on the moon includes
methane rain. The Huygens probe was named
after Christiaan Huygens, a Dutch polymath
who first discovered Titan in 1655. In addition
to astronomy and physics, Huygens had more
prosaic interests, including probability theory.
Originally published in Latin in 1657, De Ratio
0.0
ciniis in Ludo A/eae, or On the Logic of Games
0 2 4 6 8 10
of Chance, was one of the first texts to formally
x
explore one of the most important concepts in
Probability density function. probability theory, namely expectations.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Like many of his contemporaries, Huygens was interested Now pretend that we don't actually know what the mean
in games of chance. As he described it, if a game has a return of the asset is, but we have 10 years' worth of his
50% probability of paying $3 and a 50% probability of torical data for which the mean is 15%. In this case the
paying $7, then this is, in a way, equivalent to having $5 expected value may or may not be 15%. /fwe decide that
with certainty. This is because we expect, on average, to the expected value is equal to 15%, based on the data,
win $5 in this game: then we are making two assumptions: first, we are assum
50% . $3 + 50% . $7 = $5 ing that the returns in our sample were generated by the
(2.9)
same random process over the entire sample period; sec
As one can already see, the concepts of expectations and ond, we are assuming that the returns will continue to be
averages are very closely linked. In the current example, if generated by this same process in the future. These are
we play the game only once, there is no chance of winning very strong assumptions. lfwe have other information that
exactly $5; we can win only $3 or $7. Still, even if we play leads us to believe that one or both of these assumptions
the game only once, we say that the expected value of the are false, then we may decide that the expected value is
game is $5. That we are talking about the mean of all the something other than 15%. In finance and risk manage
potential payouts is understood. ment, we often assume that the data we are interested in
We can express the concept of expectations more for are being generated by a consistent, unchanging process.
mally using the expectation operator. We could state that Testing the validity of this assumption can be an impor
the random variable, X, has an expected value of $5 as tant part of risk management in practice.
follows: The concept of expectations is also a much more general
E[X] = 0.50 . $3 + 0.50 . $7 = $5 (2.10) concept than the concept of the mean. Using the expecta
tion operator. we can derive the expected value of func
where E[·] is the expectation operator.1
tions of random variables. As we will see in subsequent
In this example, the mean and the expected value have sections, the concept of expectations underpins the defi
the same numeric value, $5. The same is true for discrete nitions of other population statistics (variance, skewness,
and continuous random variables. The expected value kurtosis), and is important in understanding regression
of a random variable is equal to the mean of the random analysis and time series analysis. In these cases, even when
variable. we could use the mean to describe a calculation, in prac
While the value of the mean and the expected value may tice we tend to talk exclusively in terms of expectations.
be the same in many situations, the two concepts are not
exactly the same. In many situations in finance and risk Example 2.4
management, the terms can be used interchangeably. The
Question:
difference is often subtle.
At the start of the year, you are asked to price a newly
As the name suggests, expectations are often thought
issued zero coupon bond. The bond has a notional of
of as being forward looking. Pretend we have a financial
$100. You believe there is a 20% chance that the bond will
asset for which next year's mean annual return is known
default, in which case it will be worth $40 at the end of
and equal to 15%. This is not an estimate; in this hypotheti
the year. There is also a 30% chance that the bond will be
cal scenario, we actually know that the mean is 15%. We
downgraded, in which case it will be worth $90 in a year's
say that the expected value of the return next year is 15%.
time. If the bond does not default and is not downgraded,
We expect the return to be 15%, because the probability
it will be worth $100. Use a continuous interest rate of 5%
weighted mean of all the possible outcomes is 15%.
to determine the current price of the bond.
Answer:
We first need to determine the expected future value of
1 Those of you with a background in physics might be more famil
iar with the term expectation value and the notation 00 rather
the bond-that is, the expected value of the bond in one
year's time. We are given the following:
than E[X]. This is a matter of convention. Throughout this book
we use the term expected value and E[ J. which are currently P[Vl'+l = $40] = 0.20
more popular in finance and econometrics. Risk managers should
be familiar with both conventions. P[Vl'+1 = $90] = 0.30
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Because there are only three possible outcomes, the prob E[XY] = 0.50 . $100 . $0 + 0.50 . $0 . $100 = $0 (2.14)
ability of no downgrade and no default must be 50%:
In this case, the product of the expected values and the
P[V1+1 = $100] = 1 - 0.20 - 0.30 = 0.50 expected value of the product are clearly not equal. In the
The expected value of the bond in one year is then: special case where E[XYJ = EIX.JE[YJ, we say that X and Y
are independent.
E[V1+i] = 0.20 $40 + 0.30 $90 + 0.50 $100 = $85
· · ·
Imagine we have two binary options. Each pays either Given the following equation,
$100 or nothing, depending on the value of some underly y = (x + 5)3 + x2 + 10x
ing asset at expiration. The probability of receiving $100 is what is the expected value of y? Assume the following:
50% for both options. Further, assume that it is always the
case that if the first option pays $100, the second pays $0, E[x] = 4
and vice versa. The expected value of each option sepa E[X2] = 9
rately is clearly $50. If we denote the payout of the first
E[X3] = 12
option as X and the payout of the second as Y, we have:
Answer:
E[X] = E[YJ = 0.50 . $100 + 0.50 . $0 = $50 (2.13)
Note that .E[x1J and EL\'"1] cannot be derived from knowl
It follows that E[X]E[Y] = $50 x $50 = $2,500. In each
edge of E[x]. In this problem, E[x2] * E[xJl and E[x3] *
scenario, though, one option is valued at zero, so the
product of the payouts is always zero: $100 $0 = $0 · ·
E[x]3.
$100 = $0. The expected value of the product of the two To find the expected value of y, then, we first expand the
option payouts is: term (x + 5)3 within the expectation operator:
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
E[y] = E[(x + 5)3 + x2 + 10x] = E[x3 + 16x2 + 85x + 125] In the previous example, we were calculating the popula
tion variance and standard deviation. All of the possible
Because the expectation operator is linear, we can sepa
outcomes for the derivative were known.
rate the terms in the summation and move the constants
outside the expectation operator: To calculate the sample variance of a random variable X
based on n observations, x1, x� . . . , x,, we can use the fol
E[y] = E[x3] + E[16x2] + E[85x] + E[125]
lowing formula:
= �] + 16E[Xl] + 85E[x] + 125
At this point, we can substitute in the values for E[xl �
>:
= 1
-- f <x
n - 1 1�1 I
- }1x)
E[x21 and E[x31 which were given at the start of the
E[a!J = a2x (2.19)
exercise:
where }ix is the sample mean as in Equation (2.2). Given
E[y] =12 + 16 . 9 + 85 . 4 + 125 = 621
that we have n data points, it might seem odd that we
This gives us the final answer, 621. are dividing the sum by (n - 1) and not n. The reason
has to do with the fact that V.., itself is an estimate of
the true mean, which also contains a fraction of each xr
It turns out that dividing by (n - 1), not n, produces an
VARIANCE AND STANDARD unbiased estimate of a 2• If the mean is known or we are
calculating the population variance, then we divide by
DEVIATION
n. If instead the mean is also being estimated, then we
divide by n - 1.
The variance of a random variable measures how noisy or
unpredictable that random variable is. Variance is defined Equation (2.18) can easily be rearranged as follows (the
as the expected value of the difference between the vari proof of this equation is also left as an exercise):
able and its mean squared:
(2.20)
u2 = E[(X - µ.)2] (2.18)
Note that variance can be nonzero only if E[X2] + E[X]2•
where u2 is the variance of the random variable X with
When writing computer programs, this last version of the
mean µ..
variance formula is often useful, since it allows us to calcu
The square root of variance, typically denoted by 11, is late the mean and the variance in the same loop.
called standard deviation. In finance we often refer to In finance it is often convenient to assume that the mean
standard deviation as volatility. This is analogous to refer of a random variable is equal to zero. For example, based
ring to the mean as the average. Standard deviation is a on theory, we might expect the spread between two
mathematically precise term, whereas volatility is a more
equity indexes to have a mean of zero in the long run. In
general concept.
this case, the variance is simply the mean of the squared
returns.
Example 2.6
Question:
Exampla 2.7
A derivative has a 50/50 chance of being worth either
+10 or -10 at expiry. What is the standard deviation of the Question:
derivative's value? Assume that the mean of daily Standard 8t Poor's (SBtP)
Answer: 500 Index returns is zero. You observe the following
returns over the course of 10 days:
"' = 0.50 . 10 + 0.50 . (-10) = 0
u2 = 0.50 . (10 - 0)2 + 0.50 . (-10 - 0)2
= 0.5 · 100 + 0.5 · 100 = 100
I 7% 1-4%1 11% I 8% I 3% I 9% l-21%1 10%1-9%1 -1%I
Estimate the standard deviation of daily S&P 500 Index
O' = 10
returns.
2017 Financial Risk Manager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017Financial Risk Manager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
+� <1 - 0>(1 - �) = o
3 3 3 3
In the special case where the correlation between the two
securities is zero, we can further simplify our equation. For
the standard deviation:
Because the covariance is zero, the correlation is also
p,..9 =0
aA+s = Fia
==-- (2.33)
zero. There is no need to calculate the variances or stan
dard deviations. We can extend Equation (2.31) to any number of variables:
As forewarned, even though X and Yare clearly related,
their correlation is zero.
i,1=1 x
Y= ,
a; = ff p aa (2.34)
1=1 1=1 , 1 1
APPLICATION: PORTFOLIO VARIANCE
In the case where all of the J<;'s are uncorrelated and all
AND HEDGING
the variances are equal to a, Equation (2.32) simplifies to:
If we have a portfolio of securities and we wish to deter
mine the variance of that portfolio, all we need to know is
oy = /;io iff p1 = O V i .Pj (2.35)
the variance of the underlying securities and their respec This is the famous square root rule for the addition of
tive correlations. uncorrelated variables. There are many situations in sta
tistics in which we come across collections of random
For example, if we have two securities with random
variables that are independent and have the same sta
returns XA and X8, with means IJ.A and lls and standard
tistical properties. We term these variables independent
deviations CJA and CJ8, respectively, we can calculate the
and identically distributed (i.i.d.). In risk management
variance of XA plus Xs as follows:
we might have a large portfolio of securities, which can
(2.31) be approximated as a collection of i.i.d. variables. As we
where PAs is the correlation between XA and Xs. The proof will see in subsequent chapters, this i.i.d. assumption
is left as an exercise. Notice that the last term can either also plays an important role in estimating the uncertainty
increase or decrease the total variance. Both standard inherent in statistics derived from sampling, and in the
deviations must be positive; therefore, if the correlation analysis of time series. In each of these situations, we will
is positive, the overall variance will be higher than in the come back to this square root rule.
case where the correlation is negative. By combining Equation (2.31) with Equation (2.22), we
If the variance of both securities is equal, then Equa arrive at an equation for calculating the variance of a lin
tion (2.31) simplifies to: ear combination of variables. If Y is a linear combination
of XA and Xs, such that:
(2.32)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
a
where is the standard deviation of X, and µ. is
the mean of X.
-
By standardizing the central moment, it is
much easier to compare two random variables.
Negative Skew
Multiplying a random variable by a constant
will not change the skewness. --- No Skew
x1, , xm
negative skew is generally considered to be more risky. statistic, the skewness of a random variable X, based on
Historical data suggest that many financial assets exhibit n observations, X7 can be calculated as:
±(x, )
• • •
negative skew.
s = _.! µ J
-
As with variance, the equation for skewness differs a
(2.44)
n ,_,
depending on whether we are calculating the population
a
where µ. is the population mean and is the population
standard deviation. Similar to our calculation of sample
.
variance, if we are calculating the sample skew
- ness there is going to be an overlap with the
calculation of the sample mean and sample stan
dard deviation. We need to correct for that. The
A )3
sample skewness can be calculated as:
A (
-
n n .:.:L___!:
x - µ.
(2.45)
(n - l)(n - 2) tr a
s=
Positive Skew
--- No Skew
Based on Equation (2.20), for variance, it is
tempting to guess that the formula for the third
central moment can be written simply in terms of
E[X3] and µ.. Be careful, as the two sides of this
equation are not equal:
E[(X + µ.)k] :F E[Xl] - µ.3 (2.48)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
K = E[(Xa" µ.)4]
-
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
( A)'
x,- µ.
3)� a
n<n + l) n
K= (2.50)
(n - l)(n - 2)(n -
In the next chapter we will study the normal
distribution, which has a kurtosis of 3. Because
normal distributions are so common, many
people refer to "excess kurtosis,0 which is sim
High Kurtosis ply the kurtosis minus 3.
No Excess
K......,. = K - 3 (2.51)
In this way, the normal distribution has an
excess kurtosis of 0. Distributions with posi
tive excess kurtosis are termed leptokurtotic.
Distributions with negative excess kurtosis are
termed platykurtotic. Be careful; by default,
many applications calculate excess kurtosis,
not kurtosis.
- . . . .
When we are also estimating the mean and
variance, calculating the sample excess kurtosis
Uitfili);Jiif'tI High kurtosis. is somewhat more complicated than just sub
tracting 3. If we have n points, then the correct
formula is:
- - (n - 1)2
Ke,,,,.• = K - 3 (2.52)
(n - 2)(n - 3)
where f< is the sample kurtosis from Equa
tion (2.50). As n increases, the last term on
the right-hand side converges to 3.
Low Kurtosis
No Excess
COSKEWNESS AND
COKURTOSIS
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Ia[C11J:li
i?E Combined fund returns.
20%
•
that the covariance between managers A and B
is the same as the covariance between manag
•
lI
5%
ers C and D.
If we combine A and B in an equally weighted B 0% ,----,- ....,.-
•
portfolio and combine C and D in a separate
equally weighted portfolio, we get the returns -5%
shown in Figure 2-7. •
The two portfolios have the same mean and -10%
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
20%
or cokurtosis. One reason that many models
avoid these higher-order cross moments is prac
15% • tical. As the number of variables increases, the
number of nontrivial cross moments increases
10%
rapidly. With 10 variables there are 30 coskew
•
ness parameters and 65 cokurtosis parameters.
With 100 variables, these numbers increase to
•
5%
171,600 and over 4 million, respectively. Fig-
•
D 0%
ure 2-11 compares the number of nontrivial cross
moments for a variety of sample sizes. In most
-5%
cases there is simply not enough data to calcu
•
late all of these cross moments.
-10% Risk models with time-varying volatility (e.g.,
GARCH) or time-varying correlation can dis
-15% • play a wide range of behaviors with very few
free parameters. Copulas can also be used to
-20% describe complex interactions between vari-
-20% -15% -10% -5% 0% 5% 10% 15% 20% ables that go beyond covariances, and have
c become popular in risk management in recent
li[�ll;l�!'�I Funds C and D. years. All of these approaches capture the
essence of coskewness and cokurtosis, but
in a more tractable framework. As a risk manager, it is
A+B C+D important to differentiate between these models-which
address the higher-order cross moments indirectly-and
SXXY 0.99 -0.58 models that simply omit these risk factors altogether.
SXYY 0.58 -0.99
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
population data. In statistics we refer to these parameter seems rather subjective. What we need is an objective
estimates, or to the method of obtaining the estimate, as measure for comparing different unbiased estimators.
an estimator. For example, at the start of the chapter, we As we will see in coming chapters, just as we can measure
introduced an estimator for the sample mean: the variance of random variables, we can measure the
variance of parameter estimators as well. For example,
(2.51) if we measure the sample mean of a random variable
several times, we can get a different answer each time.
This formula for computing the mean is so popular that Imagine rolling a die 10 times and taking the average of
we're likely to take it for granted. Why this equation, all the rolls. Then repeat this process again and again.
though? One justification that we gave earlier is that this The sample mean is potentially different for each sample
particular estimator provides an unbiased estimate of the of 10 rolls. It turns out that this variability of the sample
true mean. That is: mean, or any other distribution parameter, is a function
E[jlJ = p. (2.57) not only of the underlying variable, but of the form of the
estimator as well.
Clearly, a good estimator should be unbiased. That said,
for a given data set, we could imagine any number of When choosing among all the unbiased estimators, stat
unbiased estimators of the mean. For example, assuming isticians typically try to come up with the estimator with
there are three data points in our sample, x,, x2, and x3, the minimum variance. In other words, we want to choose
the following equation: a formula that produces estimates for the parameter that
are consistently close to the true value of the parameter.
µ. = 0.75x1 + 0.25x2 + 0.00x3 (2.58) If we limit ourselves to estimators that can be written as
is also an unbiased estimator of the mean. Intuitively, this a linear combination of the data, we can often prove that
new estimator seems strange; we have put three times as a particular candidate has the minimum variance among
much weight on x, as on x1, and we have put no weight all the potential unbiased estimators. We call an estimator
on x3 • There is no reason, as we have described the prob with these properties the best linear unbiased estimator,
lem, to believe that any one data point is better than any or BLUE. All of the estimators that we produced in this
other, so distributing the weight equally might seem more chapter for the mean, variance, covariance, skewness, and
logical. Still, the estimator in Equation (2.58) is unbiased, kurtosis are either BLUE or the ratio of BLUE estimators.
and our criterion for judging this estimator to be strange
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Distinguish the key properties among the • Describe the central limit theorem and the
following distributions: uniform distribution, implications it has when combining independent and
Bernoulli distribution, Binomial distribution, identically distributed (i.i.d.) random variables.
Poisson distribution, normal distribution, lognormal • Describe i.i.d. random variables and the implications
distribution, Chi-squared distribution, Student's t, of the i.i.d. assumption when combining random
and F-distributions, and identify common variables.
occurrences of each distribution. • Describe a mixture distribution and explain the
creation and characteristics of mixture distributions.
i Chapter 4 of Mathematics and Statistics for Financial Risk Management, Second Edition, by Michael B. Miller.
Excerpt s
29
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In Chapter 1, we were introduced to random variables. In everywhere else. Figure 3-1 shows the plot of a uniform
nature and in finance, random variables tend to follow cer distribution's probability density function.
tain patterns, or distributions. In this chapter we will learn Because the probability of any outcome occurring must
about some of the most widely used probability distribu be one, we can find the value of c as follows:
tions in risk management.
UNIFORM DISTRIBUTION
c - -----------------------· .------.
For a continuous random variable, X, recall that
the probability of an outcome occurring between
b, and b2 can be found by integrating as follows:
b,
In other words, the probability density is con 14[itl!il¥§1 Probability density function of a uniform
stant and equal to c between b, and b2, and zero distribution.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In other words, the mean is just the average of the start for whom the Bernoulli distribution is named. In addition
and end values of the distribution. to the Bernoulli distribution, Jacob is credited with first
Similarly, for the variance, we have: describing the concept of continuously compounded
returns, and, along the way, discovering Euler's
0'2 = �J
b,
=
c(x - µ)2 dJc' _!_ (b2 - b )2
12 , (J.4)
number. e.
The Bernoulli distribution is incredibly simple. A Bernoulli
random variable is equal to either zero or one. If we define
= =
This result is not as intuitive.
p as the probability that X equals one, we have:
P[X = 1] = p and P[X = 0] = 1 - p
For the special case where b1 0 and b2 1, we refer to
(.J.8)
the distribution as a standard uniform distribution. Stan
dard uniform distributions are extremely common. The We can easily calculate the mean and variance of a Ber
= =
default random number generator in most computer pro noulli variable:
grams (technically a pseudo random number generator) µ p · 1 + (1 - p) · O p
is typically a standard uniform random variable. Because a2 = p · (1 - p)2 +(1-p)·(O - p)2 = p(1 - p) (3.7)
these random number generators are so ubiquitous, uni
form distributions often serve as the building blocks for Binary outcomes are quite common in finance: a bond can
computer models in finance. default or not default; the return of a stock can be positive
or negative; a central bank can decide to raise rates or not
To calculate the cumulative distribution function (CDF) to raise rates.
of the uniform distribution, we simply integrate the PDF.
Again, assuming a lower bound of b1 and an upper bound In a computer simulation, one way to model a Bernoulli
of b2, we have: variable is to start with a standard uniform variable. Con
= J = = ba2 -bbl
veniently, both the standard uniform variable and our
P[X s a]
- Bernoulli probability, p, range between zero and one. If
cdz
•
c[z]• --=-----=i.
(J.S)
ii, II, the draw from the standard uniform variable is less than
p, we set our Bernoulli variable equal to one; likewise, if
As required, when a equals b1, we are at the minimum, and the draw is greater than or equal top, we set the Bernoulli
the CDF is zero. Similarly, when a equals b2, we are at the variable to zero (see Figure 3-2).
maximum, and the CDF equals one.
As we will see later, we can use combinations of uni
form distributions to approximate other more complex BINOMIAL DISTRIBUTION
distributions. As we will see in the next section, uni
form distributions can also serve as the basis of other A binomial distribution can be thought of as a collection
simple distributions, including the Bernoulli distribution. of Bernoulli random variables. If we have two independent
bonds and the probability of default for both is 10%, then
there are three possible outcomes: no bond defaults, one
BERNOULLI DISTRIBUTION bond defaults, or both bonds default. Labeling the num
= =
ber of defaults K:
Bernoulli's principle explains how the flow of fluids or
gases leads to changes in pressure. It can be used to P[K O] (1 - 10%)2 = 81%
explain a number of phenomena, including how the PU< = 1] = 2 10% (1 - 10%) = 18%
• •
P[K = 2] = 10%2 = 1%
wings of airplanes provide lift. Without it, modern avia
tion would be impossible. Bernoulli's principle is named
after Daniel Bernoulli, an eighteenth-century Dutch
Swiss mathematician and scientist. Daniel came from a
=
Notice that for K 1 we have multiplied the probability
of a bond defaulting, 10%, and the probability of a bond
family of accomplished mathematicians. Daniel and his not defaulting, 1 - 10%, by 2. This is because there are
cousin Nicolas Bernoulli first described and presented two ways in which exactly one bond can default: The first
a proof for the St. Petersburg paradox. But it is not bond defaults and the second does not, or the second
Daniel or Nicolas, but rather their uncle, Jacob Bernoulli, bond defaults and the first does not.
Chapter 3 Distributions • 31
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Example 3.1
Ii
U f\11;l¥tfl How to generate a Bernoul l i distribution Question:
from a uniform distribution.
Assume we have four bonds, each with a 10%
probability of defaulting over the next year.
The event of default for any given bond is independent of
If we now have three bonds, still independent and with a the other bonds defaulting. What is the probability that
10% chance of defaulting, then: zero, one, two, three, or all of the bonds default? What is
P[K = OJ = (1 - 10%)3 = 72.9% the mean number of defaults? The standard deviation?
P[K = IJ =3 10% (1 - 10%)2 = 24.3%
· ·
Answer:
P[K = 2] = 3 10%2 • (1 - 10%) = 2.7%
·
We can calculate the probability of each possible out
come as follows:
P[K = 3] = 10%3 = 0.1%
Notice that there are three ways in which we can get
exactly one default and three ways in which we can get
exactly two defaults.
# of
Defaults
(:) pk(l - p)" Probablllty
We can extend this logic to any number of bonds. If 0 1 65.61% 65.61%
we have n bonds, the number of ways in which k of
those bonds can default is given by the number of 1 4 7.29% 29.16%
combi nations: 2 6 0.81% 4.86%
(Z) =
n
k!(n�k)!
(3.8) 3
4
4
1
0.09%
0.01%
0.36%
0.01%
Similarly, if the probability of one bond defaulting is p,
then the probability of any particular k bonds defaulting is 100.00%
simply _pk(l - p)n-1r. Putting these two together, we can cal
culate the probability of any k bonds defaulting as: We can calculate the mean number of defaults two ways.
The first is to use our formula for the mean:
(1.9)
µ. = np = 4 10% = 0.40
·
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
On average there are 0.40 defaults. The other way we occurs in the middle. In other words, when p = 0.50, the
could arrive at this result is to use the probabilities from most likely outcome for a binomial random variable, the
the table. We get: mode, is n/2 when n is even, or the whole numbers either
4
side of n/2 when n is odd.
µ = I_p1x1 = 65.61%·0 + 29.16% · 1 + 4.86%·2 + 0.36% ·3
1=0
+0.01% ·4 = 0.40
POISSON DISTRIBUTION
This is consistent with our earlier result.
To calculate the standard deviation, we also have two Another useful discrete distribution is the Poisson dis
choices. Using our formula for variance, we have: tribution, named for the French mathematician Simeon
Denis Poisson.
u2 = np(l - p) = 4 · 10%(1 - 10%) = 0.36
For a Poisson random variable X,
CJ' = 0.60
'),.n
As with the mean, we could also use the probabilities from P[X = n] = - e-i. (1.10)
the table: n!
4
a2 = IP,
1 (x, - µ)2 for some constant >.., it turns out that both the mean and
�a variance of X are equal to >... Figure 3-4 shows the prob
0'2 = 65.61%. 0.16 + 29.16%. 0.36 + 4.86%. 2.56 + 0.36% . 6.76 ability density functions for three Poisson distributions.
+ 0.01% 12.96 = 0.36
·
O' = 0.60 The Poisson distribution is often used to model the occur
rence of events over time-for example, the number of
Again, this is consistent with our earlier result. bond defaults in a portfolio or the number of crashes in
equity markets. In this case, n is the number of events
Figure 3-3 shows binomial distributions with p = 0.50, for that occur in an interval, and >.. is the expected number of
n = 4, 16, and 64. The highest point of each distribution events in the interval. Poisson distributions are often used
to model jumps in jump-diffusion models.
If the rate at which events occur over time is
constant, and the probability of any one event
occurring is independent of all other events,
•n=4 then we say that the events follow a Poisson
Cl n = 1 6 process, where:
D n = 64
Example 3.2
Question:
Assume that defaults in a large bond port
folio follow a Poisson process. The expected
number of defaults each month is four. What
is the probability that there are exactly three
defaults over the course of one month? Over
Binomial probability density functions. two months?
Chapter 3 Distributions • 33
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
NORMAL DISTRIBUTION
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
This is very convenient. For example, if the log returns of There is no explicit solution for the cumulative standard
individual stocks are independent and normally distrib normal distribution, or for its inverse. That said, most sta
uted, then the average return of those stocks will also be tistical packages will be able to calculate values for both
normally distributed. functions. To calculate values for the CDF or inverse CDF
When a normal distribution has a mean of zero and a for the normal distribution, there are a number of well
standard deviation of one, it is referred to as a standard known numerical approximations.
normal distribution. Because the normal distribution is so widely used, most
practitioners are expected to have at least a rough idea
(J.15) of how much of the distribution falls within one, two, or
three standard deviations. In risk management it is also
It is the convention to denote the standard normal PDF by useful to know how many standard deviations are needed
lfl, and the cumulative standard normal distribution by «I>. to encompass 95% or 99% of outcomes. Figure 3-6 lists
Because a linear combination of independent normal dis some common values. Notice that for each row in the
tributions is also normal, standard normal distributions table, there is a "one-tailed" and "two-tailed" column. If
are the building blocks of many financial models. To get a we want to know how far we have to go to encompass
95% of the mass in the density function, the one-tailed
normal variable with a standard deviation of and a meana
Chapter 3 Distributions • 35
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
LOGNORMAL DISTRIBUTION Equation (3.18) looks almost exactly like the equation for
the normal distribution, Equation (3.12), with x replaced
It's natural to ask: if we assume that log returns are nor by ln(x) . Be careful, though, as there is also the x in the
mally distributed, then how are standard returns distrib denominator of the leading fraction. At first it might not
uted? To put it another way: rather than modeling log be clear what the x is doing there. By carefully rearranging
returns with a normal distribution, can we use another dis Equation (3.18), we can get something that, while slightly
tribution and model standard returns directly? longer, looks more like the normal distribution in form:
The answer to these questions lies in the log normal distri 1 ,
-�
�(l2 nx-(�..-a')J'
= e2" �
f(x)
1 e (3.19)
bution, whose density function is given by:
1 lnx-�
e2[ " ).
l While not as pretty, this starts to hint at what we've actu
f(x) = (1.18)
--
E[X] = e��2...
standard return will be lognormally distributed.
(1.20)
Unlike the normal distribution, which ranges from nega
tive infinity to positive infinity, the lognormal distribution This result looks very similar to the Taylor expansion
is undefined, or zero, for negative values. Given an asset of the natural logarithm around one. Remember, if R
with a standard return, R, if we model (1 + ff) using the is a standard return and r the corresponding log
lognormal distribution, then R will have a minimum value return, then:
of -100%. This feature, which we associate with limited
liability, is common to most financial assets. Using the r .., R - _! R2 (3.21)
2
log normal distribution provides an easy way to ensure
Be careful: Because these equations are somewhat simi
lar, it is very easy to get the signs in front of u2 and R2
that we avoid returns less than -100%. The probability
density function for a lognormal distribution is shown in
Figure 3-7. backward.
The variance of the lognormal distribution is given by:
0.7 E[(X - E[X])2] = (e"' - 1)e211-+a' (3.22)
0.1
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Chapter 3 Distributions • 37
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
superficial; it turns out that as we add more and more In some cases a simple Monte Carlo simulation can be
variables, the distribution actually converges to a normal easier to understand, thereby reducing operational risk.
distribution. What's more, this is not just true if we start As an example of a situation where we might use a Monte
out with uniform distributions; it applies to any distribu Carlo simulation, pretend we are asked to evaluate the
tions with finite variance.1 This result is known as the cen mean and standard deviation of the profits from a fixed
tral limit theorem. strike arithmetic Asian option, where the value of the
More formally, if we have n i.i.d. random variables, x,,X1, option, v; at expiry is:
Xn, each with mean µ. and standard deviation and
[_!Tf x,o]
. • • , a,
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
(J.25) \
\
\
0.4
\
Because the mean of a standard uniform vari -k = 1
able is� and the variance is� this produces a -- k = 2
good approximation to a standard normal vari 0.3 -
k=4
able, with mean zero and standard deviation of
one. By utilizing a greater number of uniform
variables, we could increase the accuracy of 0.2
CHI-SQUARED DISTRIBUTION
0.0
0 2 4 6 8 10 12
If we have k independent standard normal
variables, z,. Z1, . . . , Zk' then the sum of their FIGURE 3-10 Chi-squared proba bility density functions.
squares, S, has a chi-squared distribution.
We write:
It
s = I,z�
,_, The chi-squared distribution is widely used in risk man
s - x� (3.21) agement, and in statistics in general, for hypothesis
The variable k is commonly referred to as the degrees of testing.
freedom. It follows that the sum of two independent chi
squared variables, with k1 and k2 degrees of freedom, will
follow a chi-squared distribution, with (k1 + k2) degrees of
freedom. STUDENT'S t DISTRIBUTION
Because the chi-squared variable is the sum of squared Another extremely popular distribution in statistics and
values, it can take on only nonnegative values and is asym in risk management is Student's t distribution. The distri
metrical. The mean of the distribution is k, and the variance bution was first described in English, in 1908, by William
is 2k. As k increases, the chi-squared distribution becomes Sealy Gosset, an employee at the Guinness brewery in
increasingly symmetrical. As k approaches infinity, the chi Dublin. In order to comply with his firm's policy on pub
squared distribution converges to the normal distribution. lishing in public journals, he submitted his work under the
Figure 3-10 shows the probability density functions for pseudonym Student. The distribution has been known as
some chi-squared distributions with different values for k. Student's t distribution ever since. In practice, it is often
For positive values of x, the probability density function referred to simply as the t distribution.
for the chi-squared distribution is: If z is a standard normal variable and U is a chi-square
� __.!'.
x2-1e 2
variable with k degrees of freedom, which is independent
of Z, then the random variableX,
f(x) =
1 (J.27)
2"f1T(k/2)
where r is the gamma function:
(J.29)
(3.28)
follows a t distribution with k degrees of freedom.
2017 Financial Risk Manager (FRM) Pstt I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright o 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
- k = 10
If u, and u2 are two independent chi-squared
distributions with k, and k2 degrees of free
0.2 dom, respectively, then X,
X- _
U,/k, _
F(k1,k) (1.31)
U2/k2
r( ;(1))(1 )
probability density function can be written:
k where B(x, y) is the beta function:
k
x2 ;•
f(x) = I
where k is the degrees of freedom and r is the gamma As with the chi-squared and Student's t distributions,
function. memorizing the probability density function is probably
not something most risk managers would be expected to
Very few risk managers will memorize this PDF equation, do; rather, it is important to understand the general shape
but it is important to understand the basic shape of the
distribution and how it changes with k. Figure 3-11
shows
the probability density function for three Student's t dis
and some properties of the distribution.
Figure 3-12 shows the probability density functions
tributions. Notice how changing the value of k changes for several F-distributions. Because the chi-squared PDF
the shape of the distribution, specifically the tails. is zero for negative values, the F-distributions density
function is also zero for negative values. The mean
The t distribution is symmetrical around its mean, which is and variance of the F-distribution are as follows:
equal to zero. For low values of k, the t distribution looks
very similar to a standard normal distribution, except that k
µ = __:_:.z_ fork2 > 2
it displays excess kurtosis. As k increases, this excess kur k, - 2
2
a' = 2k'(k, + k' - 2) fork > 4
tosis decreases. In fact. as k approaches infinity, the t dis
k,Ck, - 2)2(k2 - 4)
(3.34)
tribution converges to a standard normal distribution. 2
The variance of the t distribution for k > 2 is k/(k - 2). As k1 and k2 increase, the mean and mode converge to
You can see that as k increases, the variance of the t dis one. As k1 and k2 approach infinity, the F-distribution con
tribution converges to one, the variance of the standard verges to a normal distribution.
normal distribution.
There is also a nice relationship between Student's t dis
The t distribution's popularity derives mainly from its use tribution and the F-distribution. From the description
in hypothesis testing. The t distribution is also a popular of the t distribution, Equation (3.29), it is easy to see
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
f(x) =
{
described by the following two-part function:
(b
2(x - a) a s x s c
- a)(c - a) (I.JI)
2<b - x)
c s. x s b
(b- a)(b - c)
1.0
Figure 3-13 shows a triangular distribution where
a,b, and c are 0.0, 1.0, and 0.8, respectively.
It is easily verified that the PDF is zero at both
a and b, and that the value of f(x) reaches a
0.5 maximum, 2/(b - a), at c. Because the area of a
triangle is simply one half the base multiplied by
the height, it is also easy to confirm that the area
under the PDF is equal to one.
The mean, ,..., and variance, �. of a triangular dis
0.0 �--.te..
tribution are given by:
0.0 0.5 1.0 1.5 2.0 2.5 3.0
a+b+c
Iitf\11;l¥§Fl F-distribution probability density functions. µ= 3
a2 + b2 + c2 - ab - ac - bc
a2 = ------ (J.17)
that the square of a variable with a t distribution has an 18
F-distribution. More specifically, if Xis a random variable
with a t distribution with k degrees of freedom, then X1
has an F-distribution with 1 and k degrees of freedom:
2
X - F(l,k) (3.35)
TRIANGULAR DISTRIBUTION
Chapter 3 Distributions • 41
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The beta distribution is another distribution with a finite Imagine a stock whose log returns follow a normal distri
range. It is more complicated than the triangular distribu bution with low volatility 90% of the time, and a normal
tion mathematically, but it is also much more flexible. distribution with high volatility 10% of the time. Most of
As with the triangular distribution, the beta distribution the time the world is relatively dull, and the stock just
bounces along. Occasionally, though-maybe there is an
can be used to model default rates and recovery rates.
earnings announcement or some other news event-the
As we will see in Chapter 4, the beta distribution is also
stock's behavior is more extreme. We could write the
extremely useful in Bayesian analysis.
combined density function as:
The beta distribution is defined on the interval from zero
to one. The PDF is defined as follows, where a and b are (140)
two positive constants:
where wL = 0.90 is the probability of the return com-
1 - r1
f(x) = -- (1- x)b-l 0 :S x :S 1 (J.J8) ing from the low-volatility distribution, fL(x), and wH =
B(a,b) 0.10 is the probability of the return coming from the
where B(a,b) is the beta function as described earlier for high-volatility distribution fJ.x). We can think of this as
the F-distribution. The uniform distribution is a special a two-step process. First, we randomly choose the high
case of the beta distribution, where both a and b are or low distribution, with a 90% chance of picking the low
equal to one. Figure 3-14 shows four different parameter distribution. Second, we generate a random return from
izations of the beta distribution. the chosen normal distribution. The final distribution, f(x),
is a legitimate probability distribution in its own right, and
The mean, p. and variance, a2, of a beta distribution are
although it is equally valid to describe a random draw
given by: directly from this distribution, it is often helpful to think in
_______
a terms of this two-step process.
µ. = - -
a+b
Note that the two-step process is not the same as the
ab
0'2 = (J.39) process described in a previous section for adding two
(a + b)2(a + b + 1)
random variables together. An example of adding
I ' I
two random variables together is a portfolio of two
-- o.s.o.s :· I
I stocks. At each point in time, each stock generates
I -s,s : I
I / I a random return, and the portfolio return is the
3 I 2,6 : I sum of both returns. In the case we are describ-
I .: I
4,1
.
••••
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Chapter 3 Distributions • 43
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
CopyrightO 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
distribution must be greater than or equal to the variance Similarly for the second term:
of the two component distributions.
E[(X2 - µ.)2] = a2 + w2(µ., - µ.2)"
Answer: Substituting back into our original equation for variance:
Assume the two random variables, X1 and X , have vari
2 E[(X - µ.)2] = a2 + w(1 - w)(µ.1 - µ.2)2
ance a1. The means are µ.1 and JL:i., with corresponding
weights wand (1 - w). Because w and (1 - w) are always positive and (µ.1 - JL:i.)2
has a minimum of zero, w(1 - w)(µ.1 JL:i.)2 must be greater
-
+ (1 - w)2(j.Lz - p.,)2.]
= a2 +(1 - w)2<J.Li - µ.2)2
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
l1tat1ve Analysis. Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe Bayes' theorem and apply this theorem in • Apply Bayes' theorem to scenarios with more than
the calculation of conditional probabilities. two possible outcomes and calculate posterior
• Compare the Bayesian approach to the frequentist probabilities.
approach.
i Chapter 6 of Mathematics and Statistics for Financial Risk Management, Second Edition, by Michael B. Miller
Excerpt s
47
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Default 4% 6% 10%
OVERVIEW
90% 10% 100%
The foundation of Bayesian analysis is Bayes' theorem. Ii[Ciild=t!!II Proba bi Iity matrix.
Bayes' theorem is named after the eighteenth-century
English mathematician Thomas Bayes, who first described
the theorem. During his life, Bayes never actually pub of this information in a probability matrix as shown in
licized his eponymous theorem. Bayes' theorem might Figure 4-1.
have been confined to the dustheap of history had not a As required, the rows and columns of the matrix add up,
friend submitted it to the Royal Society two years after and the sum of all the probabilities is equal to 100%.
his death.
In the probability matrix, notice that the probability of
Bayes' theorem itself is incredibly simple. For two random both bonds defaulting is 6%. This is higher than the 1%
variables, A and B, Bayes' theorem states that: probability we would expect if the default events were
A]·P[A] independent (10% x 10% = 1%). The probability that
P[A I BJ = P[B IP[B] (4.1)
neither bond defaults, 86%, is also higher than what we
would expect if the defaults were independent (90% x
90% = 81%). Because bond issuers are often sensitive to
In the next section we'll derive Bayes' theorem and explain
how to interpret Equation (4.1). As we will see, the simplic
broad economic trends, bond defaults are often highly
ity of Bayes' theorem is deceptive. Bayes' theorem can be
correlated.
applied to a wide range of problems, and its application
can often be quite complex. We can also express features of the probability matrix in
terms of conditional probabilities. What is the probability
Bayesian analysis is used in a number of fields. It is most
that Bond A defaults, given that Bond B has defaulted?
often associated with computer science and artificial intel
Bond B defaults in 10% of the scenarios, but the prob
ligence, where it is used in everything from spam filters to
ability that both Bond A and Bond B default is only 6%. In
machine translation and to the software that controls self
other words, Bond A defaults in 60% of the scenarios in
driving cars. The use of Bayesian analysis in finance and
which Bond B defaults. We write this as follows:
risk management has grown in recent years, and will likely
60%
PCA 1 BJ = P[A n 81
continue to grow. 6%
= =
(4.2)
What follows makes heavy use of joint and conditional
P[BJ 10%
probabilities. If you have not already done so and you Notice that the conditional probability is different from
are not familiar with these topics, you can review them in the unconditional probability. The unconditional probabil
Chapter 1. ity of default is 10%.
P[A] =
10% � 60% =
P[A I 8] (4.3)
BAYES' THEOREM It turns out that Equation (4.2) is true in general. More
often the equation is written as follows:
Assume we have two bonds, Bond A and Bond B, each
with a 10% probability of defaulting over the next year.
PCA nBJ = P[A I BJ · P[BJ (4.4)
Further assume that the probability that both bonds In other words, the probability of both A and B occurring
default is 6%, and that the probability that neither bond is just the probability that A occurs, given B, multiplied by
defaults is 86%. It follows that the probability that only the probability of B occurring. What's more, the ordering
Bond A or Bond B defaults is 4%. We can summarize all of A and B doesn't matter. We could just as easily write:
【
梦
48 • 2017 Flnanclal Risk Manager Exam Part I: Quantitative Analysls
轩
考2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
资Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
w
w
w
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
P[have disease I +] =
P[+ I have disease] · P[have disease]
= =
Example 4.1 P[+]
99% · l%
Question: P[have dsease I +] 50%
2%·99%
Imagine there is a disease that afflicts just 1 in every 100
people in the population. A new test has been developed The reason the answer is 50% and not 99% is because
the disease is so rare. Most people don't have the disease,
to detect the disease that is 99% accurate. That is, for
people with the disease, the test correctly indicates that so even a small number of false positives overwhelms
the number of actual positives. It is easy to see this in a
they have the disease in 99% of cases. Similarly, tor those
matrix. Assume 10,000 trials:
who do not have the disease, the test correctly indicates
that they do not have the disease in 99% of cases.
Actual
If a person takes the test and the result of the test is posi
tive, what is the probability that he or she actually has the + -
disease?
+ 99 99 198
Answer: Test
-
1 9,801 9,802
While not exactly financial risk. this is a classic example
of how conditional probability can be far from intuitive. 100 9,900 10,000
This type of problem is also far from being an academic
curiosity. A number of studies have asked doctors simi- If you check the numbers, you'll see that they work out
lar questions; see, for example, Gigerenzer and Edwards exactly as described: 1% of the population with the dis
(2003). The results are often discouraging. The physicians' ease, and 99% accuracy in each column. In the end,
answers vary widely and are often far from correct. though, the number of positive test results is identical
for the two populations, 99 in each. This is why the prob
If the test is 99% accurate, it is tempting to guess that ability of actually having the disease given a positive test
there is a 99% chance that the person who tests positive is 50%.
actually has the disease. 99% is in fact a very bad guess.
The correct answer is that there is only a 50% chance that In order for a test for a rare disease to be meaningful, it
the person who tests positive actually has the disease. has to be extremely accurate. In the case just described,
99% accuracy was not nearly accurate enough.
To calculate the correct answer, we first need to calculate
the unconditional probability of a positive test. Remember
from Chapter 1 that this is simply the probability of a posi Bayes' theorem is often described as a procedure for
tive test being produced by somebody with the disease updating beliefs about the world when presented with
plus the probability of a positive test being produced by new information. For example, pretend you had a coin
somebody without the disease. Using a "+" to represent a that you believed was fair, with a 50% chance of landing
=
positive test result, this can be calculated as: heads or tails when flipped. If you flip the coin 10 times
and it lands heads each time, you might start to suspect
P[ +] P[+ n have disease] + P[ + n have disease] that the coin is not fair. Ten heads in a row could happen,
P[ +l = P[+ I have disease] · P[have disease] but the odds of seeing 10 heads in a row is only 1:1,024 for
a fair coin, �) 0 = J.i.024, How do you update your beliefs
+ P[+ I have disease] · P[have disease]
1
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
after seeing 10 heads? If you believed there was a 90% probability of any manager being a star, 16%, the uncondi
probability that the coin was fair before you started flip tional probability:
ping, then after seeing 10 heads your belief that the coin is
fair should probably be somewhere between 0% and 90%. = =-25
P[S] 16% 4
You believe it is less likely that the coin is fair after seeing
10 heads (so less than 90%), but there is still some prob To answer the second part of the question, we need to
ability that the coin is fair (so greater than 0%). As the fol find P[S I 38], the probability that the manager is a star,
lowing example will make clear; Bayes' theorem provides given that the manager has beaten the market three
a framework for deciding exactly what our new beliefs years in a row. We can find this probability using Bayes'
should be. theorem:
I S]P[S]
P[S I 38] = P[3BP[3B]
Example 4.2
We already know P[S]. Because outperformance is inde
Question: pendent from one year to the next, the other part of the
You are an analyst at Astra Fund of Funds. Based on an numerator, Pf..JB I S], is just the probability that a star
examination of historical data, you determine that all fund beats the market in any given year to the third power:
managers fall into one of two groups. Stars are the best
managers. The probability that a star will beat the market P[3B I S] = (�J ��
in any given year is 75%. Ordinary, nonstar managers, by
contrast, are just as likely to beat the market as they are The denominator is the unconditional probability of beat
to underperform it. For both types of managers, the prob ing the market for three years. This is just the weighted
ability of beating the market is independent from one year average probability of three market-beating years over
to the next. both types of managers:
Pf.38] = P[3B I S]P[S] + P[3B I S:JP[S:J
Stars are rare. Of a given pool of managers, only 16% turn
out to be stars. A new manager was added to your port
folio three years ago. Since then, the new manager has
p -(�)3 (_i)3
[3Bl - 4 �5 +
2
� (21)M + 11�
- -
P[B I Sl = 75% = 1 Even though it is much more likely that a star will beat the
market three years in a row, we are still far from certain
The probability that a nonstar manager will beat the mar that this manager is a star. In fact, at 39% the odds are
ket is 50%: more likely that the manager is not a star. As was the case
= =�
P[B I S] 50%
in the medical test example, the reason has to do with
the overwhelming number of false positives. There are so
many nonstar managers that some of them are bound to
At the time the new manager was added to the portfolio, beat the market three years in a row. The real stars are
the probability that the manager was a star was just the simply outnumbered by these lucky nonstar managers.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Next, we answer the final part of the question. The prob sample problem, though, we were presented with a port
ability that the manager beats the market next year is folio manager who beat the market three years in a row.
just the probability that a star would beat the market plus Shouldn't we have concluded that the probability that the
the probability that a nonstar would beat the market,
weighted by our new beliefs. Our updated belief about
the manager being a star is 39% = %3, so the probability
=
portfolio manager would beat the market the following
year was 100% (� 100%), and not 60%? How can both
=
P[B] P[B I SJ . P[S] + P[B I SJ . P[S]
The last approach, taking three out of three positive
=
results and concluding that the probability of a posi-
+ _J •
14 tive result next year is 100%, is known as the frequentist
= 60%
P[BJ �4 . _!_23 2 23 approach. The conclusion is based only on the observed
P[B] frequency of positive results. Prior to this chapter we had
The probability that the manager will beat the market been using the frequentist approach to calculate prob
next year falls somewhere between the probability for a abilities and other parameters.
nonstar, 50%, and for a star, 75%, but is closer to the prob The Bayesian approach, which we have been explor-
ability for a nonstar. This is consistent with our updated ing in this chapter, also counts the number of positive
belief that there is only a 39% probability that the man results. The conclusion is different because the Bayesian
ager is a star. approach starts with a prior belief about the probability.
Which approach is better? It's hard to say. Within the sta
When using Bayes' theorem to update beliefs, we often tistics community there are those who believe that the
refer to prior and posterior beliefs and probabilities. In the frequentist approach is always correct. On the other end
preceding sample problem, the prior probability was 16%. of the spectrum, there are those who believe the Bayesian
That is, before seeing the manager beat the market three approach is always superior.
times, our belief that the manager was a star was 16%. The
posterior probability for the sample problem was 39%. Proponents of Bayesian analysis often point to the absur
That is, after seeing the manager beat the market three dity of the frequentist approach when applied to small
times, our belief that the manager was a star was 39%. data sets. Observing three out of three positive results
and concluding that the probability of a positive result
We often use the terms evidence and likelihood when refer next year is 100% suggests that we are absolutely certain
ring to the conditional probability on the right-hand side and that there is absolutely no possibility of a negative
of Bayes' theorem. In the sample problem, the probability
=
result. Clearly this certainty is unjustified.
of beating the market, assuming that the manager was a
star; P[38 I S] 27/s.a, was the likelihood. In other words, the Proponents of the frequentist approach often point to the
likelihood of the manager beating the market three times, arbitrariness of Bayesian priors. In the portfolio manager
assuming that the manager was a star; was 27/s.a. example, we started our analysis with the assumption that
16% of managers were stars. In a previous example we
likelihood
\
assumed that there was a 90% probability that a coin was
fair. How did we arrive at these priors? In most cases the
I 38] = P[3B 1 S]P[SJ � prior (4.7)
prior is either subjective or based on frequentist analysis.
posterior - P[S P[3BJ Perhaps unsurprisingly, most practitioners tend to take
a more balanced view, realizing that there are situations
BAYES VERSUS FREQUENTISTS that lend themselves to frequentist analysis and others
that lend themselves to Bayesian analysis. Situations in
Pretend that as an analyst you are given daily profit data which there is very little data, or in which the signal-to
for a fund, and that the fund has had positive returns for noise ratio is extremely low, often lend themselves to
560 of the past 1,000 trading days. What is the probabil Bayesian analysis. When we have lots of data, the conclu
ity that the fund will generate a positive return tomorrow? sions of frequentist analysis and Bayesian analysis are
=
Without any further instructions, it is tempting to say
that the probability is 56%, (569f,ooa 56%). In the previous
often very similar, and the frequentist results are often
easier to calculate.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
025]P[p = 025]
Pf.p = 025 I 2BJ = P I p = P[2B]
Bayesian analysis.
[28
MANY-STATE PROBLEMS 1 2
- 16.J.Q 2 - 1 -
4.557t>
�
-
(4.11)
In the two previous sample problems, each variable could 44 44 22
exist in only one of two states: a person either had the 160
disease or did not have the disease; a manager was either Similarly, we can show that the posterior probability that
a star or a nonstar. We can easily extend Bayesian analysis the manager is an in-line performer is 54.55%:
to any number of possible outcomes. For example, sup
pose rather than stars and nonstars, we believe there are P[p = 050 I 28J = Pf.28 I p = O.SO]P[p = o.soJ
three types of managers: underperformers, in-line per Pf.28]
formers, and outperformers. The underperformers beat 4 6
= J.6JQ =
24
the market only 25% of the time, the in-line performers = � = 54.55% (4.12)
44 44 22
beat the market 50% of the time, and the outperformers
160
beat the market 75% of the time. Initially we believe that
a given manager is most likely to be an inline performer; and that the posterior probability that the manager is an
and is less likely to be an underperformer or an outper outperformer is 40.91 %:
former. More specifically, our prior belief is that a manager 0.75]P[p = 0.75]
has a 60% probability of being an in-line performer, a 20% P[p = 0.?5 I 2Bl = P[2B I p = P[28]
chance of being an underperformer, and a 20% chance of 9 2
= 16.J.Q_ = � = __!!_ = 40.91%
being an outperformer. We can summarize this as:
(4.13)
P[p = 0.25] = 20% 44
160
44 22
(�r
the probabilities is still equal to 100% (the percentages
PC28 I P = o.251 = 16
seem to add to 100.01%, but that is only a rounding error):
J__ + � + J!_
Gr � �
22
P[28 I p = 0.50] = = =
1 22 22 22
=
22
=1 (4.14)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In each case, the denominator on the right-hand side is Using our shortcut, we first calculate the posterior prob
the same, P[281 or ·�so. We can then rewrite this equation abilities in terms of an arbitrary constant, c. If the manager
in terms of a constant, c: is an underperformer:
P{p = x I 28] = c · PC2B I p = x]P[p = x] (4.18) P[p = 0.25 I 68] = c . P[68 I p = 0.25] . P[p = 0.25]
We also know that the sum of all the posterior probabili
ties must equal one:
P[p = 0.25 I 68] = c . c�)(�r (1r �
1
3
I,c =
1-1 · P[28 I p x1]P[p = x1] 6 10 4
()
4
P[p = 0.25 I 68] = c 10 2 ··3 10
l
= cI, PC2B I p = x)P[p = x) = 1 (4.17) Similarly, if the manager is an in-line performer or outper
i=l
( ) 105•·241010
former, we have:
In our current example we have:
= 0.50 I 68] = c 10
( ) Pfp
( )10·410
c _l ± + �__! + �± = c 2 + 24 + 18 = c 44 = 1 6
16 10 1610 16 10 160 160 2' 36
160 (4.18)
P[p = 0.75 I 68] = c 160
c
=
44
Because all of the posterior probabilities sum to one,
We then use this to calculate each of the posterior prob we have:
PCP = 0.25 I 68] + P[p = o.so I 68] + PCP = o.75 I 68] = 1
abilities. For example, the posterior probability that the
()
manager is an underperformer is:
P[p = 0.25 I 28] = c . P[2B I p = 0.25]P[p = 0.25] c 10 � <3l + 210 + 3s) = 1
6 10·410
160 1 2 2
44 1610 44 22 (4.19)
c 10()
�
6 10·410 1.294 = 1
In the current example this might not seem like much
, 10·410 - 1
(�)
of a shortcut, but with continuous distributions, this c = --
approach can make seemingly intractable problems
----
2·3 1,294
very easy to solve.
This may look unwieldy, but, as we will see, many of the
terms will cancel out before we arrive at the final answers.
Example 4.3
Substituting back into the equations for the posterior
Question:
()
probabilities, we have:
Using the same prior distributions as in the preceding 10 2 · 34 33 27
P[p = 0.25 I 68] =c = = = 2.09%
example, what would the posterior probabilities be for an
-- -- --
()
6 10·410 1,294 1.294
underperformer, an in-line performer, or an outperformer
if instead of beating the market two years in a row, the 10 6 • 210 210 1,024
P[p = 0.50 I 68] = c 1 = -- = = 79.13%
6 10 . 4 0
-- --
1,294 1,294
()
manager beat the market in 6 of the next 10 years?
Answer: 10 2 • 36 35 243
P[p = 0.75 I 68] = c = -- = -- = 18.78%
6 10 . 410
--
1,294 1.294
For each possible type of manager, the likelihood of beat
ing the market 6 times out of 10 can be determined using In this case, the probability that the manager is an in-line
performer has increased from 60% to 79.13%. The prob
( �)
a binomial distribution (see Chapter 3):
ability that the manager is an outperformer decreased
P[6B I p] = 1 p6(1 - p)4 slightly from 20% to 18.78%. It now seems very unlikely
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
that the manager is an underperformer (2.09% probability This example involved three possible states. The basic
compared to our prior belief of 20%). approach for solving a problem with four; five, or any finite
number of states is exactly the same, only the number of
While the calculations looked rather complicated, using
calculations increases. Because the calculations are highly
our shortcut saved us from actually having to calculate
repetitive, it is often much easier to solve these problems
many of the more complicated terms. For more complex
problems, and especially for problems involving continu using a spreadsheet or computer program.
ous distributions, this shortcut can be extremely useful.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
l1tat1ve Analysis. Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Calculate and interpret the sample mean and sample • Differentiate between a one-tailed and a two-tailed
variance. test and identify when to use each test.
• Construct and interpret a confidence interval. • Interpret the results of hypothesis tests with a
• Construct an appropriate null and alternative specific level of confidence.
hypothesis, and calculate an appropriate test • Demonstrate the process of backtesting VaR by
statistic. calculating the number of exceedances.
i Chapter 7 of Mathematics and Statistics for Financial Risk Management, Second Edition, by Michael B. Miller.
Excerpt s
57
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In this chapter we explore two closely related top- It follows that the standard deviation of the sample mean
ics, confidence intervals and hypothesis testing. At the decreases with the square root of n. This square root is
end of the chapter, we explore applications, including important. In order to reduce the standard deviation of
value-at-risk (VaR). the mean by a factor of 2, we need 4 times as many data
points. To reduce it by a factor of 10, we need 100 times
as much data. This is yet another example of the famous
SAMPLE MEAN REVISITED
square root rule for independent and identically distrib
uted (i.i.d.) variables.
Imagine taking the output from a standard random num
ber generator on a computer, and multiply it by 100. The In our current example, because the DGP follows a uni
resulting data-generating process (DGP) is a uniform form distribution, we can easily calculate the variance
random variable, which ranges between 0 and 100, with of each data point using Equation (3.4). The variance of
a mean of 50. If we generate 20 draws from this DGP and each data point is 833.33, (100 - 0)2/12 = 833.33. This is
calculate the sample mean of those 20 draws, it is unlikely equivalent to a standard deviation of 28.87. For 20 data
that the sample mean will be exactly 50. The sample mean points, the standard deviation of the mean will then be
might round to 50, say 50.03906724, but exactly 50 is 28.87/./20 = 6.45, and for 1,000 data points, the standard
next to impossible. In fact, given that we have only 20 deviation will be 28.87/J1,ooo = 0.91.
data points, the sample mean might not even be close to We have the mean and the standard deviation of our sam
the true mean. ple mean, but what about the shape of the distribution?
The sample mean is actually a random variable itself. If we You might think that the shape of the distribution would
continue to repeat the experiment-generating 20 data depend on the shape of the underlying distribution of the
points and calculating the sample mean each time-the DGP. If we recast our formula for the sample mean slightly,
calculated sample mean will be different every time. As though:
we proved in Chapter 2, even though we never get exactly
50, the expected value of each sample mean is in fact (5.J)
50. It might sound strange to say it, but the mean of our
sample mean is the true mean of the distribution. Using and regard each of the (1/n)x,'s as a random variable in
our standard notation: its own right, we see that our sample mean is equivalent
E[ji.] = µ. (S.1) to the sum of n i.i.d. random variables, each with a mean
of µ.In and a standard deviation of u/n. Using the central
If instead of 20 data points. what if we generate 1,000
limit theorem, we claim that the distribution of the sample
data points? With 1,000 data points, the expected value mean converges to a normal distribution. For large values
of our sample mean is still 50, just as it was with 20 data
of n, the distribution of the sample mean will be extremely
points. While we still don't expect our sample mean to be
close to a normal distribution. Practitioners will often
exactly 50, our sample mean will tend to be closer when assume that the sample mean is normally distributed.
we are using 1,000 data points. The reason is simple: A
single outlier won't have nearly the impact in a pool of
1,000 data points that it will in a pool of 20. If we continue Example 5.1
to generate sets of 1,000 data points, it stands to reason
that the standard deviation of our sample mean will be Question:
lower with 1,000 data points than it would be if our sets You are given 10 years of monthly returns for a portfolio
contained only 20 data points. manager. The mean monthly return is 2.3%, and the stan
It turns out that the variance of our sample mean doesn't dard deviation of the return series is 3.6%. What is the
just decrease with the sample size; it decreases in a pre standard deviation of the mean?
dictable way, in proportion to the sample size. If our sam The portfolio manager is being compared against a
ple size is n and the true variance of our DGP is a2, then benchmark with a mean monthly return of 1.5%. What is
the variance of the sample mean is: the probability that the portfolio manager's mean return
02
a2 =
exceeds the benchmark? Assume the sample mean is nor
�
(S.2)
n mally distributed.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0'2
(n- 1)�
•2
- 2
xn-1 (5.5)
approximated by a t distribution.
By looking up the appropriate values for the t distribution,
where u2 is the population variance. Note that this is true we can establish the probability that our t-statistic is con
tained within a certain range:
P[x ]
only when the DGP has a normal distribution. Unfortu
nately, unlike the case of the sample mean, we cannot
apply the central limit theorem here. Even when the sam � Jt - µ. � x = 'Y (S.7)
L a/Fn II
ple size is large, if the underlying distribution is nonnor
mal. the statistic in Equation (5.5) can vary significantly where xL and xu are constants. which, respectively, define
from a chi-squared distribution. the lower and upper bounds of the range within the t dis
tribution, and 'Y is the probability that our t-statistic will
be found within that range. Typically 'V is referred to as
CONFIDENCE INTERVALS the confidence level. Rather than working directly with
the confidence level, we often work with the quantity
In our discussion of the sample mean, we assumed that 1 - 'Y· which is known as the significance level and is often
the standard deviation of the underlying distribution was denoted by a. The smaller the confidence level is, the
known. In practice, the true standard deviation is likely higher the significance level.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In practice, the population mean, µ., is often unknown. By We can then look up the corresponding probability from
rearranging the previous equation we come to an equa the t distribution.
[- - x"a]
tion with a more interesting form:
In addition to the null hypothesis, we can offer an alterna
x.�a
P µ --- s µ s µ. + =y
tive hypothesis. In the previous example, where our null
- (S.8)
Fn Fn hypothesis is that the expected return is greater than 10%,
the logical alternative would be that the expected return
Looked at this way, we are now giving the probability that
is less than or equal to 10%:
the population mean will be contained within the defined
range. When it is formulated this way, we call this range H1 : µ., s 10% (S.11)
the confidence interval for the population mean. Confi In principle, we could test any number of hypotheses. In
dence intervals are not limited to the population mean. practice, as long as the alternative is trivial, we tend to
Though it may not be as simple, in theory we can define a limit ourselves to stating the null hypothesis.
confidence level for any distribution parameter.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2
Figure 5-1 shows common critical values for t-tests of
Notice that annualizing the standard deviation has no varying degrees of freedom and for a normal distribution.
impact on the test statistic. The same factor would appear Notice that all distributions are symmetrical. For small
in the numerator and the denominator, leaving the ratio sample sizes, extreme values are more likely, but as the
unchanged. For a chi-squared distribution with 255 sample size increases, the t distribution converges to the
degrees of freedom, 290.13 corresponds to a probability normal distribution. For 5% significance with 100 degrees
of 6.44%. We fail to reject the null hypothesis at the 95% of freedom, the difference between our rule of thumb
confidence level. based on the normal distribution, 1.64 standard deviations,
is very close to the actual value of 1.66.
N
tailed tests.
tlO '100 tl,000
A two-tailed null hypothesis could take the form:
1.0% -2.76 -2.36 -2.33 -2.33
H0 : µ. = 0
2.5% -2.23 -1.98 -1.96 -1.96
H, : 11. + 0 (5.12)
5.0% -1.81 -1.66 -1.65 -1.64
In this case, H1 implies that extreme positive or negative
values would cause us to reject the null hypothesis. If we 10.0% -1.37 -1.29 -1.28 -1.28
are concerned with both sides of the distribution (both 90.0% 1.37 1.29 1.28 1.28
tails), we should choose a two-tailed test.
95.0% 1.81 1.66 1.65 1.64
A one-tailed test could be of the form:
H0 : µ. > c
97.5% 2.23 1.98 1.96 1.96
H1 : µ. s c (5.1J)
99.0% 2.76 2.36 2.33 2.33
In this case, we will reject H0 only if the estimate ofµ. is ii[Cii);ljJI Common critical values for Student's
significantly less than c. If we are only concerned with t distribution.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
P[L � VaR)
equal to 25%. The exact probability for a standard normal
variable is closer to 5%, which is indeed less than 25%.
= 1 - -y (5.15)
Chebyshev's inequality makes clear how assuming normal If a risk manager says that the one-day 95% VaR of a
portfolio is $400, this means that there is a 5% probability
ity can be very anti-conservative. If a variable is normally
that the portfolio will Jose $400 or more on any given day
(that L will be more than $400).
distributed, the probability of a three standard deviation
event is very small, 0.27%. If we assume normality, we
will assume that three standard deviation events are very We can also define VaR directly in terms of returns. If we
rare. For other distributions. though, Chebyshev's inequal multiply both sides of the inequality in Equation (5.15) by
ity tells us that the probability could be as high as Ml, or -1, and replace -L with R, we come up with Equation (5.16):
P[R s -VaR J = 1 -
approximately 11 %. Eleven percent is hardly a rare occur
-y (5.11)
rence. Assuming normality when a random variable is in ,
fact not normal can lead to a severe underestimation of Equations (5.15) and (5.16) are equivalent. A loss of $400 or
risk. Risk managers take note! more and a return of -$400 or less are exactly the same.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
1
p=-+ -
1 ?C - 10 :S 1C :S 0
10 100
1
P=-- -
1 11: 0 < 11: :S 10
10 100
What is the one-day 95% VaR for Triangle Asset
Management?
Answer:
To find the 95% VaR, we need to find a,
such that:
•
J pdrr = 0.05
-10
( 1 -1 ) [ 1 --1 21•
of the function:
a -+ 'II' rhr
I
-10 10 100
= -'IT +
10 200
'IT
-10
10 200
+ 0.50 =
0.05
-- - - - - - - - - - - - - -- - - -- - - -
is more popular. It has the advantage that for
reasonable confidence levels for most portfolios, 0.10 -
VaR will almost always be a positive number. The
alternative convention is attractive because the
VaR and returns will have the same sign. Under the
alternative convention, if your VaR is -$400, then
a return of -$400 is just at the VaR threshold. In
practice, rather than just saying that your VaR is
$400, it is often best to resolve any ambiguity by
stating that your VaR is a loss of $400 or that your
VaR is a return of -$400.
Example S.J
Question:
0.00
The probability density function (PDF) for
-12 -10 --8 -6 -4 -2 0 2 4 6 8 10 12
daily profits at Triangle Asset Management can
1r
be described by the following function (see
Figure 5-3): 14(§111;14§1 Tria ngular probability density function.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The probability of a VaR exceedance should be condition The actual number of exceedances in each bucket should
ally independent of all available information at the time follow a binomial distribution with n = 400, and p = 5%.
the forecast is made. In other words, if we are calculat
ing the 95% VaR for a portfolio, then the probability of
an exceedance should always be 5%. The probability Subadditivity
shouldn't be different because today is Tuesday, because There is a reason VaR has become so popular in risk man
it was sunny yesterday, or because your firm has been agement. The appeal of VaR is its simplicity. Because VaR
having a good month. Importantly, the probability should can be calculated for any portfolio, it allows us to easily
not vary because there was an exceedance the previous compare the risk of different portfolios. Because it boils
day, or because risk levels are elevated. risk down to a single number; VaR provides us with a
A common problem with VaR models in practice is that convenient way to track the risk of a portfolio over time.
exceedances often end up being serially correlated. When Finally, the concept of VaR is intuitive, even to those not
exceedances are serially correlated, you are more likely to versed in statistics.
see another exceedance in the period immediately after Because it is so popular, VaR has come under a lot of
an exceedance than expected. To test for serial correla criticism. The criticism generally falls into one of three
tion in exceedances, we can look at the periods imme categories.
diately following any exceedance events. The number of
At a very high level, financial institutions have been criti
exceedances in these periods should also follow a bino
cized for being overly reliant on VaR. This is not so much
mial distribution. For example, pretend we are calculating
a criticism of VaR as it is a criticism of financial institutions
the one-day 95% VaR for a portfolio, and we observed
for trying to make risk too simple.
40 exceedances over the past 800 days. To test for serial
correlation in the exceedances, we look at the 40 days At the other end of the spectrum, many experts have
immediately following the exceedance events, and count criticized how VaR is measured in practice. This is not
how many of those were also exceedances. In other so much a criticism of VaR as it is a criticism of specific
words, we count the number of back-to-back exceed implementations of VaR. For example, in the early days of
ances. Because we are calculating VaR at the 95% con finance it was popular to make what is known as a delta
fidence level, of the 40 day-after days, we would expect normal assumption. That is, when measuring VaR, you
that 2 of them, 5% x 40 = 2, would also be exceedances. would assume that all asset returns were normally distrib
The actual number of these day-after exceedances should uted, and that all options could be approximated by their
follow a binomial distribution with n = 40 and p = 5%. delta exposures. Further, the relationship between assets
was based entirely on a covariance matrix (no coskew
Another common problem with VaR models in practice
ness or cokurtosis). These assumptions made calculating
is that exceedances tend to be correlated with the level
VaR very easy, even for large portfolios, but the results
of risk. It may seem counterintuitive, but we should be no
were often disappointing. As computing power became
more or less likely to see VaR exceedances in years when
market volatility is high compared to when it is low. Posi cheaper and more widespread, this approach quickly fell
tive correlation between exceedances and risk levels can out of favor. Today VaR models can be extremely complex,
but many people outside of risk management still remem
happen when a model does not react quickly enough to
ber when delta-normal was the standard approach, and
changes in risk levels. Negative correlation can happen
mistakenly believe that this is a fundamental shortcoming
when model windows are too short. To test for correlation
of VaR.
between exceedances and the level of risk, we can divide
our exceedances into two or more buckets, based on the In between, there are more sophisticated criticisms. One
level of risk. As an example, pretend we have been calcu such criticism is that VaR is not a subadditive risk mea
lating the one-day 95% VaR for a portfolio over the past sure. It is generally accepted that a logical risk measure
800 days. We divide the sample period in two, placing the should have certain properties; see, for example, Artzner,
400 days with the highest forecasted VaR in one bucket Delbaen, Eber, and Heath (1999). One such property is
and the 400 days with the lowest forecasted VaR in the known as subadditivity. Subadditivity is basically a fancy
other. After sorting the days, we would expect each 400- way of saying that diversification is good, and a good risk
day bucket to contain 20 exceedances: 5% x 400 = 20. measure should reflect that.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Assume that our risk measure is a function fthat takes as bonds are uncorrelated. It seems to suggest that holding
its input a random variable representing an asset or port $200 of either bond would be less risky than holding a
folio of assets. Higher values of the risk measure are asso portfolio with $100 of each. Clearly this is not correct.
ciated with greater risk. If we have two risky portfolios, X
and Y, then f is said to be subadditive if: This example makes clear that when assets have payout
functions that are discontinuous near the VaR critical level,
ft)( + Y) � f()() + f(Y) (5.18)
we are likely to have problems with subadditivity. By the
In other words, the risk of the combined portfolio, X + Y, same token, if the payout functions of the assets in a port
is less than or equal to the sum of the risks of the separate folio are continuous, then VaR will be subadditive. In many
portfolios. Variance and standard deviation are subaddi settings this is not an onerous assumption. In between, we
tive risk measures. have large, diverse portfolios, which contain some assets
While there is a lot to recommend VaR, unfortunately it with discontinuous payout functions. For these portfolios
does not always satisfy the requirement of subadditiv subadditivity will likely be only a minor issue.
ity. The following example demonstrates a violation of
subadditivity.
Expected Shortfall
Example 5.5
Another criticism of VaR is that it does not tell us anything
Question: about the tail of the distribution. Two portfolios could
Imagine a portfolio with two bonds, each with a 4% have the exact same 95% VaR but very different distribu
probability of defaulting. Assume that default events are tions beyond the 95% confidence level.
uncorrelated and that the recovery rate of both bonds More than VaR, then, what we really want to know is how
is 0%. If a bond defaults, it is worth $0; if it does not, it big the loss will be when we have an exceedance event.
is worth $100. What is the 95% VaR of each bond sepa Using the concept of conditional probability, we can
rately? What is the 95% VaR of the bond portfolio? define the expected value of a loss, given an exceedance,
Answer: as follows:
For each bond separately, the 95% VaR is $0. For an indi E[L I L <:!: VaR) = S (5.19)
vidual bond, in (over) 95% of scenarios, there is no loss.
We refer to this conditional expected loss, S, as the
In the combined portfolio, however, there are three possi expected shortfall.
bilities, with the following probabilities:
If the profit function has a probability density function
given by f(,x), and VaR is the VaR at the 'Y confidence level,
x
we can find the expected shortfall as:
0.16% -$200 VoR
S = --
1
J xf(x)dx
1 - 'Y --
(S.20)
7.68% -$100
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
VaR. As our discussion on backtesting suggests, though, The PDF is also shown in Figure 5-4. We calculated Tri
because it is concerned with the tail of the distribution, angle's one-day 95% VaR as a loss of (10 -
J10 ) = 6.84.
the reliability of our expected shortfall measure may be For the same confidence level and time horizon, what is
difficult to gauge. the expected shortfall?
Answer:
ExampleS.6
Question:
Because the VaR occurs in the region where 'II' < we 0,
need to utilize only the first half of the function. Using
In a previous example, the probability density function Equation (5.20), we have:
20Vf1R'"(-+-11
Voll.
of Triangle Asset Management's daily profits could be
described by the following function: s = 01 5 f101 lOQ1 )
.0 -IO
'rtpdrr = dtr
10
s = ((-10 +JiOr + 1�(-10+JiOr)-(<-10)2 + ,�(-10)3)
0. 10 - - - - - - - - - - - - - - - - - - -- -
- - -
s = -10+�.fiO
3 = -7139
Thus, the expected shortfall is a loss of 7.89. Intui
tively this should make sense. The expected short
fall must be greater than the VaR, 6.84, but less
than the maximum loss of 10.
Because extreme
events are less likely (the height of the PDF
decreases away from the center), it also makes
sense that the expected shortfall is closer to the
VaR than it is to the maximum loss.
0.00
t-6
r
-12 -10 -8 -4 -2 0 2 4 6 8 10 12
VaR
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Explain how regression analysis in econometrics • Describe the method and three key assumptions of
measures the relationship between dependent and OLS for estimation of parameters.
independent variables. • Summarize the benefits of using OLS estimators.
• Interpret a population regression function, regression • Describe the properties of OLS estimators and their
coefficients, parameters, slope, intercept, and the sampling distributions, and explain the properties of
error term. consistent estimators in general.
• Interpret a sample regression function, regression • Interpret the explained sum of squares, the total sum
coefficients, parameters, slope, intercept, and the of squares, the residual sum of squares, the standard
error term. error of the regression, and the regression R2•
• Describe the key properties of a linear regression. • Interpret the results of an OLS regression.
• Define an ordinary least squares (OLS) regression
and calculate the intercept and slope of the
regression.
i Chapter 4 of Introduction to Econometrics, Brief Edition, by James H. Stock and Mark W. Watson.
Excerpt s
69
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
A state implements tough new penalties on drunk drivers: administrators can depend in part on how well their stu
What is the effect on highway fatalities? A school district dents do on these tests. We therefore sharpen the super
cuts the size of its elementary school classes: What is the intendent's question: If she reduces the average class size
effect on its students' standardized test scores? You suc by two students, what will the effect be on standardized
cessfully complete one more year of college classes: What test scores in her district?
is the effect on your future earnings? A precise answer to this question requires a quantitative
All three of these questions are about the unknown effect statement about changes. If the superintendent changes
of changing one variable, X (X being penalties for drunk the class size by a certain amount, what would she expect
driving, class size, or years of schooling), on another vari the change in standardized test scores to be? We can
able, Y (Y being highway deaths, student test scores, write this as a mathematical relationship using the Greek
or earnings). letter beta, pClu5Slzs, where the subscript HClassSize" distin
guishes the effect of changing the class size from other
This chapter introduces the linear regression model relat
effects. Thus,
ing one variable, X, to another, Y. This model postulates a
linear relationship between X and Y; the slope of the line change in TestScore 11TestScore
fic.i-rs;ZI! = change in ClassSize = AClassSize (6.1)
relating X and Yis the effect of a one-unit change in X
on Y. Just as the mean of Y is an unknown characteris where the Greek letter A (delta) stands for "change in."
tic of the population distribution of Y, the slope of the That is, Pcia.as,,,. is the change in the test score that results
line relating X and Y is an unknown characteristic of the from changing the class size, divided by the change in the
population joint distribution of X and Y. The econometric class size.
problem is to estimate this slope-that is. to estimate the
If you were lucky enough to know pC!aaSia, you would be
effect on Yof a unit change in X-using a sample of data
able to tell the superintendent that decreasing class size
by one student would change districtwide test scores by
on these two variables.
This chapter describes methods for estimating this slope fic11ssS1m' You could also answer the superintendent's actual
using a random sample of data on X and Y. For instance, question, which concerned changing class size by two stu
using data on class sizes and test scores from different dents per class. To do so, rearrange Equation (6.1) so that
ilTestScore = Pci.s.s"'" x ilC/assSize
school districts, we show how to estimate the expected
effect on test scores of reducing class sizes by, say, one
(8.2)
student per class. The slope and the intercept of the line Suppose that Pc11ssS1m = -0.6. Then a reduction in class size
relating X and Y can be estimated by a method called of two students per class would yield a predicted change
ordinary least squares (OLS). in test scores of (-0.6) x (-2) = 1.2; that is, you would
predict that test scores would rise by 1.2 points as a result
of the reduction in class sizes by two students per class.
THE LINEAR REGRESSION MODEL Equation (6.1) is the definition of the slope of a straight
line relating test scores and class size. This straight line
The superintendent of an elementary school district must
can be written
decide whether to hire additional teachers and she wants
your advice. If she hires the teachers, she will reduce the TestScore = p0 + Pcia.as1m X C/assSize (8.3)
number of students per teacher (the student-teacher where Po is the intercept of this straight line, and, as
ratio) by two. She faces a trade-off. Parents want smaller before, ficteuS;a is the slope. According to Equation (6.3).
classes so that their children can receive more individual if you knew Po and p�, not only would you be able to
ized attention. But hiring more teachers means spending determine the change in test scores at a district associ
more money, which is not to the liking of those paying the ated with a change in class size, but you also would be
billl So she asks you: If she cuts class sizes, what will the able to predict the average test score itself for a given
effect be on student performance? class size.
In many school districts, student performance is measured When you propose Equation (6.3) to the superintendent,
by standardized tests, and the job status or pay of some she tells you that something is wrong with this formulation.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
She points out that class size is just one of many facets of of "13cr.ssa." because this equation is written in terms of a
elementary education, and that two districts with the same general variable X1.]
class sizes will have different test scores for many rea
Equation (6.5) is the llnear regression model with a sln
sons. One district might have better teachers or it might
gle ragressor, in which Y is the dependent varlable and X
use better textbooks. Two districts with comparable class
is the Independent varlable or the regressor.
sizes, teachers, and textbooks still might have very differ
ent student populations; perhaps one district has more The first part of Equation (6.5), p0 + 1J1X1, is the population
immigrants (and thus fewer native English speakers) or regraalon llne or the populatlon regrealon function.
wealthier families. Finally, she points out that, even if two This is the relationship that holds between Y a nd X on
districts are the same in all these ways, they might have average over the population. Thus, if you knew the value
different test scores for essentially random reasons hav of X, according to this population regression line you
ing to do with the performance of the individual students would predict that the value of the dependent variable, Y,
on the day of the test. She is right, of course; for all these is Po + 13,X.
reasons, Equation (6.3) will not hold exactly for all dis The Intercept 130 and the slope f:11 are the coefficients
tricts. Instead, it should be viewed as a statement about a of the population regression line, also known as the
relationship that holds on average across the population parameters of the population regression line. The slope
of districts. p1 is the change in Yassociated with a unit change in X.
A version of this linear relationship that holds for each dis The intercept is the value of the population regression
trict must incorporate these other factors influencing test line when X = O; it is the point at which the population
scores, including each district's unique characteristics (for regression line intersects the Yaxis. In some econometric
example, quality of their teachers, background of their applications, the intercept has a meaningful economic
students, how lucky the students were on test day). One interpretation. In other applications, the intercept has no
approach would be to list the most important factors and real-world meaning; for example, when Xis the class size,
to introduce them explicitly into Equation (6.3) (an idea strictly speaking the intercept is the predicted value of
we return to in Chapter 8). For now, however; we simply test scores when there are no students in the class! When
lump all these "other factors" together and write the rela the real-world meaning of the intercept is nonsensical it
tionship for a given district as is best to think of it mathematically as the coefficient that
2017 Financial Risk Manager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
620
faced in statistics. For example, suppose you want to
compare the mean earnings of men and women who
recently graduated from college. Although the popula
600 �������
10 15
tion mean earnings are unknown, we can estimate the
20 25 30
Studeot-teecher ratio (X)
population means using a random sample of male and
female college graduates. Then the natural estimator of
Ii
:Hiili :ljJII Scatter plot of test score vs. student- the unknown population mean earnings for women, for
teacher ratio (hypothetical data). example, is the average earnings of the female college
graduates in the sample.
districts. The population regression line is �o + �iX°· The vertical
The scatterplot shows hypothetical observations for seven school
distance from the ;rr. point to the population regression line is The same idea extends to the linear regression model.
Yi - (�0 + �). which is the population error term u1 for the We do not know the population value of Pei-sin' the
1"tn observation.
slope of the unknown population regression line relating
X (class size) and Y (test scores). But just as it was pos
that districts with lower student-teacher ratios (smaller sible to learn about the population mean using a sample
classes) tend to have higher test scores. The intercept of data drawn from that population, so is it possible to
p0 has a mathematical meaning as the value of the Y learn about the population slope pClas:sSl'Zll using a sample
axis intersected by the population regression line, but, of data.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
lt'!:l!Jji Summary of the Distribution of Student-Teacher Ratios and Fifth-Grade Test Scores
for 420 K-8 Districts in California in 1998
Percentile
Standard 50"
Average Deviation 10" 25" 40" (median) 60% 75% 90"
Student-teacher 19.6 1.9 17.3 18.6 19.3 19.7 20.1 20.9 21.9
ratio
Test score 665.2 19.1 630.4 640.0 649.1 654.5 659.4 666.7 679.1
.. .
nia school districts that serve kindergarten
700 .
.
. .
through eighth grade. The test score is the
.. .
.
districtwide average of reading and math 680 . :. ... ·.. •.\ :..·
.
.: ., " . .. t. ..
. .
. .
·'?=·�· •• •. • •
�
.
\:t.··.,�·
.
� ·
660
. . •- .•• .•H�,'..'- ,
. . �·
. ., ..
•. •. •• •• • •
measured in various ways. The measure used
.
: . .
here is one of the broadest, which is the •. • .::1. '....• -• ,.. .·'a:.
.;.'}-
640
• ::
,� •· . •.
•
.
.. .
wide student-teacher ratio. These data are .
. . .
.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Y,
predicted value of given x,. based on the OLS regres
Figure 6-2, the estimated slope is -2.28 and the estimated
sion line, is �o + p,x, . The residua! for the ;t� obser
if= intercept is 698.9. Accordingly, the OLS regression line for
Y,
vation is the difference between and its predicted value: these 420 observations is
-
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
,
the median to just short of the 60111 percen
.. ·. "
620 . . . . ... ... . . . . '
. .... ..
tile. Thus, a decrease in class size that would
place her district close to the 10% with the
smallest classes would move her test scores
15 20 25 30 from the SQth to the 60111 percentile. Accord
Student-teacher ratio
ing to these estimates, at least, cutting the
ijj[Cil);lJift The estimated regression line for the California data. student-teacher ratio by a large amount
(2 students per teacher) would help and might
and the student-teacher ratio. If class sizes fall by 1 student, the estimated regres
The estimated regression line shows a negative relatlonshlp between test scores
be worth doing depending on her budgetary
sion predicts that test scores will increase by 2.28 points. situation, but it would not be a panacea.
What if the superintendent were contem
plating a far more radical change, such as reducing the
student-teacher ratio from 20 students per teacher to 5?
The negative slope indicates that more students per
Unfortunately, the estimates in Equation (6.11) would not
teacher (larger classes) is associated with poorer perfor
be very useful to her. This regression was estimated using
mance on the test.
the data in Figure 6-2, and as the figure shows, the small
It is now possible to predict the districtwide test score est student-teacher ratio in these data is 14. These data
given a value of the student-teacher ratio. For example, contain no information on how districts with extremely
for a district with 20 students per teacher, the predicted small classes perform, so these data alone are not a reli
test score is 698.9 - 2.28 x 20 = 653.3. Of course, this able basis for predicting the effect of a radical move to
prediction will not be exactly right because of the other such an extremely low student-teacher ratio.
factors that determine a district's performance. But the
regression line does give a prediction (the OLS predic
tion) of what test scores would be for that district, based
Why Use the OLS Estimator?
on their student-teacher ratio, absent those other factors. There are both practical and theoretical reasons to use
the OLS estimators Po and p,, Because OLS is the domi
Is this estimate of the slope large or small? To answer this,
we return to the superintendent's problem. Recall that nant method used in practice, it has become the common
she is contemplating hiring enough teachers to reduce language for regression analysis throughout economics,
finance (see the box), and the social sciences more gener
the student-teacher ratio by 2. Suppose her district is at
ally. Presenting results using OLS (or its variants discussed
the median of the California districts. From Table 6-1, the
median student-teacher ratio is 19.7 and the median test later in this book) means that you are "speaking the same
language" as other economists and statisticians. The OLS
score is 654.5. A reduction of 2 students per class, from
19.7 to 17.7, would move her student-teacher ratio from the formulas are built into virtually all spreadsheet and statis
50111 percentile to very near the 10111 percentile. This is a big tical software packages, making OLS easy to use.
change, and she would need to hire many new teachers. The OLS estimators also have desirable theoretical prop
How would it affect test scores? erties. These are analogous to the desirable properties
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Pearson Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
of Y as an estimator of the population mean. Under the ranges between 0 and 1 and measures the fraction of the
assumptions introduced in a later section, the OLS estima variance of Y, that is explained by x,. The standard error
tor is unbiased and consistent. The OLS estimator is also of the regression measures how far Y, typically is from its
efficient among a certain class of unbiased estimators; predicted value.
however, this efficiency result holds under some additional
special conditions, and further discussion of this result is The R2
deferred until Chapter 7.
The regression � is the fraction of the sample variance of
Y, explained by (or predicted by) x,. The definitions of the
MEASURES OF FIT predicted value and the residual (see Box 6-2) allow us
to write the dependent variable Y, as the sum of the pre
Having estimated a linear regression, you might wonder dicted value, Y,, plus the residual ii1:
Yi = Y; + u;
how well that regression line describes the data. Does the
regressor account for much or for little of the variation in (8.13)
the dependent variable? Are the observations tightly clus In this notation, the W is the ratio of the sample variance
tered around the regression line, or are they spread out? of Y, to the sample variance of Y;.
The R2 and the standard error of the regression measure Mathematically, the R2 can be written as the ratio of the
how well the OLS regression line fits the data. The R2 explained sum of squares to the total sum of squares. The
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
explained sum of squares (ESS) is the sum of squared The Standard Error of the Regression
deviations of the predicted values of 'f;, Y,, from their
average, and the total sum of squares (TSS) is the sum of The standard error of the regression (SER) is an estima
squared deviations of Y, from its average: tor of the standard deviation of the regression error u,.
The units of u1 and Y, are the same, so the SER is a mea
n
ESS = L(Y; - Y)2 (8.14) sure of the spread of the observations around the regres
i-1 sion line, measured in the units of the dependent variable.
n - 2
For example, if the units of the dependent variable are
""' (Y; - Y)
TSS = "'- (6.15) dollars, then the SER measures the magnitude of a typical
i•l
deviation from the regression line-that is, the magnitude
Equation (6.14) uses the fact that the sample average OLS of a typical regression error-in dollars.
predicted value equals Y. Because the regression errors u1, • • • , un are
The R2 is the ratio of the explained sum of squares to the unobserved, the SER is computed using their sample
total sum of squares: counterparts, the OLS residuals Iii . . . . , Un . The formula
for the SER is
R2 ESS n _2
(8.19)
n - i=1
_
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
distribution of u; being centered on the population regres Assumption #2: ex,. Y,), I 11 ,n =
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
SAMPLING DISTRIBUTION
I:I•}!lift The Least Squares
OF THE OLS ESTIMATORS
Assumptions
Y; = Po + P�; + u;, i = 1, . . . , n, where Because the OLS estimators Po and �1 are computed
1. The error term u, has conditional mean zero given from a randomly drawn sample, the estimators themselves
x,: E(u1 Ix� = O; are random variables with a probability distribution-the
=
2. (X,,Y,), i 1, . . . , n are independent and identically sampling distribution-that describes the values they
distributed (i.i.d.) draws from their joint distribu could take over different possible random samples. This
tion; and
section presents these sampling distributions. In small
J. Large outliers are unlikely: x, and Y, have nonzero samples, these distributions are complicated, but in large
finite fourth moments.
samples, they are approximately normal because of the
central limit theorem.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
'� -- .
••
. ...
• •
200
• • •
and p, is well approximated by the bivariate •
•
• •• • •
.
..
I •• •
normal distribution. This implies that the mar •
., • • • •
•
ginal distributions of Po and p, are normal in 419 •
large samples.
198
•
• ��. . •
..
• • ••
•
•
This argument invokes the central limit theorem. • •
196
Technically, the central limit theorem concerns
the distribution of averages (like Y). If you •
����-'-��'--��
examine the numerator in Equation (6.7) for p1, 194 ��
97 98 99 100 101 102 103
you will see that it, too, is a type of average x
not a simple average, like Y, but an average of
the product, (Y;- Y)(X; - X). The central limit liUCill;)jjfj The variance of p1 and the variance of X.
theorem applies to this average so that, like the The lighter dots represent a set of x;s with a small variance. The black dots
simpler average Y, it is normally distributed in represent a set of x;s with a large variance. The regression line can be esti
mated more accurately with the black dots than with the lighter dots.
large samples.
The normal approximation to the distribution of the OLS
estimators in large samples is summarized in Box 6-4. A
relevant question in practice is how large n must be for
these approximations to be reliable. We suggested that the more complicated averages appearing in regression
n = 100 is sufficiently large for the sampling distribution of analysis. In virtually all modern econometric applications
Y to be well approximated by a normal distribution, and n > 100, so we will treat the normal approximations to the
sometimes smaller n suffices. This criterion carries over to distributions of the OLS estimators as reliable unless there
are good reasons to think otherwise.
The results in Box 6-4 imply that the OLS estimators are
consistent-that is, when the sample size is large, Po and
l;f.)!iitl Large-Sample Distributions p, will be close to the true population coefficients p0 and
!li with high probability. This is because the variances
of � 0 and �,
of0
If the least squares assumptions in Box 6-3 hold, then in and of, of the estimators decrease to zero as n increases
large samples Po and p1 have a jointly normal sampling (n appears in the denominator of the formulas for the
distribution. The large-sample normal distribution of variances). so the distribution of the OLS estimators will
P 1 is N(p1, a£_), where the variance of this distribution,
.., be tightly concentrated around their means, IJ0 and IJ,,
all,,
?. IS.
when n is large.
� .! var[(X1 - P.x)u1]
= (6.21) Another implication of the distributions in Box 6-4 is that,
p, n [var(X;)] 2
in general, the larger the variance of Xr the smaller the
The large-sample normal distribution of Po is N(p"' variance uj, ?f p,. Mathematically, this arises because the
aL), where variance of p1 in Equation (6.21) is inversely proportional
2 - _!_ var(Hµ ) , where 1 - 1 -
a�
_
n
;
[E(ffl) ]2 H
_
EO(f)
(�
'v (8.22)
ri·
to the square of the variance of X;: the larger is VaR(X;),
the larger is the denominator in Equation (6.21) so the
smaller is ul,· To get a better sense of why this is so, look
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
at Figure 6-6, which presents a scatterplot of 150 artificial The results in this chapter describe the sampling distribu
data points on X and Y. The data points indicated by the tion of the OLS estimator. By themselves, however, these
colored dots are the 75 observations closest to X. Sup results are not sufficient to test a hypothesis about the
pose you were asked to draw a line as accurately as pos value of IJ, or to construct a confidence interval for Pi·
sible through either the colored or the black dots-which Doing so requires an estimator of the standard deviation
would you choose? It would be easier to draw a precise of the sampling distribution-that is, the standard error of
line through the black dots, which have a larger variance the OLS estimator. This step-moving from the sampling
than the colored dots. Similarly, the larger the variance of distribution of p, to its standard error, hypothesis tests.
X, the more precise is p,. and confidence intervals-is taken in the next chapter.
The normal approximation to the sampling distribution of
�o and �1 is a powerful tool. With this approximation in SUMMARY
hand, we are able to develop methods for making infer
ences about the true population values of the regression 1. The population regression line, Po + 13,X. is the mean
coefficients using only a sample of data. of Yas a function of the value of X. The slope, p1, is the
expected change in Yassociated with a 1-unit change
CONCLUSION in X. The intercept, 130, determines the level (or height)
of the regression line. Box 6-1 summarizes the termi
This chapter has focused on the use of ordinary least nology of the population linear regression model.
squares to estimate the intercept and slope of a popula
tion regression line using a sample of n observations on
a dependent variable, Y. and a single regressor, X. There
=
2. The population regression line can be estimated using
sample observations ('f;, X), i 1, . . . , n by ordinary
least squares (OLS). The OLS estimators of the regres
are many ways to draw a straight line through a scat sion intercept and slope are denoted by �o and �1•
terplot, but doing so using OLS has several virtues. If the 3. The R2 and standard error of the regression (SER) are
least squares assumptions hold, then the OLS estimators measures of how close the values of Y, are to the esti
of the slope and intercept are unbiased, are consistent, mated regression line. The R2 is between 0 and 1, with
and have a sampling distribution with a variance that is a larger value indicating that the y;·s are closer to the
inversely proportional to the sample size n. Moreover, if n line. The standard error of the regression is an estima
is large, then the sampling distribution of the OLS estima tor of the standard deviation of the regression error.
tor is normal.
4. There are three key assumptions for the linear regres
These important properties of the sampling distribu sion model: (1) The regression errors, u1, have a mean
tion of the OLS estimator hold under the three least of zero conditional on the regressors X1; (2) the
squares assumptions. sample observations are i.i.d. random draws from the
The first assumption is that the error term in the linear population; and (3) large outliers are unlikely. If these
regression model has a conditional mean of zero. given assumptions hold, the OLS estimators �o and �1 are
the regressor X. This assumption implies that the OLS esti (1) unbiased; (2) consistent; and (3) normally distrib
mator is unbiased. uted when the sample is large.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
�ex, - X)(Y, - Y)
• = n 1=1 /=I
ers (measured as "full-time equivalents"), number of com· IJ1 n n
�1 (X; - X)2
(6.27)
puters per classroom, and expenditures per student. The ..! �� - 002
student-teacher ratio used here is the number of students n /=I /=
in the district, divided by the number of full-time equiva
lent teachers. Demographic variables for the students also Po = Y- - p..X (8.28)
are averaged across the district. The demographic vari
Equations (6.27) and (6.28) are the formulas for Po and
ables include the percentage of students who are in the
p1 given in Box 6-2; the formula p, = sXY/� is obtained
public assistance program ca1works (formerly AFDC), the
by dividing the numerator and denominator in Equation
percentage of students who qualify for a reduced price
(6.27) by n - 1.
lunch, and the percentage of students who are English
learners (that is, students for whom English is a second
language). All of these data were obtained from the Cali
fornia Department of Education (www.cde.ca.gov).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
\
nI: Qua11t1lat1ve Analysis, Seventh Edition by Global Association of Risk Professionals.
ucation, Inc. All Rights Reserved. Pearson Custom Edition. �
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning Objectives
After completing this reading you should be able to:
• Calculate and interpret confidence intervals for • Determine the conditions under which the OLS is the
regression coefficients. best linear conditionally unbiased estimator.
• Interpret the p-value. • Explain the Gauss-Markov Theorem and its
• Interpret hypothesis tests about regression limitations, and alternatives to the OLS.
coefficients. • Apply and interpret the t-statistic when the sample
• Evaluate the implications of homoskedasticity and size is small.
heteroskedasticity.
i Chapter S of Introduction to Econometrics, Brief Edition, by .James H. Stock and Mark W. Watson.
Excerpt s
87
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The first three sections assume that the three least Recall that the null hypothesis that the mean of Y is a spe
squares assumptions of Chapter 6 hold. If, in addition, cific value ILY.o can be written as H0: E(Y) = µ.Y.0, and the
some stronger conditions hold, then some stronger results two-sided alternative is H1: E(Y) "* µ..Y.0•
can be derived regarding the distribution of the OLS esti The test of the null hypothesis H0 against the two-sided
mator. One of these stronger conditions is that the errors alternative proceeds as in the three steps summarized.
are homoskedastic, a concept introduced later. The Gauss The first is to compute the standard error of Y, SE(Y),
Markov theorem, which states that, under certain condi which is an estimator of the standard deviation of the
tions, OLS is efficient (has the smallest variance) among sampling distribution of Y. The second step is to compute
=
a certain class of estimators is also discussed. The final the t-statistic, which has the general form given in Box 7-1;
section discusses the distribution of the OLS estimator applied here, the t-statistic is t (Y - J1.Y.o)/SE(Y).
when the population distribution of the regression errors
The third step is to compute the p-value, which is the
is normal.
smallest significance level at which the null hypothesis
could be rejected, based on the test statistic actually
observed; equivalently, the p-value is the probability
TESTI NG HYPOTHESES ABOUT of obtaining a statistic, by random sampling variation,
ONE OF THE REGRESSION at least as different from the null hypothesis value as is
COEFFICIENTS the statistic actually observed, assuming that the null
hypothesis is correct. Because the t-statistic has a stan
Your client, the superintendent, calls you with a problem. dard normal distribution in large samples under the null
She has an angry taxpayer in her office who asserts that hypothesis, the p-value for a two-sided hypothesis test
cutting class size will not help boost test scores, so that is 2cl>(-I tact!), where tact is the value of the t-statistic
reducing them further is a waste of money. Class size, the
taxpayer claims, has no effect on test scores.
The taxpayer's claim can be rephrased in the language of
regression analysis. Because the effect on test scores of a
unit change in class size is Pc.11sS12a• the taxpayer is assert l:r•£fAI General Form of the t-Statistic
ing that the population regression line is flat-that is, the In general, the t-statistic has the form
slope Po.sssiz. of the population regression line is zero. Is
estimator - �
hypothesized value
there, the superintendent asks, evidence in your sample t = ----- ------
(7.1)
standard error of the estimator
of 420 observations on California school districts that this
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
actually computed and If> is the cumulative standard The second step is to compute the t-statlstlc,
=
normal distribution. Alternatively, the third step can be
�1 - 1,0
replaced by simply comparing the t-statistic to the critical t � (7.5)
value appropriate for the test with the desired significance SE(r;11)
level. For example, a two-sided test with a 5% significance The third step is to compute the p-value, the probability
level would reject the null hypothesis if I tact I > 1.96. In of observing a value of p1 at least as different from f:J1.0 as
this case, the population mean is said to be statistically the estimate actually computed <Prc1'), assuming that the
significantly different than the hypothesized value at the null hypothesis is correct. Stated mathematically,
5% significance level.
p-value = PrH0U P1 - !11.o l > I P�- !11.olJ (7.6)
Testing Hypotheses About the Slope p1
At a theoretical level, the critical feature justifying the
[I
1 1
SE(j!1)
l I
SE(IJ1)
1
= P rHo P -!,o > Prr -. f:J .o IJ
foregoing testing procedure for the population mean is
that, in large samples, the sampling distribution of Y is
approximately normal. Because p, also has a normal sam
where PrH0 denotes the probability computed under the
pling distribution in large samples, hypotheses about the
null hypothesis, the second equality follows by dividing by
true value of the slope p1 can be tested using the same
SE( P1), and t•� is the value of the t-statistic actually com
general approach.
puted. Because p1 is approximately normally distributed
The null and alternative hypotheses need to be in large samples, under the null hypothesis the t-statistic
SE(p1)
•
= v!;;2
6
a,t (7.1) Reporting Regression Equations and Application
where to Test Scores
n
1 The OLS regression of the test score against the
-- Lex, - >o201
O'2
n - 2 1�1
• . = -1 x -- --
student-teacher ratio, reported in Equation (7.11), yielded
[� �(X, -X)2r = =
(7.4)
P o = 698.9 and �1 = -2.28. The standard errors of these
�· n
estimates are SE(p0) 10.4 and S£(�1) 0.52.
Although the formula for Oi, is complicated, in applica Because of the importance of the standard errors, by con
tions the standard error is computed by regression soft- vention they are included when reporting the estimated
ware so that it is easy to use in practice. OLS coefficients. One compact way to report the standard
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
131 is less than r:i.1•0• If the alternative is that 131 is greater than slope arose purely because of random sampling variation
131•0, the inequality in Equation (7.9) is reversed. at the 1% significance level.
Because the null hypothesis is the same for a one- and
a two-sided hypothesis test, the construction of the Testing Hypotheses about
t-statistic is the same. The only difference between a one the Intercept Po
and two-sided hypothesis test is how you interpret the
This discussion has focused on testing hypotheses about
t-statistic. For the one-sided alternative in Equation (7.9),
the slope, r;i.1. Occasionally, however. the hypothesis con
the null hypothesis is rejected against the one-sided
alternative for large negative, but not large positive, val cerns the intercept, 130. The null hypothesis concerning the
=
intercept and the two-sided alternative are
ues of the t-statistic: Instead of rejecting if I t� I > 1.96,
the hypothesis is rejected at the 5% significance level if Ho: !:lo 130,0 vs. H,: !:lo '*- Po.o (7.11)
t«:t < -1.645. (two-sided alternative)
The p-value for a one-sided test is obtained from the The general approach to testing this null hypothesis
cumulative standard normal distribution as consists of the three steps in Box 7-2, applied to p0• If
p-value = Pr(.Z < tad) (7.10) the alternative is one-sided, this approach is modified as
= 4>(t•� (p-value, one-sided left-tail test) was discussed in the previous subsection for hypotheses
about the slope.
If the alternative hypothesis is that 131 is greater than 131,o·
the inequalities in Equations (7.9) and (7.10) are reversed, Hypothesis tests are useful if you have a specific
so the p-value is the right-tail probability, Pr(.Z > tad). null hypothesis in mind (as did our angry taxpayer).
Being able to accept or to reject this null hypothesis
based on the statistical evidence provides a powerful
When Should a One-Sided Test Be Used?
tool for coping with the uncertainty inherent in using
In practice, one-sided alternative hypotheses should be a sample to learn about the population. Yet, there are
used only when there is a clear reason for doing so. This many times that no single hypothesis about a regres
reason could come from economic theory, prior empiri sion coefficient is dominant, and instead one would like
cal evidence, or both. However, even if it initially seems to know a range of values of the coefficient that are
that the relevant alternative is one-sided, upon reflection consistent with the data. This calls for constructing a
this might not necessarily be so. A newly formulated drug confidence interval.
undergoing clinical trials actually could prove harmful
because of previously unrecognized side effects. In the
class size example, we are reminded of the graduation
CONFIDENCE INTERVALS
joke that a university's secret of success is to admit tal
ented students and then make sure that the faculty stays
FOR A REGRESSION COEFFICIENT
out of their way and does as little damage as possible. In
Because any statistical estimate of the slope (:11 necessarily
practice, such ambiguity often leads econometricians to
has sampling uncertainty, we cannot determine the true
use two-sided tests.
value of 131 exactly from a sample of data. It is, however,
possible to use the OLS estimator and its standard error
to construct a confidence interval for the slope (J, or for
Application to Test Scores
= =
The t-statistic testing the hypothesis that there is no
effect of class size on test scores [so 131•0 O in Equa-
tion (7.9)] is t•ct -4.38. This is less than -2.33 (the criti
the intercept r;i.0•
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
95% probability of containing the true value of p1; that is, Confidence Interval for Po
in 95% of possible samples that might be drawn, the con
A 95% confidence interval for Po is constructed as in
fidence interval will contain the true value of 131• Because
Box 7-3, with Po and SE(p0) replacing p, and SE( p1).
this interval contains the true value in 95% of all samples,
it is said to have a confidence level of 95%.
Application to Test Scores
The reason these two definitions are equivalent is as fol
lows. A hypothesis test with a 5% significance level will, The OLS regression of the test score against the
by definition, reject the true value of IJ1 in only 5% of all student-teacher ratio, reported in Equation (7.8),
possible samples; that is, in 95% of all possible samples yielded p1 = -2.28 and SE(p1) = 0.52. The 95% two
the true value of p, will not be rejected. Because the 95% sided confidence interval for p, is {-2.28 ± 1.96 x 0.52},
confidence interval (as defined in the first definition) or -3.30 5 p, 5 -1.26. The value p, = 0 is not con
is the set of all values of 131 that are not rejected at the tained in this confidence interval, so (as we knew
5% significance level, it follows that the true value of p, already) the hypothesis � = 0 can be rejected at the 5%
will be contained in the confidence interval in 95% of all significance level.
possible samples.
As in the case of a confidence interval for the population Confidence Intervals for Predicted Effects
mean, in principle a 95% confidence interval can be com of Changing X
puted by testing all possible values of� (that is, testing The 95% confidence interval for p, can be used to con
the null hypothesis p, = Pi.o for all values of Pi,o) at the 5% struct a 95% confidence interval for the predicted effect
significance level using the t-statistic. The 95% confidence of a general change in X
interval is then the collection of all the values of p1 that are
Consider changing X by a given amount, A x. The pre
not rejected. But constructing the t-statistic for all values
dicted change in Yassociated with this change in Xis
of� would take forever.
p,A x. The population slope 131 is unknown, but because
An easier way to construct the confidence interval is to we can construct a confidence interval for p,. we can
note that the t-statistic will reject the hypothesized value construct a confidence interval for the predicted effect
Pi.o whenever Ptc is outside the range p, ± 1.96SE(p1). p1Ax. Because one end of a 95% confidence interval for
That is, the 95% confidence interval for p1 is the interval p1 is p1 - 1.96SE(f!,1), the predicted effect of the change
[p1 - 1.96SE(p1), p, + 1.96SE(p1)]. This argument parallels A x using this estimate of p1 is [p1 - 1.96SE(f!,1)] x A x. The
the argument used to develop a confidence interval for other end of the confidence interval is p, + l.96SE(p1), and
the population mean. the predicted effect of the change using that estimate is
The construction of a confidence interval for IJ1 is summa [p1 + 1.96SE(p1)] x A x. Thus a 95% confidence interval
rized in Box 7-3. for the effect of changing x by the amount A x can be
expressed as
l:r•tf41 Confidence Interval for �1 95% confidence interval for fJ,11 x = (7.11)
A 95% two-sided confidence interval for p1 is an [ P1A x - 1.96SE( P 1) X /:J,. x, P1A x + 1.96SE( �1) X A x]
interval that contains the true value of p1 with a
For example, our hypothetical superintendent is contem
95% probability; that is, it contains the true value of
p, in 95% of all possible randomly drawn samples. plating reducing the student-teacher ratio by 2. Because
Equivalently, it is the set of values of 131 that cannot be the 95% confidence interval for fJ, is [-3.30, -1.26],
rejected by a 5% two-sided hypothesis test. When the the effect of reducing the student-teacher ratio by 2
sample size is large, it is constructed as could be as great as -3.30 x (-2) = 6.60, or as little as
95% confidence interval for 131 = -1.26 x (-2) = 2.52. Thus decreasing the student-teacher
(7.12)
ratio by 2 is predicted to increase test scores by between
[ P1 - 1.96SE(P 1). p, + 1.96SE( P1)J.
2.52 and 6.60 points, with a 95% confidence level.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Coefficients
D1 = 0, or p, = E(Yi I D1=1) - E(Y1 I D1 = 0). In the test score
The mechanics of regression with a binary regressor are example, [31 is the difference between mean test score in
the same as if it is continuous. The interpretation of p,, districts with low student-teacher ratios and the mean
however, is different, and it turns out that regression with test score in districts with high student-teacher ratios.
a binary variable is equivalent to performing a difference
Because p, is the difference in the population means, it
of means analysis, as described previously.
makes sense that the OLS estimator [:!1is the difference
To see this, suppose you have a variable D1 that equals between the sample averages of Yi in the two groups, and
either 0 or 1, depending on whether the student-teacher in fact this is the case.
{
ratio is less than 20:
1 if the student-teacher ratio in ;!J'I district < 20 Hypothesis Tests and Confidence Intervals
D1
0 if the student-teacher ratio in ;1J1 district :.?: 20
=
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Pearson Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
-
Test score
TestScore = 650.0 + 7.40, R2 = 0.037, (7.18) 720
(1.3) (1.8)
SER = 18.7 700 Distribution of Ywhen X
I
= 15
CistribLJtion of Ywhen X= 20
Dist,.. ,.,./w""'
. x= ..
where the standard errors of the OLS esti
mates of the coefficients 130 and 131 are given
in parentheses below the OLS estimates.
680
660
I
Thus the average test score for the sub
640
sample with student-teacher ratios greater
than or equal to 20 (that is, for which D = 0)
620
is 650.0, and the average test score for the
subsample with student-teacher ratios less
than 20 (so D = 1) is 650.0 + 7.4 = 657.4. The
60010 15 20 25 30
Student-teacher ratio
difference between the sample average test
scores for the two groups is 7.4. This is the liUC'llJd§A?J An example of heteroskedasticity.
OLS estimate of p,, the coefficient on the Like Figure 6-4. this shows the conditional distribution of test scores for three dif
student-teacher ratio binary variable D. ferent class sizes. Unlike Figure 6-4, these distributions become more spread out
(have a larger variance) for larger class sizes. Because the variance of the distribu·
Is the difference in the population mean test tion of u given x, VaR(u l;o, depends on x, u is heteroskedastic.
scores in the two groups statistically signifi
cantly different from zero at the 5% level? To
find out, construct the t-statistic on 131: t = 7.4/1.8 = 4.04.
This exceeds 1.96 in absolute value. so the hypothesis that What Are Heteroskedastlclty
the population mean test scores in districts with high and and Homoskedastlclty?
low student-teacher ratios is the same can be rejected at
the 5% significance level. Definitions of Heteroskedasticity
and Homoskedasticity
The OLS estimator and its standard error can be used to
The error term u, is homoskedastlc if the variance of
construct a 95% confidence interval for the true differ
the conditional distribution of u1 given X1 is constant for
ence in means. This is 7.4 ± 1.96 x 1.8 = (3.9, 10.9). This
i = 1, . . . , n and in particular does not depend on X,. Other
confidence interval excludes p, = 0, so that (as we know
wise, the error term is heteraskedastlc.
from the previous paragraph) the hypothesis p, = 0 can be
rejected at the 5% significance level. As an illustration, return to Figure 6-4. The distribution
of the errors u, is shown for various values of x. Because
this distribution applies specifically for the indicated value
HETEROSKEDASTICITY of x, this is the conditional distribution of u, given x, = x.
As drawn in that figure, all these conditional distributions
AND HOMOSKEDASTICITY
have the same spread; more precisely, the variance of
these distributions is the same for the various values of x.
Our only assumption about the distribution of u1 condi
That is, in Figure 6-4, the conditional variance of u, given
tional on X, is that it has a mean of zero (the first least
X, = x does not depend on x, so the errors illustrated in
squares assumption). If, furthermore, the variance of this
Figure 6-4 are homoskedastic.
conditional distribution does not depend on X1, then the
errors are said to be homoskedastic. This section dis In contrast, Figure 7-2 illustrates a case in which the con
cusses homoskedasticity, its theoretical implications, the ditional distribution of u1 spreads out as x increases. For
simplified formulas for the standard errors of the OLS small values of x, this distribution is tight, but for larger
estimators that arise if the errors are homoskedastic, values of x. it has a greater spread. Thus, in Figure 7-2 the
and the risks you run if you use these simplified formulas variance of u, given x, = x increases with x, so that the
in practice. errors in Figure 7-2 are heteroskedastic.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
where � is given in Equation (6.19). In the special case example of the gender gap in earnings among college
that Xis a binary variable, the estimator of the variance graduates. Familiarity with how people are paid in the
of p, under homoskedasticity (that is, the square of the world around us gives some clues as to which assump
standard error of p, under homoskedasticity) is the so tion is more sensible. For many years-and, to a lesser
called pooled variance formula for the difference extent, today-women were not found in the top-paying
in means. jobs: There have always been poorly paid men, but there
have rarely been highly paid women. This suggests that
Because these alternative formulas are derived for the
the distribution of earnings among women is tighter than
special case that the errors are homoskedastic and do
among men. In other words, the variance of the error term
not apply if the errors are heteroskedastic, they will be
in Equation (7.20) for women is plausibly less than the
referred to as the "homoskedasticity-only" formulas for
variance of the error term in Equation (7.21) for men. Thus,
the variance and standard error of the OLS estimators. As
the presence of a "glass ceilingu for women's jobs and pay
the name suggests, if the errors are heteroskedastic, then
suggests that the error term in the binary variable regres
the homoskedasticity-only standard errors are inappro
sion model in Equation (7.19) is heteroskedastic. Unless
priate. Specifically, if the errors are heteroskedastic, then
there are compelling reasons to the contrary-and we can
the t-statistic computed using the homoskedasticity-only
think of none-it makes sense to treat the error term in
standard error does not have a standard normal distribu
this example as heteroskedastic.
tion, even in large samples. In fact, the correct critical
values to use for this homoskedasticity-only t-statistic As this example of modeling earnings illustrates, hetero
depend on the precise nature of the heteroskedasticity, skedasticity arises in many econometric applications. At
so those critical values cannot be tabulated. Similarly, if a general level, economic theory rarely gives any reason
the errors are heteroskedastic but a confidence interval to believe that the errors are homoskedastic. It therefore
is constructed as :t1.96 homoskedasticity-only standard is prudent to assume that the errors might be hetero
errors, in general the probability that this interval con skedastic unless you have compelling reasons to believe
tains the true value of the coefficient is not 95%, even in otherwise.
large samples.
In contrast, because homoskedasticity is a special case
of heteroskedasticity, the estimators ag, and &g0 of the Practical Implications
variances of �1 and Po given in Equations (7.4) and The main issue of practical relevance in this discussion
(7.26) produce valid statistical inferences whether the is whether one should use heteroskedasticity-robust
errors are heteroskedastic or homoskedastic. Thus or homoskedasticity-only standard errors. In this
hypothesis tests and confidence intervals based on those regard, it is useful to imagine computing both, then
standard errors are valid whether or not the errors are choosing between them. If the homoskedasticity-only
heteroskedastic. Because the standard errors we have and heteroskedasticity-robust standard errors are the
used so far [i.e., those based on Equations (7.4) and same, nothing is lost by using the heteroskedasticity
(7.26)] lead to statistical inferences that are valid whether robust standard errors; if they differ, however, then
or not the errors are heteroskedastic, they are called you should use the more reliable ones that allow for
heteroskadastlclty-robust standard errors. Because such heteroskedasticity. The simplest thing, then, is always to
formulas were proposed by Eicker (1967). Huber (1967), use the heteroskedasticity-robust standard errors.
and White (1980), they are also referred to as
Eicker-Huber-White standard errors. For historical reasons, many software programs use
the homoskedasticity-only standard errors as their
default setting, so it is up to the user to specify the
What Does This Mean In Practice?
option of heteroskedasticity-robust standard errors.
Which Is More Realistic, Heteroskedastlclty The details of how to implement heteroskedasticity
or Homoskedastlclty? robust standard errors depend on the software
The answer to this question depends on the application. package you use.
However, the issues can be clarified by returning to the
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
This is an empirical question, so answering it requires Earnings= -3.13 + 1.47 Years Education, (7.21)
analyzing data. Figure 7-3 is a scatterplot of the hourly (0.93) (0.07)
earnings and the number of years of education for a
sample of 2950 full-time workers in the United States in Ff = 0.130, SER = 8.77.
2004, ages 29 and 30, with between 6 and 18 years of This line is plotted in Figure 7-3. The coefficient of
1.47 in the OLS regression line means that, on average,
hourly earnings increase by $1.47 for each additional
year of education. The 95% confidence interval for this
coefficient is 1.47 ± 1.96 x 0.07, or 1.33 to 1.61.
Homly earninp
100
The second striking feature of Figure 7-3 is that the
spread of the distribution of earnings increases with the
. .
years of education. While some workers with many years
of education have low-paying jobs, very few workers
so with low levels of education have high-paying jobs. This
can be stated more precisely by looking at the spread of
the residuals around the OLS regression line. For workers
with ten years of education, the standard deviation of
the residuals is $5.46; for workers with a high school
10 15 20
Ye.n of echac:ation diploma, this standard deviation is $7.43; and for workers
with a college degree, this standard deviation increases
to $10.78. Because these standard deviations differ
Scatterplot of hourly earnings
for different levels of education, the variance of the
and years of education for 29- to residuals in the regression of Equation (7.23) depends
30-year-olds in the United States on the value of the regressor (the years of education); in
in 2004. other words, the regression errors are heteroskeclastic.
Hourly earnings are plotted against years of education for 2950 In real-world terms, not all college graduates will be
full-time, 29- to 30-year-old workers. The spread around the earning $50/hour by the time they are 29, but some will,
regression line increases with the years of education, indicating and workers with only ten years of education have no
that the regression errors are heteroskedastic. shot at those jobs.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Linear Conditionally Unbiased The Gauss-Markov theorem states that, under a set of
conditions known as the Gauss-Markov conditions, the
Estimators and the Gauss-Markov
OLS estimator p1 has the smallest conditional variance,
Theorem
given X1 • Xm of all linear conditionally unbiased
. . . •
If the three least squares assumptions (Box 6-3) hold and estimators of �1; that is, the OLS estimator is BLUE. The
if the error is homoskedastic, then the OLS estimator has Gauss-Markov conditions, which are stated in the chap
the smallest variance, conditional on X1, , , , , Xni among ter Appendix, are implied by the three least squares
all estimators in the class of linear conditionally unbiased assumptions plus the assumption that the errors are
estimators. In other words, the OLS estimator is the Best homoskedastic. Consequently, if the three least squares
Linear conditionally Unbiased Estimator-that is, it is assumptions hold and the errors are homoskedastic,
BLUE. This result extends to regression the result that the then OLS is BLUE.
sample average Y is the most efficient estimator of the
population mean among the class of all estimators that Limitations of the Gauss-Markov Theorem
are unbiased and are linear functions (weighted averages) The Gauss-Markov theorem provides a theoretical jus
of Y,, Y,,.
. . . •
tification for using OLS. However, the theorem has
two important limitations. First, its conditions might
Linear Conditionally Unbiased Estimators
not hold in practice. In particular, if the error term is
The class of linear conditionally unbiased estimators con heteroskedastic-as it often is in economic applications
sists of all estimators of J:\, that are linear functions of then the OLS estimator is no longer BLUE. As discussed
Y,, . . . , Y,, and that are unbiased, conditional on X,, . . . , X,,. previously, the presence of heteroskedasticity does not
That is, if �1 is a linear estimator, then it can be written as pose a threat to inference based on heteroskedasticity
n robust standard errors, but it does mean that OLS is no
�1 = La;Y; (�1 is linear)
i•l
(7.24) longer the efficient linear conditionally unbiased estima
tor. An alternative to OLS when there is heteroskedasticity
where the weights a1, , a,, can depend on X1, , X,, but of a known form, called the weighted least squares esti
not on Y,, . . . , Y,,. The estimator �. is conditionally unbi
• • • • • •
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The Wei
ghted Least Squares Estimator unknown population distribution of the data. If, however,
the three least squares assumptions hold, the regression
If the errors are heteroskedastic, then OLS is no longer
errors are homoskedastic, and the regression errors are
BLUE. If the nature of the heteroskedastic is known
normally distributed, then the OLS estimator is normally
specifically, if the conditional variance of u, given Xi is
distributed and the homoskedasticity-only t-statistic has a
known up to a constant factor of proportionality-then
Student t distribution. These five assumptions-the three
it is possible to construct an estimator that has a smaller
least squares assumptions, that the errors are homoske
variance than the OLS estimator. This method, called
weighted least squares (WLS), weights the 1-tti observation
dastic, and that the errors are normally distributed-are
collectively called the homoskadutlc normal regression
by the inverse of the square root of the conditional vari
assumptions.
ance of u, given X" Because of this weighting, the errors
in this weighted regression are homoskedastic, so OLS,
The t-Statlstlc and the Student
t Distribution
when applied to the weighted data, is BLUE. Although
theoretically elegant, the practical problem with weighted
least squares is that you must know how the conditional The Student t distribution with m degrees of freedom is
variance of u1 depends on X1-something that is rarely defined to be the distribution of z(\lliijiii, whereZ is a
known in applications. random variable with a standard normal distribution, Wis
a random variable with a chi-squared distribution with m
The Least Absolute Deviations Estimator degrees of freedom, and Zand Ware independent. Under
As discussed earlier, the OLS estimator can be sensitive the null hypothesis, the t-statistic computed using the
to outliers. If extreme outliers are not rare, then other esti homoskedasticity-only standard error can be written in
=
mators can be more efficient than OLS and can produce this form.
The homoskedasticity-only t-statistic testing [31 1Jt.O is
inferences that are more reliable. One such estimator is
the least absolute deviations (LAD) estimator, in which the
regression coefficients Jio and p, are obtained by solving
=
t ( �1 - lli1,0)/a� · where ai is defined in Equation (7.22).
,
Under the homoskedastic normal regression assumptions,
a minimization like that in Equation (6.6), except that the Y has a normal distribution, conditional on x,, . . . , X,,. As
absolute value of the prediction "mistake" is used instead discussed previously, the OLS estimator is a weighted
of its square. That is, the least absolute deviations estima average of Yi Y,,. where the weights depend on
• . . . •
tors of f:io and p, are the values of b0 and b, that minimize X1, • • , X,,. Because a weighted average of independent
•
�7..1 1 Yi - b0 - b..X11. In practice, this estimator is less sen normal random variables is normally distributed, �1 has
sitive to large outliers in u than is OLS. a normal distribution, conditional on X1, , X,,. Thus
• • •
In many economic data sets, severe outliers in u are rare. <f!.1 - f:i1.0) has a normal distribution under the null
so use of the LAD estimator, or other estimators with hypothesis, conditional on X1, , Xn. In addition, the
• • •
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
error formula for the difference of means. It follows that level. The population coefficient might be 0, and we
the result is a special case of the result that, if the homo might simply have estimated our negative coefficient by
skedastic normal regression assumptions hold, then the random sampling variation. However, the probability of
homoskedasticity-only regression t-statistic has a Student doing so (and of obtaining a t-statistic on �1 as large as
t distribution. we did) purely by random variation over potential samples
is exceedingly small, approximately 0.001%. A 95% confi
Use of the Student t Distribution dence interval for �, is -3.30 � [:i1 � -1.26.
In Practice This represents considerable progress toward answering
If the regression errors are homoskedastic and normally the superintendent's question. Yet, a nagging concern
distributed and if the homoskedasticity-only t-statistic remains. There is a negative relationship between the
is used, then critical values should be taken from the student-teacher ratio and test scores, but is this relation
Student t distribution instead of the standard normal dis ship necessarily the causal one that the superintendent
tribution. Because the difference between the Student t needs to make her decision? Districts with lower student
distribution and the normal distribution is negligible if n teacher ratios have, on average, higher test scores. But
is moderate or large, this distinction is relevant only if the does this mean that reducing the student-teacher ratio
sample size is small. will, in fact, increase scores?
In econometric applications, there is rarely a reason to There is, in fact, reason to worry that it might not. Hiring
believe that the errors are homoskedastic and normally more teachers, after all, costs money, so wealthier school
distributed. Because sample sizes typically are large, how districts can better afford smaller classes. But students at
ever, inference can proceed as described earlier-that is, wealthier schools also have other advantages over their
by first computing heteroskedasticity-robust standard poorer neighbors, including better facilities, newer books,
errors, and then using the standard normal distribution and better-paid teachers. Moreover, students at wealthier
to compute p-values, hypothesis tests, and confidence schools tend themselves to come from more affluent
intervals. families, and thus have other advantages not directly
associated with their school. For example, California has
a large immigrant community; these immigrants tend
CONCLUSION to be poorer than the overall population and, in many
cases, their children are not native English speakers. It
Return for a moment to the problem that started Chap thus might be that our negative estimated relationship
ter 6: the superintendent who is considering hiring addi between test scores and the student-teacher ratio is a
tional teachers to cut the student-teacher ratio. What consequence of large classes being found in conjunction
have we learned that she might find useful? with many other factors that are, in fact. the real cause of
Our regression analysis, based on the 420 observations the lower test scores.
for 1998 in the California test score data set, showed that These other factors, or uomitted variables," could mean
there was a negative relationship between the student that the OLS analysis done so far has little value to the
teacher ratio and test scores: Districts with smaller classes superintendent. Indeed, it could be misleading: Changing
have higher test scores. The coefficient is moderately the student-teacher ratio alone would not change these
large, in a practical sense: Districts with 2 fewer students other factors that determine a child's perfonnance at
per teacher have, on average, test scores that are 4.6 school. To address this problem, we need a method that
points higher. This corresponds to moving a district at the will allow us to isolate the effect on test scores of chang
so1t1 percentile of the distribution of test scores to approx ing the student-teacher ratio, holding these other factors
imately the so1t1 percentile. constant. That method is multiple regression analysis, the
The coefficient on the student-teacher ratio is statisti topic of Chapters 8 and 9.
cally significantly different from 0 at the 5% significance
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
standard errors has a Student t distribution when the (Ill) E(Uµj IX1, . . . , Xn) = 0, i =F j
where the conditions hold for i, j = 1, . . . , n. The three
null hypothesis is true. The difference between the
Student t distribution and the normal distribution is
conditions, respectively, state that u, has mean zero, that
negligible if the sample size is moderate or large.
u, has a constant variance, and that the errors are uncor
related for different observations, where all these state
Key Terms ments hold conditionally on all observed X's (X,, . . . , Xn).
The Gauss-Markov conditions are implied by the three
null hypothesis (89) least squares assumptions (Box 6-3), plus the additional
two-sided alternative hypothesis (89) assumptions that the errors are homoskedastic. Because
standard error of p, (89) the observations are i.i.d. (Assumption 2), E(u1 IX1, • • • ,
t-statistic (89) X,,) = E(u1 IX1), and by Assumption 1, E(u1 IX1) = O; thus
p-value (89) condition (i) holds. Similarly, by Assumption 2,
confidence interval for fi, (91) VaR(ui ! Xi. X,,) = VaR(u, IX,), and because the errors
are assumed to be homoskedastic, VaR(u1 IX1) = �.
. . . •
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
=
To show that condition (iii) is implied by the least squares
assumptions, note that E(u,u1 IX,, . . . , X,,) E(u,u1 IX" �) =
the case of regression without an "X," so that the only
=
regressor is the constant regressor X°' 1. Then the OLS
= =
because (X" ')lj) are i.i.d. by Assumption 2. Assumption 2
also implies that E(u1u11X" X1) E(u11Xi> E(u11X1) for i + ;j
estimator Po Y. It follows that, under the Gauss-Markov
assumptions, Y is BLUE. Note that the Gauss-Markov
=
because E(u,IXi> O for all i, it follows that E(up1IX1, ,
X,,) 0 for all i + j, so condition (iii) holds. Thus, the least
squares assumptions in Box 6-3, plus homoskedastic-
• • • requirement that the error be homoskedastic is irrelevant
in this case because there is no regressor, so it follows
that Y is BLUE if Y,, . . . , Y,, are i.i.d.
ity of the errors, imply the Gauss-Markov conditions in
Equation (7.26).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
t1tat1ve Analysis, Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Define and interpret omitted variable bias, and • Calculate and interpret measures of fit in multiple
describe the methods for addressing this bias. regression.
• Distinguish between single and multiple regression. • Explain the assumptions of the multiple linear
• Interpret the slope coefficient in a multiple regression model.
regression. • Explain the concept of imperfect and perfect
• Describe homoskedasticity and heteroskedasticity multicollinearity and their implications.
in a multiple regression.
• Describe the OLS estimator in a multiple regression.
Excerpt s
i Chapter 6 of Introduction to Econometrics, Brief Edition, by James H. Stock and Mark W. Watson.
105
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Chapter 7 ended on a worried note. Although school that is, the mean of the sampling distribution of the OLS
districts with lower student-teacher ratios tend to have estimator might not equal the true effect on test scores
higher test scores in the California data set, perhaps stu of a unit change in the student-teacher ratio. Here is the
dents from districts with small classes have other advan reasoning. Students who are still learning English might
tages that help them perform well on standardized tests. perform worse on standardized tests than native English
Could this have produced misleading results and, if so, speakers. If districts with large classes also have many
what can be done? students still learning English, then the OLS regression of
test scores on the student-teacher ratio could erroneously
Omitted factors, such as student characteristics, can in
fact make the ordinary least squares (OLS) estimator of find a correlation and produce a large estimated coef
ficient, when in fact the true causal effect of cutting class
the effect of class size on test scores misleading or, more
sizes on test scores is small, even zero. Accordingly, based
precisely, biased. This chapter explains this "omitted vari
able bias" and introduces multiple regression, a method on the analysis of Chapters 6 and 7, the superintendent
might hire enough new teachers to reduce the student
that can eliminate omitted variable bias. The key idea of
teacher ratio by two, but her hoped-for improvement in
multiple regression is that, if we have data on these omit
test scores will fail to materialize if the true coefficient is
ted variables, then we can include them as additional
small or zero.
regressors and thereby estimate the effect of one regres
sor (the student-teacher ratio) while holding constant the A look at the california data lends credence to this con
other variables (such as student characteristics). cern. The correlation between the student-teacher ratio
and the percentage of English learners (students who are
This chapter explains how to estimate the coefficients of
not native English speakers and who have not yet mas
the multiple linear regression model. Many aspects of mul
tered English) in the district is 0.19. This small but positive
tiple regression parallel those of regression with a single
correlation suggests that districts with more English learn
regressor, studied in Chapters 6 and 7. The coefficients of
the multiple regression model can be estimated from data ers tend to have a higher student-teacher ratio (larger
classes). If the student-teacher ratio were unrelated to
using OLS; the OLS estimators in multiple regression are
the percentage of English learners, then it would be safe
random variables because they depend on data from a
random sample; and in large samples the sampling distri to ignore English proficiency in the regression of test
scores against the student-teacher ratio. But because the
butions of the OLS estimators are approximately normal.
student-teacher ratio and the percentage of English learn
ers are correlated, it is possible that the OLS coefficient in
the regression of test scores on the student-teacher ratio
OMITTED VARIABLE BIAS reflects that influence.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Example #2: Time of Day of the Test Omitted Variable Bias and the First Least
Squares Assumption
Another variable omitted from the analysis is the time of
day that the test was administered. For this omitted vari Omitted variable bias means that the first least squares
able, it is plausible that the first condition for omitted vari assumption-that E(u1 I X,) = 0, as listed in Box 6-3-is
able bias does not hold but the second condition does. incorrect. To see why, recall that the error term u, in the
For example, if the time of day of the test varies from one linear regression model with a single regressor repre
district to the next in a way that is unrelated to class size, sents all factors, other than X" that are determinants of
then the time of day and class size would be uncorrelated Y� If one of these other factors is correlated with x� this
so the first condition does not hold. Conversely, the time means that the error term (which contains this factor) is
of day of the test could affect scores (alertness varies correlated with x� In other words, if an omitted variable
through the school day), so the second condition holds. is a determinant of Yi. then it is in the error term. and if
However, because in this example the time that the test it is correlated with X1. then the error term is correlated
is administered is uncorrelated with the student-teacher with X1. Because U1 and X1 are correlated, the conditional
ratio, the student-teacher ratio could not be incorrectly mean of u, given X, is nonzero. This correlation therefore
picking up the "time of day0 effect. Thus omitting the time violates the first least squares assumption, and the conse
of day of the test does not result in omitted variable bias. quence is serious: The OLS estimator is biased. This bias
does not vanish even in very large samples, and the OLS
Example #3: Parking Lot Space per Pup/I estimator is inconsistent.
Another omitted variable is parking lot space per pupil
(the area of the teacher parking lot divided by the num A Formula for Omitted Varlable Blas
ber of students). This variable satisfies the first but not
The discussion of the previous section about omitted
the second condition for omitted variable bias. Specifi
variable bias can be summarized mathematically by a for
=
cally, schools with more teachers per pupil probably have
mula for this bias. Let the correlation between X1 and u, be
more teacher parking space, so the first condition would
corr(Xi,u1) PXll· Suppose that the second and third least
be satisfied. However, under the assumption that learning
squares assumptions hold, but the first does not because
takes place in the classroom, not the parking lot, parking
P.kli is nonzero. Then the OLS estimator has the limit
lot space has no direct effect on learning; thus the second
condition does not hold. Because parking lot space per (8.1)
pupil is not a determinant of test scores. omitting it from
the analysis does not lead to omitted variable bias.
That is, as the sample size increases, p, is close to
Omitted variable bias is summarized in Box 8-1. �1 + pxJ.au/ax) with increasingly high probability.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The formula in Equation (8.1) summarizes several of the the error term (u), so p� < 0 and the coefficient on
ideas discussed above about omitted variable bias: the student-teacher ratio �1 would be biased toward
a negative number. In other words, having a small
1. Omitted variable bias is a problem whether the
sample size is large or small. Because �1 does not percentage of English learners is associated both with
converge in probability to the true value IJ1, p, is
high test scores and low student-teacher ratios, so
inconsistent; that is, �1 is not a consistent estimator of one reason that the OLS estimator suggests that small
classes improve test scores may be that the districts
IJ1 when there is omitted variable bias. The term piJ.au/
with small classes have fewer English learners.
ax) in Equation (8.1) is the bias in �1 that persists even
in large samples.
2. Whether this bias is large or small in practice depends
Addressing Omitted Variable
on the correlation Pxu between the regressor and the Blas by Dividing the Data Into Groups
error term. The larger is I Pxu I. the larger is the bias. What can you do about omitted variable bias? Our super
J. The direction of the bias in �1 depends on whether intendent is considering increasing the number of teach
X and u are positively or negatively correlated. For ers in her district, but she has no control over the fraction
example, we speculated that the percentage of stu of immigrants in her community. As a result, she is inter
dents learning English has a negative effect on district ested in the effect of the student-teacher ratio on test
test scores (students still learning English have lower scores, holding constant other factors, including the per
scores), so that the percentage of English learners centage of English learners. This new way of posing her
enters the error term with a negative sign. In our data, question suggests that, instead of using data for all dis
the fraction of English learners is positively correlated tricts, perhaps we should focus on districts with percent
with the student-teacher ratio (districts with more ages of English learners comparable to hers. Among this
English learners have larger classes). Thus the student subset of districts, do those with smaller classes do better
teacher ratio (X) would be negatively correlated with on standardized tests?
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
lfe1!:1!j:ell Differences in Test Scores for California School Districts with Low and High Student-Teacher
Ratios, by the Percentage of English Learners in the District
Average Average
Test Score n Test Score n Difference t-Statistic
All districts 657.4 238 650.0 182 7.4 4.04
Table 8-1 reports evidence on the relationship between different picture. Of the districts with the fewest English
class size and test scores within districts with comparable learners (< 1.9%), the average test score for those 76 with
percentages of English learners. Districts are divided into low student-teacher ratios is 664.5 and the average for
eight groups. First, the districts are broken into four cat the 27 with high student-teacher ratios is 665.4. Thus, for
egories that correspond to the quartiles of the distribu· the districts with the fewest English learners, test scores
tion of the percentage of English learners across districts. were on average 0.9 points lower in the districts with low
Second, within each of these four categories, districts student-teacher ratios! In the second quartile, districts
are further broken down into two groups, depending on with low student-teacher ratios had test scores that
whether the student-teacher ratio is small (STR < 20) or averaged 3.3 points higher than those with high
large (STR � 20). student-teacher ratios; this gap was 5.2 points for the
The first row in Table 8-1 reports the overall difference in third quartile and only 1.9 points for the quartile of dis
average test scores between districts with low and high tricts with the most English leamers. Once we hold the
student-teacher ratios, that is, the difference in test scores percentage of English learners constant, the difference in
between these two groups without breaking them down performance between districts with high and low student
further into the quartiles of English learners. (Recall that teacher ratios is perhaps half (or less) of the overall esti
this difference was previously reported in regression form mate of 7.4 points.
in Equation (7.18) as the OLS estimate of the coefficient At first this finding might seem puzzling. How can the
on D1 in the regression of TestScore on Di> where D, is a overall effect of test scores be twice the effect of test
binary regressor that equals 1 if STR1 < 20 and equals 0 scores within any quartile? The answer is that the districts
otherwise.) Over the full sample of 420 districts, the aver with the most English learners tend to have both the
age test score is 7.4 points higher in districts with a low highest student-teacher ratios and the lowest test scores.
student-teacher ratio than a high one; the t-statistic The difference in the average test score between districts
is 4.04, so the null hypothesis that the mean test in the lowest and highest quartile of the percentage of
score is the same in the two groups is rejected at the English learners is large, approximately 30 points. The
1% significance level. districts with few English learners tend to have lower
The final four rows in Table 8-1 report the difference in student-teacher ratios: 74% (76 of 103) of the districts
test scores between districts with low and high student in the first quartile of English learners have small classes
teacher ratios, broken down by the quartile of the per (STR < 20), while only 42% (44 of105) of the districts
centage of English learners. This evidence presents a in the quartile with the most English learners have small
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
classes. So, the districts with the most English learners model. The coefficient Po is the Intercept, the coeffi-
have both lower test scores and higher student-teacher cient !Ji is the slope coefficient of Xv or, more simply, the
ratios than the other districts. coefficient on X111 and the coefficient lh is the slope coef.
flclent of X11 or, more simply, the coefficient on X11- One or
This analysis reinforces the superintendent's worry that
omitted variable bias is present in the regression of test more of the independent variables in the multiple regres
scores against the student-teacher ratio. By looking within sion model are sometimes referred to as control varlables.
quartiles of the percentage of English learners, the test The interpretation of the coefficient p1 in Equation (8.2)
score differences in the second part of Table 8-1 improve is different than it was when Xv was the only regressor: In
upon the simple difference-of-means analysis in the first Equation (8.2), IJ1 is the effect on Yof a unit change in X1,
line of Table 8-1. Still, this analysis does not yet provide the holdlng X2 constant or controlllng for X2•
superintendent with a useful estimate of the effect on test
This interpretation of IJ1 follows from the definition that
scores of changing class size, holding constant the frac
the expected effect on Y of a change in X1, AX1, hold-
tion of English learners. Such an estimate can be provided,
ing X2 constant, is the difference between the expected
however, using the method of multiple regression.
value of Ywhen the independent variables take on the
values X1 + AX1 and X2 and the expected value of Ywhen
the independent variables take on the values X1 and X2•
THE MULTIPLE REGRESSION MODEL
Accordingly, write the population regression function in
Equation (8.2) as Y= Po + P,X1 + p�2• and imagine chang
ing X1 by the amount U. while not changing X2, that is,
The multlple regression model extends the single vari
able regression model of Chapters 6 and 7 to include
while holding X2 constant. Because X1 has changed, Ywill
change by some amount, say /1Y. After this change, the
additional variables as regressors. This model permits
estimating the effect on Y1 of changing one variable ()(11)
new value of Y, Y + /1Y. is
while holding the other regressors (X'lf, X31, and so forth)
constant. In the class size problem, the multiple regression Y + /1Y = Po + J:J,(X, + AX,) + P�2 (8.3)
model provides a way to isolate the effect on test scores An equation for A.Yin terms of AX1 is obtained by sub
O'f) of the student-teacher ratio ()(11) while holding con tracting the equation Y = Po + p,x1 + P2X2 from Eq ua
stant the percentage of students in the district who are tion (8.3), yielding AY = p,AX1• That is,
English learners ()(21).
!Ji = :;, . holding X2 constant (8.4)
The Populatlon Regression Line
The coefficient p1 is the effe<:t on Y (the expected change
Suppose for the moment that there are only two indepen in Y) of a unit change in x,. holding X2 fixed. Another
dent variables, Xv and X21" In the linear multiple regression phrase used to describe P, is the partlal •ffect on Y of x,,
model, the average relationship between these two inde holding X2 fixed.
pendent variables and the dependent variable, Y, is given
The interpretation of the intercept in the multiple regres
sion model, J30i is similar to the interpretation of the inter
by the linear function
E(Y; IX11 = X,, X21 = X2) = !Jo+ PiX1 + P:iX2 (8.2) cept in the single-regressor model: It is the expected value
where E(Yi IX11 = x1, X'll = x2) is the conditional expectation of Yi when X11 and X21 are zero. Simply put, the intercept Po
of Yi given that Xu = x, and x')j = X2- That is, if the student determines how far up the Yaxis the population regres
teacher ratio in the ith district (X11) equals some value x1 sion line starts.
and the percentage of English learners in the ,th district
()(21) equals x,, then the expected value of Yi given the The Population Multiple
student-teacher ratio and the percentage of English learn Regression Model
ers is given by Equation (8.2). The population regression line in Equation (8.2) is the rela
Equation (8.2) is the populatlon regression llne or popu tionship between Y and X1 and X2 that holds on average
latlon regression function in the multiple regression in the population. Just as in the case of regression with a
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The variable XOJ is sometimes called the constant ragras x11,. var(u, IXl/I . . . , Xkl), is constant for i = 1, . . . , n and thus
sor because it takes on the same value-the value 1-for
does not depend on the values of Xv. . . . , Xt<r· Otherwise,
all observations. Similarly, the intercept, 130, is sometimes
the error term is heteroskedllstlc.
called the constant term in the regression.
The multiple regression model holds out the promise of
The two ways of writing the population regression model,
providing just what the superintendent wants to know:
Equations (8.5) and (8.6), are equivalent.
the effect of changing the student-teacher ratio, holding
The discussion so far has focused on the case of a single constant other factors that are beyond her control. These
additional variable, X2• In practice, however, there might be factors include not just the percentage of English learn
multiple factors omitted from the single-regressor model. ers. but other measurable factors that might affect test
For example. ignoring the students' economic background performance. including the economic background of the
might result in omitted variable bias, just as ignoring the students. To be of practical help to the superintendent,
fraction of English learners did. This reasoning leads us to however. we need to provide her with estimates of the
consider a model with three regressors or, more generally, unknown population coefficients fJo, . . . , pk of the popula
a model that includes k regressors. The multiple regres tion regression model calculated using a sample of data.
sion model with k regressors, Xl/I X21, , Xk,, is summa
• • • Fortunately, these coefficients can be estimated using
rized as Box 8-2. ordinary least squares.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
=
is that these coefficients can be estimated by minimiz
ing the sum of squared prediction mistakes, that is, by
computed from a sample of n observations of (X11, • • • ,
Xk/J Yj), i 1, . . . , n. These are estimators of the unknown
choosing the estimators b0 and b1 so as to minimize true population coefficients P0o (J1, • • • , fJ1e and error
�;'..1 CY, - b0 - b.X1)2• The estimators that do so are the term, Ur
OLS estimators, Po and p1•
The method of OLS also can be used to estimate the coef
ficients Po. p,, . . . , P1e in the multiple regression model.
Let b0, b1, • • • , bk be estimators of p0, p,, . . . , P1r The pre are satisfied that you have minimized the total sum of
dicted value of Y,. calculated using these estimators, is squares in Equation (8.8). It is far easier, however, to use
b0 + biJ<i1 + · · · + bi.)(11� and the mistake in predicting Y, is explicit formulas for the OLS estimators that are derived
Y, - (be + b.X11 + · · · + b,,Xk,) = Y, - bo - b.X11 - · · • - b,,Xk� using calculus. The formulas for the OLS estimators in the
The sum of these squared prediction mistakes over all multiple regression model are similar to those in Box 6-2
n observations thus is for the single-regressor model. These formulas are incor
n
porated into modern statistical software. In the multiple
L <Yi - ba - ¥11 - · • • - b,,xk,)2 ca.a> regression model, the formulas for general k are best
i=l
expressed and discussed using matrix notation, so their
The sum of the squared mistakes for the linear regression presentation is omitted.
model in Equation (8.8) is the extension of the sum of the The definitions and terminology of OLS in multiple regres
squared mistakes given in Equation (6.6) for the linear sion are summarized in Box 8-3.
regression model with a single regressor.
The estimators of the coefficients p0, p,, . . . , pk that mini Appllcatlon to Test Scores
mize the sum of squared mistakes in Equation (8.8) are and the Student-Teacher Ratio
called the ordinary least squares (OLS) estimators of p0,
Earlier, we used OLS to estimate the intercept and
p,, . . . , P1e· The OLS estimators are denoted p0, �,, . • • , Pk·
slope coefficient of the regression relating test scores
The terminology of OLS in the linear multiple regression (TestScore) to the student-teacher ratio (STR), using
model is the same as in the linear regression model with our 420 observations for California school districts;
a single regressor. The OLS regression llne is the straight the estimated OLS regression line, reported in
line constructed using the OLS estimators: Po + p,x, + Equation (6.11), is
· · · + PkX1r The predicted value of Y, given Xl/t . . . , Xkr•
=
based on the OLS regression line, is Yi Po + p1X11 +
PkX1er· The OLS resldual for the /h observation is the differ
+ · · ·
---
TestScore = 698.9 - 2.28 x STR (8.11)
ence between Y, and its OLS predicted value, that is, the
=
Our concern has been that this relationship is misleading
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
We are now in a position to address this concern by which is what the superintendent needs to make her deci
using OLS to estimate a multiple regression in which the sion. Second, it readily extends to more than two regres
dependent variable is the test score ('fi) and there are sors, so that multiple regression can be used to control
two regressors: the student-teacher ratio ()(11) and the for measurable factors other than just the percentage of
percentage of English learners in the school district (X'JJ) English learners.
for our 420 districts (i = 1, . . . , 420). The estimated OLS
The rest of this chapter is devoted to understanding and
regression line for this multiple regression is
to using OLS in the multiple regression model. Much of
-
TestScore = 686.0 - 1.10 x STR - 0.65 x PctEL (8.12) what you learned about the OLS estimator with a single
regressor carries over to multiple regression with few or
where PctEL is the percentage of students in the district no modifications, so we will focus on that which is new
who are English learners. The OLS estimate of the inter with multiple regression. We begin by discussing mea
cept (p0) is 686.0, the OLS estimate of the coefficient on sures of fit for the multiple regression model.
the student-teacher ratio ( �1 ) is -1.10, and the OLS esti
mate of the coefficient on the percentage English learners
(�2) is -0.65. MEASURES OF FIT I N MULTIPLE
The estimated effect on test scores of a change in the REGRESSION
student-teacher ratio in the multiple regression is approxi
mately half as large as when the student-teacher ratio Three commonly used summary statistics in multiple
is the only regressor: in the single-regressor equation regression are the standard error of the regression, the
2
regression R2, and the adjusted R2 (also known as R ). All
[Equation (8.11)], a unit decrease in the STR is estimated
to increase test scores by 2.28 points, but in the multiple three statistics measure how well the OLS estimate of the
regression equation [Equation (8.12)], it is estimated to multiple regression line describes, or "fits," the data.
increase test scores by only 1.10 points. This difference
occurs because the coefficient on STR in the multiple The Standard Error
regression is the effect of a change in STR, holding con
of the Regression (SER)
stant (or controlling for) PctEL, whereas in the single
regressor regression, PctEL is not held constant. The standard error of the regression (SER) estimates the
standard deviation of the error term u1• Thus, the SER is a
These two estimates can be reconciled by concluding that
measure of the spread of the distribution of Yaround the
there is omitted variable bias in the estimate in the single
regression line. In multiple regression, the SER is
regressor model in Equation (8.11). Previously, we saw that
districts with a high percentage of English learners tend SER = s,; , where � =
-2 1 �
n A2 = SSR
(8.13)
u;
to have not only low test scores but also a high student n - k- 1 ,_, n -k -1
teacher ratio. If the fraction of English learners is omitted where the SSR is the sum of squared residuals,
from the regression, reducing the student-teacher ratio SSR l';7.1 Uf.
=
is estimated to have a larger effect on test scores, but The only difference between the definition in Equa-
this estimate reflects both the effect of a change in the tion (8.13) and the definition of the SER in Chapter 6
student-teacher ratio and the omitted effect of having for the single-regressor model is that here the divisor is
fewer English learners in the district. n - k - 1 rather than n - 2. In Chapter 6, the divisor n - 2
We have reached the same conclusion that there is omit (rather than n) adjusts for the downward bias introduced
ted variable bias in the relationship between test scores by estimating two coefficients (the slope and intercept
and the student-teacher ratio by two different paths: the of the regression line). Here, the divisor n - k - 1 adjusts
tabular approach of dividing the data into groups and for the downward bias introduced by estimating k + 1
the multiple regression approach [Equation (8.12)]. Of coefficients (the k slope coefficients plus the intercept).
these two methods, multiple regression has two important As in Chapter 6, using n - k - 1 rather than n is called a
advantages. First, it provides a quantitative estimate of degrees-of-freedom adjustment. If there is a single regres
the effect of a unit decrease in the student-teacher ratio, sor, then k = 1, so the formula in Chapter 6 is the same
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
as in Equation (8.13). When n is large, the effect of the The difference between this formula and the second defi
degrees-of-freedom adjustment is negligible. nition of the R2 in Equation (8.14) is that the ratio of the
sum of squared residuals to the total sum of squares is
The R2 multiplied by the factor (n - 1)/(n - k - 1). As the second
The regression /12 is the fraction of the sample variance expression in Equation (8.15) shows, this means that the
of Y, explained by (or predicted by) the regressors. Equiv adjusted R2 is 1 minus the ratio of the sample variance of
alently, the R2 is 1 minus the fraction of the variance of Y, the OLS residuals [with the degrees-of-freedom correc
tion in Equation (8.13)] to the sample variance of Y.
There are three useful things to know about the R2. First,
not explained by the regressors.
The mathematical definition of the R2 is the same as for (n - 1)/(n - k - 1) is always greater than 1, so R2 is always
regression with a single regressor: less than R2.
ESS SSR
TSS TSS
R2 = =1_ (8.14) Second, adding a regressor has two opposite effects on
the R2. On the one hand, the SSR falls, which increases
=
where the explained sum of squares is ESS �-1 CY; - Y)2
and the total sum of squares is TSS l:?-1 (Yi Y)2.
the R2. On the other hand, the factor (n - 1)/(n - k - 1)
=
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
precise if they are made using the regression with both Assumption #2: (X,,. X211 • • • , Xk/J Y,),
n Are 1.1.d.
STR and PctEL than if they are made using the regression
i 1,
=
= • • . ,
with only STR as a regressor.
The second assumption is that (X1/J , Xk� Y1), i 1, . . . , n
Using the R2 and Adjusted R2
• . .
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
handles perfect multicollinearity, if you try to estimate this THE DISTRIBUTION OF THE OLS
regression the software will do one of three things: (1) It
ESTIMATORS IN MULTIPLE
will drop one of the occurrences of STR; (2) it will refuse
to calculate the OLS estimates and give an error message;
REGRESSION
or (3) it will crash the computer. The mathematical reason
Because the data differ from one sample to the next, dif
for this failure is that perfect multicollinearity produces
ferent samples produce different values of the OLS esti
division by zero in the OLS formulas.
mators. This variation across possible samples gives rise
At an intuitive level, perfect multicollinearity is a prob to the uncertainty associated with the OLS estimators of
lem because you are asking the regression to answer an the population regression coefficients, Po. p,, . . . , Pk· Just
illogical question. In multiple regression, the coefficient as in the case of regression with a single regressor, this
on one of the regressors is the effect of a change in that variation is summarized in the sampling distribution of the
regressor, holding the other regressors constant. In the OLS estimators.
hypothetical regression of TestScore on STR and STR, the
Recall from Chapter 6 that, under the least squares
coefficient on the first occurrence of STR is the effect on
assumptions, the OLS estimators <Po and p,) are unbiased
test scores of a change in STR, holding constant STR. This
and consistent estimators of the unknown coefficients (�0
makes no sense, and OLS cannot estimate this nonsensi
and !'J.1) in the linear regression model with a single regres
cal partial effect.
sor. In addition, in large samples, the sampling distribution
The solution to perfect multicollinearity in this hypotheti of Po and p1 is well approximated by a bivariate normal
cal regression is simply to correct the typo and to replace distribution.
one of the occurrences of STR with the variable you origi
These results carry over to multiple regression analysis.
nally wanted to include. This example is typical: When
That is, under the least squares assumptions of Box 8-4,
perfect multicollinearity occurs, it often reflects a logical
the OLS estimators p0, p1, . • • , Pk are unbiased and con
mistake in choosing the regressors or some previously
sistent estimators of �. p,, . . . , p11 in the linear multiple
unrecognized feature of the data set. In general, the solu
regression model. In large samples, the joint sampling dis
tion to perfect multicollinearity is to modify the regressors
tribution of fl,0, fl,1, • • • , Pk is well approximated by a mul
to eliminate the problem.
tivariate normal distribution, which is the extension of the
Additional examples of perfect multicollinearity are given bivariate normal distribution to the general case of two or
later in this chapter, which also defines and discusses more jointly normal random variables.
imperfect multicollinearity.
Although the algebra is more complicated when there
The least squares assumptions for the multiple regression are multiple regressors, the central limit theorem applies
model are summarized in Box 8-4. to the OLS estimators in the multiple regression model
for the same reason that it applies to Y and to the OLS
l:(.}!j:ii The Least Squares estimators when there is a single regressor: The OLS
Assumptions in the Multiple estimators Po. p,, . . . , Pk are averages of the randomly
Regression Model sampled data, and if the sample size is sufficiently large
Y; = � + PiX11 + W21 +
the sampling distribution of those averages becomes
. . . + Pi.X111 + u/O i = 1, . . . , n, where
normal. Because the multivariate normal distribution is
1. u, has conditional mean zero given X11o X21, , XJti,
• • •
best handled mathematically using matrix algebra, the
that is,
xk,) = 0
expressions for the joint distribution of the OLS estima
=
E(u,IX11>X21> • . . •
tors are omitted.
2. CX11t X21> • , X111, Y,), i 1, . . . , n are independently
. .
and identically distributed (i.i.d.) draws from their Box 8-5 summarizes the result that, in large samples, the
joint distribution. distribution of the OLS estimators in multiple regression is
I. Large outliers are unlikely: X11> • . . , X111 and Y, have approximately jointly normal. In general, the OLS estima
nonzero finite fourth moments. tors are correlated; this correlation arises from the correla
4. There is no perfect multicollinearity. tion between the regressors.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Example #7: Fraction of English Learners Example 113: Percentage of English Speakers
Let FracEL1 be the fraction of English learners in the 11t1 dis Let PctES, be the percentage of "English speakers" in
trict. which varies between 0 and l . If the variable FracEL, the '"ltt district, defined to be the percentage of students
were included as a third regressor in addition to STR1 and who are not English learners. Again the regressors will
PctEL" the regressors would be perfectly multicollinear. be perfectly multicollinear. Like the previous example,
The reason is that PctEL is the percentage of English the perfect linear relationship among the regressors
learners, so that PctEL1 = 100 x FracEL1 for every district. involves the constant regressor Xm: For every district,
Thus one of the regressors (PctEL1) can be written as a PctES1 = 100 X Xa1 - PctELr.
perfect linear function of another regressor (FracEL1). This example illustrates another point: perfect multicol
Because of this perfect multicollinearity, it is impossible linearity is a feature of the entire set of regressors. If
to compute the OLS estimates of the regression of Test either the intercept (i.e., the regressor X01) or PctEL1 were
Score, on STR" PctEL" and FracELi- At an intuitive level, excluded from this regression, the regressors would not
OLS fails because you are asking, What is the effect of a be perfectly multicollinear.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The Dummy Variable Trap your regression to eliminate it. Some software is unreli
able when there is perfect multicollinearity, and at a
Another possible source of perfect multicollinearity arises
minimum you will be ceding control over your choice
when multiple binary, or dummy, variables are used as
of regressors to your computer if your regressors are
regressors. For example, suppose you have partitioned
perfectly multicollinear.
the school districts into three categories: rural, subur
ban, and urban. Each district falls into one (and only
one) category. Let these binary variables be Rural,, which Imperfect Multlcolllnearlty
equals 1 for a rural district and equals 0 otherwise; Subur Despite its similar name, imperfect multicollinearity is
ban,, and Urban,. If you include all three binary variables conceptually quite different than perfect multicollinearity.
in the regression along with a constant, the regressors Imperfect multlcollln•rlty means that two or more of the
will be perfect multicollinearity: Because each district regressors are highly correlated, in the sense that there is
belongs to one and only one category, Rural; + Subur a linear function of the regressors that is highly correlated
ban, + Urban, = 1 = X0,, where X01 denotes the constant with another regressor. Imperfect multicollinearity does
regressor introduced in Equation (8.6). Thus, to estimate not pose any problems for the theory of the OLS estima
the regression, you must exclude one of these four vari tors; indeed, a purpose of OLS is to sort out the inde
ables, either one of the binary indicators or the constant pendent influences of the various regressors when these
term. By convention, the constant term is retained, in regressors are potentially correlated.
which case one of the binary indicators is excluded. For
If the regressors are imperfectly multicollinear, then the
example, if Rural, were excluded, then the coefficient on
coefficients on at least one individual regressor will be
Suburban, would be the average difference between test
imprecisely estimated. For example, consider the regres
scores in suburban and rural districts, holding constant
sion of TestScore on STR and PctEL. Suppose we were
the other variables in the regression.
to add a third regressor, the percentage the district's
In general, if there are G binary variables. if each obser residents who are first-generation immigrants. First
vation falls into one and only one category, if there is an generation immigrants often speak English as a second
intercept in the regression, and if all G binary variables language, so the variables PctEL and percentage immi
are included as regressors, then the regression will fail grants will be highly correlated: Districts with many
because of perfect multicollinearity. This situation is recent immigrants will tend to have many students who
called the dummy varlable t,..p, The usual way to avoid are still learning English. Because these two variables are
the dummy variable trap is to exclude one of the binary highly correlated, it would be difficult to use these data
variables from the multiple regression, so only G - 1 of the to estimate the partial effect on test scores of an increase
G binary variables are included as regressors. In this case, In PctEL, holding constant the percentage immigrants.
the coefficients on the included binary variables represent In other words, the data set provides little information
the incremental effect of being in that category, relative about what happens to test scores when the percentage
to the base case of the omitted category, holding con of English learners is low but the fraction of immigrants is
stant the other regressors. Alternatively, all G binary high, or vice versa. If the least squares assumptions hold,
regressors can be included if the intercept is omitted from then the OLS estimator of the coefficient on PctEL in this
the regression. regression will be unbiased; however, it will have a larger
variance than if the regressors PctEL and percentage
Solutions to Perfect Mult/col/lnearlty immigrants were uncorrelated.
Perfect multicollinearity typically arises when a mistake The effect of imperfect multicollinearity on the variance
has been made in specifying the regression. Sometimes of the OLS estimators can be seen mathematically by
the mistake is easy to spot (as in the first example) but inspecting the variance of p1 in a multiple regression with
sometimes it is not (as in the second example). In one way two regressors (X1 and X2) for the special case of a homo
or another your software will let you know if you make skedastic error.' In this case, the variance of �1 is inversely
such a mistake because it cannot compute the OLS esti
mator if you have.
When your software lets you know that you have per
fect multicollinearity, it is important that you modify
1ai =.!n [
�· 1 -
, ]�
P1,.x, a1,
2017 FinBncial RiskManager (FRM) Pelt I: Quantifa6119 Analysis, Seventh Ediion by Global Aleociaticn of Riek Pn:Jfeaaional•.
Copyright O 2017 by Pean10n Edue11lion, Inc. All Rights�. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Construct, apply, and interpret hypothesis tests • Interpret tests of a single restriction involving
and confidence intervals for a single coefficient in a multiple coefficients.
multiple regression. • Interpret confidence sets for multiple coefficients.
• Construct, apply, and interpret joint hypothesis tests • Identity examples of omitted variable bias in multiple
and confidence intervals for multiple coefficients in a regressions.
multiple regression. • Interpret the ffl and adjusted·R2 in a multiple
• Interpret the F-statistic. regression.
i Chapter 7 of Introduction to Econometrics, Brief Edition, by James H. Stock and Mark W. Watson.
Excerpt s
123
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
= =
error, how to test hypotheses, and how to construct no effect on class size corresponds to the null hypothesis
confidence intervals for a single coefficient in a multiple that 131 0 (so �1,0 0). Our task is to test the null hypoth
regression equation. esis H0 against the alternative H, using a sample of data.
Box 7-2 gives a procedure for testing this null hypothesis
Standard Errors for the OLS Estimators when there is a single regressor. The first step in this pro
Recall that, in the case of a single regressor, it was possi cedure is to calculate the standard error of the coefficient.
ble to estimate the variance of the OLS estimator by sub The second step is to calculate the t-statistic using the
stituting sample averages for expectations, which led to general formula in Box 7-1. The third step is to compute
the estimator al, given in Equation (7.4). Under the least the p-value of the test using the cumulative normal distri
squares assumptions, the law of large numbers implies bution in Appendix Table 1 on page 251 or, alternatively, to
that these sample averages converge to their population compare the t-statistic to the critical value corresponding
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
rejected at the 5% significance level (but not quite at the provides no evidence that hiring more teachers
1% significance level). improves test scores if overall expenditures per pupil
are held constant.
A 95% confidence interval for the population coefficient
on STR is -1.10 ± 1.96 x 0.43 = (-1.95, -0.26); that is, One interpretation of the regression in Equation (9.6) is
we can be 95% confident that the true value of the coef that, in these California data, school administrators allo
ficient is between -1.95 and -0.26. Interpreted in the cate their budgets efficiently. Suppose, counterfactually,
context of the superintendent's interest in decreasing the that the coefficient on STR in Equation (9.6) were nega
student-teacher ratio by 2, the 95% confidence interval tive and large. If so, school districts could raise their test
for the effect on test scores of this reduction is (-1 .95 x 2, scores simply by decreasing funding for other purposes
-0.26 x 2) = (-3.90, -0.52). (textbooks, technology, sports, and so on) and transfer
ring those funds to hire more teachers, thereby reducing
Adding Expenditures per Pupil to the Equation class sizes while holding expenditures constant. However,
Your analysis of the multiple regression in Equation (9.5) the small and statistically insignificant coefficient on STR
has persuaded the superintendent that, based on the evi in Equation (9.6) indicates that this transfer would have
dence so far; reducing class size will help test scores in her little effect on test scores. Put differently, districts are
district. Now, however, she moves on to a more nuanced already allocating their funds efficiently.
question. If she is to hire more teachers, she can pay Note that the standard error on STR increased when Expn
for those teachers either through cuts elsewhere in the was added, from 0.43 in Equation (9.5) to 0.48 in Equa
budget (no new computers, reduced maintenance, and tion (9.6). This illustrates the general point, introduced
so on), or by asking for an increase in her budget, which in Chapter 8 in the context of imperfect multicollinear
taxpayers do not favor. What, she asks, is the effect on ity, that correlation between regressors (the correlation
test scores of reducing the student-teacher ratio, hold between STR and Expn is -0.62) can make the OLS esti
ing expenditures per pupil (and the percentage of English mators less precise.
learners) constant?
What about our angry taxpayer? He asserts that the
This question can be addressed by estimating a regression population values of both the coefficient on the student
of test scores on the student-teacher ratio, total spending teacher ratio (Iii,) and the coefficient on spending per
per pupil, and the percentage of English learners. The OLS pupil (P,2) are zero, that is, he hypothesizes that both
regression line is Iii, = 0 and p.2 = 0. Although it might seem that we can
- reject this hypothesis because the t-statistic testing P,2 = 0
TestScore = 649.6 - 0.29 X STR + 3.87 in Equation (9.6) is t = 3.87/1.59 = 2.43, this reasoning is
(15.5) (0.48) (1.59)
flawed. The taxpayer's hypothesis is a joint hypothesis,
x Expn - 0.656 x PctEL (9.&) and to test it we need a new tool, the F-statistic.
(0.032)
where Expn is total annual expenditures per pupil in the TESTS OF JOINT HYPOTHESES
district in thousands of dollars.
The result is striking. Holding expenditures per pupil This section describes how to formulate joint hypotheses
and the percentage of English learners constant, chang on multiple regression coefficients and how to test them
ing the student-teacher ratio is estimated to have a very using an F-statistic.
small effect on test scores: The estimated coefficient on
STR is -1.10 in Equation (9.5) but, after adding Expn as Testing Hypotheses on Two
a regressor in Equation (9.6), it is only -0.29. Moreover, or More Coefficients
the t-statistic for testing that the true value of the coef Joint Null Hypotheses
ficient is zero is now t = (-0.29 - 0)/0.48 = -0.60, so
the hypothesis that the population value of this coef Consider the regression in Equation (9.6) of the test
ficient is indeed zero cannot be rejected even at the 10% score against the student-teacher ratio, expenditures
significance level < l -0.60 I < 1.645). Thus Equation (9.6) per pupil, and the percentage of English learners. Our
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
angry taxpayer hypothesizes that neither the student t-statistic for testing the null hypothesis that 131 = 0, and
teacher ratio nor expenditures per pupil have an effect let t2. be the t-statistic for testing the null hypothesis that
on test scores, once we control for the percentage of 132 = 0. What happens when you use the "one at a time"
English learners. Because STR is the first regressor in testing procedure: Reject the joint null hypothesis if either
Equation (9.6) and Expn is the second, we can write this t1 or t2 exceeds 1.96 in absolute value?
hypothesis mathematically as Because this question involves the two random variables t1
H0: p1 = 0 and p2 = 0 vs. H1: 131 '* 0 and/or 132 .P 0 (9.7) and t2, answering it requires characterizing the joint sam
pling distribution of t1 and t2• As mentioned previously, in
The hypothesis that both the coefficient on the student
large samples ;31 and j32 have a joint normal distribution,
teacher ratio (p1) and the coefficient on expenditures per
so under the joint null hypothesis the t-statistics t1 and t2
pupil (p2) are zero is an example of a joint hypothesis on
have a bivariate normal distribution, where each t-statistic
the coefficients in the multiple regression model. In this
has mean equal to 0 and variance equal to 1.
case, the null hypothesis restricts the value of two of the
coefficients, so as a matter of terminology we can say that First consider the special case in which the t-statistics
the null hypothesis in Equation (9.7) imposes two restric are uncorrelated and thus are independent. What is the
tions on the multiple regression model: P1 = 0 and P2 = 0. size of the "one at a time" testing procedure; that is, what
In general, a Joint hypothesis is a hypothesis that imposes is the probability that you will reject the null hypothesis
two or more restrictions on the regression coefficients. when it is true? More than 5%! In this special case we can
We consider joint null and alternative hypotheses of calculate the rejection probability of this method exactly.
the form The null is not rejected only if both I t1 I s 1.96 and
I t2 I s 1.96. Because the t-statistics are independent.
H0: P; = P.;.oo Pm = P,,,,0
• . • • • for a total of q restrictions, vs. Pr( I t1 I s 1.96 and I t2 I s 1.96) = Pr( I t1 I s 1.96) x
Pr( I t2 I s 1.96) = 0.95 = 0.9025 = 90.25%. So the prob
2
H1: one or more of the q restrictions under H0 does
not hold, (9.8) ability of rejecting the null hypothesis when it is true is
2
1 - 0.95 = 9.75%. This "one at a time" method rejects the
where Pr Pm• . . . , refer to different regression coefficients,
null too often because it gives you too many chances: If
and P.,;o• P,,,,O' . . . , refer to the values of these coefficients
you fail to reject using the first t-statistic, you get to try
under the null hypothesis. The null hypothesis in Equa
again using the second.
tion (9.7) is an example of Equation (9.8). Another exam
ple is that, in a regression with k = 6 regressors, the null If the regressors are correlated, the situation is even more
hypothesis is that the coefficients on the 2nc1, 4tn, and 5111 complicated. The size of the "one at a time" procedure
regressors are zero; that is, 132 = 0, 134 = 0, and 135 = 0, so depends on the value of the correlation between the
that there are q = 3 restrictions. In general, under the null regressors. Because the "one at a time" testing approach
hypothesis H0 there are q such restrictions. has the wrong size-that is, its rejection rate under the null
If any one (or more than one) of the equalities under the hypothesis does not equal the desired significance level
a new approach is needed.
null hypothesis H0 in Equation (9.8) is false, then the joint
null hypothesis itself is false. Thus, the alternative hypoth One approach is to modify the "one at a time" method
esis is that at least one of the equalities in the null hypoth so that it uses different critical values that ensure that its
esis H0 does not hold. size equals its significance level. This method, called the
Bonferroni method, is described in the Appendix. The
Why Can't I Just Test the Individual Coefficients advantage of the Bonferroni method is that it applies very
One at a Time? generally. Its disadvantage is that it can have low power;
Although it seems it should be possible to test a joint it frequently fails to reject the null hypothesis when in fact
hypothesis by using the usual t-statistics to test the the alternative hypothesis is true.
restrictions one at a time, the following calculation shows Fortunately, there is another approach to testing joint
that this approach is unreliable. Specifically, suppose hypotheses that is more powerful, especially when the
that you are interested in testing the joint null hypoth regressors are highly correlated. That approach is based
esis in Equation (9.6) that p1 = 0 and p2 = 0. Let t1 be the on the F-statistic.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The F-Statlstlc Thus the critical values for the F-statistic can be obtained
from the tables of the Fq,.. distribution for the appropriate
The F·statlstlc is used to test joint hypothesis about value of q and the desired significance level.
regression coefficients. The formulas for the F-statistic are
integrated into modern regression software. We first dis Computing the Heteroskedastlclty-Robust
cuss the case of two restrictions, then turn to the general F-Statlstlc In Statistical Softwal'fl
case of q restrictions.
If the F-statistic is computed using the general
The F-Statlstlc with q = 2 Restrictions heteroskedasticity-robust formula, its large-n distribution
under the null hypothesis is Fq,.. regardless of whether the
When the joint null hypothesis has the two restrictions
2
that p1 = 0 and p = 0, the F-statistic combines the two
errors are homoskedastic or heteroskedastic. As discussed
t2
t-statistics t1 and using the formula
in Chapter 7, for historical reasons most statistical soft
( 2A 2i>t.t/1t2)
ware computes homoskedasticity-only standard errors
_! ff + � - by default. Consequently, in some software packages you
F= (9.9)
2 1 - Pr,,r. must select a "robust" option so that the F-statistic is
where P�.r. is an estimator of the correlation between the
computed using heteroskedasticity-robust standard errors
(and, more generally, a heteroskedasticity-robust esti
two t-statistics.
mate of the "covariance matrix"). The homoskedasticity
To understand the F-statistic in Equation (9.9), first suppose only version of the F-statistic is discussed at the end of
that we know that the t-statistics are uncorrelated so we this section.
can drop the terms involving Pt,,t,· If so, Equation (9.9) sim
plifies and F = �<fi + fi); that is, the F-statistic is the aver Computing the p-Value Usi
ng the F-Statistic
age of the squared t-statistics. Under the null hypothesis, t1
t2
and are independent standard normal random variables
The p-value of the F-statistic can be computed using the
In general the t-statistics are correlated, and the formula The p-value in Equation (9.11) can be evaluated using a
for the F-statistic in Equation (9.9) adjusts for this cor table of the F,,,.. distribution (or, alternatively, a table of
relation. This adjustment is made so that, under the null the � distribution, because a ,(q-distributed random vari
hypothesis, the F-statistic has an F2J- distribution in large able is q times an F11.,.-distributed random variable). Alter
samples whether or not the t-statistics are correlated. natively, the p-value can be evaluated using a computer,
because formulas for the cumulative chi-squared and F
The F-Statistic with q Restrictions distributions have been incorporated into most modem
statistical software.
The formula for the heteroskedasticity-robust F-statistic
testing the q restrictions of the joint null hypothesis in
Equation (9.8) is a matrix extension of the formula for the The "Overall" Regression F-Statistic
heteroskedasticity-robust t-statistic. This formula is incor The "overall" regression F-statistic tests the joint hypoth
porated into regression software, making the F-statistic esis that a// the slope coefficients are zero. That is, the null
easy to compute in practice. and alternative hypotheses are
Under the null hypothesis, the F-statistic has a sam 2
H0: �, = 0, p = 0, . . . , P., = O vs.
pling distribution that, in large samples, is given by the H1: f3i; 1' 0, at least one j, j = 1, . . . , k (9.12)
Fq,.. distribution. That is, in large samples, under the null
Under this null hypothesis, none of the regressors explains
hypothesis
any of the variation in Yr although the intercept (which
the F-statistic is distributed F11... (9.10) under the null hypothesis is the mean of "Y;) can be
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
nonzero. The null hypothesis in Equation (9.12) is a special the regression R2: A large F-statistic should, it seems,
case of the general null hypothesis in Equation (9.8), and be associated with a substantial increase in the R2. In
the overall regression F-statistic is the F-statistic com fact, if the error u; is homoskedastic, this intuition has an
puted for the null hypothesis in Equation (9.12). In large exact mathematical expression. That is, if the error term
samples, the overall regression F-statistic has an F11... distri is homoskedastic, the F-statistic can be written in terms
bution when the null hypothesis is true. of the improvement in the fit of the regression as mea
sured either by the sum of squared residuals or by the
The F-Statlstlc when q = 7 regression R2. The resulting F-statistic is referred to as
When q = 1, the F-statistic tests a single restriction. Then the homoskedasticity-only F-statistic, because it is valid
the joint null hypothesis reduces to the null hypothesis on only if the error term is homoskedastic. In contrast, the
a single regression coefficient, and the F-statistic is the heteroskedasticity-robust F-statistic is valid whether the
square of the t-statistic. error term is homoskedastic or heteroskedastic. Despite
this significant limitation of the homoskedasticity-only
Application to Test Scores F-statistic, its simple formula sheds light on what the
F-statistic is doing. In addition, the simple formula can
and the Student-Teacher Ratio
be computed using standard regression output, such as
We are now able to test the null hypothesis that the coef might be reported in a table that includes regression R2's
ficients on both the student-teacher ratio and expendi but not F-statistics.
tures per pupil are zero, against the alternative that at
The homoskedasticity-only F-statistic is computed using
least one coefficient is nonzero, controlling for the per
a simple formula based on the sum of squared residu-
centage of English learners in the district.
als from two regressions. In the first regression, called
To test this hypothesis, we need to compute the the restricted regression, the null hypothesis is forced to
heteroskedasticity-robust F-statistic of the test that �1 = 0 be true. When the null hypothesis is of the type in Equa
and �2 = 0 using the regression of TestScore on STR, Expn, tion (9.8), where all the hypothesized values are zero, the
and PctEL reported in Equation (9.6). This F-statistic is restricted regression is the regression in which those coef
5.43. Under the null hypothesis, in large samples this sta ficients are set to zero, that is, the relevant regressors are
tistic has an F2.m distribution. The 5% critical value of the excluded from the regression. In the second regression,
F2.- distribution is 3.00, and the 1% critical value is 4.61. called the unrestricted regreulon, the alternative hypoth
The value of the F-statistic computed from the data, 5.43, esis is allowed to be true. If the sum of squared residuals is
exceeds 4.61, so the null hypothesis is rejected at the 1% sufficiently smaller in the unrestricted than the restricted
level. It is very unlikely that we would have drawn a sam regression, then the test rejects the null hypothesis.
ple that produced an F-statistic as large as 5.43 if the null
The homoskedastlclty-only F-statlstlc is given by
hypothesis really were true (the p-value is 0.005). Based
the formula
on the evidence in Equation (9.6) as summarized in this
F-statistic, we can reject the taxpayer's hypothesis that (SSRrutrlcted - SSRunrestrkte<i)/q
F= <9•13>
neither the student-teacher ratio nor expenditures per SSR,,,,stricte
,., J(n - k� - 1)
pupil have an effect on test scores (holding constant the
percentage of English learners). where SSR,.._ is the sum of squared residuals from the
restricted regression, SSR,,,,__ is the sum of squared
residuals from the unrestricted regression, Q is the num
The Homoskedasticity-Only F-Statistic
ber of restrictions under the null hypothesis, and k
One way to restate the question addressed by the is the number of regressors in the unrestricted regres
F-statistic is to ask whether relaxing the q restric- sion. An alternative equivalent formula for the homo
tions that constitute the null hypothesis improves the skedasticity-only F-statistic is based on the R1 of the
fit of the regression by enough that this improvement two regressions:
is unlikely to be the result merely of random sampling
variation if the null hypothesis is true. This restatement (R2unnJStrfded - R21'&Strkt)ed Iq ----
F= -- (9.14)
suggests that there is a link between the F-statistic and (1 - R2unrestricted)/ (n - kunreWtctlld - 1 )
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
If the errors are homoskedastic, then the difference so R2restticted = 0.4149. The number of restrictions is q = 2,
between the homoskedasticity-only F-statistic computed the number of observations is n = 420, and the number
using Equation (9.13) or (9.14) and the heteroskedasticity of regressors in the unrestricted regression is k = 3. The
robust F-statistic vanishes as the sample size n increases. homoskedasticity-only F-statistic, computed using Equa
Thus, if the errors are homoskedastic, the sampling dis tion (9.14), is
tribution of the rule-of-thumb F-statistic under the null
F = [(0.4366 - 0.4149)/2]/[(1 - 0.4366)/
hypothesis is, in large samples, Fq...
(420 - 3 - 1)] = 8.01
·
restricted regression imposes the joint null hypothesis Approach #1: Test the Restriction Directly
that the true coefficients on STR and Expn are zero; that
is, under the null hypothesis STR and Expn do not enter Some statistical packages have a specialized command
the population regression, although PctEL does (the null designed to test restrictions like Equation (9.16) and
hypothesis does not restrict the coefficient on PctEL). The the result is an F-statistic that, because q = 1, has an F1,.
restricted regression, estimated by OLS, is distribution under the null hypothesis. (Recall that the
-
square of a standard normal random variable has an F,,..
TestScore = 664.7 - 0.671 x PctEL, R1 = 0.4149 distribution, so the 95% percentile of the F1,. distribution is
(1.0) (0.032) (9.15) 1.962 = 3.84.)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
This method can be extended to other restrictions on Although this method of trying all possible values of Ji1,0
regression equations using the same trick. and 1}2.0 works in theory, in practice it is much simpler to
use an explicit formula for the confidence set. This for
The two methods (Approaches #1 and #2) are equiva mula for the confidence set for an arbitrary number of
lent, in the sense that the F-statistic from the first coefficients is based on the formula for the F-statistic.
method equals the square of the t-statistic from the When there are two coefficients, the resulting confidence
second method. sets are ellipses.
As an illustration, Figure 9-1 shows a 95% confidence set
Extension to q > 7 (confidence ellipse) for the coefficients on the student
In general it is possible to have q restrictions under the teacher ratio and expenditure per pupil, holding constant
null hypothesis in which some or all of these restrictions the percentage of English learners, based on the esti
involve multiple coefficients. The F-statistic from earlier mated regression in Equation (9.6). This ellipse does not
extends to this type of joint hypothesis. The F-statistic can include the point (0,0). This means that the null hypoth
be computed by either of the two methods just discussed esis that these two coefficients are both zero is rejected
for q = 1. Precisely how best to do this in practice depends using the F-statistic at the 5% significance level, which we
on the specific regression software being used. already knew from earlier. The confidence ellipse is a fat
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
7
The OLS estimators of the coefficients in multiple
regression will have omitted variable bias if an
6
omitted determinant of Y, is correlated with at
5
least one of the regressors. For example, students
4
from affluent families often have more learning
3 opportunities than do their less affluent peers,
2 which could lead to better test scores. Moreover, if
1 the district is a wealthy one, then the schools will
0 tend to have larger budgets and lower student
-1 ������ teacher ratios. If so, the affluence of the students
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 and the student-teacher ratio would be negatively
Coeftident OD STR. Cflt) correlated, and the OLS estimate of the coeffi
iij@iliJj::ll 95% confidence set for coefficients on STR cient on the student-teacher ratio would pick up
and Expn from Equation (9.6). the effect of average district income, even after
The 95% confidence set for the coefficients on STR (�) and Expn (�2) is an controlling for the percentage of English learners.
ellipse. The ellipse contains the pairs of values of �1 and � that cannot be In short, omitting the students' economic back
rejected using the F-statistic at the 5% significance level. ground could lead to omitted variable bias in the
regression of test scores on the student-teacher
ratio and the percentage of English learners.
sausage with the long part of the sausage oriented in the
lower-left/upper-right direction. The reason for this ori The general conditions for omitted variable bias in mul
entation is that the estimated correlation between p, and tiple regression are similar to those for a single regres
�2 is positive, which in turn arises because the correlation sor: If an omitted variable is a determinant of y; and if it
between the regressors STR and Expn is negative (schools is correlated with at least one of the regressors, then the
that spend more per pupil tend to have fewer students OLS estimators will have omitted variable bias. As was
per teacher). discussed previously, the OLS estimators are correlated,
so in general the OLS estimators of all the coefficients will
be biased. The two conditions for omitted variable bias in
multiple regression are summarized in Box 9-3.
MODEL SPECIFICATION
FOR MULTIPLE REGRESSION At a mathematical level, if the two conditions for omit
ted variable bias are satisfied, then at least one of the
The job of determining which variables to include in
multiple regression-that is, the problem of choosing a
regression specification-can be quite challenging, and I:f•tfft O m itted Variable Bias
no single rule applies in all situations. But do not despair, in Multiple Regression
because some useful guidelines are available. The start Omitted variable bias is the bias in the OLS estimator
ing point for choosing a regression specification is think that arises when one or more included regressors
ing through the possible sources of omitted variable are correlated with an omitted variable. For omitted
variable bias to arise, two things must be true:
bias. It is important to rely on your expert knowledge of
the empirical problem and to focus on obtaining an unbi 1. At least one of the included regressors must be
correlated with the omitted variable.
ased estimate of the causal effect of interest; do not rely
solely on purely statistical measures of fit such as the 2. The omitted variable must be a determinant of the
dependent variable, Y.
� or R2.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
regressors is correlated with the error term. This means There are four potential pitfalls to guard against when
that the conditional expectation of u; given x1, , XID is
• • • using the R2 or R2:
nonzero, so that the first least squares assumption is vio
lated. As a result, the omitted variable bias persists even 1. An /nct'NSfl In th• R2 or IP does not neceaarl/y m•n
if the sample size is large, that is, omitted variable bias that an added variable Is statlst/t:ally signlllcant. The
implies that the OLS estimators are inconsistent. R2 increases whenever you add a regressor, whether
or not it is statistically significant. The R2 does not
Model Specification In Theory always increase, but if it does this does not necessarily
mean that the coefficient on that added regressor is
and In Practice
statistically significant. To ascertain whether an added
In theory, when data are available on the omitted vari variable is statistically significant, you need to perform
able, the solution to omitted variable bias is to include the a hypothesis test using the t-statistic.
2. A high R2 or IP does not mNn that the regreaotS
omitted variable in the regression. In practice, however,
deciding whether to include a particular variable can be are• true cause ol the dependent variable. Imagine
difficult and requires judgment. regressing test scores against parking lot area per
Our approach to the challenge of potential omitted vari pupil. Parking lot area is correlated with the student
able bias is twofold. First, a core or base set of regres teacher ratio, with whether the school is in a suburb
sors should be chosen using a combination of expert or a city, and possibly with district income-all things
judgment, economic theory, and knowledge of how the that are correlated with test scores. Thus the regres
data were collected; the regression using this base set of sion of test scores on parking lot area per pupil could
regressors is sometimes referred to as a base speclflca· have a high R2 and R2, but the relationship is not
tlon. This base specification should contain the variables causal (try telling the superintendent that the way to
of primary interest and the control variables suggested increase test scores is to increase parking space!).
by expert judgment and economic theory. Expert judg 3. A high R2 or IP does not mNn th.,. ls no omitted
ment and economic theory are rarely decisive, however, varlable blaB. Recall the discussion which concerned
and often the variables suggested by economic theory omitted variable bias in the regression of test scores
are not the ones on which you have data. Therefore the on the student-teacher ratio. The R2 of the regression
next step is to develop a list of candidate alternatlve never came up because it played no logical role in this
specifications, that is, alternative sets of regressors. If discussion. Omitted variable bias can occur in regres
the estimates of the coefficients of interest are numeri sions with a low R2, a moderate R2, or a high R2. Con
cally similar across the alternative specifications, then versely, a low R2 does not imply that there necessarily
this provides evidence that the estimates from your base is omitted variable bias.
specification are reliable. If, on the other hand, the esti
4. A high R2 or IP does not neCflllSarl/y mnn you llllw
th• mast 11PJH'OPl'iet. s.t of,.grnsws. nor do•
mates of the coefficients of interest change substantially
a low IP or Ii'- necessarily mean you ,,...,. an lnap
across specifications, this often provides evidence that
the original specification had omitted variable bias.
proprlatrl set ol regntDOtS. The question of what
constitutes the right set of regressors in multiple
Interpreting the R2 and the Adjusted regression is difficult and we return to it throughout
R2 In Practice this textbook. Decisions about the regressors must
An R2 or an R2 near 1 means that the regressors are good weigh issues of omitted variable bias, data availability,
at predicting the values of the dependent variable in the data quality, and, most importantly, economic theory
sample, and an R2 or an R2 near O means they are not. and the nature of the substantive questions being
This makes these statistics useful summaries of the pre addressed. None of these questions can be answered
dictive ability of the regression. However, it is easy to read simply by having a high (or low) regression Rl or R2.
more into them than they deserve. These points are summarized in Box 9-4.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
-: ·
• dicted change in test scores for
.,,., , �. . . .
· , ..
'�
.
.. .
660 660 a one-percentage-point increase
640 ..
.-.. ..:
� .. ., • 640 in English learners, holding STR
620
. .. � !'•
.. •
.. . ., . • constant; in the specification with
•• • 620
������� FracEL. the coefficient is the pre
600 600
50 dicted change in test scores for
Percent
0 25 50 75 100 0 25 75 too
Percent an increase by 1 in the fraction
(a) Pen:entage ofBngllih
i language learners (b) Percentage qualifying fur reduced price lunch of English learners-that is, for a
100-percentage-point-increase
'!Ht lcore holding STR constant. Although
720
these two specifications are
700 mathematically equivalent, for
680 the purposes of interpretation
the one with PctEL seems, to us,
660
more natural.
640
Another consideration when
620
• deciding on a scale is to choose
•
���� ���
600 the units of the regressors so that
0 25 50 75 100
Percent the resulting regression coef
(c) Pen:entage qualifying fur income asmCUll:e ficients are easy to read. For
14Mil;lJ:E Scatterplots of test scores vs. three student characteristics.
example, if a regressor is measured
in dollars and has a coefficient of
The scatterplots show a negative relationship between test scores and (a) the percentage of
0.00000356, it is easier to read if
English learners (correlation = -0.64). (b) the percentage of students qualifying for a sub
sidized lunch (correlation = -0.87), and (c) the percentage qualifying for income assistance the regressor is converted to mil
(correlation = -0.63). lions of dollars and the coefficient
3.56 is reported.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
lfei:l!Jitil Results of Regressions of Test Scores on the Student-Teacher Ratio and Student Characteristic
Control Variables Using California Elementary School Districts
Summary Statistics
SER 18.58 14.46 9.08 11.65 9.08
These regressions were estimated using the data on K-8 school districts in California, described in Appendix A in Chapter 6.
Standard
errors are given in parentheses under coefficients. The individual coefficient is statistically significant at the •s% level or •.,% signifi
cance level using a two-sided test.
rows are the estimated regression coefficients, with their (-2.28) appears in the first row of numerical entries, and
standard errors below them in parentheses. The asterisks its standard error (0.52) appears in parentheses just below
indicate whether the t-statistics, testing the hypothesis the estimated coefficient. The intercept (698.9) and its
that the relevant coefficient is zero, is significant at the standard error (10.4) are given in the row labeled "Inter
5% level (one asterisk) or the 1% level (two asterisks). The cept." (Sometimes you will see this row labeled "constant"
final three rows contain summary statistics for the regres because, as discussed earlier, the intercept can be viewed
sion (the standard error of the regression, SER, and the as the coefficient on a regressor that is always equal to 1.)
adjusted R2, R2) and the sample size (which is the same Similarly, the R2 (0.049), the SER (18.58), and the sample
for all of the regressions, 420 observations). size n (420) appear in the final rows. The blank entries in
the rows of the other regressors indicate that those regres
All the information that we have presented so far in equa
tion format appears as a column of this table. For exam sors are not included in this regression.
ple, consider the regression of the test score against the Although the table does not report t-statistics, these can
student-teacher ratio, with no control variables. In equa be computed from the information provided; for example,
tion form, this regression is the t-statistic testing the hypothesis that the coefficient
-- on the student-teacher ratio in column (1) is zero is
TestScore = 698.9 - 2.28 x STR, R'- = 0.049, -2.28/0.52 = -4.38. This hypothesis is rejected at the 1%
(10.4) (0.52) level, which is indicated by the double asterisk next to the
SER = 18.58, n = 420 (9.19) estimated coefficient in the table.
All this information appears in column (1) of Table 9-1. Regressions that include the control variables measuring
The estimated coefficient on the student-teacher ratio student characteristics are reported in columns (2)-(5).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Column (2), which reports the regression of test scores ratio and its standard error, and because the coefficient
on the student-teacher ratio and on the percentage of on this control variable is not significant in specification
English learners, was previously stated as Equation (9.5). (5). this additional control variable is redundant, at least
Column (3) presents the base specification, in which the for the purposes of this analysis.
regressors are the student-teacher ratio and two control
variables, the percentage of English learners and the per
centage of students eligible for a free lunch. CONCLUSION
Columns (4) and (5) present alternative specifications
Chapter 8 began with a concern: In the regression of test
that examine the effect of changes in the way the eco
scores against the student-teacher ratio, omitted stu
nomic background of the students is measured. In column
dent characteristics that influence test scores might be
(4), the percentage of students on income assistance is
correlated with the student-teacher ratio in the district,
included as a regressor, and in column (5) both of the
and if so the student-teacher ratio in the district would
economic background variables are included.
pick up the effect on test scores of these omitted stu
Di
scussion of Empirical Results dent characteristics. Thus, the OLS estimator would have
omitted variable bias. To mitigate this potential omitted
These results suggest three conclusions: variable bias, we augmented the regression by including
1. Controlling for these student characteristics cuts the variables that control for various student characteristics
effect of the student-teacher ratio on test scores (the percentage of English learners and two measures
approximately in half. This estimated effect is not of student economic background). Doing so cuts the
very sensitive to which specific control variables are estimated effect of a unit change in the student-teacher
included in the regression. In all cases the coefficient ratio in half, although it remains possible to reject the
on the student-teacher ratio remains statistically null hypothesis that the population effect on test scores,
significant at the 5% level. In the four specifications holding these control variables constant, is zero at the
with control variables, regressions (2)-(5), reduc- 5% significance level. Because they eliminate omitted
ing the student-teacher ratio by one student per variable bias arising from these student characteristics,
teacher is estimated to increase average test scores these multiple regression estimates, hypothesis tests,
by approximately one point, holding constant student and confidence intervals are much more useful for advis
characteristics. ing the superintendent than the single-regressor esti
2. The student characteristic variables are very useful mates of Chapters 6 and 7.
predictors of test scores. The student-teacher ratio The analysis in this and the preceding chapter has pre
alone explains only a small fraction of the variation sumed that the population regression function is linear in
in test scores: The R2 in column (1) is 0.049. The R2 the regressors-that is, that the conditional expectation of
jumps, however, when the student characteristic Y; given the regressors is a straight line. There is, however,
variables are added. For example, the R2 in the base no particular reason to think this is so. In fact, the effect
specification, regression (3), is 0.773. The signs of the of reducing the student-teacher ratio might be quite dif
coefficients on the student demographic variables ferent in districts with large classes than in districts that
are consistent with the patterns seen in Figure 9-2: already have small classes. If so, the population regression
Districts with many English learners and districts with line is not linear in the X's but rather is a nonlinear func
many poor children have lower test scores. tion of the X's.
J. The control variables are not always individually statisti
cally significant: In specification (5), the hypothesis that
the coefficient on the percentage qualifying for income SUMMARY
assistance is zero is not rejected at the 5% level (the
t-statistic is -0.82). Because adding this control vari 1. Hypothesis tests and confidence intervals for a single
able to the base specification (3) has a negligible effect regression coefficient are carried out using essen
on the estimated coefficient for the student-teacher tially the same procedures that were used in the
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
one-variable linear regression model of Chapter 7. For Bonferroni test is the one-at-a-time t-statistic test done
example, a 95% confidence interval for IJ, is given by properly. The Bonferronl test of the joint null hypothesis
�l ± 1.965£(�,). P, = 131,0 and 132 = 132,o based on the critical value c > 0 uses
2. Hypotheses involving more than one restriction on the the following rule:
coefficients are called joint hypotheses. Joint hypoth Accept if I t1 I 5 c and if I t2 I 5 c; otherwise reject
eses can be tested using an F-statistic.
(Bonferroni one-at-a-time t-statistic test) (9.20)
3. Regression specification proceeds by first determin
ing a base specification chosen to address concern where t, and t2 are the t-statistics that test the restrictions
about omitted variable bias. The base specification on IJ, and p2, respectfully.
can be modified by including additional regressors The trick is to choose the critical value c in such a way
that address other potential sources of omitted vari that the probability that the one-at-a-time test rejects
able bias. Simply choosing the specification with the when the null hypothesis is true is no more than the
highest R2 can lead to regression models that do not desired significance level, say 5%. This is done by using
estimate the causal effect of interest. Bonferroni's inequality to choose the critical value c to
allow both for the tact that two restrictions are being
tested and for any possible correlation between t, and t2•
Key Terms
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
lfei:I!¥?fl Bonferroni Critical Values c for the standard normal distribution, so Pr( IZI > 2.241) = 2.5%.
One-at-a-Time t-Statistic Test of a Thus Equation (9.22) tells us that, in large samples, the
Joint Hypothesis one-at-a-time test in Equation (9.20) will reject at most
5% of the time under the null hypothesis.
Significance Level
Number of
,"
The critical values in Table 9-2 are larger than the criti
Restrictions (q) 10" s" cal values for testing a single restriction. For example,
2 1.960 2.241 2.807 with q = 2, the one-at-a-time test rejects if at least one
t-statistic exceeds 2.241 in absolute value. This critical
3 2.128 2.394 2.935
value is greater than 1.96 because it properly corrects for
4 2.241 2.498 3.023 the fact that, by looking at two t-statistics, you get a sec
ond chance to reject the joint null hypothesis.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe linear and nonlinear trends. • Explain the necessary conditions for a model
• Describe trend models to estimate and forecast selection criterion to demonstrate consistency.
trends.
• Compare and evaluate model selection criteria,
including mean squared error (MSE), s1, the Akaike
information criterion (AIC), and the Schwarz
information criterion (SIC).
141
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
MODELING TREND that it increases or decreases like a straight line. That is, a
simple linear function of time,
The series that we want to forecast vary over time, and Tt = Po + P,TIME,.
we often mentally attribute that variation to unobserved
provides a good description of the trend. The variable
underlying components, such as trends, seasonals, and
TIME is constructed artificially and is called a time trend
cycles. In this chapter, we focus on trend.1 Trend is slow,
or time dummy. Time equals 1 in the first period of the
long-run evolution in the variables that we want to model
sample, 2 in the second period, and so on. Thus, for a sam
ple of size T. TIME = (1, 2, 3, . . . , T 1, T); put differently,
and forecast. In business, finance, and economics, for
-
40
20
Trend = 10 - 0.25 • Time
50 55 60 65 70 75 80 85 90 95 -20
.;
Time .;
Trend = -50 +
10 20 30 40 50 60 70 80 90 100
1 Later we'll define and study seasonals and cycles. Not all com Time
ponents need be present in all observed series.
2 Later we'll broaden our discussion to allow for li[Clil;lj[e?fj Increasing and decreasing linear
atochaltk: tNnd. trends.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
linear trends superimposed in Figures 10-4 and 10-5.3 In from the trend. The linear trends seem adequate. There
each case, we show the actual participation rate series are still obvious dynamic patterns in the residuals, but
together with the fitted trend, and we also show the that's to be expected-persistent dynamic patterns are
residual-the deviation of the actual participation rate typically observed in the deviations of variables from
trend.
3 Shortly we'll discuss how we estimated the trends. For now. just
take them as given. Sometimes trend appears nonlinear, or curved, as, for
example, when a variable increases at an increasing or
88 decreasing rate. Ultimately, we don't require that trends
be linear, only that they be smooth. Figure 10-6 shows the
86
90
84
85
82
80
� BO
IX
75
78
70
76
74
12 -+-�������-----.
50 55 60 65 70 75 80 85 90 95
Time
60 Year
40 8000
30
6000
GI
§ 4000
0
>
2000
Residual
50 55 60 65 70 75 80 85 90 50 55 60 65 70 75 80 BS 90 95
Year Time
liittlil:ljt•#il Linear trend, female labor force l@Mil:lj[.l#'.J Volume on the New York Stock
participation rate. Exchange.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
monthly volume of shares traded on the New York Stock Linear trend emerges as a special (and potentially restric
Exchange (NYSE). Volume increases at an increasing rate; tive) case when 132 = 0. Higher-order polynomial trends
the trend is therefore nonlinear. are sometimes entertained, but it's important to use low
order polynomials to maintain smoothness.
Quadratic trand models can potentially capture nonlin
earities such as those observed in the volume series. Such A variety of different nonlinear quadratic trend shapes are
trends are quadratic, as opposed to linear, functions of possible, depending on the signs and sizes of the coef
time, ficients; we show several in Figure 10-7. In particular. if p1 > O
and 13 > O as in the upper-left panel, the trend is mono
2
tonically, but nonlinearly, increasing, Conversely, if
20 40 60 80 100
20 40 60 80 100
Time
Time
1000 1500
0
1000
-1000
500
-2000
0
-3000
-4000 -500
20 40 60 80 100 20 40 60 80 100
Time Time
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0
20 40 60 80 100 20 40 60 80 100
Time Time
-1
-2
-3
-4
-5
-6
20 40 60 80 100 20 40 60 80 100
Time Time
between them. The nonlinear trends in some series are compares. In Figure 10-11, we plot the Jog volume data
well approximated by quad ratic trend, whereas the with linear trend superimposed; the log-linear trend looks
trends in other series are better approximated by expo quite good. Equivalently, Figure 10-12 shows the volume
nential trend. We have already seen, for example, that data in levels with exponential trend superimposed;
although quadratic trend looked better than linear trend the exponential trend looks much better than did the
for the NYSE volume data, the quadratic fit still had some quadratic.
undesirable features. Let's see how an exponential trend
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Time
where e denotes the set of parameters to be estimated. A
FIGURE 10-11 Linear trend, log volume on the linear trend, for example, has 7;(0) = Po + �1TIME1 and e =
New York Stock Exchange. (r;JO' r;J,); in which case the computer finds
r
JI.JI,
=
,_,
8000
Similarly, in the quadratic trend case, the com
Actual puter finds
6000
4000
4000
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
estimate it by regressing log y on an intercept and TI ME. operational-we just replace unknown parameters with
Thus, we let the computer find their least squares estimates, yielding
Note that the fitted values from this regression are the fit trend regression disturbance is normally distributed, in
ted values of log y, so they must be exponentiated to get which case a 95% interval forecast ignoring parameter
the fitted values of y; estimation uncertainty is Yr+h.r ± 1.96u, where u is the stan
dard deviation of the disturbance in the trend regression.8
To make this operational, we use YT+h,T ::!:. 1.96&, where a is
FORECASTING TREND the standard error of the trend regression, an estimate
ofu.
Consider first the construction of point forecasts. Sup
To form a density forecast, we again assume that the
pose we're presently at time T, and we want to use a trend
trend regression disturbance is normally distributed.
model to forecast the h-step-ahead value of a series y; For
Then, ignoring parameter estimation uncertainty, we have
illustrative purposes, we'll work with a linear trend, but the 2
the density forecast N<Yr+ti.1' a ), where a is the standard
procedures are identical with more complicated trends.
The linear trend model, which holds for any time t, is
deviation of the disturbance in the trend regression. To
make this operational, we use the density forecast
Y1 = �o + �1TIME1 + st'
NcYT+h,1' (12'),
In particular, at time T + h, the future time of interest,
Yr+h = Po + P,TIMEr+h + 8r+h" SELECTING FORECASTING
Two future values of series appear on the right side of the MODELS USING THE AKAIKE
equation, TIMEr+h and sr+h" If TIMEr+h and Br+h were known AND SCHWARZ CRITERIA
at time T. we could immediately crank out the forecast.
In fact. TIMEr+h is known at time T, because the artificially We've introduced a number of trend models, but how do
constructed time variable is perfectly predictable; specifi we select among them when fitting a trend to a specific
cally, TIM Er+h = T + h. Unfortunately er+" is not known at series? What are the consequences, for example, of fit
time T, so we replace it with an optimal forecast of sr+h ting a number of trend models and selecting the model
constructed using information only up to time T.8 Under with highest R2? Is there a better way? This issue of model
the assumption that s is simply independent zero-mean selectlon is of tremendous importance in all of forecast
random noise, the optimal forecast of sr+h for any future ing, so we introduce it now.
period is 0, yielding the point forecast7
It turns out that model selection strategies such as select
Yr+ti.r = Po + P,TI M Er+h" ing the model with highest R2 do not produce good
The subscript "T + h, T" on the forecast reminds us that out-of-sample forecasting models. Fortunately, however,
the forecast is for time T + h and is made at time T. a number of powerful modern tools exist to assist with
model selection. Here we digress to discuss some of the
The point forecast formula just given is not of practical
available methods, which will be immediately useful in
use, because it assumes known values of the trend
selecting among alternative trend models, as well as many
parameters 130 and i;l1• But it's a simple matter to make it
other situations.
8 More formally, we say that we're "projecting •rn, on the time-T 8 When we say that we ignore parameter estimation uncertainty,
information set.M
we mean that we use the estimated parameters as if they were
7 "Independent zero-mean random noise• is just a fancy way the true values, ignoring the fact that they are only estimates and
of saying that the regression disturbances satisfy the usual subject to sampling variability. Later we'll see how to account for
assumptions-they are identically and independently distributed. parameter estimation uncertainty by using simulation techniques.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Most model selection criteria attempt to find the model MSE-turns out to be a bad idea. In-sample MSE can't rise
with the smallest out-of-sample 1-step-ahead mean when more variables are added to a model, and typically
squared prediction error. The criteria we examine fit this it will fall continuously as more variables are added. To
general approach; the differences among criteria amount see why, consider the fitting of polynomial trend models.
to different penalties for the number of degrees of free In that context, the number of variables in the model is
dom used in estimating the model (that is, the number linked to the degree of the polynomial (call it p):
of parameters estimated). Because all of the criteria T1 = 130 + 131TIM E1 + 132TIME� + . . . + 13 TIME�.
0
are effectively estimates of out-of-sample mean square
prediction error, they have a negative orientation-the
We've already considered the cases of p=
1 (linear trend)
smaller, the better.
and p= 2 (quadratic trend), but there's nothing to stop us
from fitting models with higher powers of time included.
First, consider the mean squared error (MSE), As we include higher powers of time, the sum of squared
T residuals can't rise, because the estimated parameters are
I, e: explicitly chosen to minimize the sum of squared residu
MSE _t,,L_ ,
T als. The last-included power of time could always wind up
=
variance, and the size of the bias increases with the num
ber of variables included in the model. The direction of
The denominator of the ratio that appears in the formula
the bias is downward-in-sample MSE provides an overly
is just the sum of squared deviations of y from its sample
optimistic (that is, too small) assessment of out-of-sample
mean (the so-called total sum of squares), which depends
prediction error variance.
only on the data, not on the particular model fit. Thus,
selecting the model that minimizes the sum of squared To reduce the bias associated with MSE and its rela
residuals-which, as we saw, is equivalent to selecting the tives, we need to penalize for degrees of freedom used.
model that minimizes MSE-is also equivalent to selecting Thus, let's consider the mean squared error corrected for
the model that maximizes R2. degrees of freedom,
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
where k is the number of degrees of freedom used in information criterion, (AIC) and the Schwarz information
model fitting,11 and s2 is just the usual unbiased estimate criterion (SIC). Their formulas are
of the regression disturbance variance. That is, it is the
square of the usual standard error of the regression. So
selecting the model that minimizes s2 is also equivalent to
selecting the model that minimizes the standard error of
the regression. Also, s2 is intimately connected to the R2 and
adjusted for degrees of freedom (the adjusted R2, or R2). T
Recall that
T
s1c -r!:)�
-
I,e2
T
.
I, e:/(T - k)
How do the penalty factors associated with MSE, s2, AIC, and
s2
R2 = 1 - T = 1 - --
T -----
! <Y, - y)2/(T - 1)
�1
L(Y, - y)2/(T - 1)
�1
SIC compare in terms of severity? All of the penalty factors
are functions of k/T. the number of parameters estimated
The denominator of the R2 expression depends only on per sample observation, and we can compare the penalty
the data, not the particular model fit, so the model that factors graphically as k/Tvaries. In Figure 10-13, we show the
minimizes s2 is also the model that maximizes.R2. In short, penalties as k/T moves from 0 to 0.25, for a sample size of
the strategies of selecting the model that minimizes s2, or T = 100. The s2 penalty is small and rises slowly with k/T', the
the model that minimizes the standard error of the regres AIC penalty is a bit larger and still rises only slowly with k/T.
The SIC penalty, on the other hand, is substantially larger
and rises at a slightly increasing rate with k/T.
sion, or the model that maximizes R2, are equivalent, and
they do penalize for degrees of freedom used.
To highlight the degree-of-freedom penalty, let's rewrite s2 It's clear that the different criteria penalize degrees of
as a penalty factor times the MSE, freedom differently. In addition, we could propose many
other criteria by altering the penalty. How, then, do we
T
S2 = (-T )�.
'I, e2 select among the criteria? More generally, what proper
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
200,000
150,000
100,000
50,000
-
-
- 0
40,000
-50,000
20,000
-20,000
Residual
-40,000
55 60 65 70 75 80 85 90
Time
FIGURE 10·15 Retail sales, linear trend residual plot.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Pearson Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
- 200,000
/i
- 150,000
- 100,000
10,000 - 50,000
5,000
-5,000
-10,000 -+-------..--.-.-..---,.---,---'
55 60 65 70 75 80 85 90
Time
FIGURE 10-16 Retail sales, quadratic trend residual plot.
significant.13 R2 is now almost 1. Figure 10-16 shows the little scope for explaining such dynamics with trend,
residual plot, which now looks very nice, as the fitted non because they're related to the business cycle, not the
linear trend tracks the evolution of retail sales well. The growth trend.
residuals still display persistent dynamics (indicated as Now let's estimate a different type of nonlinear trend
well by the still-low Durbin-Watson statistic), but there's
model. the exponential trend. First, we'll do it by OLS
regression of the log of retail sales on a constant and lin
13 The earlier caveat regarding the effects of serial correlation on ear time trend variable. We show the estimation results
inference applies, however. and residual plot in Table 10.3 and Figure 10-17. As with
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
=
Convergence achieved after 1 iteration
RTRR C(l)*EXP(C(2)*TIME)
- 200,000
- 150,000
- 100,000
20,000 - 50,000
-10,000
-20,000 -+------....--.-..--'
55 60 65 70 75 BO 85 90
Time
FIGURE 10-18 Retail sales, exponential trend residual plot.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
200,000
;
;
;
;
; ;
+"
; ;
; ;
190,000
Ill ;
ll'J
; ;
tJ ; ;
cu ; ;
... ;
;
u.. 180,000
0
;
;
"'C
c:
Ill
:>.
... 170,000
...
0
ll'J
:c
160,000
-
150,000 -- -
--.- ----.
.... - -
..-- .... -
...- -- -
--.- -
----. - -
..-- -
.... ... ....
- ...
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
FIGURE 10·19 Retail sales: History, 1990.01-1993.12; and quadratic trend
forecast, 1994.01-1994.12.
seems to fit the best. The quadratic trend model, how is important in the trend, as both rank the linear trend last.
ever, contains one more parameter than the other two, so Both, moreover. favor the quadratic trend model. So let's
it's not surprising that it fits a little better, and there's no use the quadratic trend model.
guarantee that its better fit on historical data will trans
Figure 10-19 shows the history of retail sales, 1990.01-
late into better out-of-sample forecasting performance.
1993.12, together with out-of-sample point and 95% inter
To settle on a final model, we examine the AIC or SIC,
val extrapolation forecasts, 1994.01-1994.12. The point
which are summarized in Table 10-5 for the three trend forecasts look reasonable. The interval forecasts are com
models.14 Both the AIC and SIC indicate that nonlinearity
puted under the (incorrect) assumption that the deviation
of retail sales from trend is random noise, which is why
14 It's important that the exponential trend model be estimated in they're of equal width throughout. Nevertheless, they look
levels. in order to maintain comparability of the exponential trend reasonable.
model AIC and SIC with those of the other trend models.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
c:
200,000
0
,,,
,,,
'P ,,,
RI ,,,
.!::! 190,000
,,,
,,,
RI ,,,
(IJ
�
,,,
,,,
"'C
c: 180,000
RI
•
.
.;
Ill
ta
u
� 170,000
0
u..
>;
0 160,000
....
Ill
:c
150,000 -+-----+-----.--.---....-..--
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
In Figure 10-20, we show the history of retail sales through sample. The confidence intervals are very wide, reflecting
1993, the quadratic trend forecast for 1994, and the real the large standard error of the linear trend regression rela
ization for 1994. The forecast is quite good, as the realiza tive to the quadratic trend regression.
tion hugs the forecasted trend line quite closely. All of the Finally, Figure 10-22 shows the history, the linear trend
realizations, moreover, fall inside the 95% forecast interval.
forecast for 1994, and the realization. The forecast is
For comparison, we examine the forecasting performance terrible-far below the realization. Even the very wide
of a simple linear trend model. Figure 10-21 presents the interval forecasts fail to contain the realizations. The rea
history of retail sales and the out-of-sample point and son for the failure of the linear trend forecast is that the
95% interval extrapolation forecasts for 1994. The point forecasts (point and interval) are computed under the
forecasts look very strange. The huge drop forecasted assumption that the linear trend model is actually the true
relative to the historical sample path occurs because the DGP, whereas in fact the linear trend model is a very poor
linear trend is far below the sample path by the end of the approximation to the trend in retail sales.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
200,000
- - -
- - -
..... 180,000
Ill
IO
u
QI
...
0
LL 160,000
"'C
c: - - - -
- - -
IO
� 140,000
0.
....
.!!!
I
- - -
120,000 - - - -
-
100,000 -+- --.-
- ---.-
- --....
---. - ..- -
.... ..- ""T"'""
- -
.... ... ---r-
- ---T
-
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
c
200,000
0
:;;
IO
.!:::! 180,000
"jij
QI
0:::
"'C
c 160,000
ro
..,j - .
Ill -- - -
ro
u
� 140,000
0
LL
-
� - - - - -
0 120,000
.....
Ill
:c
100,000 ----.---...--..--""T"'""---..-
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
FIGURE 10·22 Retail sales: History, 1990.01-1993.12; and linear trend
forecast and realization, 1994.01-1994.12.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe the sources of seasonality and how to deal • Explain how to construct an h-step-ahead point
with it in time series analysis. forecast.
• Explain how to use regression analysis to model
seasonality.
Excerpt s
i Chapter 6 of Elements of Forecasting, 4th Edition, by Francis X. Diebold.
161
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
THE NATURE AND SOURCES You might imagine that, although certain series are sea
sonal for obvious reasons, seasonality is nevertheless
OF SEASONALITY
uncommon. On the contrary, and perhaps surprisingly,
In the last chapter, we focused on the trends; now we'll seasonality is pervasive in business and economics. Many
focus on saasonallty. A seasonal pattern is one that industrialized economies, for example, expand briskly
1 every fourth quarter and contract every first quarter.
repeats itself every year. The annual repetition can be
exact, in which case we speak of deterministic seasonal· One way to deal with seasonality in a series is simply to
lty, or approximate, in which case we speak of stochastic remove it and then to model and forecast the saasonally
seasonal� Just as we focused exclusively on determin- adJusted time sarles.2 This strategy is perhaps appropriate
istic trend in Chapter 10, reserving stochastic trend for
subsequent treatment, so shall we focus exclu- 14,000
sively on deterministic seasonality here.
Seasonality arises from links of technologies,
..!!!
12,000
preferences, and institutions to the calendar. "'
Ill
The weather (for example, daily high tempera- Ill
ture in Tokyo) is a trivial but very important di
,!!;;
0
10,000
Ill
seasonal series, as it's always hotter in the "'
summer than in the winter. Any technology I!)
that involves the weather, such as production
8000
of agricultural commodities, is likely to be sea-
sonal as well.
Preferences may also be linked to the calendar. 6000
Consider, for example, gasoline sales. In Fig- 80 81 82 83 84 85 86 87 88 89 90 91 92
ure 11-1, we show monthly U.S. currency-dollar Time
gasoline sales, 1980.01-1992.01. People want to
do more vacation travel in the summer, which 14f9i!;lj!OI Gasoline sales.
tends to increase both the price and quantity
of summertime gasoline sales, both of which 2800
feed into higher current-dollar sales.
Finally, social institutions that are linked to the
2400
Kl
calendar, such as holidays, are responsible for
iii
Ill
seasonal variation in a variety of series. Pur-
0
chases of retail goods skyrocket, for example,
::I
2000
every Christmas season. In Figure 11-2, we C'
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
"'
given time, we can be in only one of the four
QI
60,000
iii quarters, so one seasonal dummy is 1, and all
Ill
"' others are 0.
'ti
0 50,000
0
�
The pure seasonal dummy model is
QI
:c
Ill 40,000
:::1
...
= = =
think of s as the number of observations on a series in model is:3
each year. Thus, s 4 if we have quarterly data, s 12 if I
we have monthly data, s 52 if we have weekly data, and Yr = P,TIMEt + r 1P1t + El.
1-1
so forth.
In fact, you can think of what we're doing in this chapter
Now let's construct Hasonal dummy Yilllriabl•s. which
as a generalization of what we did in the last, in which we
indicate which season we're in. If, for example, there are
focused exclusively on trend. We still want to account for
===
four seasons, we create
trend, if it's present, but we want to expand the model so
Di (1, 0 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, . . .); that we can account for seasonality as well.
02 (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, . . .);
D'!J
04 = (0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, . . .);
(0, 0, 0, 1, 0,0, 0, 1, 0, 0, 0, 1, . . .).
J For simplicity. we have included only a linear trend. but more
Di indicates whether we're in the first quarter (it's 1 in the complicated models of trend. such as quadratic. exponential or
first quarter and 0 otherwise), 02 indicates whether we're logistic, could of course be used.
2017 Financial Risk Manager (FRM) Pstt I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright o 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The idea of seasonality may be extended to allow for The full model is
more general calendar effects. �standard" seasonality is
just one type of calendar effect. Two additional important
calendar effects are holiday variation and trading-day
variation. so that at time T + h,
Hollday v.rlatlon refers to the fact that some holidays'
dates change over time. That is, although they arrive at
approximately the same time each year, the exact dates
differ. Easter is a common example. Because the behav As with the pure trend model of Chapter 10, we project the
ior of many series, such as sales, shipments, inventories, right side of the equation on what's known at time T (that
hours worked, and so on, depends in part on the timing of is, the tlme-T Information set, Or) to obtain the forecast
such holidays, we may want to keep track of them in our
forecasting models. As with seasonality, holiday effects
may be handled with dummy variables. In a monthly
model, for exa mple, in addition to a full set of seasonal
As always, we make this point forecast operational by
dummies, we might include all "Easter dummy,u which is l
replacing unknown parameters with estimates,
if the month contains Easter and 0 otherwise.
Trading-day v.rlatlon refers to the fact that different
months contain different numbers of trading days or
business days, which is a n important consideration when
To form an interval forecast. we proceed precisely as in
modeling and forecasting certain series. For example, in a
pure trend models we studied earlier. We assume that the
monthly forecasting model of volume traded on the Lon
regression disturbance is normally distributed, i n which
don Stock Exchange, in addition to a full set of seasonal
case a 95% Interval forecast ignoring parameter estima
dummies, we might include a trading-day variable, whose
tion uncertainty is Yr+11.1 ± 1.96a, where a is the standard
value each month is the number of trading days that
deviation of the regression disturbance. To make the inter
val forecast operational, we use 9'r+11.r ± 1.96 u, where a is
month.
Allowing for the possibility of holiday or trading-day varia the standard error of the regression.
tion gives the complete model
To form a density forecast, we again assume that the
trend regression disturbance is normally distributed. Then,
Yr = P,TIMEr + f, 'Y,Dir + I,a�0HDV1r + ia;0TDV1r + £r,
1.1 /•1 ,_,
ignoring parameter estimation uncertainty, the density
2
forecast is N<Yr+11..,. a ), where a is the standard deviation
where the HDVs are the relevant holiday variables (there of the disturbance in the trend regression. The operational
v1
are of them), and the TDVsare the relevant trading-day density forecast is then N�T+Jt.,. a2).
variables (here we've allowed for v2 of them, but in most
applications, v2 = 1 will be adequate). This is just a stan
dard regression equation and can be estimated by ordi APPLICATION: FORECASTING
nary least squares. HOUSING STARTS
2017 Finl/lllcia/ Rial< Msnager (FRM) P< I: Quanlilatil/8 Ans/ya, Seventh Edition by Global Aaaociaticn of Risk Pnlfessionals.
Copyright O 2017 by Peal'llOn Education, Inc. All Rights ReseM!d. Pear.ion Ct.Islam Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
200
seasonal dummies account for more than a third
=
of the variation in housing starts, as R2 0.38. At
least some of the remaining variation is cyclical,
which the model is not designed to capture. (Note
t 150 the very low Durbin-Watson statistic.)
ftJ
tii
The residual plot in Figure 11-6 makes clear the
100
strengths and limitations of the model. First com
pare the actual and fitted values. The fitted values
50 go through the same seasonal pattern every year
there's nothing in the model other than determin
istic seasonal dummies-but that rigid seasonal
50 55 60 65 70 75 80 85 90 pattern picks up a lot of the variation in housing
starts. It doesn't pick up a/I of the variation, how
Time
ever, as evidenced by the serial correlation that's
iij[tlil;j:li¢1 Housing starts, 1946.01-1994.11. apparent in the residuals. Note the dips in the
residuals, for example, in recessions (for example
160 1990, 1982, 1980, and 1975), and the peaks in
booms.
140 The estimated seasonal factors are just the 12 esti
mated coefficients on the seasonal dummies; we
120 graph them in Figure 11-7. The seasonal effects are
�
"'
very low in January and February and then rise
� 100
U'I quickly and peak in May, after which they decline,
at first slowly and then abruptly in November and
80 December.
The figures reveal that there is no trend, so we'll work with In Figure 11-9, we include the 1994 realization. The forecast
the pure seasonal model, appears highly accurate, as the realization and forecast
are quite close throughout. Moreover, the realization is
everywhere well within the 95% interval.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
250
Actual
200
150
I
11 1 11 100
' � � �
50
50 55 60 65 70 75 80 85 90
Time
IaMil:ljit
B Residual plot.
160 -
2 4 6 8 10 12
Season
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
250
..... 200
Ill
Ill
u
cu ... - -
I
� I
1J
150
c
., -
IU
, -
- \
,
� 100 I
0
..... ., ... -
I
Ill
I '
I
50
a
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
250
c
0
;:l
.!:! 200
Ill
Iii ...
GJ -
-
a: I
1J 150
c I
IU
..J
Ill
Ill
GJ 100
u
... , ... -
0 I '
u.
I
�
0 50
.....
Ill
�
0
90:01 90:07 91:01 91:07 92:01 92:07 93:01 93:07 94:01 94:07
Time
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Define covariance stationary, autocovariance • Describe Wold's theorem.
function, autocorrelation function, partial • Define a general linear process.
autocorrelation function, and autoregression. • Relate rational distributed lags to Wold's theorem.
• Describe the requirements for a series to be • Calculate the sample mean and sample
covariance stationary. autocorrelation, and describe the Box-Pierce
• Explain the implications of working with models that Q-statistic and the Ljung·Box a-statistic.
are not covariance stationary. • Describe sample partial autocorrelation.
• Define white noise, and describe independent white
noise and normal (Gaussian) white noise.
• Explain the characteristics of the dynamic structure
of white noise.
• Explain how a lag operator works.
171
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
We've already built forecasting models with trend and COVARIANCE STATIONARY
seasonal components. In this chapter, as well as the next
TIME SERIES
one, we consider a crucial third component, cycles. When
you think of a "cycle," you probably think of the sort of A ,.•Dz•tlon of a time series is an ordered set, {. . . , y_ 2•
rigid up-and-down pattern depicted in Figure 12-1. Such
y_l'y0 ,y1,y1 , • . . }. Typically the observations are ordered
cycles can sometimes arise, but cyclical fluctuations in in time-hence the name time ••rl••-but they don't have
business, finance, economics, and government are typi to be. We could, for example, examine a spatial series,
cally much less rigid. In fact, when we speak of cycles, such as office space rental rates as we move along a line
we have in mind a much more general, all-encompassing from a point In Midtown Manhattan to a point in the New
notion of cyclicality: any sort of dynamics not captured by York suburbs 30 miles away. But the most important case
trends or seasonals. for forecasting, by far, involves observations ordered in
Cycles, according to our broad interpretation, may display time, so that's what we'll stress.
the sort of back-and-forth movement characterized in In theory, a time series realization begins in the infinite
Figure 12-1, but they don't have to. All we require is that past and continues into the infinite future. This perspec
there be some dynamics, some persistence, some way in tive may seem abstract and of limited practical applicabil
which the present is linked to the past and the future to ity, but it will be useful in deriving certain very important
properties of the forecasting models we'll be using soon.
the present. Cycles are present in most of the series that
concern us, and it's crucial that we know how to model In practice, of course, the data we observe are just a finite
and forecast them, because their history conveys informa subset of a realization, {y1 , • • • ,y�, called a ample path.
tion regarding their future.
Shortly we'll be building forecasting models for cyclical
Trend and seasonal dynamics are simple, so we can cap time series. If the underlying probabilistic structure of the
ture them with simple models. Cyclical dynamics, how series were changing over time. we'd be doomed-there
ever, are more complicated. Because of the wide variety would be no way to predict the future accurately on the
of cyclical patterns, the sorts of models we need are sub basis of the past. because the laws governing the future
stantially more involved. Thus, we split the discussion into would differ from those governing the past. If we want to
three parts. Here in Chapter 12 we develop methods for forecast a series, at a minimum we'd like its mean and its
characterizing cycles and in Chapter 13 we discuss models covariance structure (i.e., the covariances between current
of cycles. This material is crucial to a real understanding of and past values) to be stable over time, in which case we
forecasting and forecasting models, and it's also a bit dif say that the series is cov.rlance stationary.
ficult the first time around because it's unavoidably rather
Let's discuss covariance stationarity in greater depth. The
mathematical, so careful, systematic study is required.
first requirement for a series to be covariance stationary is
that the mean of the series be stable over time. The mean
1.5 of the series at time t is
E(y;J = ,_..t .
1.0
I f the mean i s stable over time, as required by cova riance
0.5 stationarity, then we can write
.,
v E(y) = µ.,
O'
0.0
2017 Finl/lllcia/ Rial< Msnager (FRM) P< I: Quanlilatil/8 Ans/ya, Seventh Edition by Global Aaaociaticn of Risk Pnlfessionals.
Copyright O 2017 by Peal'llOn Education, Inc. All Rights ReseM!d. Pear.ion Ct.Islam Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
course depend on T, and it may also depend on t, so in covariance stationary. We'll often adopt that strategy.
general we write Alternatively, simple transformations often appear to
y(t, T) = cov(yt yt---.) = E(y1 - µ.)(yt--. - µ.).
transform nonstationary series to covariance stationarity.
' For example, many series that are clearly nonstationary in
If the covariance structure is stable over time, as required levels appear covariance stationary in growth rates.
by covariance stationarity, then the autocovariances
In addition, note that although covariance stationarity
depend only on displacement, T, not on time, t, and
requires means and covariances to be stable and finite, it
we write
places no restrictions on other aspects of the distribution
-y(t, T) = 'Y(T), of the series, such as skewness and kurtosis.1 The upshot
for all t.
is simple: Whether we work directly in levels and include
special components for the nonstationary elements of our
The autocovariance function is important because it pro models, or we work on transformed data such as growth
vides a basic summary of cyclical dynamics in a covari rates, the covariance stationarity assumption is not as
ance stationary series. By examining the autocovariance unrealistic as it may seem.
structure of a series, we learn about its dynamic behavior.
We graph and examine the autocovariances as a function Recall that the correlation between two random variables
of T. Note that the autocovariance function is symmetric; x and y is defined by
�-'
\,U\'
l \.X, Y) -
that is, COV(X, y)
a,,aY
.
y(T) = -y(-T),
_
for all T. Typically, we'll consider only nonnegative values That is, the correlation is simply the covariance, "normal
of T. Symmetry reflects the fact that the autocovariance of ized" or "standardized," by the product of the standard
a covariance stationary series depends only on displace deviations of x and y. Both the correlation and the cova
ment; it doesn't matter whether we go forward or back riance are measures of linear association between two
ward. Note also that random variables. The correlation is often more informa
tive and easily interpreted, however, because the con
y(O) = cov(yt, y) = var (yt ). struction of the correlation coefficient guarantees that
There is one more technical requirement of covariance corr(x,y) e [-1, 1], whereas the covariance between the
stationarity: We require that the variance of the series same two random variables may take any value. The cor
the autocovariance at displacement 0, y(O)-be finite. relation, moreover, does not depend on the units in which
It can be shown that no autocovariance can be larger in x and y are measured, whereas the covariance does. Thus,
absolute value than -y(O), so if -y(O) < QQ' then so, too, are for example, if x and y have a covariance of 10 million,
all the other autocovariances. they're not necessarily very strongly associated, whereas
if they have a correlation of .95, it is unambiguously clear
It may seem that the requirements for covariance sta
that they are very strongly associated.
tionarity are quite stringent, which would bode poorly for
our forecasting models, almost all of which invoke covari In light of the superior interpretability of correlations as
ance stationarity in one way or another. It is certainly true compared with covariances, we often work with the cor
that many economic, business, financial, and government relation, rather than the cova riance, between Yt and y1-<,
series are not covariance stationary. An upward trend, That is, we work with the autocorrelation function, p(T),
for example, corresponds to a steadily increasing mean, rather than the autocovariance function, -y(T). The autocor
and seasonality corresponds to means that vary with relation function is obtained by dividing the autocovari
the season, both of which are violations of covariance ance function by the variance,
stationarity.
p(T) = -y(Q) , T = 0, 1, 2. , . . .
'Y(T)
But appearances can be deceptive. Although many series
are not covariance stationary, it is frequently possible to
work with models that give special treatment to nonsta
tionary components such as trend and seasonality, so 1 For that reason, covariance stationarity is sometimes called
that the cyclical component that's left over is likely to be Hcond-order mtlonarlty or weak stationarity.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
g
u
::s
p(T) = cov(yt, Y1_,.) <:
�var(y1)�var(y1_,) -0.5
-y(T) -y(T)
= = 'Y(O)
J'Y(O)J'Y(O) ' -1.0
5 10 15 20 25 30 35
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
1.0
WHITE NOISE
yt = Et
1.0
Et - (0, cr2),
"E
.... o.o
0 ally uncorrelated. Throughout, unless
v
8 explicitly stated otherwise, we assume that
� a2 < oo. Such a process, with zero mean,
-0.5
constant variance, and no serial correlation,
is called zero-mun white noise, or simply
-1.0 white nolse.4 Sometimes for short we write
£t
5 10 15 20 25 30 35
- WN(O, a2)
Displacement
and hence
14(filldjF
0-1 Autocorrelation function, sharp cutoff.
Yt - WN(O, a2).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
{
ance function for a white noise process is
We read "y is independently and identically distributed as 2
'Y(:r) a ,
=
normal, with zero mean and constant variance" or simply T=O
"y is Gaussian white noise." In Figure 12-6 we show a sam 0, T � 1,
ple path of Gaussian white noise, of length T = 150, simu and the autocorrelation function for a white noise
lated on a computer. There are no patterns of any kind in
{l,
process is
p(T) =0
the series due to the independence over time.
=
T
7 Karl a
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
=
0
P(T) = series. o,_, = {yt-1 ,yt-2 , . . .}. or the past history of the
0 1
•
shocks, 0,_, {Et-1 ' Er- , . . .}. (They're the same in the
1
We show the partial autocorrelation function of a white white noise case.) In contrast to the unconditional mean
noise process in Figure 12-8. Again, it's degenerate and and variance, which must be constant by covariance sta
exactly the same as the autocorrelation function! tionarity, the conditional mean and variance need not
By now you've surely noticed that if you were assigned be constant, and in general we'd expect them not to be
the task of forecasting independent white noise, you'd constant. The unconditionally expected growth of laptop
likely be doomed to failure. What happens to a white
noise series at any time is uncorrelated with anything in
the past; similarly, what happens in the future is uncor 1c If you need to refresh your memory on conditional means. con
related with anything in the present or past. But under sult any good introductory statistics book. such as Wonnacott
standing white noise is tremendously important for at and Wonnacott (1990).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
computer sales next quarter may be 10%, but expected which is a weighted sum, or distributed lag, of cur-
sales growth may be much higher, conditional on knowl rent and past values. All forecasting models, one way
edge that sales grew this quarter by 20%. For the inde or another, must contain such distributed lags, because
pendent white noise process, the conditional mean is they've got to quantify how the past evolves into the
present and future; hence, lag operator notation is a use
E(yt I fit-,) = 0,
ful shorthand for stating and manipulating forecasting
and the conditional variance is models.
Thus far, we've considered only finite-order polynomials in
Conditional and unconditional means and variances are the lag operator; it turns out that infinite-order polynomi
identical for an independent white noise series; there are als are also of great interest. We write the infinite-order
no dynamics in the process and hence no dynamics in the lag operator polynomial as
conditional moments to exploit for forecasting.
B(L) = bD + bL
1 £...., bJ.!
+ b2L2 + ··· = � J '
1�0
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
yr = B(L)Er = I, b1Et-I
-
l
=O
Er WN(O, a 2),
and
where b0 = 1 and I.ti/ < cc. In short, the correct "model"
for any covarianc�-�tationary series is some infinite dis
var(y.) = var (1-0t ) = f/-0
b,Er-i ti/var(E,_1) = 'i,b,2a2 = cr2'i,b,2•
/-0 /-0
tributed lag of white noise, called the Wold representa
At this point. in parallel to our discussion of white noise.
tion. The t/s are often called Innovations, because they
we could compute and examine the autocovariance and
correspond to the 1-step-ahead forecast errors that we'd
autocorrelation functions of the general linear process.
make if we were to use a particularly good forecast. That
Those calculations, however. are rather involved, and not
is, the Er's represent that part of the evolution of y that's
particularly revealing, so we'll proceed instead to examine
linearly unpredictable on the basis of the past of y. Note
the conditional mean and variance, where the information
also that the t/s, although uncorrelated, are not necessar
set nr-1 on which we condition contains past innovations;
}. In this manner, we can see
•..
ily independent. Again, it's only for Gaussian random vari·
that is, nr-i = {tt-1, 11-2,
ables that lack of correlation implies independence, and
how dynamics are modeled via conditional moments.13The
the innovations are not necessarily Gaussian.
conditional mean is
..
In our statement of Wold's theorem we assumed a zero
mean. That may seem restrictive, but it's not. Rather,
whenever you see Yr just read Yr - fl-· so that the process is b
= 0 + 1EH + b2E1_2 + · ·
·= }:b1Et-1,
,_,
expressed in deviations from its mean. The deviation from
the mean has a zero mean, by construction. Working with and the conditional variance is
zero-mean processes therefore involves no loss of gener·
var(y1 I nr-i> = E((yt - E(y1 I nr-i>>2 I 01_1) = £(1� I nr-1>
ality while facilitating notational economy. We'll use this = £(�) = er.
device frequently.
The key insight is that the conditional mean moves over
time in response to the evolving information set. The
The General Linear Process
model captures the dynamics of the process, and the
Wold's theorem tells us that when formulating forecast· evolving conditional mean is one crucial way of summa·
ing models for covariance stationary time series, we need rizing them. An important goal of time series modeling,
only consider models of the form
Yr = B(L)Er = I,b1Er-I
1=4
1� Although Wold's theorem guarantees only serially uncorrelated
white noise innovations, we shall sometimes make a stronger
,
assumption of independent white noise innovations to focus the
discussion. We do so, for example, in the following characteriza
12 Moreover we require that the covariance stationary processes tion of the conditional moment structure of the general linear
don't contain any deterministic components. process.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
especially for forecasters, is capturing such conditional Then we can find an approximation of the Wold repre
mean dynamics-the unconditional mean is constant (a sentation using a rational distributed lag. Rational dis
requirement of stationarity), but the conditional mean var tributed lags produce models of cycles that economize
ies in response to the evolving information set.14 on parameters (they're parsimonious), while nevertheless
providing accurate approximations to the Wold represen
Ratlonal Distributed Lags tation. The popular ARMA and ARIMA forecasting models,
which we'll study shortly, are simply rational approxima
As we've seen, the Wold representation points to the cru
tions to the Wold representation.
cial importance of models with infinite distributed lags.
Infinite distributed lag models, in turn, are stated in terms
of infinite polynomials in the lag operator, which are there ESTIMATION AND INFERENCE FOR
fore very important as well. Infinite distributed lag models
THE MEAN, AUTOCORRELATION,
are not of immediate practical use, however, because they
AND PARTIAL AUTOCORRELATION
contain infinitely many parameters, which certainly inhib
its practical application! Fortunately, infinite polynomials FUNCTIONS
in the lag operator needn't contain infinitely many free
Now suppose we have a sample of data on a time series,
parameters. The infinite polynomial B(l) may, for example,
and we don't know the true model that generated the
be a ratio of finite-order (and perhaps very low-order)
data, or the mean, autocorrelation function, or partial
polynomials. Such polynomials are called ratlonal poly·
autocorrelation function associated with that true model.
nomlals, and distributed lags constructed from them are
Instead, we want to use the data to estimate the mean,
called ratlonal distributed lags.
autocorrelation function, and partial autocorrelation func
Suppose, for example, that tion, which we might then use to help us team about the
8(L) underlying dynamics and to decide on a suitable model or
B(l) = '
4>(L) set of models to fit to the data.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Jfp(T) -
as
N(O, 1).
This estimator, viewed as a function of T, is called the
- x�.
Squaring both sides yields15
sample autocorrelation function or correlogram. Note
that some of the summations begin at t = T + 1, not at 7j;2(T)
t = 1; this is necessary because of the appearance of Yt-.. in
T T)
It can be shown that, in addition to being approximately
the sum. Note that we divide those same sums by T, even
though only (T - terms appear in the sum. When Tis
normally distributed, the sample autocorrelations at vari
ous displacements are approximately independent of one
large relative to (which is the relevant case), division by
another. Recalling that the sum of independent x2 vari
Tor by T - T will yield approximately the same result, so it ables is also x?- with degrees of freedom equal to the sum
won't make much difference for practical purposes; more
of the degrees of freedom of the variables summed, we
over, there are good mathematical reasons for preferring
have shown that the Box-Piere• Q-statlstlc,
division by T.
It's often of interest to assess whether a series is reason
ably approximated as white noise, which is to say whether
all its autocorrelations are 0 in population. A key result, is approximately distributed as a x;, random variable under
which we simply assert, is that if a series is white noise, the null hypothesis that y is white noise.16 A slight modifi
then the distribution of the sample autocorrelations in cation of this, designed to follow more closely the X:- dis
large samples is tribution in small samples, is
p(T) - ( �l
N 0,
QLB = T(T + 2) �( �T}2(T).
T
we we
provide 95% bounds for the sample autocorrelations
sample autocorrelations p(T) so that positive and negative values
taken one at a time. Ultimately, we're often interested in don't cancel when sum across various values of,., as will
whether a series is white noise-that is, whether a// its soon do.
autocorrelations are jointly 0. A simple extension lets us 18 m is a maximum displacement selected by the user. Shortly
test that hypothesis. Rewrite the expression we'll discuss how to choose it.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
115
We compute the Q-statistic and its p-value
under the null hypothesis of white noise
llO for values of m (the number of terms in the
c
105
sum that underlies the Q-statistic) rang
s
.,
s
sistently 0 to four decimal places, so the
11.l
� 95
null hypothesis of white noise is decisively
:a
rejected.
90
"'
c:
"'
u
11 A canwlOQram •n•ly•ll simply means examina
85
tion of the sample autocorrelation and partial
autocorrelation functions (with two-standard
80 '------'---'--L__J_--'-....I-'--'-'--'-'-'-._.�
M w � ro n � % � 00 � M M � 00 M
�
error bands), together with related diagnostics,
such as Q-statistics.
18 We show the Ljung-Box version of the
Time
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
1.0
Bibliographical
c:;
0 and Computational Notes
"
·::i
0.8
...
<l
... 0.6
0 Wold's theorem was originally proved in a 1938 mono
8
u
""
�
econometric modeling dates at least to Jorgenson (1966).
"
c..
0.0 Bartlett (1946) derived the standard errors of the sample
�
VJ -0.2
autocorrelations and partial autocorrelations of white
noise. In fact, the plus-or-minus two-standard-error bands
-0.4 are often called the NBartlett bands."
2 4 6 8 10 12
Displacement The two variants of the a-statistic that we introduced
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Pearson Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
l1tat1ve Analysis. Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe the properties of the first-order moving • Describe the properties of a general pth order
average (MA(l)) process, and distinguish between autoregressive (AR(p)) process.
autoregressive representation and moving average • Define and describe the properties of the
representation. autoregressive moving average (ARMA) process.
• Describe the properties of a general finite-order • Describe the application of AR and ARMA processes.
process of order q (MA(q)) process.
• Describe the properties of the first-order
autoregressive (AR(l)) process, and define and
explain the Yule-Walker equation.
187
2017 Financial RiskManager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
When building forecasting models, we don't want to pre directly as distributed lags of current and past shocks
tend that the model we fit is true. Instead, we want to be that is, as moving average processes.2
aware that we're approximating a more complex reality.
That's the modern view, and it has important implica The MA(1) Process
tions for forecasting. In particular, we've seen that the
The first-order moving average process, or MA(1)
key to successful time series modeling and forecasting is
process, is
parsimonious, yet accurate, approximation of the Wold
representation. In this chapter, we consider three approxi Yt = Et + 0Et-1 = (1 + 0L)Et
mations: moving avarage (MA) models, autoregres·
Et - WN(O, a2).
sift (AR) models, and autoregressive moving average
(ARMA) models. The three models vary in their specifics The defining characteristic of the MA process in general,
and have different strengths in capturing different sorts of and the MA(l) in particular; is that the current value of the
autocorrelation behavior. observed series is expressed as a function of current and
lagged unobservable shocks. Think of it as a regression
We begin by characterizing the autocorrelation functions model with nothing but current and lagged disturbances
and related quantities associated with each model, under on the right-hand side.
the assumption that the model is "true." We do this sepa
rately for MA, AR, and ARMA models.' These character To help develop a feel for the behavior of the MA(l) pro
izations have nothing to do with data or estimation, but cess, we show two simulated realizations of length 150 in
they're crucial for developing a basic understanding of the Figure 13-1. The processes are
properties of the models, which is necessary to perform Yt = Ei- + 0.4Er-1
intelligent modeling and forecasting. They enable us to
and
make statements such as "If the data were really gener
ated by an autoregressive process, then we'd expect its Yt = Et + 0.95Er-1•
autocorrelation function to have property x." Armed with where in each case E1 llS N(O, 1). To construct the realiza
that knowledge, we use the sample autocorrelations and tions, we used the same series of underlying white noise
partial autocorrelations, in conjunction with the AIC and shocks; the only difference in the realizations comes from
the SIC, to suggest candidate forecasting models, which the different coefficients. Past shocks feed positively into
we then estimate. the current value of the series, with a small weight of
0 = 0.4 in one case and a large weight of 9 = 0.95 in the
MOVING AVERAGE (MA) MODELS other. You might think that 0 = 0.95 would induce much
more persistence than 9 = 0.4, but it doesn't. The struc
The finite-order moving average process is a natural and ture of the MA(l) process, in which only the first lag of the
obvious approximation to the Wold representation, which
is an infinite-order moving average process.
4
Finite-order moving average processes also 0 = 0.95
�
.,
-2
bll
1 Sometimes, especially when characterizing pop <::
·:;:
ulation properties under the assumption that the 0 0 = 0.4
� -4
models are correct, we refer to them as processes.
which is short for stochutk: procusu-hence
-6
the terms moving average process, autoregressive
process, and ARMA process. 20 40 60 80 100 120 140
2 Economic equilibria, for example. may be dis Time
turbed by shocks that take some time to be fully
assimilated. liU'illijjg§I Realizations of two MA(1) processes.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
To compute the autocorrelation function for the MA(l) laMIJ;ljtf'.J Population autocorrelation function,
process, we must first compute the autocovariance func MA(1) process, a = 0.95.
tion. We have
{
'Y(T) = E(yrY1_) = E((Er + 6Er-,)<er-. + 6E1....,._1))
6cr2 , T=1 9 = 0.4 has a smaller autocorrelation (0.34) than the pro
= cess with parameter e = 0.95 (0.50), but both drop to O
0, otherwise.
beyond displacement 1.
The autocorrelation function is just the autocovariance
function scaled by the variance, Note that the requirements of covariance stationarity
_
(constant unconditional mean, constant and finite uncon
p('1") = 02 '
l e_
=, ditional variance, autocorrelation dependent only on dis
= ,+
placement) are met for any MA(l) process, regardless of
'Y('I") 'I"
y(O)
--
0, otherwise.
the values of its parameters. If, moreover, Jal < 1, then we
The key feature here is the sharp cutoff In the autocor say that the MA(l) process is lnvertlble. In that case, we
relatlon function. All autocorrelations are 0 beyond dis can "invert" the MA(1) process and express the current
placement 1, the order of the MA process. In Figures 13-2 value of the series not in tenns of a current shock and a
and 13-3, we show the autocorrelation functions for lagged shock but rather in terms of a current shock and
our two MA(l) processes with parameters e = 0.4 and Jagged values of the series. That's called an autoregres
0 = 0.95. At displacement 1, the process with parameter sive representation. An autoregressive representation
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
has a current shock and lagged observable values of the Autoregressive representations are appealing to fore
series on the right, whereas a moving average representa casters, because one way or another, if a model is to be
tion has a current shock and lagged unobservable shocks used for real-world forecasting, it must link the present
on the right. observables to the past history of observables, so that we
can extrapolate to form a forecast of future observables
Let's compute the autoregressive representation. The
based on present and past observables. Superficially, mov
process is
ing average models don't seem to meet that requirement,
Yt = Et + llEt-1 because the current value of a series is expressed in terms
E1 - WN(O, cil). of current and lagged unobservable shocks, not observable
variables. But under the invertibility conditions that we've
Thus, we can solve for the innovation as
described, moving average processes have equivalent
Et= Yt - 8E:-1 · autoregressive representations. Thus, although we want
Lagging by successively more periods gives expressions autoregressive representations for forecasting, we don't
for the innovations at various dates, have to start with an autoregressive model. However; we
typically restrict ourselves to invertible processes, because
Et-1 = Yt-1 - 9Et-2 for forecasting purposes we want to be able to express
Er-2 = Yr-2 - llEt-3 current observables as functions of past observables.
Finally, let's consider the partial autocorrelation func-
tion for the MA(1) process. From the infinite autoregres
and so forth. Making use of these expressions for lagged
sive representation of the MA(l) process, we see that the
innovations, we can substitute backward in the MA(1) pro
partial autocorrelation function will decay gradually to
cess, yielding
0. As we discussed in Chapter 12, the partial autocorrela
Yt = et + eyt-1 - 82Yt-2 + e�Yt-3 - · · · · tions are just the coefficients on the last included lag in
In lag operator notation, we write the infinite autoregres a sequence of progressively higher-order autoregressive
sive representation as approximations. If e > 0, then the pattern of decay will be
one of damped oscillation; otherwise, the decay will be
1 one-sided.
l + eL Yr = Er.
In Figures 13-4 and 13-5 we show the partial autocorrela
Note that the back substitution used to obtain the autore
tion functions for our example MA(l) processes. For each
gressive representation only makes sense, and in fact a
process, lel < 1, so that an autoregressive representation
convergent autoregressive representation only exists, if
lel < 1, because in the back substitution we raise e to pro
gressively higher powers.
We can restate the invertibility condition in another way:
a
0
·;i
The inverse of the root of the moving average lag opera 0.5
<'<!
'E
tor polynomial (1 + 6L) must be less than 1 in absolute
0
u
value. Recall that a polynomial of degree m has m roots.
....
B o.o
Thus, the MA(l) lag operator polynomial has one root,
�
which is the solution to "iii
·o
; -0.5
1 + OL = 0. �
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Yt = Et + 8yt-1 - 91yt-2 + 8�Yt-3 - . . . ' in which case we have the convergent autoregressive
representation,
so the autoregressive representation for the process with
8 = 0.4 is 1
= Et.
8(L) Yt
Yt = Et + 0.4yt-1 - 0.42yt-2 + . . .
= Et + 0.4yl'-1 - 0.16Yt-2 + . . . ' The conditional mean of the MA(q) process evolves with
the information set, in contrast to the unconditional
and the autoregressive representation for the process
moments, which are fixed. In contrast to the MA(1) case, in
with 8 = 0.95 is
which the conditional mean depends on only the first lag
Yr = Et + 0. 9Sy - 0.952yt_2 + · · ·
t-1 of the innovation, in the MA(q) case the conditional mean
= Er + 0.95yt-1 - 0.902Sy + · · · . t-2. depends on q lags of the innovation. Thus, the MA(q) pro
cess has the potential for longer memory.
The partial autocorrelations display a similar damped oscil
lation.3 The decay, however, is slower for the a = 0.95 case. The potentially longer memory of the MA(q) process
emerges clearly in its autocorrelation function. In the
The MA(q) Process MA(1) case, all autocorrelations beyond displacement 1
are O; in the MA(q) case, all autocorrelations beyond dis
Now consider the general finite-order moving average
placement q are 0. This autocorrelation cutoff is a distinc
process of order q, or MA(q) for short,
tive property of moving average processes. The partial
Yt Et + 81Er-1 +
= + 8Q"t-Q 8(L)Et
• • • = autocorrelation function of the MA(q) process, in contrast,
Et - WN(O, a2), decays gradually, in accord with the infinite autoregressive
representation, in either an oscillating or a one-sided fash
where ion, depending on the parameters of the process.
9(L) = 1 + 0/- + · · · + 9,fQ In closing this section, let's step back for a moment and
3 Note, however, that the partial autocorrelations are not the consider in greater detail the precise way in which finite
successive coefficients in the infinite autoregressive representa order moving average processes approximate the Wold
tion. Rather. they are the coefficients on the last included lag in representation. The Wold representation is
sequence of progressively longer autoregressions. The two are
related but distinct. Yt = B(l)Et •
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Yt = 8(L)Et,
whereas the MA(1) process approximates the infinite mov and the second is
ing average with only a first-order moving average, which
Yt = 0.95yt-1 + �·
can sometimes be very restrictive.
where in each case Et jig N(O, 1), and the same innovation
sequence underlies each realization. The fluctuations in
AUTOREGRESSIVE (AR) MODELS the AR(l) with parameter cp = 0.95 appear much more
persistent than those of the AR(1) with parameter cp = 0.4.
The autoregressive process is also a natural approxima This contrasts sharply with the MA(1) process, which has
tion to the Wold representation. We've seen, in fact, that a very short memory regardless of parameter value. Thus,
under certain conditions a moving average process has the AR(1) model is capable of capturing much more per
an autoregressive representation, so an autoregressive sistent dynamics than is the MA(1).
process is in a sense the same as a moving average pro
Recall that a finite-order moving average process is
cess. Like the moving average process, the autoregressive
always covariance stationary but that certain condi-
process has direct motivation; it's simply a stochastic dif
tions must be satisfied for invertibility, in which case an
ference equation, a simple mathematical model in which
autoregressive representation exists. For autoregressive
the current value of a series is linearly related to its past
processes, the situation is precisely the reverse. Autore
values, plus an additive stochastic shock. Stochastic dif
gressive processes are always invertible-in fact, invert
ference equations are a natural vehicle for discrete-time
ibility isn't even an issue, as finite-order autoregressive
stochastic dynamic modeling.
processes already are in autoregressive form-but certain
conditions must be satisfied for an autoregressive process
The AR(1) Process to be covariance stationary.
The first-order autoregressive process, AR(1) for short, is If we begin with the AR(l) process,
Yt = 'PYt-1 + �
Et - WN(O, u2). and substitute backward for lagged y's on the right side,
In lag operator form, we write we obtain
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
In lag operator form, we write so that, multiplying both sides of the equation by y,_. we
obtain
1
Yr = 1 - 'fl Er.
YtYt-• = 'PY1-1Y1-. + Ei.Yt-<'
This moving average representation for y is convergent if For T � 1, taking expectations of both sides gives
-y(T)
and only if lcpl < 1; thus, lcpl < 1 is the condition for covari = cp-y(T - 1),
ance stationarity in the AR(1) case. Equivalently, the con
dition for covariance stationarity is that the inverse of the This is called the Yule·W.lker equation. It is a recursive
root of the autoregressive lag operator polynomial be less equation; that is, given -y(T), for any T, the Yule-Walker
than 1 in absolute value. equation immediately tells us how to get 'Y(T + 1). If we
knew 'Y(O) to start things off (an "initial condition"), we
From the moving average representation of the cova could use the Yule-Walker equation to determine the
riance stationary AR(1) process, we can compute the entire autocovariance sequence. And we do know y(O); it's
unconditional mean and variance, just the variance of the process, which we already showed
E(y;J = £(Et + 'PE1-1 + �:i. + . . ·) to be y(O) = o"Ji - .;-. Thus, we have
= E(E;J + cpE(Ei--1) + qi2E(Ei-1) + . . . 2
y(0) = --0'
1 - ip2
=O
0'2
and y(1) = cp--
1 - cp2
var(y;> = var(E1 + cpE1-1 + cp2EM + · · ·)
= a2 + � + 'P"'0'2 + . . .
and so on. In general, then,
2
= 1 -0' cp2 .
•
'Y(T) = cp - a2 2 ,
1 - cp
T = Q, 1, 2,, . . ,
{0,
To find the autocovariances, we proceed as follows. The process cuts off abruptly; specifically,
p(T)
process is
cp, T = 1
=
T>1
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
1.0 1.0
c:
0
·;i
0.5 0.5
c: "'
0 'ii
·::i ....
"' ....
'ii 0
....
....
0 9
0.0 0.0
u
s �
�
� ·;i
-0.5 �
�
-0.5
-1.0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Displacement Displacement
0.5
c:
0
·;i
"'
"E
....
0
0.0
u
s
�
-0.5
2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 5 6 7 8 9 10 11 12 13 14 15
Displacement Displacement
14tf\ll;ljglU Population autocorrelation function, FIGURE 13·10 Population partial autocorrelation
AR(l) process, rp = 0.95. function, AR(1) process. rp = 0.95.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0
·:::1
-.:;
OS
1 0.4
Yr = cll(L) Er. ....
....
0
u
.9 0.0
The autocorrelation function for the general
�
AR(p) process, as with that of the AR(1)
-0.4
process, decays gradually with displace
ment. Finally, the AR(p) partial autocor
-0.8
relation function has a sharp cutoff at
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
displacement p, for the same reason that Displacement
the AR(1) partial autocorrelation function
has a sharp cutoff at displacement 1. FIGURE 13-11 Population autocorrelation function AR(2)
process with complex roots.
Let's discuss the AR(p) autocorrelation
function in a bit greater depth. The key insight is that,
in spite of the fact that its qualitative behavior (gradual
p(O) = 1
damping) matches that of the AR(1) autocorrelation
function, it can nevertheless display a richer variety of p(1) = __!i_
patterns, depending on the order and parameters of the 1 - 4P2
process. It can, for example, have damped monotonic p(T) = Cil1P(T - 1) + Cil2'>(T - 2), T = 2, 3,. . . .
decay, as in the AR(1) case with a positive coefficient, but
it can also have damped oscillation in ways that AR(1) Using this formula, we can evaluate the autocorrelation
can't have. In the AR(1) case, the only possible oscillation function for the process at hand; we plot it in Figure 13-11.
occurs when the coefficient is negative, in which case the Because the roots are complex, the autocorrelation func
autocorrelations switch signs at each successively longer tion oscillates, and because the roots are close to the unit
displacement. In higher-order autoregressive models, circle, the oscillation damps slowly.
however, the autocorrelations can oscillate with much Finally, let's step back once again to consider in greater
richer patterns reminiscent of cycles in the more tradi detail the precise way that finite-order autoregressive pro
tional sense. This occurs when some roots of the autore cesses approximate the Wold representation. As always,
gressive lag operator polynomial are complex.5 the Wold representation is
Consider, for example, the AR(2) process, Yr = B(l)Et '
Yr = 1.5yt-1 - 0.9yt-2 + t&:t . where B(l) is of infinite order. The AR(1), as compared to
The corresponding lag operator polynomial is 1 - 1.SL + the MA(l), is simply a different approximation to the Wold
0.9L2, with two complex conjugate roots, 0.83 ± 0.65i. representation. The moving average representation asso
The inverse roots are 0.75 ± 0.58i, both of which are close ciated with the AR(1) process is
to, but inside, the unit circle; thus, the process is covari 1
ance stationary. It can be shown that the autocorrelation Yr = 1 - q>L Er .
Thus, when we fit an AR(l) model, we're using Yi-ipL, a
function for an AR(2) process is
rational polynomial with degenerate numerator polyno
mial (degree 0) and denominator polynomial of degree 1,
to approximate B(L). The moving average representation
4 A necessary condition for covalfance atatlonaltty, which is
often useful as a quick check, is r, :_,cp, < 1. If the condition is
associated with the AR(1) process is of infinite order; as
satisfied, the process mayor may not be stationary; but if the con is the Wold representation, but it does not have infinitely
dition is violated, the process can't be stationary. many free coefficients. In fact, only one parameter, �.
5 Note that complex roots can't occur in the AR(1) case. underlies it.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
The AR(p) is an obvious generalization of the AR(1) strategy which is an infinite distributed lag of current and past
for approximating the Wold representation. The moving innovations. Similarly, if the invertibility condition is
average representation associated with the AR(p) process is satisfied, then we have the infinite autoregressive
representation,
(1 - 'fl)
= Er.
(1 + 0L) Yr
When we fit an AR(p) model to approximate the Wold
representation we're still using a rational polynomial with The ARMA(p, q) process is a natural generalization of the
degenerate numerator polynomial (degree 0), but the ARMA(l, 1) that allows for multiple moving average and
denominator polynomial is of higher degree. autoregressive lags. We write
(ARMA) MODELS
or
Autoregressive and moving average models are often Cl>(L) Y1 = 9(L)�.
combined in attempts to obtain better and more par
simonious approximations to the Wold representation, where
yielding the autoregressive moving average process, cl>(L) = 1 - 'PJL - q>-i,l2 - • • • - IJl,1-P
ARMA(A q) process for short. As with moving average
and
and autoregressive processes, ARMA processes also have
direct motivation.' First, if the random shock that drives an 8(L) = 1 + e� + e2L2 + . . . + a,1-11•
autoregressive process is itself a moving average process,
If the inverses of all roots of cl>(L) are inside the unit circle,
then it can be shown that we obtain an ARMA process.
then the process is covariance stationary and has conver
Second, ARMA processes can arise from aggregation. For
gent infinite moving average representation
example, sums of AR processes, or sums of AR and MA
processes, can be shown to be ARMA processes. Finally,
AR processes observed subject to measurement error also
turn out to be ARMA processes.
If the inverses of all roots of 9(L) are inside the unit circle,
The simplest ARMA process that's not a pure autoregres then the process is invertible and has convergent infinite
sion or pure moving average is the ARMA(l, 1), given by autoregressive representation
E1 - WN(O, u-2),
or, in lag operator form, As with autoregressions and moving averages, ARMA pro
(1 - cpL)y1 = (1 + 9L)E1,
cesses have a fixed unconditional mean but a time-varying
conditional mean. In contrast to pure moving average or
where lel < 1 is required for stationarity and lel < 1 is pure autoregressive processes. however. neither the auto
required for invertibility.7 If the covariance stationarity correlation nor partial autocorrelation functions of ARMA
condition is satisfied, then we have the moving average processes cut off at any particular displacement. Instead,
representation each damps gradually, with the precise pattern depending
on the process.
(1+ 8L)
Yr = (1 - lpl) Er • ARMA models approximate the Wold representation by a
ratio of two finite-order lag operator polynomials. neither
of which is degenerate. Thus, ARMA models use ratios of
full-fledged polynomials in the lag operator to approxi
5 For more extensive discussion. see Granger and Newbold (1986).
mate the Wold representation,
7 Both stationarity and invertibility need to be checked in the
ARMA case, because both autoregressive and moving average
components are present.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
ARMA models, by allowing for both moving average use a computer to solve for the parameters that minimize
and autoregressive components, often provide accurate the sum of squared residuals,
approximations to the Wold representation that neverthe
less have just a few parameters. That is, ARMA models
are often both highly accurate and highly parsimonious. ... � (
p., ii = argm i n f y1 - (� + 8y1-1 -82yr- + ··· +(-1)mt18myr-m))
1+ 8 2
2
2017 �inancialRiskManager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
t O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
Copyrigh
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
liJ:l!JRtil Employment MA(4) Model of yon one lag of y; in the AR(p) case, we
regress y on p lags of y.
LS // Dependent variable is CAN EMP.
Sample: 1962:1 1993:4 We estimate AR(p) models, (p) = 1, 2, 3, 4.
Included observations: 128 Both the AIC and the SIC suggest that the
Convergence achieved after 49 iterations AR(2) is best. To save space, we report only
Variable Coefficient Std. Error t-Statlstlc Prob. the results of AR(2) estimation in Table
13-3. The estimation results look good, and
c 100.5438 0.843322 119.2234 0.0000 the residuals (Figure 13-14) look like white
MA(1) 1.587641 0.063908 24.84246 0.0000 noise. The residual correlogram (Table 13-4
and Figure 13-15) supports that conclusion.
MA(2) 0.994369 0.089995 11.04917 0.0000
Finally, we consider ARMA(p, q) approxi
MA(3) -0.020305 0.046550 -0.436189 0.6635 mations to the Wold representation.
MA(4) -0.298387 0.020489 -14.56311 0.0000 ARMA models are estimated in a fashion
similar to moving average models; they
R2 0.849951 Mean dependent var. 101.0176 have autoregressive approximations with
Adjusted R2 0.845071 SD dependent var. 7.499163 nonlinear restrictions on the parameters,
which we impose when doing a numerical
SE of regression 2.951747 Akaike info criterion 2.203073 sum of squares minimization. We exam-
Sum squared resid. 1071.676 Schwarz criterion 2.314481 ine all ARMA(p, q) models with p and q
less than or equal to 4; the SIC and AIC
Log likelihood -317.6208 F-statistic 174.1826
values appear in Tables 13·5 and 13-6. The
Durbin-Watson stat. 1.246600 Prob(F-statistic) 0.000000 SIC selects the AR(2) (an ARMA(2, 0)),
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal'90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
Correlogram
c
0
0.6
Sample: 1962:1 1993:4
·;:i
�
Included observations: 128 "'
Q-statistic probabilities adjusted for 4 ARMA term(s) .... 0.4
0
u
8
P. Std. LJung-
�
a.
Acorr. Acorr. Error Box p•Vlllue 0.2
.,
a
1 0.345 0.345 .088 15.614 cJll 0.0
c
6 0.484 0.145 .088 183.70 0.000 0
·;:i
0.6
""
5
Vl
11 0.081 -0.098 .088 207.89 0.000 "' -0.4
12 0.029 -0.113 .088 208.01 0.000 -0.6
2 3 4 5 6 7 8 9 10 11 12
Displacement
lfj:!!JRll Employment AR(2) Model FIGURE 13·13 Employment MA(4) model: residual
LSI/Dependent variable is CANEMP. sample autocorrelation functions,
Sample: 1962:1 1993:4 with plus or minus two-standard
Included observations: 128 error bands.
Convergence achieved after 3 iterations
them, which brings us down to an ARMA(2,
variable Coefficient Std. Error t-Statistic Prob. 0), or AR(2), model with roots virtually
c
indistinguishable from those of our earlier
101.2413 3.399620 29.78017 0.0000
estimated AR(2) process! We refer to this
AR(I) 1.438810 0.078487 18.33188 0.0000 situation as one of common factors in an
AR(2) -0.476451 0.077902 -6.116042 0.0000 ARMA model. Be on the lookout for such
situations, which arise frequently and can
ffl 0.963372 Mean dependent var. 101.0176 lead to substantial model simplification.
Adjusted � 0.962786 SD dependent var. 7.499163 Thus, we arrive at an AR(2) model for
employment.
SE of regression 1.446663 Akaike info criterion 0.761677
Sum squared resid. 261.6041 Schwarz criterion 0.828522
Bibliographical and
Log likelihood -227.3715 F-statistic 1643.837
Computational Notes
Durbin-Watson stat. 2.067024 Prob(F-statistic) 0.000000
Inverted AR roots .92 .52 Characterization of time series by means of
autoregressive, moving average, or ARMA
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
110
100
8
6 90
4
80
2
0 <A--rl-'-l-f-t-ll-\l--tiH-+---\-H__,_'--'"'--�-\-41+-+JWV--�-'t-+-+----+-1f-++f-'Wl-+-ff-H-ll-4---ot---H-,-
-2
-4 ������
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92
it1:J!jgil Employment AR(2) Model, Residual lfe"'!:I!jflSd Employment AIC Values, Various
Correlogram ARMA Models
Sample: 1962:1 1993:4 MA Order
Included observations: 128
Q-statistic probabilities adjusted for 2 ARMA term(s) 0 1 2 3 4
P. Std. LJung- 0 2.86 2.32 2.47 2.20
Ac:orr. Ac:cor. Error Box p-Yalue
AR 1 1.01 0.83 0.79 0.80 0.81
1 -0.035 -0.035 .088 0.1606 Order 2 0.762 0.77 0.78 0.80 0.80
2 0.044 0.042 .088 0.4115
3 0.77 0.761 0.77 0.78 0.79
3 0.011 0.014 .088 0.4291 0.512
4 0.79 0.79 0.77 0.79 0.80
4 0.051 0.050 .088 0.7786 0.678
5 0.002 0.004 .088 0.7790 0.854
It'!:!!jgij;:t Employment SIC Values, Various
6 0.019 0.015 .088 0.8272 0.935 ARMA Models
7 -0.024 -0.024 .088 0.9036 0.970 MA Ordar
8 0.078 0.072 .088 1.7382 0.942 0 1 2 3 4
9 0.080 0.087 .088 2.6236 0.918 0 2.91 2.38 2.56 2.31
10 0.050 0.050 .088 2.9727 0.936 1 1.05 0.90 0.88 0.91 0.94
AR
11 -0.023 -0.027 .088 3.0504 0.962 Order 2 0.83 0.86 0.89 0.92 0.96
12 -0.129 -0.148 .088 5.4385 0.860 3 0.86 0.87 0.90 0.94 0.96
4 0.90 0.92 0.93 0.97 1.00
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0.2 0.2
-0.2 -�--�--�---
2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12
Displacement Displacement
FIGURE 13-15 Employment AR(2) model: residual sample autocorrelation and partial autocorrelation
functions, with plus or minus two-standard-error bands.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
110
100
8
6 90
4
80
2 1��--� �-•--lf---�-.-l--- -l-+���•� -n..--- H
O ll\-7Pd-"-'l--'V-\:-tt-f-'d-t+"..µ..'--�-ttHcAf\F'-'-�+-t�H-t\-P"V-\-iHH+v-"-i-t--,-il-l-ArAJ
-2
-4 ������
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92
FIGURE 13·16 Employment ARMA(3, 1) model, residual plot.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Yt = µ. + �
4>(l)Et = ll1 8 That's why, for example. information on the number of iterations
required for convergence is presented even for estimation of the
lit - WN(O, a2), autoregressive model.
and the ARMA model is 8 Hence the notation "JI.· for the intercept.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
in threshold models with observed states (Granger and Developments, Tensions and Prospects. Boston: Kluwer
Terasvirta, 1993) and allowing for time-varying transi Academic Press, 427-472.
tion probabilities in threshold models with latent states
Diebold, F. X., and Rudebusch, G. D. (1996) . "Measuring
(Diebold, Lee, and Weinbach, 1994).
Business Cycles: A Modem Perspective." Review of Eco
nomics and Statistics, 78, 67-77. Reprinted in Diebold and
Rudebusch (1999).
Concepts for Review
Diebold, F. X., and Rudebusch, G. D. (1999). Business
Moving Average (MA) model Cycles: Durations, Dynamics, and Forecasting. Princeton,
Autoregressive (AR) model N.J.: Princeton University Press.
Autoregressive moving average (ARMA) model Engle, R. F. (1982). "Autoregressive Conditional Heteroske
Stochastic process dasticity with Estimates of the Variance of U.K. Inflation."
MA(l) process Econometrica, 50, 987-1008.
Cutoff in the autocorrelation function
lnvertibility Fildes, R., and Steckler, H. (2000). "The State of Macro
Autoregressive representation economic Forecasting.� Manuscript.
MA(q) process Granger; c. w. J. (1990). "Aggregation of Time Series Vari
Complex roots ables: A Survey.N In T. Barker and M. H. Pesaran (eds.), Dis
Condition for invertibility of the MA(q) process aggregation in Econometric Modelling. London: Routledge.
Yule-Walker equation
Granger, C. W. J., and Newbold, P. (1986). Forecasting Eco
AR(p) process
nomic Time Series. 2nd ed. Orlando, Fla.: Academic Press.
Condition for covariance stationarity
ARMA(p, q) process Granger, C. W. J., and Terasvirta, Y. (1993). Model-
Common factors ling Nonlinear Economic Relationships. Oxford: Oxford
Box-Jenkins model University Press.
Hamilton, J. D. (1989). "A New Approach to the Economic
References and Additional Readings Analysis of Nonstationary Time Series and the Business
Cycle." Econometrica, 57, 357-384.
Bollerslev, T. (1986). "Generalized Autoregressive Condi McCullough, 8. D., and Vinod, H. D. (1999). "The Numerical
tional Heteroskedasticity." Journal of Econometrics, 31, Reliability of Econometric Software.''.Journal of Economic
307-327. literature, 37, 633-665.
Bollerslev, T., Chou, R. Y, and Kroner, K. F. (1992). "ARCH Newbold, P., Agiakloglou, C., and Miller J. P. (1994).
Modeling in Finance: A Selective Review of the Theory and "Adventures with ARIMA Software." International Journal
Empirical Evidence. ·Journal of Econometrics, 52, 5-59. of Forecasting, 10, 573-581.
Box, G. E. P., Jenkins, G. W., and Reinsel, G. (1994). Time Pesaran, M. H .. Pierse, R. G., and Kumar, M. S. (1989).
Series Analysis, Forecasting and Control. 3rd ed. Engle "Econometric Analysis of Aggregation in the Context of
wood Cliffs, N.J.: Prentice Hall. Linear Prediction Models.u Econometrica, 57, 861-888.
Burns, A. F., and Mitchell, W. C. (1946). Measuring Business Slutsky, E. (1927). "The Summation of Random Causes as
Cycles. New York: National Bureau of Economic Research. the Source of Cyclic Processes.N Economellica, 5, 105-146.
Diebold, F. X., Lee, J.-H., and Weinbach, G. (1994). "Regime Stock, J. H., and Watson, M. W. (2003). "Macroeconomic
Switching with Time-Varying Transition Probabilities." In Forecasting in the Euro Area: Country-Specific versus Area
C. Hargreaves (ed.), Nonstationary Time Series Analysis Wide Information." European Economic Review, 47, 1-18.
and Cointegration. Oxford: Oxford University Press,
Taylor, S. (1996). Modeling Financial Time Series. 2nd ed.
283-302. Reprinted in Diebold and Rudebusch (1999).
New York: Wiley.
Diebold, F. X., and Lopez, J. (1995). "Modeling Volatility
Tong, H. (1990). Non-linear Time Series. Oxford:
Dynamics." In Kevin Hoover (ed.), Macroeconometrics:
Clarendon Press.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
/f
l1tat1ve Analysis. Seventh Edition by Global Association of Risk Professionals.
...
. \
"-----
nc. All Rights Reserved. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Define and distinguish between volatility, variance • Explain mean reversion and how it is captured in the
rate, and implied volatility. GARCH(l,1) model.
• Describe the power law. • Explain the weights in the EWMA and GARCH(l,1)
• Explain how various weighting schemes can be used models.
in estimating volatility. • Explain how GARCH models perform in volatility
• Apply the exponentially weighted moving average forecasting.
(EWMA) model to estimate volatility. • Describe the volatility term structure and the impact
• Describe the generalized autoregressive conditional of volatility changes.
heteroskedasticity (GARCH(p, q)) model for
estimating volatility and its properties.
• Calculate volatility using the GARCH(1,1) model.
i Chapter 70 of Risk Management and Financial Institutions, Fourth Edition, by John C. Hull.
Excerpt s
207
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
An alternative definition of daily volatility of a variable Business Days vs. Calendar Days
is therefore the standard deviation of the proportional
change in the variable during a day. This is the definition One issue is whether time should be measured in calen
that is usually used in risk management. dar days or business days. As shown in Box 14-1, research
2017 �inancialRiskManager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyrigh
t O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
shows that volatility is much higher on business days than Black-Scholes-Merton option pricing model that cannot
on non-business days. As a result, analysts tend to ignore be observed directly is the volatility of the underlying
weekends and holidays when calculating and using vola asset price. The implied volatility of an option is the vola
tilities. The usual assumption is that there are 252 days tility that gives the market price of the option when it is
per year. substituted into the pricing model.
Assuming that the returns on successive days are inde
pendent and have the same standard deviation, this The VIX Index
means that The CBOE publishes indices of implied volatility. The most
popular index. the VIX, is an index of the implied volatil
ity of 30-day options on the S&P 500 calculated from a
or wide range of calls and puts.1 Trading in futures on the VIX
started in 2004 and trading in options on the VIX started
in 2006. A trade involving options on the S&P 500 is a bet
on both the future level of the S&P 500 and the volatility
of the S&P 500. By contrast, a futures or options contract
showing that the daily volatility is about 6% of annual on the VIX is a bet only on volatility. One contract is on
volatility. 1,000 times the index.
IMPLIED VOLATILITIES
Although risk managers usually calculate volatilities from 1 Similarly, the VXN is an index of the volatility of the NASDAQ
historical data, they also try and keep track of what are 100 index and the VXD is an index of the volatility of the Dow
known as implied volatilities.
The one parameter in the jones Industrial Average.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
90
Example 14.3
Suppose that a trader buys BO
an April futures contract on 70
the VIX when the futures
price is $18.5 (correspond- 60
ing to a 30-day S&P 500
volatility of 18.5%) and 50
closes out the contract
40
when the futures price is
$19.3 (corresponding to 30
an S&P 500 volatility of
19.3%). The trader makes a 20
gain of $800.
10
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition byGlobal Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
3 Kurtoss
i measures the size of a distribution's tails. A /eptokur
tic distribution has heavier tails than the normal distribution. A
iij[CliJ;ljCtfj Comparison of normal distribution
platykurtic distribution has less heavy tails than the normal dis
with a heavy-tailed distribution.
tribution; a distribution with the same sized tails as the normal Note: The two distributions have the same mean and standard
distribution is termed mesokurtic:. deviation.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Example 14.4
0.5 1.5 2
Suppose that we know from experience that a = 3 for a
-1
particular financial variable and we observe that the prob
ability that v > 10 is 0.05. EQuation (14.1) gives -2
-3
0.05 = K x 10-3
so that K = 50. The probability that v > 20 can now be
�
,....,
,,.,.
-4
calculated as >
.c
....,
50 x 20-3 = 0.00625 e -5
CL
The probability that v > 30 is �
-6
50 x 30-3 = 0.0019
-7
and so on.
-8
Equation (14.1) implies that
ln[Prob(v > x)] = In K - 11 In x
-9
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Table 14-3 shows a possible sequence of stock prices. 15 20.70 1.00976 0.00971
Suppose that we are interested in estimating the vola 16 20.90 1.00966 0.00962
tility for day 21 using 20 observations on the u1 so that
n = 21 and m = 20. In this case, u = 0.00074 and the 17 20.40 0.97608 -0.02421
estimate of the standard deviation of the daily return 18 20.50 1.00490 0.00489
calculated using Equation (14.2) is 1.49%.
19 20.60 1.00488 0.00487
For risk management purposes, the formula in Equa 20 20.30 0.98544 -0.01467
tion (14.2) is usually changed in a number of ways:
1. As explained u1 is defined as the percentage change in
the market variable between the end of day i - l and
very small when compared with the standard devia
the end of day i so that
tion of changes.4
(14.J) 3. m - 1 is replaced by m. This moves us from an unbi
ased estimate of the volatility to a maximum likeli
This makes very little difference to the values of u1 that hood estimate.
are computed.
2. ii is assumed to be zero. The justification for this is 4 This is likely to be the case even if the variable happened to
that the expected change in a variable in one day is increase or decrease quite fast during the m days of our data.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
These three changes allow the formula for the variance This is known as an ARCH(m) model. It was first sug
rate to be simplified to gested by Engle.5 The estimate of the variance is based
on a long-run average variance and m observations. The
cs2n = -�u2
1 m
m� n-1
(14A) older an observation, the less weight it is given. Defining
'° = 'YVL' the model in Equation (14.6) can be written
where u1 is given by Equation (14.3). m
02 = m+ �
� au2 n
n
/•1 I -1
(14.7)
Example 14.6
In the next two sections, we discuss two important
Consider again Example 14.5. When n = 21 and m = 20,
approaches to monitoring volatility using the ideas in
Equations (14.5) and (14.6).
fu!_, = 0.00424
1
=1
so that Equation (14.4) gives THE EXPONENTIALLY WEIGHTED
MOVING AVERAGE MODEL
(J� = 0.00424/20 = 0.000214
(14.S) The estimate, un, of the volatility for day n (made at the
end of day n 1) is calculated from un-i (the estimate that
-
The variable a, is the amount of weight given to the obser was made at the end of day n 2 of the volatility for day
-
vation i days ago. The a's are positive. If we choose them n 1) and u,.._, (the most recent daily percentage change).
so that m; < mi when i > j, less weight is given to older
-
observations. The weights must sum to unity, so that To understand why Equation (14.8) corresponds to
weights that decrease exponentially, we substitute for o!-1
fa,
1 =
1 to get
/=
An extension of the idea in Equation (14.5) is to assume
that there is a long-run average variance rate and that this or
should be given some weight. This leads to the model that
takes the form
Substituting in a similar way for a-1,.._ gives
G2n : "11
1•1 k
I +�aµ2 r.-1 (14.6)
2
/=I
where VL is the long-run variance rate and 'Y is the weight
assigned to VL. Because the weights must sum to unity, we
have 5 See R. F. Engle, "Autoregressive Conditional Heteroscedasticity
with Estimates of the Variance of U.K. lnflation,u Econometrica 50
(1982): 987-1008. Robert Engle won the Nobel Prize for Econom
ics in 2003 for his work on ARCH models.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Continuing in this way, we see that (i.e., a value close to 1.0) produces estimates of the daily
volatility that respond relatively slowly to new information
O"� = (1- i..)f ;.1-1u!_, + i..mO"�-m provided by the daily percentage change.
1-1
The RiskMetrics database, which was originally created
For a large m, the term A.ma!..,,, is sufficiently small to be by JPMorgan and made publicly available in 1994, used
ignored so that Equation (14.8) is the same as Equa the EWMA model with A. = 0.94 for updating daily volatil
tion (14.5) with a1 = (1 - A.).t1/-1 • The weights for the u1 ity estimates. The company found that, across a range of
decline at rate A. as we move back through time. Each different market variables, this value of A. gives forecasts
weight is A. times the previous weight. of the variance rate that come closest to the realized
variance rate.6 In 2006, RiskMetrics switched to using a
long memory model. This is a model where the weights
assigned to the u2,,...1 as i increases decline less fast than
Example 14.7
Suppose that A. is 0.90, the volatility estimated for a mar in EWMA.
ket variable for day n - 1 is 1% per day, and during day
n - 1 the market variable increased by 2%. This means that
u�_1 = 0.012 = 0.0001 and u2,,...1 = 0.02" = 0.0004. Equa- THE GARCH(111) MODEL
tion (14.8) gives
We now move on to discuss what is known as the
u2 = 0.9 x 0.0001 + 0.1
n
x 0.0004 = 0.00013 GARCH(l,1) model proposed by Bollerslev in 1986.7 The
difference between the EWMA model and the GARCH(l,1)
.The estimate of the volatility for day n, a" is, therefore,
1 model is analogous to the difference between Equa-
'0.00013 or 1.14% per day. Note that the expected value
tion (14.5) and Equation (14.6). In GARCH(l,1), u! is calcu
of u2n-1 is u2n-1 or 0.0001. In this example, the realized value
lated from a long-run average variance rate, VL, as well as
of u2n-1 is greater than the expected value, and as a result
from a,,...1 and u,,...1• The equation for GARCH(l,1) is
our volatility estimate increases. If the realized value of tPn-1
had been less than its expected value, our estimate of the
(14.9)
volatility would have decreased.
where 'Y is the weight assigned to VL, a is the weight
assigned to LJ!-1• and 13 is the weight assigned to u2,._r
The EWMA approach has the attractive feature that the
Because the weights must sum to one:
data storage requirements are modest. At any given time,
we need to remember only the current estimate of the 'Y + a + i;\ = 1
variance rate and the most recent observation on the
value of the market variable. When we get a new observa The EWMA model is a particular case of GARCH(l,1)
tion on the value of the market variable, we calculate a where -y = 0, a = 1 - A., and 13 = A..
new daily percentage change and use Equation (14.8) to The "(1,l)N in GARCH(l,1) indicates that a! is based on the
update our estimate of the variance rate. The old estimate most recent observation of u2 and the most recent esti
of the variance rate and the old value of the market vari mate of the variance rate. The more general GARCH(p, Q)
able can then be discarded. model calculates � from the most recent p observations
The EWMA approach is designed to track changes in the
volatility. Suppose there is a big move in the market vari
able on day n - 1 so that tPn-1 is large. From Equation (14.B)
this causes our estimate of the current volatility to move
upward. The value of A. governs how responsive the esti 8See JP Morgan, Ri skMetrics Monitor. Fourth Quarter, 1995. We
mate of the daily volatility is to the most recent daily per will explain an alternative (maximum likelihood) approach to esti
centage change. A low value of A. leads to a great deal of mating parameters later in the chapter. The realized variance rate
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
This is the form of the model that is usually used for the
purposes of estimating the parameters. Once w, u, and p
have been estimated, we can calculate 'Y as 1 u p. The - - Substituting for cr!._ we get
2
long-term variance VL can then be calculated as 0>/-y. For a
stable GARCH(1,1) model, we require a + p < 1. Otherwise
the weight applied to the long-term variance is negative.
Continuing in this way, we see that the weight applied to
u2,..,. is up"1. The weights decline exponentially at rate p.
Example 14.8
The parameter p can be interpreted as a decay rate. It
Suppose that a GARCH(1,1) model is estimated from daily is similar to >.. in the EWMA model. It defines the relative
data as importance of the observations on the u1 in determining
the current variance rate. For example, if � = 0.9, u!._2 is
a! = 0.000002 + 0.13u!_, + 0.86G!_1 only 90% as important as u2,,_1; u2n-3 is 81% as important as
u2,,_1; and so on. The GARCH(1,1) model is the same as the
This corresponds to a. = 0.13, p = 0.86, and 1s> = 0.000002.
Because 'Y = 1 a. p, it follows that 'Y = 0.01 and
EWMA model except that, in addition to assigning weights
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
the parameters that maximize the chance (or likelihood) multiplicative factors, it can be seen that we wish to
[ ]
of the data occurring. maximize
We start with a very simple example. Suppose that we m
2
sample 10 stocks at random on a certain day and find �
-ln(v) -
u
� (14.12)
that the price of one of them declined during the day and
the prices of the other nine either remained the same or or
�
increased. What is our best estimate of the proportion
m
of stock prices that declined during the day? The natural -mln(v) - !,_=t_
answer is 0.1. Let us see if this is the result given by maxi /•1 v
-, �L. u1
the other nine do not is p(l - p)9. (There is a probability p
that it will decline and 1 - p that each of the other nine 2
m 1-1
will not.) Using the maximum likelihood approach, the
best estimate of p is the one that maximizes p(l - p)9• Dif This maximum likelihood estimator is the one we used in
ferentiating this expression with respect top and setting
the result equal to zero, it can be shown that p = 0.1 maxi
Equation (14.4). The corresponding unbiased estimator is
the same with m replaced by m - 1.
mizes the expression. The maximum likelihood estimate of
p is therefore 0.1. as expected.
Estimating EWMA or GARCH(111)
Estimating a Constant Variance We now consider how the maximum likelihood method
In our next example of maximum likelihood methods, we can be used to estimate the parameters when EWMA.
GARCH(1,1), or some other volatility updating procedure
is used. Deline V; = a�as the variance estimated for day i.
consider the problem of estimating a variance of a vari
able X from m observations on X when the underlying dis
tribution is normal. We assume that the observations are Assume that the probability distribution of u1 conditional
u1, u , • • • , um and that the mean of the underlying normal
on the variance is normal. A similar analysis to the one
2
distribution is zero. Denote the variance by v. The likeli just given shows the best parameters are the ones that
[-- ( )]
hood of u, being observed is the probability density func maximize
tion for X when X = u,. This is
(�)
m 2
1 -u
1 u
2 TI
,., ) 21W,
exp _:i
2v,
exp
hxv
Taking logarithms we see that this is equivalent to
The likelihood of m observations occurring in the order in maximizing
which they are observed is
(14..13)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
18-Jul-2005 1 1221.13
19-Jul-2005 2 1229.35 0.006731
20-Jul-2005 3 1235.20 0.004759 0.00004531 9.5022
21-Jul-2005 4 1227.04 -0.006606 0.00004447 9.0393
22-Jul-2005 5 1233.68 0.005411 0.00004546 9.3545
25-Jul-2005 6 1229.03 -0.003769 0.00004517 9.6906
... ... ... ... ... ...
... ... ... ... ... ...
11-Aug-2010 1277 1089.47 -0.028179 0.00011834 2.3322
12-Aug-2010 1278 1083.61 -0.005379 0.00017527 8.4841
13-Aug-2010 1279 1079.25 -0.004024 0.00016327 8.6209
10,228.2349
Trial Estimates of GARCH parameters
w a p
0.000001347 0.08339 0.9101
table analyzes data on the S&P 500 between July 18, current trial estimates of (I), a, and p. We are interested
2005, and August 13, 2010.9 in choosing Cl>, a, and p to maximize the sum of the num
The numbers in the table are based on trial estimates of bers in the sixth column. This involves an iterative search
the three GARCH(l,1) parameters: ro, a, and p. The first procedure.10
column in the table records the date. The second column In our example, the optimal values of the parameters turn
counts the days. The third column shows the S&P 500 at out to be
the end of day i, s, The fourth column shows the propor w = 0.0000013465
tional change in the exchange rate between the end of
day i - 1 and the end of day i. This is u; = (S; - S1-1)/S1-r a = 0.083394
The fifth column shows the estimate of the variance rate, p = 0.910116
v, = � for day i made at the end of day i - 1. On day
and the maximum value of the function in Equation (14.13)
three, we start things off by setting the variance equal to
is 10,228.2349. (The numbers shown in Table 14-4 are
u�. On subsequent days, Equation (14.10) is used. The sixth
actually those calculated on the final iteration of the
column tabulates the likelihood measure, -ln(v) - u2/v,
search for the optimal w, a, and J3i.)
The values in the fifth and sixth columns are based on the
9 The data and calculations can be found at http://www-2.rotman 10 As discussed later, a general purpose algorithm such as Solver
.utoronto.ca/-hulVofod/GarchExample/index.html. in Microsoft's Excel can be used.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
1800 6
1600
5
1400
1200 4
1000
3
800
600 2
400
200
0 0 -+-��---.���...-��--..���.,....-�
Jul-05 Jul-06 Jul-07 Jul-08 Jul-09 Jul-10 Jul-OS Jul-06 Jul-07 Jul-08 Jul-09 Jul-10
FIGURE 14-4 S&P 500 index: Ju ly 18, 2005, FIGURE 14-S GARCH(1.1) daily volatility of S&P
to August 13, 2010. 500 index: Ju ly 18, 2005,
to August 13, 2010.
by the data. Most of the time, the volatility was less than only one parameter; >.. has to be estimated. In the data
2% per day, but volatilities over 5% per day were experi in Table 14·4, the value of� that maximizes the objective
enced during the credit crisis. (Very high volatilities are function in Equation (14.13) is 0.9374 and the value of the
also indicated by the VIX index-see Figure 14-1.) objective function is 10,192.5104.
An alternative more robust approach to estimating For both GARCH(l,l) and EWMA, we can use the Solver
parameters in GARCH(1,1) is known as variance target routine In Excel to search for the values of the parameters
ing.n This involves setting the long-run average variance that maximize the likelihood function. The routine works
rate, VL, equal to the sample variance calculated from the well provided we structure our spreadsheet so that the
data (or to some other value that is believed to be rea parameters we are searching for have roughly eQual val
sonable). The value of� then equals VL(l a p) and ues. For example, in GARCH(1,I) we could let cells A1, A2,
and A3 contain co x 105, 10a, a nd p. We could then set
- -
gives a dally volatility of 1.5531%. Setting VL equal to the then use Bl, B2, and B3 for the calculations. but we would
sample variance, the values of u and p that maximize ask Solver to calculate the values of Al, A2, and A3 that
maximize the likelihood function. Sometimes, Solver gives
11 a local maximum, so a number of different starting values
See R. Engle and J. Mezrich, "GARCH for Groups,• Risk (August
1996): 36-40. for the parameter should be tested.
2017 Finl/lllcia/ Rial< Msnager (FRM) P< I: Quanlilatil/8 Ans/ya, Seventh Edition by Global Aaaociaticn of Risk Pnlfessionals.
Copyright 0 2017 by Peal'llOn Education, Inc. All Rights ReseM!d. Pear.ion Ct.Islam Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
12
For a series x� the autocorrelation with a lag of k is the coeffi
cient of correlation between x, and xs+v so that
11
See G. M. Ljung and G. E. P. Box, "'On a Measure of Lack of Fit in
Time Series Models.D Biometr ica 65 (1978): 297-303.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Pea1"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
V . As
L
On day n + t in the future, we have mentioned earlier, the variance rate exhibits mean
reversion with a reversion level of VL and a reversion rate
of 1 - a - p. Our forecast of the future variance rate tends
toward VL as we look further and further ahead. This anal
The expected value of �H-i is a�+t-r Hence,
ysis emphasizes the point that we must have a + p < 1 for
a stable GARCH(1,1) process. When a + p > 1, the weight
given to the long-term average variance is negative and
where E denotes expected value. Using this equation the process is mean fleeing rather than mean reverting.
repeatedly yields For the S&P 500 data considered earlier, a + p = 0.9935
and VL = 0.0002075. Suppose that our estimate of the
current variance rate per day is 0.0003. (This corresponds
or to a volatility of 1.732% per day.) In 10 days, the expected
variance rate is
(14.14)
0.0002075 + 0.993510(0.0003 - 0.0002075) =
0.0002942
This equation forecasts the volatility on day n + t using
the information available at the end of day n - 1. In the The expected volatility per day is .Jo.0002942 or 1.72%,
EWMA model, a + p 1 and Equation (14.14) shows that
�
still well above the long-term volatility of 1.44% per day.
the expected future variance rate equals the current vari However, the expected variance rate in 500 days is
ance rate. When a + p < 1, the final term in the equation 0.0002075 + 0.9935500(0.0003 - 0.0002075) = 0.0002110
becomes progressively smaller as t increases. Figure 14-6
shows the expected path followed by the variance rate for and the expected volatility per day is 1.45%, very close to
situations where the current variance rate is different from the long-term volatility.
Time Time
(a) (b)
liiit91!;Jjti:U Expected path for the variance rate when (a) current variance rate is above
long-term variance rate and (b) current variance rate is below long-term
variance rate.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Volatlllty Term Structures where Tis measured in days. Table 14-6 shows the volatil
ity per year for different values of T.
Suppose it is day n. Define
V(t) = E(G�+r) Impact of Volatlllty Changes
o(T)2 { aT (o(0)2 )}
and Equation (14.15) can be written as
a = ln--
1 1 - e-.r
a+p = 252 v.L + - v1
252
so that Equation (14.14) becomes
When u(O) changes by .do-(0), a<n changes by
V(t) = VL + e-«[V(O) - V1] approximately
G(O)
T
We assume as before that V(O) = 0.0003 so that
= -J2s2 x -J0.0003 = 2750%. The table considers a
As increases, this approaches Ve Define a(7) as the
100-basis-point change in the instantaneous volatility
volatility per annum that should be used to price a T-day
from 27.50% per year to 28.50% per year. This means that
option under GARCH(l,1). Assuming 252 days per year,
a(7)2 is 252 times the average variance rate per day, so .do-(0) = 0.01 or 1 %.
that Many financial institutions use analyses such as this
when determining the exposure of their books to volatil
(14.15) ity changes. Rather than consider an across-the-board
increase of 1% in implied volatilities when calculating vega,
they relate the size of the volatility increase that is consid
The relationship between the volatilities of options and
their maturities is referred to as the volatility term struc
ered to the maturity of the option. Based on Table 14-7, a
ture. The volatility term structure is usually calculated
from implied volatilities, but Equation (14.15) provides an
alternative approach for estimating it from the GARCH(l,1) "'j:l@JE!fl S&P 500 Volatility Term Structure
model. Although the volatility term structure estimated Predicted from GARCH(l,1)
from GARCH(l,1) is not the same as that calculated from
Option Life
implied volatilities, it is often used to predict the way that
(days) 10 30 50 100 500
the actual volatility term structure will respond to volatil
ity changes. Option 27.36 27.10 26.87 26.35 24.32
volatility(%
When the current volatility is above the long-term volatil per annum)
ity, the GARCH(l,1) model estimates a downward-sloping
volatility term structure. When the current volatility is
below the long-term volatility, it estimates an upward
sloping volatility term structure. In the case of the ifj:l@jEW Impact of 1% Increase in the
S&P 500 data, a = ln(l/0.99351) = 0.006511 and VL = Instantaneous Volatility Predicted
0.0002075. Suppose that the current variance rate per from GARCH(1,1)
day, V(O), is estimated as 0.0003 per day. It follows from Option Life
Equation (14.15) that (days) 10 30 50 100 500
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
0.97% volatility increase would be considered for a 10-day calculated as a weighted average of the U:· The key fea
option, a 0.92% increase for a 30-day option, a 0.87% ture of the methods that have been discussed here is that
increase for a 50-day option, and so on. they do not give equal weight to the observations on the
ir,. The more recent an observation, the greater the weight
assigned to it. In the EWMA model and the GARCH(1,1)
SUMMARY model, the weights assigned to observations decrease
exponentially as the observations become older. The
In risk management, the daily volatility of a market vari GARCH(l,1) model differs from the EWMA model in that
able is defined as the standard deviation of the percent some weight is also assigned to the long-run average vari
age daily change in the market variable. The daily variance ance rate. Both the EWMA and GARCH(l,l) models have
rate is the square of the daily volatility. Volatility tends to structures that enable forecasts of the future level of vari
be much higher on trading days than on nontrading days. ance rate to be produced relatively easily.
As a result, nontrading days are ignored in volatility cal
Maximum likelihood methods are usually used to estimate
culations. It is tempting to assume that daily changes in
parameters in GARCH(l,l) and similar models from histori
market variables are normally distributed. In fact, this is
cal data. These methods involve using an iterative pro
far from true. Most market variables have distributions for
cedure to determine the parameter values that maximize
percentage daily changes with much heavier tails than the
the chance or likelihood that the historical data will occur.
normal distribution. The power law has been found to be
Once its parameters have been determined, a model can
a good description of the tails of many distributions that
be judged by how well it removes autocorrelation from
are encountered in practice.
the ir,.
This chapter has discussed methods for attempting to
keep track of the current level of volatility. Define u1 as
The GARCH(1,1) model can be used to estimate a volatil
ity for options from historical data. This analysis is often
the percentage change in a market variable between the
used to calculate the impact of a shock to volatility on the
end of day i 1 and the end of day i. The variance rate of
-
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Define correlation and covariance and differentiate • Define copula and describe the key properties of
between correlation and dependence. copulas and copula correlation.
• Calculate covariance using the EWMA and • Explain tail dependence.
GARCH(1,1) models. • Describe Gaussian copula, Student's t-copula,
• Apply the consistency condition to covariance. multivariate copula, and one-factor copula.
• Describe the procedure of generating samples from
a bivariate normal distribution.
• Describe properties of correlations between
normally distributed variables when using a
one-factor model.
i Chapter 71 of Risk Management and Financial Institutions, Fourth Edition, by .John C. Hull.
Excerpt s
225
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Suppose that a company has an exposure to two differ- An analogy here is that variance rates were the funda
ent market variables. In the case of each variable, it gains mental variables for the EWMA and GARCH models, even
$10 million if there is a one-standard-deviation increase though it is easier to develop intuition about volatilities.
and loses $10 million if there is a one-standard-deviation
decrease. If changes in the two variables have a high posi Correlation vs. Dependence
tive correlation, the company's total exposure is very high;
Two variables are defined as statistically independent if
if they have a correlation of zero, the exposure is less but
knowledge about one of them does not affect the prob
still quite large; if they have a high negative correlation, the
ability distribution for the other. Formally, V. and V2 are
exposure is quite low because a loss on one of the variables
independent if:
is likely to be offset by a gain on the other. This example
shows that it is important for a risk manager to estimate f(V21 V, = x) = f(V2)
correlations between the changes in market variables as for all x where f(.) denotes the probability density function
well as their volatilities when assessing risk exposures. and I is the symbol denoting "conditional on."
This chapter explains how correlations can be monitored in If the coefficient of correlation between two variables
a similar way to volatilities. It also covers what are known is zero, does this mean that there is no dependence
as copulas. These are tools that provide a way of defin- between the variables? The answer is no. We can illustrate
ing a correlation structure between two or more variables, this with a simple example. Suppose that there are three
regardless of the shapes of their probability distributions. equally likely values for V1: -1, 0, and +1. If V. = -1 or V1 =
Copulas have a number of applications in risk manage +1, then V2 = +1. If v, = 0, then V2 = 0. In this case, there
ment. The chapter shows how a copula can be used to cre is clearly a dependence between V, and V2• If we observe
ate a model of default correlation for a portfolio of loans. the value of V.· we know the value of V2• Also, a knowl
This model is used in the Basel II capital requirements. edge of the value of V2 will cause us to change our prob
ability distribution for V. · However, the E(V, V2) = 0 and
E(V.) = 0, it is easy to see that the coefficient of correla
DEFINITION OF CORRELATION tion between V. and V2 is zero.
This example emphasizes the point that the coefficient
The coefficient of correlation, p, between two variables V.
of correlation measures one particular type of depen
and V2 is defined as
dence between two variables. This is linear dependence.
E(y;V2) - E(V.)E(V2)
p-
There are many other ways in which two variables can be
(15.1)
_
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
__
covn = E(x,yn) - E(x,)E(y,)
_ __,,
..--
- -+-
- ___.
.__ Vi Risk managers assume that expected daily
returns are zero when the variance rate per day
is calculated. They do the same when calculating
the covariance rate per day. This means that the
covariance rate per day between X and Yon
day n is assumed to be
(a) (b) COVn = E(x,yn)
Using eciual weights for the last m observations
on x, and y1 gives the estimate
05.3)
(c)
lattl•liljl?ll Examples of ways in which V can be
2
and for variable Yas
dependent on V1 • 1 m
var = - I, Y.
y,n m 1-1 n-1
conditional on Vr As we will see later, this is constant The correlation estimate on day n is
when v; and V2 have a bivariate normal distribution. But, cov
in other situations, the standard deviation of V2 is liable to
depend on the value of v;.
EWMA
MONITORING CORRELATION Most risk managers would agree that observations from
long ago should not have as much weight as recent
Exponentially weighted moving average and GARCH meth observations. The exponentially weighted moving average
ods can be developed to monitor the variance rate of a (EWMA) model for variances leads to weights that decline
variable. Similar approaches can be used to monitor the exponentially as we move back through time. A similar
covariance rate between two variables. The variance rate per weighting scheme can be used for covariances. The for
day of a variable is the variance of daily returns. Similarly, the mula for updating a covariance estimate in the EWMA
covariance rate per day between two variables is defined as model is:
the covariance between the daily returns of the variables.
covn = Acovn-l + (1 -
A)Xn-iYn-l
Suppose that X; and Y; are the values of two variables, X
and Y. at the end of day i. The returns on the variables on
A similar analysis to that presented for the EWMA volatil
ity model shows that the weight given to xn-iYn-i declines
day i are as i increases (i.e., as we move back through time). The
Y,
- y; - y;-1
-
y
lower the value of A., the greater the weight that is given
1-1 to recent observations.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
If we make a small change to a positive-semidefinite first obtaining independent samples z, and z2 from a
matrix that is calculated from observations on three univariate standardized normal distribution. The required
variables (e.g., for the purposes of doing a sensitivity samples Ei and £2 are then calculated as follows:
analysis), it is likely that the matrix will remain positive
£, = z1
semidefinite. However, if we do the same thing for
observations on 100 variables, we have to be much � = pZ, �
+ z2 l - P
2
more careful. An arbitrary small change to a positive where p is the coefficient of correlation in the bivariate
semidefinite 100 x 100 matrix is quite likely to lead to it normal distribution.
no longer being positive-semidefinite.
Consider next the situation where we require samples
from a multivariate normal distribution (where all
MULTIVARIATE NORMAL variables have mean zero and standard deviation one)
DISTRIBUTIONS and the coefficient of correlation between variable i
and variable j is Pir We first sample n independent
Multivariate normal distributions are well understood and variables z1(1 s i s n) from univariate standardized
relatively easy to deal with. As we will explain in the next normal distributions. The required samples are
section, they can be useful tools for specifying the cor £1 (1 :-s; i ""' n), where
relation structure between variables, even when the distri
butions of the variables are not normal. (15.5)
We start by considering a bivariate normal distribution
where there are only two variables, V1 and V2• Suppose and the «;.;,are parameters chosen to give the correct vari
that we know V. has some value. Conditional on this, the ances and the correct correlations for the Er For 1 s j < i,
value of V2 is normal with mean we must have
v. - µ.
f.l-2 + Pa2 _:_i___
a, ___r::i
and standard deviation and, for all j < i,
a2�l - p2
Here µ.1 and µ.2 are the unconditional means of v; and V2 ,
The first sample, Ei• is set equal to z1• These equations can
a1 and a2 are their unconditional standard deviations, and
p is the coefficient of correlation between v; and V2 • Note
be solved so that � is calculated from z1 and z2 , £3 is cal
that the expected value of V2 conditional on v; is linearly
culated from z1, zv and z3, and so on. The procedure is
dependent on the value of v;. This corresponds to Fig
known as the Cho/esky decomposition.
ure 15-l(a). Also, the standard deviation of V2 conditional
on the value of v; is the same for all values of v,. If we find ourselves trying to take the square root of a
negative number when using the Cholesky decomposi
Generating Random Samples tion, the variance-covariance matrix assumed for the
from Normal Distributions variables is not internally consistent. As explained ear
lier, this is equivalent to saying that the matrix is not
Most programming languages have routines for sampling positive-semidefinite.
a random number between zero and one, and many have
routines for sampling from a normal distribution.3
When samples Ei and £2 from a bivariate normal distribu Factor Models
tion (where both variables have mean zero and standard
Sometimes the correlations between normally distributed
deviation one) are required, the usual procedure involves
variables are defined using a factor model. Suppose that
Ul' U2, , UN have standard normal distributions (i.e.,
. • •
3 In Excel. the instruction =NORMSINV(RAND()) gives a random normal distributions with mean zero and standard devia
sample from a normal distribution. tion one). In a one-factor model, each U; has a component
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
dependent on a common factor, F, and a component that If the marginal distributions of v; and V2 are normal, a
is uncorrelated with the other variables. Formally, convenient and easy-to-work-with assumption is that
the joint distribution of the variables is bivariate normal.4
(15.6)
(The correlation structure between the variables is then
where F and the Z; have standard normal distributions and as described earlier.) Similar assumptions are possible for
a1 is a constant between -1 and +1. The Z; are uncorrelated some other marginal distributions. But often there is no
with each other and uncorrelated with F. The coefficient of natural way of defining a correlation structure between
z, is chosen so that u, has a mean of zero and a variance two marginal distributions. This is where copulas come in.
of one. In this model, all the correlation between U1 and
As an example of the application of copulas, suppose that
U1 arises from their dependence on the common factor. F.
variables v; and V2 have the triangular probability density
The coefficient of correlation between u, and � is a,ar
functions shown in Figure 15-2. Both variables have values
A one-factor model imposes some structure on the cor between 0 and 1. The density function for v; peaks at 0.2.
relations and has the advantage that the resulting cova The density function for V2 peaks at 0.5. For both density
riance matrix is always positive-semidefinite. Without functions, the maximum height is 2.0 (so that the area under
assuming a factor model, the number of correlations that the density function is 1.0). To use what is known as a Gauss
have to be estimated for the N variables is N(N - 1)/2. ian copula, we map v; and v2 into new variables u1 and u2
With the one-factor model, we need only estimate N that have standard normal distributions. (A standard normal
parameters: a1 , a1, . . . , a"'. An example of a one-factor distribution is a normal distribution with mean zero and
model from the world of investments is the capital asset standard deviation one.) The mapping is accomplished on a
pricing model where the return on a stock has a compo pen:entile-to-percentile basis. The one-pen:entile point of the
nent dependent on the return from the market and an v; distribution is mapped to the one-percentile point of the
idiosyncratic (nonsystematic) component that is indepen ul distribution; the 10-percentile point of the v; distribution is
dent of the return on other stocks. mapped to the 10-percentile point of the u, distribution; and
The one-factor model can be extended to a two-factor, so on. V2 is mapped into U2 in a similar way. Table 15-1 shows
three-factor, or M-factor model. In the M-factor model how values of v; are mapped into values of u,. Table 15-2
similarly shows how values of V2 are mapped into values of
u, = anF1 + a,;=2 + . . . + aJN + Jl - a:i - a:2 - • • • - a!t Z,
U1• Consider the v; = 0.1 calculation in Table 15-1. The cumula
tive probability that v; is less than 0.1 is (by calculating areas
(15.7)
The factors, F1, F2, • • • F,,, have uncorrelated standard nor of triangles) 0.5 x 0.1 x 1 = 0.05 or 5%. The value 0.1 for v;
mal distributions and the Z1 are uncorrelated both with therefore gets mapped to the five-percentile point of the
each other and with the factors. In this case, the correla standard normal distribution. This is -1.64.5
tion between U1 and U1 is
,., 4 Although the bivariate normal assumption is a convenient one, it
I, a,.,,a"' is not the only one that can be made. There are many other ways
m-1
� for -k -s. � s k
In which two normally distributed varlables can be dependent on
each other. For example. we could have V2
-v1 otherwise.
=
and v2
COPULAS
=
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
IC!:!!jljI Mapping of v; Which Has the lt;1:1!JP§'J Mapping of v Which Has the
2
Triangular Distribution in Triangular Distribution in
Figure 15-2(a) to U1 Which Has Figure 15-2(b) to U2 Which Has
a Standard Normal Distribution a Standard Normal Distribution
Percentlle of Percentlle of
V, Value Distribution u, Value V2 Value Distribution u2 Value
The variables, u, and U2, have normal distributions. We distributions and for which it is easy to define a correla
assume that they are jointly bivariate normal. This in tion structure.
turn implies a joint distribution and a correlation struc
Suppose that we assume the correlation between u, and
ture between V1 and V2• The essence of copula is there
u2 is 0.5. The joint cumulative probability distribution
fore that, instead of defining a correlation structure between V, and V2 is shown in Table 15-3. To illustrate
between V, and Va directly, we do so indirectly. We map the calculations, consider the first one where we are cal
V, and Va into other variables which have well-behaved culating the probability that V, < 0.1 and V2 < 0.1. From
12,1:1!jp§
] Cumulative Joint Probability Distribution for v; and V2 in the Gaussian Copula Model
(Correlation parameter = 0.5. Table shows the joint probability that V1 and V2 are less than the
specified values.)
0.1 0.006 0.017 0.028 0.037 0.044 0.048 0.049 0.050 0.050
0.2 0.013 0.043 0.081 0.120 0.156 0.181 0.193 0.198 0.200
0.3 0.017 0.061 0.124 0.197 0.273 0.331 0.364 0.381 0.387
0.4 0.019 0.071 0.149 0.248 0.358 0.449 0.505 0.535 0.548
0.5 0.019 0.076 0.164 0.281 0.417 0.537 0.616 0.663 0.683
0.6 0.020 0.078 0.173 0.301 0.456 0.600 0.701 0.763 0.793
0.7 0.020 0.079 0.177 0.312 0.481 0.642 0.760 0.837 0.877
0.8 0.020 0.080 0.179 0.318 0.494 0.667 0.798 0.887 0.936
0.9 0.020 0.080 0.180 0.320 0.499 0.678 0.816 0.913 0.970
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
(Note that the probability would be only 0.02 x 0.05 = G2(V2) = N(u.)
0.001 if p = 0.) where N is the cumulative normal distribution function.
The correlation between U1 and U2 is referred to as the This means that
u, = N- [G1(v1)] u2 = N- [G2(v2)]
copula correlation. This is not, in general, the same as the 1 1
coefficient of correlation between v, and V2• Because U1 1
and u2 are bivariate normal, the conditional mean of u2 v, = Gj"1[N(u1)] v2 = G; [N(u2)]
is linearly dependent on u, and the conditional standard The variables u, and U2 are then assumed to be bivari-
deviation of U2 is constant (as discussed earlier). However, ate normal. The key property of a copula model is that it
a similar result does not in general apply to V1 and V2• preserves the marginal distributions of v; and V2 (however
unusual these may be) while defining a correlation struc
Expressing the Approach Algebralcally ture between them.
I I
with correlation p as described earlier.
One-to-one
mappings 3. Mult � the normally distributed samples
by .,/Fix .
Tall Dependence
Figure 15-4 shows plots of 5,000 random
samples from a bivariate normal distribution
while Figure 15-5 does the same for the bivari
ate Student's t. The correlation parameter is 0.5
and the number of degrees of freedom for the
Student's t is 4. Define a tail value of a distribu
tion as a value in the left or right 1% tail of the
Correlation assumption distribution. There is a tail value for the normal
Ii[Ciil;)jl$t The way in which a copula model defines a joint distribution when the variable is greater than
distribution. 2.33 or less than -2.33. Similarly, there is a tail
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Multlvarlate Copulas
-5
i�[Ciil:ljfffl 5,000 random samples from a Copulas can be used to define a correlation structure
bivariate normal distribution. between more than two variables. The simplest example
of this is the multivariate Gaussian copula. Suppose that
10 there are N variables, V1 , V2 , , v,. and that we know
• • •
.
.
. .
Other factor copula models are obtained by choosing
F and the Z1 to have other zero-mean unit-variance
distributions. For example, if z, is normal and F has
a Student's t-distribution, we obtain a multivariate
-10 Student's t-distribution for U;. These distributional
14f?l1JdJPii 5,000 random samples from a choices affect the nature of the dependence between
bivariate Student's t-distribution with the u-variables and therefore that between the
four degrees of freedom. v-variables.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
APPLICATION TO LOAN PORTFOLIOS: To model the defaults of the loans in a portfolio, we define
VASICEK1S MODEL T; as the time when company i defaults. (There is an
implicit assumption that all companies will default
We now present an application of the one-factor Gaussian eventually-but the default may happen a long time, per
copula model that will prove useful in understanding the haps even hundreds of years, in the future.) We make the
Basel II capital requirements. Suppose a bank has a large simplifying assumption that all loans have the same cumu
portfolio of loans where the probability of default per year
for each loan is 1%. If the loans default independently of define PD
lative probability distribution for the time to default and
as the probability of default by time T:
each other, we would expect the default rate to be almost PD = Prob(T; < T).
exactly 1% every year. In practice, loans do not default
The Gaussian copula model can be used to define a corre
independently of each other. They are all influenced by
lation structure between the times to default of the loans.
macroeconomic conditions. As a result, in some years the
Following the procedure we have described, each time to
default rate is high whereas in others it is low. This is illus
default T; is mapped to a variable U; that has a standard
trated by Table 15-4, which shows the default rate for all
normal distribution on a percentile-to-percentile basis.
rated companies between 1970 and 2013. The default rate
varies from a low of 0.088% in 1979 to a high of 6.002% in We assume the factor model in Equation (15.8) for the
2009. Other high-default-rate years were 1970, 1989, 1990, correlation structure between the U; and make the simpli
1991, 1999, 2000, 2001, 2002, and 2008. fying assumption that the a1 are all the same and equal to
a so that:
All Rated Companies, 1970-2013 As in Equation (15.8), the variables F and Z1 have indepen
dent standard normal distributions. The copula correlation
Default Default Default between each pair of loans is in this case the same. It is
Year Rate Year Rate Year Rate
P = a2
1970 2.621 1985 0.960 2000 2.852
so that the expression for U; can be written
1971 0.285 1986 1.875 2001 4.345
(15.9)
1972 0.451 1987 1.588 2002 3.319
1973 0.453 1988 1.372 2003 2.018 Define the "worst case default rate," WCDR( T, X), as the
default rate (i.e., percentage of loans defaulting) during
1974 0.274 1989 2.386 2004 0.939 time Tthat will not be exceeded with probability X%. (In
1975 0.359 1990 3.750 2005 0.760 many applications Twill be one year.) As shown in what
N(�'(PD�1) --p�N-i(X))
follows, the assumptions we have made lead to
1976 0.175 1991 3.091 2006 0.721
WCDR(T, X) = (15.10)
1977 0.352 1992 1.500 2007 0.401
1978 0.352 1993 0.890 2008 2.252 This is a strange-looking result, but a very important one.
1979 0.088 1994 0.663 2009 6.002 It was first developed by Vasicek in 1987.7 The right-hand
side of the equation can easily be calculated using the
1980 0.342 1995 1.031 2010 1.408 NORMSDIST and NORMSINV functions in Excel. Note that
1981
1982
0.162
1.032
1996
1997
0.588
0.765
2011
2012
0.890
1.381
WCDR = PD.
if p = 0, the loans default independently of each other and
AS p increases, WCDR increases.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Example 15.2
Suppose that a bank has a large number of loans to retail
For a large portfolio of loans with the same PD, where the
copula correlation for each pair of loans is this equa
tion provides a good estimate of the percentage of loans
p,
customers. The one-year probability of default for each
loan is 2% and the copula correlation parameter, in
Vasicek's model is estimated as 0.1. In this case,
p, defaulting by time T conditional on F. We will refer to this
as the default rate.
WCDR(1, 0.999) =N (
N-i(
OD2) ;!!!f:_,(0!}99))
1 - 0.1
=
0.128
As F decreases, the default rate increases. How bad can
the default rate become? Because F has a normal distri
bution, the probability that F will be less than N-1(Y) is Y.
showing that the 99.9% worst case one-year default rate There is therefore a probability of Ythat the default rate
will be greater than
( )
is 12.8%.
N-i(PD) - FPN-i(Y)
N
�1 - p
Proof of Vaslcek's Result The default rate that we are .x% certain will not be exceeded
in time Tis obtained by substituting Y = 1 - X into the pre
ceding expression. Because N-1()() = -W-1(1 - X), we obtain
From the properties of the Gaussian copula model
PD = Prob(l"i < T) = Prob(U1 < U) Equation (15.10).
where Estimating PD and p
(15.11) The maximum likelihood methods can be used to esti
The probability of default by time T depends on the value mate PD and p from historical data on default rates. We
of the factor, F, in Equation (15.9). The factor can be used Equation (15.10) to calculate a high percentile of
thought of as an index of macroeconomic conditions. If F the default rate distribution, but it is actually true for
is high, macroeconomic conditions are good. Each u1 will all percentiles. If DR is the default rate and G(DR) is the
then tend to be high and the corresponding r, will there cumulative probability distribution function for DR, Equa
tion (15.10) shows that
(
fore also tend to be high, meaning that the probability of
an early default is low and therefore Prob(T; < T) is low. If F N-1(PD) + pN-1(G(DR))
F
is low, macroeconomic conditions are bad. Each U; and the
corresponding 7; will then tend to be low so that the prob
DR = N
p
vr;---
1- J
Rearranging this equation
( i )
ability of an early default is high. To explore this further, we
consider the probability of default conditional on F. N-1(D - N_,(PD)
G(DR) = N h (15.14)
From Equation (15.9),
z =� �
U. - � F Differentiating this, the probability density function for the
)l�p
I h
default rate is
-
The probability that U1 < U conditional on the factor value, G(DR) = exp
{�[ -rF..,-·(Dz- rn
F, is
-(
Prob(U1 < u I F) = Prob z, < u FpF = N u
�1 - p
-FPF
J (� J (N-'(DR))'
N- ( )
' PD (1 5.1 5)
( ) p
Equation (15.15) for each of the observations on DR.
N D)
Prob(T, < T I F) = N -i (P --fpF (15.11) .J. Use Solver to search for the values of PD and that
FP maximize the sum of the values in 2.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
( ]
default rate is
N""1(0.0141) + fcilOsN-1(0.999)
= O.lOG
N SUMMARY
.J1 - 0.108
Risk managers use correlations or covariances to describe
or 10.6% per annum.
the relationship between two variables. The daily cova
riance rate is the correlation between the daily returns
Alternatlves to the Gaussian Copula
on the variables multiplied by the product of their daily
The one-factor Gaussian copula model has its limitations. volatilities. The methods for monitoring a covariance rate
As Figure 15-4 illustrates, it leads to very little tail depen are similar to those for monitoring a variance rate. Risk
dence. This means that an unusually early default for one managers often try to keep track of a variance-covariance
company does not often happen at the same time as an matrix for all the variables to which they are exposed.
unusually early default time for another company. It can
The marginal distribution of a variable is the uncondi
be difficult to find a p to fit data. For example, there is no p
that is consistent with a PD of 1% and the situation where
tional distribution of the variable. Very often an analyst is
in a situation where he or she has estimated the marginal
one year in 10 the default rate is greater than 3%. Other
distributions of a set of variables and wants to make an
one-factor copula models with more tail dependence can
assumption about their correlation structure. If the mar
provide a better fit to data.
ginal distributions of the variables happen to be normal,
An approach to developing other one-factor copulas is it is natural to assume that the variables have a multivari
to choose For Z1, or both, as distributions with heavier ate normal distribution. In other situations, copulas are
tails than the normal distribution in Equation (15.9). (They used. The marginal distributions are transformed on a
have to be scaled so that they have a mean of zero and percentile-to-percentile basis to normal distributions (or
standard deviation of one.) The distribution of u1 is then to some other distribution for which there is a multivari
determined (possibly numerically) from the distributions ate counterpart). The correlation structure between the
of F and z,. Equation (15.10) becomes
WCDR(T, X) <1>("11-i(PD)�1-p.{pe-i(X)J
variables of interest is then defined indirectly from an
+ assumed correlation structure between the transformed
= variables.
When there are many variables, analysts often use a factor
model. This is a way of reducing the number of correlation
70 estimates that have to be made. The correlation between
60 any two variables is assumed to derive solely from their
correlations with the factors. The default correlation
50
between different companies can be modeled using a
40 factor-based Gaussian copula model of their times
30 to default.
20
10
8This approach is applied to evaluating the risk of tranches
0
created from mortgages in J. Hull and A. White, "The Risk of
0 0.01 0.02 0.03 0.04 0.05 0.06 Tranches Created from Mortgages; Financial Analysts Joumal 66,
Default rate no. 5 (September/October 2010): 54-67. It provides a better fit
to historical data in many situations. Its main disadvantage is that
Probability distribution of default the distributions used are not as easy to deal with as the normal
rate when parameters are estimated distribution and numerical analysis may be necessary to deter
using the data in Table 15-4. mine 11' and G(DR).
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
• Learning ObJectlves
After completing this reading you should be able to:
• Describe the basic steps to conduct a Monte Carlo • Describe the pseudo-random number generation
simulation. method and how a good simulation design alleviates
• Describe ways to reduce Monte Carlo sampling error. the effects the choice of the seed has on the
• Explain how to use antithetic variate technique to properties of the generated series.
reduce Monte Carlo sampling error. • Describe situations where the bootstrapping method
• Explain how to use control variates to reduce Monte is ineffective.
Carlo sampling error and when it is effective. • Describe disadvantages of the simulation approach
• Describe the benefits of reusing sets of random to financial problem solving.
number draws across Monte Carlo experiments and
how to reuse them.
• Describe the bootstrapping method and its
advantage over Monte Carlo simulation.
Excerpt s
i Chapter 73 of Introductory Econometrics for Finance, Third Edition by Chris Brooks.
239
2017 Financial RiskManager (FRM) Psff I: QuanlilativeAnslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
There are numerous situations, in finance and in econo Simulations studies are usually used to investigate the
metrics, where the researcher has essentially no idea what properties and behaviour of various statistics of inter
is going to happen! To offer one illustration, in the context est. The technique is often used in econometrics when
of complex financial risk measurement models for port the properties of a particular estimation method are not
folios containing large numbers of assets whose move known. For example, it may be known from asymptotic
ments are dependent on one another, it is not always clear theory how a particular test behaves with an infinite sam
what will be the effect of changing circumstances. For ple size, but how will the test behave if only fifty observa
example, following full European Monetary Union (EMU) tions are available? Will the test still have the desirable
and the replacement of member currencies with the euro, properties of being correctly sized and having high
it is widely believed that European financial markets have power? In other words, if the null hypothesis is correct,
become more integrated, leading the correlation between will the test lead to rejection of the null 5% of the time if
movements in their equity markets to rise. What would be a 5% rejection region is used? And if the null is incorrect,
the effect on the properties of a portfolio containing equi will it be rejected a high proportion of the time?
ties of several European countries if correlations between
Examples from econometrics of where simulation may be
the markets rose to 99%? Clearly, it is probably not pos
useful include:
sible to be able to answer such a questions using actual
historical data alone, since the event (a correlation of • Quantifying the simultaneous equations bias induced
99%) has not yet happened. by treating an endogenous variable as exogenous
• Determining the appropriate critical values for a
The practice of econometrics is made difficult by the
behaviour of series and inter-relationships between them Dickey-Fuller test
that render model assumptions at best questionable. For • Determining what effect heteroscedasticity has upon
example, the existence of fat tails, structural breaks and the size and power of a test for autocorrelation.
bi-directional causality between dependent and indepen Simulations are also often extremely useful tools in
dent variables, etc. will make the process of parameter finance, in situations such as:
• The pricing of exotic options, where an analytical pric
estimation and inference less reliable. Real data is messy,
and no one really knows all of the features that lurk inside
it. Clearly, it is important for researchers to have an idea ing formula is unavailable
of what the effects of such phenomena will be for model • Determining the effect on financial markets of substan
estimation and inference. tial changes in the macroeconomic environment
By contrast, simulation is the econometrician's chance • 'Stress-testing' risk management models to determine
to behave like a 'real scientist', conducting experiments whether they generate capital requirements sufficient
under controlled conditions. A simulations experiment to cover losses in all situations.
enables the econometrician to determine what the effect In all of these instances, the basic way that such a study
of changing one factor or aspect of a problem will be, would be conducted (with additional steps and modifica
while leaving all other aspects unchanged. Thus, simula tions where necessary) is shown in Box 16-1.
tions offer the possibility of complete flexibility. Simula
A brief explanation of each of these steps is in order.
tion may be defined as an approach to modelling that
seeks to mimic a functioning system as it evolves. The The first stage involves specifying the model that will
be used to generate the data. This may be a pure time
simulations model will express in mathematical equations
series model or a structural model. Pure time series mod
the assumed form of operation of the system. In econo
els are usually simpler to implement, as a full structural
metrics, simulation is particularly useful when models are
model would also require the researcher to specify a data
very complex or sample sizes are small.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Suppose that the value of the parameter of interest for 1 Obviously, for a continuous random variable. there will be an
replication i is denoted x,. If the average value of this infinite number of possible values. In this context, the prcblem is
simply that if the probability space is split into arbitrarily small
parameter is calculated for a set of, say, N = 1,000 repli intervals, some of those intervals will not have been adequately
cations, and another researcher conducts an otherwise covered by the random draws that were actually selected.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
parallel simulation on those. For example, if the driving distribution are filled by subsequent replications. The
stochastic force is a set of TN(O, 1) draws, denoted uv result is a set of random draws that are appropriately dis
for each replication, an additional replication with errors tributed across all of the outcomes of interest. The use of
given by -u1 is also used. It can be shown that the Monte low-discrepancy sequences leads the Monte Carlo stan
carlo standard error is reduced when antithetic variates dard errors to be reduced in direct proportion to the num
are used. For a simple illustration of this, suppose that the ber of replications rather than in proportion to the square
average value of the parameter of interest across two sets root of the number of replications. Thus, for example, to
of Monte Carlo replications is given by reduce the Monte Carlo standard error by a factor of 10,
x = (x1 + x.)/2 the number of replications would have to be increased
(16.2)
by a factor of 100 for standard Monte carlo random sam
where x, and x2 are the average parameter values for repli pling, but only 10 for low-discrepancy sequencing. Further
cations sets 1 and 2, respectively. The variance of x will be details of low-discrepancy techniques are beyond the
given by scope of this text, but can be seen in Boyle (1977) or Press
- 1 et al. (1992). The former offers a detailed and relevant
var(x) = 4 (var(x,) + var(x2) + 2cov(x,,x2)) (16.J)
example in the context of options pricing.
If no antithetic variates are used, the two sets of Monte
Carlo replications will be independent, so that their covari Control Variates
ance will be zero, i.e.
The application of control variates involves employing a
(16.4) variable similar to that used in the simulation, but whose
properties are known prior to the simulation. Denote
However, the use of antithetic variates would lead the
the variable whose properties are known by y, and that
covariance in Equation (16.3) to be negative, and therefore whose properties are under simulation by x. The simula
the Monte Carlo sampling error to be reduced. tion is conducted on x and also on y, with the same sets
It may at first appear that the reduction in Monte carlo of random number draws being employed in both cases.
sampling variation from using antithetic variates will be Denoting the simulation estimates of x and y by x and y,
huge since, by definition, corr(ut' -u) = cov(ut' -u) = respectively, a new estimate of x can be derived from
-1. However, it is important to remember that the rel
(18.5)
evant covariance is between the simulated quantity of
interest for the standard replications and those using the Again, it can be shown that the Monte Carlo sampling
antithetic variates. But the perfect negative covariance is error of this quantity, X-, will be lower than that of x pro
between the random draws (i.e. the error terms) and their vided that a certain condition holds. The control variates
antithetic variates. For example, in the context of option help to reduce the Monte Carlo variation owing to par
pricing (discussed below), the production of a price for ticular sets of random draws by using the same draws on
the underlying security (and therefore for the option) a related problem whose solution is known. It is expected
constitutes a non-linear transformation of ur Therefore the that the effects of sampling error for the problem under
covariances between the terminal prices of the underly study and the known problem will be similar; and hence
ing assets based on the draws and based on the antithetic can be reduced by calibrating the Monte Carlo results
variates will be negative, but not -1. using the analytic ones.
Several other variance reduction techniques that oper- It is worth noting that control variates succeed in reduc
ate using similar principles are available, including strati ing the Monte Carlo sampling error only if the control and
fied sampling, moment-matching and low-discrepancy simulation problems are very closely related. As the cor
sequencing. The latter are also known as quasi-random relation between the values of the control statistic and
sequences of draws. These involve the selection of a the statistic of interest is reduced, the variance reduction
specific sequence of representative samples from a is weakened. Consider again Equation (16.5), and take the
given probability distribution. Successive samples are variance of both sides
selected so that the unselected gaps left in the probability var(x•) = var(y + (x - y)) (16.8)
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
var(y) = 0 since y is a quantity which is known analyti test for samples of size 100 observations and for different
cally and is therefore not subject to sampling variation, so values of•· Thus, for each experiment involving a differ
Equation (16.6) can be written ent value of •· the same set of standard normal random
var(x•) = var(y + (x - y)) (16.7)
numbers could be used to reduce the sampling variation
across experiments. However, the accuracy of the actual
The condition that must hold for the Monte Carlo sam estimates in each case will not be increased, of course.
pling variance to be lower with control variates than
without is that var<x-) is less than var(X). Taken from Another possibility involves taking long series of draws
and then slicing them up into several smaller sets to be
Equation (16.7). this condition can also be expressed as
used in different experiments. For example, Monte Carlo
var(y) - 2cov(x, y) < o simulation may be used to price several options of differ
or ent times to maturity, but which are identical in all other
estimates across those experiments. For example, it may able and it is desired to estimate some parameter a. An
be of interest to examine the power of the Dickey-Fuller approximation to the statistical properties of 6Tcan be
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
obtained by studying a sample of bootstrap estimators. to construct trading rules and also to test them. In such
This is done by taking N samples of size T with replace cases, if a sufficient number of trading rules are examined,
ment from y and re-calculating a with each new sample. A some of them are bound, purely by chance alone, to
series of 8 estimates is then obtained, and their distribu generate statistically significant positive returns. Intra
tion can be considered. generational data snooping is said to occur when, over a
long period of time, technical trading rules that 'worked'
The advantage of bootstrapping over the use of analytical
in the past continue to be examined, while the ones that
results is that it allows the researcher to make inferences
did not fade away. Researchers are then made aware of
without making strong distributional assumptions, since
only the rules that worked, and not the other, perhaps
the distribution employed will be that of the actual data.
thousands, of rules that failed.
Instead of imposing a shape on the sampling distribution
of the 8 value, bootstrapping involves empirically estimat Data snooping biases are apparent in other aspects of
ing the sampling distribution by looking at the variation of estimation and testing in finance. Lo and MacKinlay (1990)
the statistic within-sample. find that tests of financial asset pricing models (CAPM)
may yield misleading inferences when properties of the
A set of new samples is drawn with replacement from the
data are used to construct the test statistics. These prop
sample and the test statistic of interest calculated from
erties relate to the construction of portfolios based on
each of these. Effectively, this involves sampling from
some empirically motivated characteristic of the stock,
the sample, i.e. treating the sample as a population from
which samples can be drawn. Call the test statistics calcu such as market capitalisation, rather than a theoretically
motivated characteristic, such as dividend yield.
lated from the new samples a•. The samples are likely to
be quite different from each other and from the original Sullivan, Timmermann and White (1990) and White
a value, since some observations may be sampled several (2000) propose the use of a bootstrap to test for data
times and others not all. Thus a distribution of values of snooping. The technique works by placing the rule under
a• is obtained, from which standard errors or some other study in the context of a 'universe' of broadly similar trad
statistics of interest can be calculated. ing rules. This gives some empirical content to the notion
that a variety of rules may have been examined before
Along with advances in computational speed and power;
the final rule is selected. The bootstrap is applied to each
the number of bootstrap applications in finance and in
trading rule, by sampling with replacement from the time
econometrics have increased rapidly in previous years. For
series of observed returns for that rule. The null hypothe
example, in econometrics, the bootstrap has been used in
sis is that there does not exist a superior technical trading
the context of unit root testing. Scheinkman and Le Baron
rule. Sullivan, Timmermann and White show how a p-value
(1989) also suggest that the bootstrap can be used as a
of the 'reality check' bootstrap-based test can be con
'shuffle diagnostic', where as usual the original data are
sampled with replacement to form new data series. Suc structed, which evaluates the significance of the returns
cessive applications of this procedure should generate a (or excess returns) to the rule after allowing for the fact
collection of data sets with the same distributional prop that the whole universe of rules may have been examined.
erties, on average, as the original data. But any kind of
dependence in the original series (e.g. linear or non-linear
An Example of Bootstrapping
autocorrelation) will, by definition, have been removed.
Applications of econometric tests to the shuffled series
In a Regression Context
can then be used as a benchmark with which to compare Consider a standard regression model
the results on the actual data or to construct standard
y = xp + u (16.9)
error estimates or confidence intervals.
The regression model can be bootstrapped in two ways.
In finance, an application of bootstrapping in the context
of risk management is discussed below. Another impor
Re-Sample the Data
tant recent proposed use of the bootstrap is as a method
for detecting data snooping (data mining) in the con This procedure involves taking the data, and sampling the
text of tests of the profitability of technical trading rules. entire rows corresponding to observation i together. The
Data snooping occurs when the same set of data is used steps would then be as shown in Box 16-2.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
for Trandom draws, where y0 is the seed (the initial value the nature of the task at hand. If each replication is rel
of y), a is a multiplier and c is an increment. All three of atively complex in terms of estimation issues, the prob
these are simply constants. The 'modulo operator' simply lem might be computationally infeasible, such that it
functions as a clock, returning to one after reaching m. could take days, weeks or even years to run the experi
ment. Although CPU time is becoming ever cheaper as
Any simulation study involving a recursion, such as that
faster computers are brought to market, the technical
described by Equation (16.11) to generate the random
ity of the problems studied seems to accelerate just as
draws, will require the user to specify an initial value, y<>' to
quickly!
get the process started. The choice of this value will, unde
sirably, affect the properties of the generated series. This • The results might not be precise
effect will be strongest for yl' y2, , but will gradually die
• • • Even if the number of replications is made very large,
away. For example, if a set of random draws is used to con the simulation experiments will not give a precise
struct a time series that follows a GARCH process, early answer to the problem if some unrealistic assumptions
observations on this series will behave less like the GARCH have been made of the data generating process. For
process required than subsequent data points. Conse example, in the context of option pricing, the option
quently, a good simulation design will allow for this phe valuations obtained from a simulation will not be accu
nomenon by generating more data than are required and rate if the data generating process assumed normally
then dropping the first few observations. For example, if distributed errors while the actual underlying returns
1,000 observations are required, 1,200 observations might series is fat-tailed.
be generated, with observations 1 to 200 subsequently • The results are often hard to replicate
deleted and 201 to 1,200 used to conduct the analysis.
Unless the experiment has been set up so that the
These computer-generated random number draws are sequence of random draws is known and can be
known as pseudo-random numbers, since they are in fact reconstructed, which is rarely done in practice, the
not random at all, but entirely deterministic, since they results of a Monte Carlo study will be somewhat spe
have been derived from an exact formula! By carefully cific to the given investigation. In that case, a repeat
choosing the values of the user-adjustable parameters, it of the experiment would involve different sets of
is possible to get the pseudo-random number generator random draws and therefore would be likely to yield
to meet all the statistical properties of true random num different results, particularly if the number of replica
bers. Eventually, the random number sequences will start tions is small.
• Sm
i ulation results are experiment-specific
to repeat, but this should take a long time to happen. See
Press et al. (1992) for more details and Fortran code, or
The need to specify the data generating process
Greene (2002) for an example.
using a single set of equations or a single equation
The U(O, 1) draws can be transformed into draws from implies that the results could apply to only that exact
any desired distribution-for example a normal or a Stu type of data. Any conclusions reached may or may
dent's t. Usually, econometric software packages with not hold for other data generating processes. To give
simulations facilities would do this automatically. one illustration, examining the power of a statistical
test would, by definition, involve determining how
frequently a wrong null hypothesis is rejected. In the
DISADVANTAGES OF THE SIMULATION context of DF tests, for example, the power of the
APPROACH TO ECONOMETRIC OR test as determined by a Monte Carlo study would
be given by the percentage of times that the null of
FINANCIAL PROBLEM SOLVING
a unit root is rejected. Suppose that the following
• It might be computationally expensive
data generating process is used for such a simulation
experiment
That is, the number of replications required to generate
yt = 0.99yM + ut' (18.13)
precise solutions may be very large, depending upon
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Clearly, the null of a unit root would be wrong in this AN EXAMPLE OF HOW TO SIMULATE
case, as is necessary to examine the power of the test.
THE PRICE OF A FINANCIAL OPTION
However, for modest sample sizes, the null is likely to
be rejected quite infrequently. It would not be appro A simple example of how to use a Monte Carlo study for
priate to conclude from such an experiment that the obtaining a price for a financial option is shown below.
DF test is generally not powerful, since in this case the
null <+ = 1) is not very wrong! This is a general problem
Although the option used for illustration here is just a
plain vanilla European call option which could be valued
with many Monte Carlo studies. The solution is to run analytically using the standard Black-Scholes (1973) for
simulations using as many different and relevant data mula, again, the method is sufficiently general that only
generating processes as feasible. Finally, it should be relatively minor modifications would be required to value
obvious that the Monte Carlo data generating process more complex options. Boyle (1977) gives an excellent
should match the real-world problem of interest as far and highly readable introduction to the pricing of financial
as possible. options using Monte Carlo. The steps involved are shown
To conclude, simulation is an extremely useful tool that in Box 16-4.
can be applied to an enormous variety of problems. The
technique has grown in popularity over the past decade,
and continues to do so. However, like all tools, it is dan
gerous in the wrong hands. It is very easy to jump into a
simulation experiment without thinking about whether
BOX 16-4
such an approach is valid or not.
Simulating the Price
of an Asian Option
1. Specify a data generating process for the underly
AN EXAMPLE OF MONTE CARLO
ing asset. A random walk with drift model is usually
SIMULATION IN ECONOMETRICS: assumed. Specify also the assumed size of the drift
DERIVING A SET OF CRITICAL VALUES component and the assumed size of the volatility
FOR A DICKEY-FULLER TEST parameter. Specify also a strike price K, and a time
to maturity, T.
Recall, that the equation for a Dickey-Fuller (DF) test 2. Draw a series of length T, the required number of
applied to some seriesyt is the regression observations for the life of the option, from a nor
Yt = +Yt-i + ut
mal distribution. This will be the error series, so that
(16.14) tt ""'N(O, l).
so that the test is one of H0: 4jl = 1 against H1: • < 1. The rel 3. Form a series of observations of length Ton the
evant test statistic is given by underlying asset.
4. Observe the price of the underlyi
-1=.1_
ng asset at matu
't (1&.15) rity observation T. For a call option, if the value of
_
-
SE(�) the underlying asset on maturity date, PT" K, the
Under the null hypothesis of a unit root, the test statistic option expires worthless for this replication. If the
value of the underlying asset on maturity date, PT >
does not follow a standard distribution, and therefore K, the option expires in the money, and has value
a simulation would be required to obtain the relevant on that date equal to Pr K, which should be dis
-
critical values. Obviously, these critical values are well counted back to the present day using the risk-free
documented, but it is of interest to see how one could rate. Use of the risk-free rate relies upon risk
generate them. A very similar approach could then neutrality arguments (see Duffie, 1996).
potentially be adopted for situations where there has 5. Repeat steps 1 to 4 a total of N times, and take the
been less research and where the results are relatively average value of the option over the N replications.
This average will be the price of the option.
less well known.
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
BOX 16·5
Slmulatlng the Price or a Flnanclal Generating Draws from
Option Using a Fat-Talled Underlylng a GARCH Process
Process
1. Draw a series of length T, the required number of
A fairly limiting and unrealistic assumption in the above observations for the life of the option, from a nor
mal distribution. This will be the error series, so that
st "" N(O, 1).
methodology for pricing options is that the underlying
asset returns are normally distributed, whereas in practice,
it is well known that asset returns are fat-tailed. There are 2. Recall that one way of expressing a GARCH
model is
rt = 11 + ut ut = Etat Et - N(O, 1)
several ways to remove this assumption. First, one could
(1&.16)
employ draws from a fat-tailed distribution, such as a Stu
dent's t, in step 2 above. Another method, which would � = <lo + a,u2t-1 + �-1 (16.17)
generate a distribution of returns with fat tails, would be A series of E,. have been constructed and it is nec
to assume that the errors and therefore the returns follow essary to specify initialising values y1 and � and
a GARCH process. To generate draws from a GARCH pro plausible parameter values for Clo- a.. 15. Assume
that y1 and � are set to 11 and one, respectively, and
the parameters are given by <lo = 0.01, a. = 0.15,
cess, do the steps shown in Box 16-5.
f3 = O.BO. The equations above can then be used to
Slmulatlng the Price or an Asian Option generate the model for rt as described above.
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
APPENDIX TABLE 1
Reference Table: Let Z be a standard normal random variable.
z P(Z < z) z P(Z < z) z pg: < z) z P(Z < z) z pg: < z) z pg: < z)
·3.00 0.0013 ·2.50 0.0062 ·2.00 0.0228 ·1 .50 0.0668 ·1.00 0.1587 ·0.50 0.3085
·2.99 0.0014 ·2.49 0.0064 ·1 .99 0.0233 ·1 .49 0.0681 ·0.99 0.1611 ·0.49 0.3121
-2.98 0.0014 -2.48 0.0066 -1.98 0.0239 -1.48 0.0694 -0.98 0.1635 -0.48 0.3156
-2.97 0.0015 -2.47 0.0068 -1.97 0.0244 -1 .47 0.0708 -0.97 0.1660 -0.47 0.3192
-2.96 0.0015 -2.46 0.0069 -1.96 0.0250 -1.46 0.0721 -0.96 0.1685 -0.46 0.3228
-2.95 0.0016 -2.45 0.0071 -1.95 0.0256 -1 .45 0.0735 -0.95 0.1711 -0.45 0.3264
-2.94 D.0016 -2.44 0.0073 -1.94 0.0262 -1.44 0.0749 -0.94 0.1736 -0.44 0.3300
-2.93 0.0017 -2.43 0.0075 -1.93 0.0268 -1 .43 0.0764 -0.93 0.1762 -0.43 0.3336
-2.92 0.0018 -2.42 0.0078 -1.92 0.0274 -1 .42 0.0778 -0.92 0.1788 -0.42 0.3372
-2.91 0.0018 -2.41 0.0080 -1.91 0.0281 -1 .41 0.0793 -0.91 0.1814 -0.41 0.3409
-2.90 0.0019 -2.40 0.0082 -1.90 0.0287 -1 .40 0.0808 -0.90 0.1841 -0.40 0.3446
-2.89 0.0019 -2.39 0.0084 -1.89 0.0294 -1 .39 0.0823 -0.89 0.1867 -0.39 0.3483
-2.88 0.0020 -2.38 0.0087 -1.88 0.0301 -1 .38 0.0838 -0.88 0.1894 -0.38 0.3520
0.0307 -1 .37 0.0853 -0.87 0.1922 -0.37 0.3557
·2.86 ·1 .86
·2.87 0.0021 ·2.37 0.0089 ·1.87
0.0021 -2.36 0.0091 0.0314 -1 .36 0.0869 -0.86 0.1949 -0.36 0.3594
-2.85 0.0022 -2.35 0.0094 -1.85 0.0322 -1 .35 0.0885 -0.85 0.1977 -0.35 0.3632
-2.84 0.0023 -2.34 0.0096 -1.84 0.0329 -1 .34 0.0901 -0.84 0.2005 -0.34 0.3669
-2.83 0.0023 -2.33 0.0099 -1.83 0.0336 -1 .33 0.0918 -0.83 0.2033 -0.33 0.3707
-2.82 0.0024 -2.32 0.0102 -1.82 0.0344 -1 .32 0.0934 -0.82 0.2061 -0.32 0.3745
-2.81 0.0025 -2.31 0.0104 -1.81 0.0351 -1.31 0.0951 -0.81 0.2090 -0.31 0.3783
-2.80 0.0026 -2.30 0.0107 -1.80 0.0359 -1 .30 0.0968 -0.80 0.21 1 9 -0.30 0.3821
-2.79 0.0026 -2.29 0.0110 -1.79 0.0367 -1.29 0.0985 -0.79 0.2148 -0.29 0.3859
-2.78 0.0027 -2.28 0.0113 -1.78 0.0375 -1.28 0.1003 -0.78 0.2177 -0.28 0.3897
0.0384 -1.27 0.1020 -0.77 0.2206 -0.27 0.3936
·2.76
-2.77 0.0028 -2.27 0.0116 -1.77
0.0029 -2.26 0.0119 ·1.76 0.0392 -1.26 0.1038 -0.76 0.2236 -0.26 0.3974
-2.75 0.0030 -2.25 0.0122 -1.75 0.0401 -1.25 0.1056 -0.75 0.2266 -0.25 0.401 3
-2.74 0.0031 -2.24 0.0125 -1.74 0.0409 -1.24 0.1075 -0.74 0.2296 -0.24 0.4052
-2.73 0.0032 -2.23 0.0129 -1.73 0.041 8 -1.23 0.1093 -0.73 0.2327 -0.23 0.4090
-2.72 0.0033 -2.22 0.0132 -1.72 0.0427 -1.22 0.1112 -0.72 0.2358 -0.22 0.41 29
·2.71 0.0034 -2.21 0.0136 ·1.71 0.0436 -1.21 0.1131 -0.71 0.2389 -0.21 0.41 68
-2.70 0.0035 -2.20 0.0139 -1.70 0.0446 -1.20 0.1151 -0.70 0.2420 -0.20 0.4207
-2.ti9 0.0036 -2.19 0.0143 -1.69 0.0455 -1.19 0.1170 -0.69 0.2451 -0.19 0.4247
-2.68 D.0037 -2.18 0.0146 -1.68 0.0465 -1.18 0.1190 -0.68 0.2483 -0.18 0.4286
-2.67 0.0038 -2.1 7 0.0150 -1.67 0.0475 -1.17 0.1210 -0.67 0.2514 -0.17 0.4325
-2.66 0.0039 -2.16 0.0154 -1.66 0.0485 -1.16 0.1230 -0.66 0.2546 -0.16 0.4364
·2.65 0.0040 ·2.15 0.0158 ·1.65 0.0495 -1.15 0.1251 -0.65 0.2578 -0.15 0.4404
-2.64 0.0041 -2.14 0.0162 -1.64 0.0505 -1.14 0.1271 -0.64 0.2611 ·0.14 0.4443
-2.63 0.0043 -2.13 0.0166 -1.63 0.0516 -1.13 0.1292 -0.63 0.2643 -0.13 0.4483
-2.62 0.0044 -2.12 0.0170 -1.62 0.0526 -1.12 0.1314 -0.62 0.2676 -0.12 0.4522
-2.61 0.0045 -2.11 0.0174 ·1.61 0.0537 -1.11 0.1335 -0.61 0.2709 -0.11 0.4562
-2.60 0.0047 -2.10 0.0179 -1.60 0.0548 -1.10 0.1357 -0.60 0.2743 -0.10 0.4602
-2.59 0.0048 -2.09 0.0183 -1.59 0.0559 -1 .09 0.1379 -0.59 0.2776 -0.09 0.4641
-2.58 0.0049 -2.08 0.0188 -1.58 0.0571 -1 .08 0.1401 -0.58 0.2810 -0.08 0.4681
-2.57 0.0051 -2.07 0.0192 -1.57 0.0582 -1 .07 0.1423 -0.57 0.2843 -0.07 0.4721
-2.56 0.0052 -2.06 0.0197 ·1.56 0.0594 -1 .06 0.1446 -0.56 0.2877 -0.06 0.4761
-2.55 0.0054 -2.05 0.0202 -1.55 0.0606 -1 .05 0.1469 -0.55 0.2912 -0.05 0.4801
-2.54 0.0055 -2.04 0.0207 -1.54 0.061 8 -1 .04 0.1492 -0.54 0.2946 -0.04 0.4840
-2.53 0.0057 -2.03 0.0212 -1.53 0.0630 -1 .03 0.1515 -0.53 0.2981 -0.03 0.4880
-2.52 0.0059 -2.02 0.0217 -1.52 0.0643 -1 .02 0.1539 -0.52 0.3015 -0.02 0.4920
-2.51 0.0060 -2.01 0.0222 -1.51 0.0655 -1 .01 0.1562 -0.51 0.3050 -0.01 0.4960
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Prof'e88ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
253
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
254 • lndax
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
European Monetary Union (EMU). 240 Hull, John C., 161-168, 225-237
evidence. 51 Huygens. Christiaan, 14-15
excess kurtosis, 24 hypothesis testing, 60-62
expectations. 14-17 about intercept �O' 91
expected shortfall, 66-67 about one of the regression coefficients. 88-91
expected value. 15 about population mean. 88-89
explained sum of squares (ESS), 77 about slope. 89
exponential trend, 145, 146, 147, 156, 157 confidence intervals and, 93
exponentially weighted moving average model (EWMA), 214-215, for single coefficient, 124-126
216, 217-219 on two or more coefficients, 126-127
correlations and, 227-228
imperfect multicollinearity, 118-119
factor copula model. 233 implied volatilities, 209-210
factor models. 229-230 independent events. 7-8
fair value. of bond. 16 Independent variable. 71, 72
fat-tailed underlying process. 248 independent white noise. 175
F-distribution, 40-41 independently and identically distributed (i.i.d), 37, 79-80, 115
finite-order moving average, 192 indicator variable, 93
first least squares assumption, omitted variable bias and, 107 innovations, 179
first-order moving average, 192 in-sample overfitting. 149
forecasting intercept. 71. 110
application: housing starts, 164-168 invar:sa cumulative distribution functions (CDFs), 6-7
application: retail sales. 151-159 invertible MA(l) process. 189
employment, estimating models for, 197-204
using GARCH(l,1) for future volatility, 220-223 joint hypotheses. 138-139
foreign currency options, 211 Bonferroni test of. 138-139
frequentists, vs. Bayes, 51-52 tests of, in multiple regression. 126-130
F-statistic, 128-129 joint null hypotheses, 126-127
J.P. Morgan, 62
GARCH models. 228 JPMorgan, 215
GARCH(l.1) model. 215-216, 217-219, 220-223
Gauss. Johann. 34 kurtosis. 23-24
Gaussian copula model. 230. 231, 232, 234 kurtusis, 211
Gaussian distribution, 34
Gaussian white noise, 176 lag operator. 178
Gauss-Markov conditions, 101-102 least absolute deviations (LAD) estimator. 99
Gauss-Markov theorem, 95, 98, 101-102 least squares assumptions, 78-81
general linear process, 179-180 in multiple regression, 115-116
Gosset, William Sealy, 39-40 least-squares regression. 147
left-hand variable, 72
hedging, portfolio variance and. 20-21 leptokurtic distribution, 211
heteroskedastic error term. 111 likelihood, 51
heteroskedasticity, 94-97, 111 linear conditionally unbiased estimators, 98
F-statlstlc and, 128 linear regression model, 70-76
heteroskedastlclty-robust standard errors. 96 llnear regression with multiple regressors
holding X2 constant, 110 distribution of OLS estimators in, 116
holiday variation, 164 least squares assumptions in. 115-116
homoskedastic error term, 111 measures of fit in, 113-115
homoskedastic normal regression assumptions, 99 multicollinearity, 117-119
homoskedasticity, 94-97, 111 multiple regression model, 110-111
F-statistic and. 129-130 OLS estimator in. 112-113
homoskedasticity-only F-statistic. 129-130 omitted variable bias. 106-110
homoskedasticity-only standard error. 95-96 overview, 106
Index • 255
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
256 • lndax
2017 Financial RiskManager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal'90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
Index • 257
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Profes8ionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.
【梦轩考资www.mxkaozi.com】 QQ106454842 专业提供CFA FRM全程高清视频+讲义
value-at-risk (VaR)
backtesting, 64-65
expected shortfall. 66-67
overview, 62-64
subaddltlvlty, 65-66
variables. See also omitted variable bias
binary, 93-94
control. 110
dummy, 93. 118
indicator, 93
standardized, 18
258 • Index
2017 Financial Risk Manager (FRM) Psff I: Quanlilative Anslysis, Seventh Edition by Global Association of Riek Professionals.
Copyright O 2017 by Peal"90n Education, Inc. All Rights ReseM!d. Pearson Custom Edition.