You are on page 1of 85

Topic 1

Background Material
John Stapleton

()

Background Material

1 / 85

Table of Contents
1.1 A review of some basic statistical concepts
1.2 Random regressors
1.3 Modelling the conditional mean
1.3.1 Specifying a functional form for the conditional mean
1.3.2 Choosing the regressors

1.4 Some asymptotic theory


1.4.1 Introduction
1.4.2 Consistency
1.4.3 Asymptotic normality
1.4.4 Asymptotic e ciency

1.5 Testing linear restrictions on the parameters


1.6 A review of generalized least squares (GLS)
(ETC3410)

Background Material

2 / 85

1.1 A Review of some basic statical concepts I


Denition (1.1)
Let x be a discrete random variable which can take on the values (x1 ,
x2, ......., xn ) with probabilities ( f (x1 ), f (x2 ), ......., f (xn )) respectively. Then
the mean or expected value or expectation of x , which we denote by E (x ), is
dened as:

E (x ) = xi f (x i ).
i =1

If x is a continuous random variable with probability density function f (x ),


then
Z

E (x ) =

xf (x )dx.

For any set of random variables x , y and z, the expectations operator satises the
following rules:
(ETC3410)

Background Material

3 / 85

1.1 A Review of some basic statical concepts II

R1 E(x + y + z) = E(x) + E(y) + E(z).


R2 E(k) = k for any constant k.
R3 E(kx) = kE(x) for any constant k.
R4 E(k + x) = k + E(x) for any constant k.
R5 In general, E(xy) 6=E(x)E(y).

(ETC3410)

Background Material

4 / 85

1.1 A Review of some basic statical concepts III

Denition (1.2)
The variance of the random variable x , which we denote by Var (x ), is dened
as:

Var (x ) = E f[x

=
=
=
=

E (x )]2 g

E [x 2 + E (x )2

2xE (x )]

2E (x )E (x )

2E (x )2

E (x ) + E (x )
E (x ) + E (x )
E (x 2 )

E (x )2 .

Informally, Var (x ) measures how tightly the values of x are clustered around the
mean.

(ETC3410)

Background Material

5 / 85

1.1 A Review of some basic statical concepts IV


Denition (1.3)
Let x and y be two random variables. Then the covariance between x and y ,
which we denote by Cov (x, y ) is dened as:

Cov (x, y ) = E f[x

E (x )][y

E (y )]g.

Cov (x, y ) measures the degree of linear association between x and y .


Notice that
Cov (x, y ) = E f[x

E (x )][y

E (y )]g

= E [xy xE (y ) yE (x ) + E (x )E (y )]
= E (xy ) E (x )E (y ) E (y )E (x ) + E (x )E (y )
= E (xy ) E (x )E (y ).

(ETC3410)

Background Material

6 / 85

1.1 A Review of some basic statical concepts V


Therefore, in the special case in which
E (x ) = 0 and/or E (y ) = 0,
the formula for the covariance between x and y simplies to
Cov (x, y ) = E (xy ).
For any pair of random variables x and y and any constants a and b, the Var
operator satisfy the following rules:

R6 Var(a) = 0.
R7 Var(ax) = a2 Var (x ).
R8 Var(ax + by) = a2 Var (x ) + b 2 Var (y ) + 2abCov (x, y ).
(ETC3410)

Background Material

7 / 85

1.1 A Review of some basic statical concepts VI


R9 If x and y are independent random variables, Cov(x,y) = 0 and
Var (ax + by ) = a2 Var (x ) + b 2 Var (y ).
As a measure of linear association, the covariance suers from two serious
limitations:
The value of Cov (x, y ) depends on the units in which x and y are
measured.
The value of Cov (x, y ) is di cult to interpret. For example, how to we
interpret the statement that
Cov (x, y ) = 2?
Correlation, which we dene below, is a superior measure of the degree of
linear association between two random variables.

(ETC3410)

Background Material

8 / 85

1.1 A Review of some basic statical concepts VII

Denition (1.4)
Let x and y be two random variables. Then the correlation between x and y ,
which we denote by Corr (x, y ), is dened as:

Corr (x, y ) =

Cov (x, y )
,
SD (x )SD (y )

where

SD (x ) = Var (x )1/2 , SD (y ) = Var (y )1/2 .


It can be shown that

(ETC3410)

Corr (x, y )

Background Material

1.

9 / 85

1.1 A Review of some basic statical concepts VIII

Corr (x, y ) is unit free and is easy to interpret. For example, if


Corr (x, y ) = 0.8
we conclude that there is a strong, positive, linear relationship between x
and y.

(ETC3410)

Background Material

10 / 85

1.2 Random regressors I

In introductory econometrics units it is often assumed that the regressors in


the model are not random variables. For example, in the simple bivariate
regression model

yi = 0 + 1 xi + ui
yi and ui are assumed to be random variables, but xi is assumed to be a
xed number which does not change in value from sample to sample.
While this assumption is useful for pedagogical purposes because it simplies
the analysis, it is inappropriate for the nonexperimental data with which
we typically work in disciplines such as economics and nance.
Nonexperimental data is data that is not generated by performing a
controlled experiment.

(ETC3410)

Background Material

11 / 85

1.2 Random regressors II


When working with nonexperimental data, it is appropriate to treat both
the dependent variable and the regressors in our regression models as
random variables. Under this more realistic assumption, when we collect a
sample of data

(yi , xi ), i = 1, 2, ..., N
we are eectively making a drawing from the joint probability distribution
of the random variables

(yi , xi ).
Consider the multivariate linear regression model

yi = 0 + 1 xi 1 + 2 xi 2 + ... + k xik + ui .

(1.1)

Let

fJ (yi , xi 1 , ...., xik j),


(ETC3410)

Background Material

12 / 85

1.2 Random regressors III


denote the joint probability distribution of the random variables
(yi , xi 1 , ...., xik ), with parameter vector . That is, is the vector of
parameters that appears in the mathematical formula for the joint
probability distribution of (yi , xi 1 , ...., xik ).
Recall from elementary statistics that

fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j), (1.2)
where:
fJ (yi , xi 1 , ...., xik j) is the joint probability distribution of
(yi , xi 1 , ...., xik ).
fC (yi jxi 1 , ...., xik , ) is the probability distribution of yi conditional on
(xi 1 , ...., xik ).
fJ (xi 1 , ...., xik j) is the joint probability distribution of (xi 1 , ...., xik ).
(ETC3410)

Background Material

13 / 85

1.2 Random regressors IV


Notice that the conditional probability distribution,

fC (yi jxi 1 , ...., xik , ), enables us to make probability statements about y


conditional on the values of (xi 1 , ...., xik ) being xed.
The most general statistical analysis of the behavior of (yi , xi 1 , ...., xik )
would involve constructing a mathematical model of fJ (yi , xi 1 , ...., xik j).
However, this task is usually too di cult and instead we restrict our
attention to modelling fC (yi jxi 1 , ...., xik , ).
Since,

fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j), (1.2)
this strategy obviously means that we ignore fJ (xi 1 , ...., xik j), and lose
any information that it contains regarding the parameter vector .

(ETC3410)

Background Material

14 / 85

1.2 Random regressors V


The strategy of focusing on fC (yi jxi 1 , ...., xik , ) and ignoring
fJ (xi 1 , ...., xik j) does not entail any loss of information in the following
special case. Let

= ( , ),
where is the vector of parameters of interest, and assume that

fJ (yi , xi 1 , ...., xik j) = fC (yi jxi 1 , ...., xik , )fJ (xi 1 , ...., xik j)

(1.3)

Notice that in (1.3) the parameter vector of interest appears only in the
conditional distribution of yi .
When (1.3) holds, (xi 1 , ...., xik ) are said to be weakly exogenous with
respect to , and there is no loss of information as a result of ignoring
fJ (xi 1 , ...., xik j) and focusing exclusively on fC (yi jxi 1 , ...., xik , ).

(ETC3410)

Background Material

15 / 85

1.2 Random regressors VI


In fact, even modelling fC (y jx1 , x2 , ..., xk ) is usually too di cult. Instead,
we typically, we focus on only one feature of the conditional distribution of
yi , namely the conditional mean, which we denote by E (y jx1 , x2 , ..., xk ).
(To economize on notation, the parameter vector and the subscript i are
suppressed).
In particular, we are usually most interested in estimating and testing
hypotheses about how the conditional mean of y changes in response to
changes in (x1 , x2 , ..., xk ).
Typically, y will not assume its conditional mean value. Let u denote the
deviation of y from its conditional mean. Then, by denition,

u=y

E (y jx1 , x2 , ..., xk ).

(1.4)

y = E (y jx1 , x2 , ..., xk ) + u.

(1.5)

Rearranging (1.4) we obtain

(ETC3410)

Background Material

16 / 85

1.2 Random regressors VII

Equation (1.5) is sometimes referred to as the error form of the model, or


the model in error form.
When we take conditional expectations of both sides of (1.5) we obtain

E (y jx1 , x2 , ..., xk ) = E (y jx1 , x2 , ..., xk ) + E (u jx1 , x2 , ..., xk ),


which implies that

E (u jx1 , x2 , ..., xk ) = 0.

(1.6)

Equations (1.5) and (1.6) together imply that we can always express y as
the sum of its true conditional mean and a random error term, which itself
has a conditional mean of zero.

(ETC3410)

Background Material

17 / 85

1.2 Random regressors VIII


If xj is a continuous variable, the marginal or partial eect of xj on the
average value of y is given by

E (y jx1 , x2 , ..., xk )
.
xj

(1.7)

A great deal of applied econometrics consists of trying to correctly specify


the conditional mean of the dependent variable y , and trying to obtain an
estimator of the marginal eects of interest that has goodstatistical
properties.
There are two aspects to specifying the conditional mean of the dependent
variable:
We must specify a functional form for the conditional mean.
We must decide what explanatory variables to include in the
conditional mean function.
We briey consider each of these issues in the following two subsections.
(ETC3410)

Background Material

18 / 85

1.3 Modelling the conditional mean I


1.3.1 Specifying a functional form for the conditional mean

In order to model the conditional mean we have to make an assumption


about its functional form. The assumption that we make has important
implications for:
How we compute the marginal eects of the x variables.
The properties of the marginal eects.
How we interpret the regression coe cients.
The method we use to estimate the regression coe cients.
In this section we briey consider the most common specications that are
used for the conditional mean. To economize on notation, we assume a
model with two explanatory variables and an intercept.
M1 The conditional mean is assumed to be linear in both the parameters and
the regressors.
(ETC3410)

Background Material

19 / 85

1.3 Modelling the conditional mean II


1.3.1 Specifying a functional form for the conditional mean

Under this specication the conditional mean is given by

E (y jx1 , x2 ) = + 1 x1 + 2 x2 ,

(1.8)

and the model in error form is

= E (y jx1 , x2 ) + u
= + 1 x1 + 2 x2 + u.

(1.9)

From (1.8) we have

E (y jx1 , x2 )
= j , j = 1, 2.
xj

(1.10)

Under this specication for the conditional mean:


The marginal eect of xj is constant and equal to j .
(ETC3410)

Background Material

20 / 85

1.3 Modelling the conditional mean III


1.3.1 Specifying a functional form for the conditional mean

j measures the change in the conditional mean of the dependent


variable arising from a one unit change in xj , holding the other
regressor constant.
The marginal eect of xj does not vary across observations and does
not depend on the value of any of the regressors.
M2 The conditional mean is assumed to be linear in the parameters but
nonlinear in one or more of the regressors.
For example,

E (y jx1 , x2 ) = + 1 x12 + 2 x22 ,

(1.11)

or, in error form,

(ETC3410)

= E (y jx1 , x2 ) + u
= + 1 x12 + 2 x22 + u.
Background Material

(1.12)
21 / 85

1.3 Modelling the conditional mean IV


1.3.1 Specifying a functional form for the conditional mean

From (1.11) we have

E (y jx1 , x2 )
E (y jx1 , x2 )
= 21 x1 ,
= 22 x2 .
x1
x2

(1.13)

Under this specication:


The marginal eect of xj is not measured by j .
The marginal eect of xj varies with the value of xj .
The marginal eect of xj measures the change in the conditional mean
of the dependent variable arising from a one unit change in xj holding
the other regressor constant.

(ETC3410)

Background Material

22 / 85

1.3 Modelling the conditional mean V


1.3.1 Specifying a functional form for the conditional mean

In some cases, a model specication that allows some of the marginal eects
to vary, such as M2, may be more realistic than one that constrains all the
marginal eects to be constant. For example, if we wished to study the
eect of education on average wages, we might specify the conditional mean
of wages as

E (wage jeduc, exp er , race, gender )

= + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 .


(1.14)
Since (1.14) implies that

E (wage jeduc, exp er , race, gender )


= 2 +25 exp er ,
exp er
(ETC3410)

Background Material

23 / 85

1.3 Modelling the conditional mean VI


1.3.1 Specifying a functional form for the conditional mean

this specication allows the marginal eect of experience to depend on the


level of experience.
M3 The conditional mean of the natural log of the dependent variable is
assumed to be linear in the parameters and the natural log of the
explanatory variables. (log-linear model).
Under this specication

E (ln y jx1 , x2 ) = + 1 ln x1 + 2 ln x2 ,

(1.15)

or, in error form,

ln y

(ETC3410)

= E (ln y jx1 , x2 ) + u
= + 1 ln x1 + 2 ln x2 + u.
Background Material

(1.16)
24 / 85

1.3 Modelling the conditional mean VII


1.3.1 Specifying a functional form for the conditional mean

Although the model is nonlinear in the regressors, it is linear in the natural


log of the regressors and in the parameters and can easily be estimated by
OLS.
From (1.16) we have

ln y
= j , j = 1, 2.
ln xj

(1.17)

This specication is often attractive because the regression coe cients


can be interpreted as elasticities or percentage changes.
In (1.17) j measures the percentage change in the level of y arising
from a one percent change in the level of xj , holding the other
regressor constant. That is, j measures the elasticity of y (not lny)
with respect to xj (not lnxj ), holding the other regressor constant.
(ETC3410)

Background Material

25 / 85

1.3 Modelling the conditional mean VIII


1.3.1 Specifying a functional form for the conditional mean

To see this note that

ln y
= lim
ln x !0
ln x

ln y
ln x

ln y
, for small ln x.
ln x

Let

(ETC3410)

ln y

= ln y1

ln y0

ln x

= ln x1

ln x0 .

Background Material

26 / 85

1.3 Modelling the conditional mean IX


1.3.1 Specifying a functional form for the conditional mean

Then

ln y

= ln y1
= ln
= ln
= ln
= ln

ln y0
y1
y0
y1
y0
y1
y0
y1

1+1

y0

y0
+1
y0
y0
+1

y1

100 ln y
(ETC3410)

y0
for small changes in y
y0
y1 y0
100
y0
% change in y.

Background Material

27 / 85

1.3 Modelling the conditional mean I


1.3.1 Specifying a functional form for the conditional mean

In deriving this approximation we have used the fact that,

ln(N + 1)

for any "small" number N. For example,

ln(0.2 + 1) = 0.18

0.2.

Using the same logic,

100 ln x

(ETC3410)

% change in x.

Background Material

28 / 85

1.3 Modelling the conditional mean II


1.3.1 Specifying a functional form for the conditional mean

Therefore, for small changes in x and y,

ln y
ln x

ln y
100 ln y
=
ln x
100 ln x

% change in y
.
% change in x

For example, if

1 = 2
in M3, then a one percent increase in x1 , holding x2 xed, is associated
with a two percent increase in y.
M4 The conditional mean of the log of the dependent variable is assumed to
be linear in the parameters and in the level of the regressors. (log-level
model)

(ETC3410)

Background Material

29 / 85

1.3 Modelling the conditional mean III


1.3.1 Specifying a functional form for the conditional mean

Under this specication the model is given by

E (ln y jx1 , x2 ) = + 1 x1 + 2 x2 ,

(1.18)

or, in error form,

ln y

= E (ln y jx1 , x2 ) + u
= + 1 x1 + 2 x2 + u.

(1.19)

From (1.19) we have

ln y
= j , j = 1, 2.
xj

(1.20)

Under this specication:

(ETC3410)

Background Material

30 / 85

1.3 Modelling the conditional mean IV


1.3.1 Specifying a functional form for the conditional mean

100j measures the percentage change in the level of y arising from


a one unit change in the level of xj , holding the other regressor
constant, since
100j

ln y
xj
ln y
100
xj
100 ln y
xj

= 100

% change in y
.
xj
For example, if
1 = 0.2
(ETC3410)

Background Material

31 / 85

1.3 Modelling the conditional mean V


1.3.1 Specifying a functional form for the conditional mean

in M4, then a one unit increase in x1 , holding x2 xed, is associated


with a twenty percent increase in y.
The marginal eect of xj on the % change in y is constant.
All of the specications for the conditional mean of y that we have
considered so far have the property that they are linear in the parameters.
Models that are linear in the parameters can generally be estimated by OLS.
Of course, whether or not the OLS estimator has good statistical properties
depends on other features of the model such as, for example, whether or not
the errors are homoskedastic.

(ETC3410)

Background Material

32 / 85

1.3 Modelling the conditional mean VI


1.3.1 Specifying a functional form for the conditional mean

Many models that appear to be nonlinear in the parameters can be


transformed into models that are linear in the parameters. For example, the
model given by

y = e [ + 1 x 1 + 2 x 2 ] e u

(1.21)

in nonlinear in the parameters. However, taking logs on both sides of (1.21)


we obtain

ln y = + 1 x1 + 2 x2 + u,

(1.22)

which is linear in the parameters.


Notice that the parameters in (1.22) are exactly the same as the parameters
in (1.21), so when we estimate (1.22) we get estimates of the parameters in
(1.21). However, because it is linear in the parameters, (1.22) is much easier
to estimate than (1.21).
M5 The conditional mean of the dependent variable is intrinsically nonlinear in
the parameters.
(ETC3410)

Background Material

33 / 85

1.3 Modelling the conditional mean VII


1.3.1 Specifying a functional form for the conditional mean

Some models of the conditional mean of the dependent variable are


intrinsically nonlinear in the parameters in the sense that they cannot be
made linear by applying a mathematical transformation, such as taking logs.
For example, assume that

E (y jx1 , x2 ) =

1
1+e

( + 1 x 1 + 2 x 2 )

(1.23)

This model is known as the logit model and is studied in topic 2. The logit
model is intrinsically nonlinear since it cannot be made linear in the
parameters by applying a mathematical transformation.
Intrinsically nonlinear models cannot be estimated by OLS. They are
typically estimated by using the method of maximum likelihood or, less
commonly, the method of nonlinear least squares.
(ETC3410)

Background Material

34 / 85

1.3 Modelling the conditional mean VIII


1.3.1 Specifying a functional form for the conditional mean

In intrinsically nonlinear models such as (1.23) the marginal eects of the


regressors:
Are not given by the regression coe cients.
Depend on the values of the regressors.
As we will see in Topic 2, a nonlinear specication for the conditional mean
of the dependent variable is sometimes more appropriate than a linear
specication, given the nature of the dependent variable.

(ETC3410)

Background Material

35 / 85

1.3 Modelling the conditional mean I


1.3.2 Choosing the regressors

Consider the linear regression model

y = + 1 x1 + 2 x2 + ..... + k

1 xk 1

+ k xk + u.

(1.24)

It is very important to understand the role of the error term, u, in (1.24).


The error term represents all those variables that aect the
dependent variable that have not been explicitly included as
regressors in the model.
If one the regressors, say xi , in (1.24) is correlated with any of the omitted
variables that are contained in u , then xi will necessarily be correlated with
u. A regressor that is correlated with the error term is referred to as an
endogenous regressor.

(ETC3410)

Background Material

36 / 85

1.3 Modelling the conditional mean II


1.3.2 Choosing the regressors

For example, suppose that the correct model in error form is

y = + 1 x1 + 2 x2 + ..... + k

1 xk 1

+ k xk + u

(1.24)

+ v.

(1.25)

but we estimate

y = + 1 x1 + 2 x2 + ..... + k

1 xk 1

In this case, we have omitted the relevant regressor xk . It follows from


(1.24) and (1.25) that

v = k xk + u.

(1.26)

The omitted variable xk is now incorporated in the error term, v, in (1.25).


If, for example, xk is correlated with x2 , then x2 will be correlated with v in
(1.25). That is, x2 will be an endogenous regressor.
(ETC3410)

Background Material

37 / 85

1.3 Modelling the conditional mean III


1.3.2 Choosing the regressors

As we will see in topic 3, when a regression equation contains one or more


endogenous regressors both the OLS and GLS estimators of the regression
coe cients lose their desirable statistical properties. Specically, both
estimators are inconsistent. (The concept of consistency is discussed in
section 1.4 below).
In light of this result, it is clearly very important to think carefully about
which regressors to include in the model, and in particular what factors we
wish to control for.
However, even when we are very careful in selecting the regressors, omitting
a relevant regressor may be unavoidable. This will be the case when one or
more of the relevant regressors is unobservable.

(ETC3410)

Background Material

38 / 85

1.3 Modelling the conditional mean IV


1.3.2 Choosing the regressors

For example, suppose that we are interested in estimating the marginal eect
of education on an individual wage, controlling for experience, race, gender,
experience and ability. In this case the conditional mean of interest is

E (wage jeduc, exp er , race, gender , exp er 2 , ability )

= + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2


+ 6 ability ,
(1.27)
which implies that the model in error form is

wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2

+ 6 ability + u.

(ETC3410)

Background Material

(1.28)

39 / 85

1.3 Modelling the conditional mean V


1.3.2 Choosing the regressors

In (1.28)

E (wage jeduc, exp er , race, gender , exp er 2 , ability )


= 1 .
educ
That is, 1 measures the marginal eect of education on the average wage,
controlling for dierences in experience, race, gender and ability.
Unfortunately, since ability is unobservable, we cant explicitly include it in
the model. Consequently, the equation that we actually estimate is

wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 + v ,


(1.28a)
where

v = 6 ability + u.
(ETC3410)

Background Material

40 / 85

1.3 Modelling the conditional mean VI


1.3.2 Choosing the regressors

We will see in Topic 3 that if, as we suspect, education and ability are
correlated, the OLS estimator of 1 in equation (1.28a) will no longer be
"reliable" even in very large samples. More specically, the OLS estimator of
1 will be an inconsistent estimator of the marginal eect of education on
the average wage controlling for dierences in experience, race, gender and
ability.. (The concept of consistency is discussed in section 1.4 below).
Informally, if we estimate (1.28a) by OLS, the OLS estimate of 1 will be an
"unreliable" estimate of the marginal eect of education on wages,
controlling for exper, race, gender and ability.
In Topic 4 we will discuss how to deal with the problem of endogenous
regressors.

(ETC3410)

Background Material

41 / 85

1.4 Some asymptotic theory I


1.4.1 Introduction

In topics 2 we will study models in which it is desirable to allow the


conditional mean of the dependent variable to be nonlinear in the
parameters and in topics 3 to 8 we will allow the regressors in our models to
be correlated with the error term. In these models it is generally impossible
to derive estimators that can be shown to be unbiased, e cient and
normally distributed in nite samples. In fact, in these models:
The nite sample properties of the estimators that we use are typically
unknown.
In addition, the nite sample distributions of our test statistics are also
typically unknown.
When conducting inference in these models we are forced to rely almost
entirely on asymptotic results, that is results that can be proved to hold
only as the sample size goes to innity.
(ETC3410)

Background Material

42 / 85

1.4 Some asymptotic theory II


1.4.1 Introduction

The strategy researchers use in these circumstances is to derive the


asymptotic distributions of estimators and test statistics and to use these
asymptotic distributions as approximations to the nite sample distributions
of the estimators and test statistics. In eect, we proceed "as if " the
asymptotic distributions are valid in nite samples. However, we never know
how accurate these approximations are in a given application.
In this section we provide a brief and relatively informal discussion of the
important concepts of consistency, asymptotic normality and asymptotic
e ciency. A more detailed and technical discussion of these concepts is
provided in ETC3400.

(ETC3410)

Background Material

43 / 85

1.4 Some asymptotic theory I


1.4.2 Consistency

Let bn denote an estimator of the parameter , given a sample of size n.


Formally, bn is said to be a consistent estimator of if

Pr (jbn

j < ) ! 1 as n ! , for all > 0.

(1.29)

When (1.29) holds we say that bn converges in probability to or that


is the probability limit of bn , which we denote by

p lim(bn ) = .

(1.30)

Intuitively, bn is a consistent estimator of if the probability that bn is


arbitrarily close to goes to 1 as the sample size gets innitely large.
(ETC3410)

Background Material

44 / 85

1.4 Some asymptotic theory II


1.4.2 Consistency

The practical implication of bn being a consistent estimator of is that


there is a very high probability that bn will be very closeto when the
sample size is large, and in this sense bn will be a goodestimator of in
large samples.
Obviously, consistency is a very desirable for an estimator.
There are four useful properties of the plim operator which we state below
without proof. We will use these properties on several occasions later in the
lecture notes.
Let x1n and x2n be two random variables such that

p lim (x 1n ) = x 1 , p lim (x 2n ) = x 2 .
That is, the random variables x1n and x2n converge in probability to the
random variables x1 and x2 respectively. Then the following properties can
be shown to hold:
(ETC3410)

Background Material

45 / 85

1.4 Some asymptotic theory III


1.4.2 Consistency

P1 The plim of a sum is the sum of the plims. That is,

p lim (x 1n +x 2n ) = p lim (x 1n ) + p lim (x 2n ) = x 1 + x2 .


P2 The plim of a product is the product of the plims. That is,

p lim (x 1n x2n ) = p lim (x 1n )p lim (x 2n ) = x1 x2 .


P3 The plim of the inverse is the inverse of the plim. That is,

p lim (x 1n1 ) = [p lim (x 1n )]

1
, x 1 6= 0.
x1

P4 The plim of a ratio is the ratio of the plims. That is,

p lim
(ETC3410)

x1n
x2n

p lim (x 1n )
x1
, = , x2 6= 0.
p lim (x 2n )
x2

Background Material

46 / 85

1.4 Some asymptotic theory IV


1.4.2 Consistency

Although P1, P2, P3 and P4 above have been stated for scalar random
variables, they can be generalized to random vectors and random matrices.
(That is, vectors and matrices whose elements are random variables).

(ETC3410)

Background Material

47 / 85

1.4 Some asymptotic theory I


1.4.3 Asymptotic normality

Let the scalar bn denote an estimator of the unknown parameter , given a


sample of size n. The estimator bn is a random variable and, like any
random variable, has a probability distribution. The form of this distribution
may depend on n. That is, as n increases the form of the probability
distribution of bn may change.
Using a body of mathematics known as central limit theorems, many
random variables whose probability distribution based on a nite sample
(nite sample distribution) is unknown, can be shown to have a well dened
probability distribution as the sample size tends to innity.
When this is the case, the random variable in question is said to "converge
in distribution" and the probability distribution to which it converges is
called a limiting (or limit) distribution.

(ETC3410)

Background Material

48 / 85

1.4 Some asymptotic theory II


1.4.3 Asymptotic normality

When bn is a consistent estimator,

p lim(bn ) = ,
which means that bn collapses to a single point as n goes to innity, in
which case the limiting distribution of bn is degenerate.
In order to obtain a non-degenerate limiting distribution for a consistent
estimator we "normalize" bn as described below.
Formally, we say that bn has a limiting normal distribution if

n ( bn

) ! N (0, V ),

(1.31)

where N(0,V) denotes a normally distributed random variable with mean


d

zero and some unknown variance V, and the notation ! denotes


convergence in distribution as n tends to innity.
(ETC3410)

Background Material

49 / 85

1.4 Some asymptotic theory III


1.4.3 Asymptotic normality

Although we refer to the estimator bn as having a limiting normal


distribution, it is clear from (1.31) that it is actually the random variable

n ( bn

that converges to a normal random variable as n goes to innity.


Equation (1.31) is an exact result, not an approximation. It states that

n ( bn

N (0, V )

N (0, V )

(1.32)

is strictly true as n tends to innity.


However, assume that

n ( bn

for large, but nite, n (where the symbol


(ETC3410)

Background Material

denotes "is approximately").


50 / 85

1.4 Some asymptotic theory IV


1.4.3 Asymptotic normality

Recall that if x is a random variable and c and d are constants, then

E (c + dx ) = c + dE (x )
var (c + dx ) = d 2 var (x ).
Using these results it follows that if

n ( bn

then

bn

bn
(ETC3410)

N (0, V )

1
p N (0, V ) ,
n
N

0,

Background Material

V
n

,
51 / 85

1.4 Some asymptotic theory V


1.4.3 Asymptotic normality

+N

bn

bn

0,

V
n

V
n

(1.33)

Equation (1.33) states that in a large nite sample bn is approximately


normally distributed with mean and variance Vn .
It is conventional to rewrite (1.33) as

bn

asy

V
n

(1.34)

Equation (1.34) is referred to as the asymptotic distribution of bn , and


V
n is referred to as the asymptotic variance of bn .
(ETC3410)

Background Material

52 / 85

1.4 Some asymptotic theory VI


1.4.3 Asymptotic normality

In summary, whenever

n ( bn

) ! N (0, V ),

(1.31)

we say that bn is asymptotically normally distributed with asymptotic


distribution

bn

asy

V
n

(1.34)

In econometrics we use the asymptotic distribution of bn as an


approximation to the true distribution of bn in a nite sample (i.e. we use
the asymptotic distribution of bn as an approximation to its nite sample
distribution).

(ETC3410)

Background Material

53 / 85

1.4 Some asymptotic theory VII


1.4.3 Asymptotic normality

Notice that the asymptotic distribution (1.34) is derived from the limiting
distribution (1.31) by assuming that the latter is approximately true in large
nite samples.
Obviously, the larger the sample size the more likely it is that the asymptotic
distribution is a good approximation to the true nite sample distribution of

bn .
Note:
Most estimators used in econometrics satisfy

n ( bn

) ! N (0, V ).

(1.31)

The results stated in (1.31) and (1.34) generalize to the case in which
bn is a kx1 vector rather than a scalar, as assumed above.
In the case in which bn is a kx1 vector, is also a kx1 vector and Vn is
a kxk variance matrix.
(ETC3410)

Background Material

54 / 85

1.4 Some asymptotic theory VIII


1.4.3 Asymptotic normality

Knowledge of the asymptotic distribution of bn is useful for two principal


reasons:
It can be used to construct condence intervals for our estimates.
It can be used to construct (asymptotically valid) hypothesis tests - as
we will see in section 1.5 below.

(ETC3410)

Background Material

55 / 85

1.4 Some asymptotic theory I


1.4.4 Asymptotic e ciency

The estimator bn is asymptotically e cient if:


(i) bn is a consistent estimator of .
(ii) The asymptotic variance of bn is at least as small as that of any other
consistent estimator. That is,

Avar (bn )

Avar (bn )

where bn denotes any other consistent estimator of .


Notice that, just as we restrict our attention to unbiased estimators when
dening nite sample e ciency, we restrict our attention to consistent
estimators when dening asymptotic e ciency.

(ETC3410)

Background Material

56 / 85

1.4 Some asymptotic theory II


1.4.4 Asymptotic e ciency

Asymptotic variance is the criterion that we use to choose between two or


more consistent estimators. The consistent estimator with the smallest
asymptotic variance is generally preferred.
In Topic 2 we will introduce the estimation method known as maximum
likelihood estimation. One of the most attractive features of maximum
likelihood estimation is that, provided the statistical/econometric model
model is correctly specied, the maximum likelihood estimator will be:
consistent
asymptotically normally distributed
asymptotically e cient

(ETC3410)

Background Material

57 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model I
A hypothesis test that is valid in a sample of any size is called an exact
test. Tests that are valid only in large samples are called asymptotic tests.
Generally speaking, exact tests are available only in the linear regression
model with normally distributed, homoskedastic, serially uncorrelated errors.
Once we relax these very restrictive assumptions, we are forced to use
asymptotic tests.
Many hypotheses of economic interest can be expressed as linear restrictions
on the parameters of an econometric model. For example, consider the wage
equation,

wage = + 1 educ + 2 exp er + 3 race + 4 gender + 5 exp er 2 + v .


(1.28)

(ETC3410)

Background Material

58 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model II
Suppose that we wish to simultaneously test the following hypotheses:
(i) The marginal eect of educ is equal but opposite in sign to the
marginal eect of exper for someone who has one year of
experience.
(ii) The marginal eect of gender is twice that of race.
Since,
MEeduc

MEexp

2 + 25 ,

the hypothesis that the marginal eect of educ is equal but opposite in
sign to the marginal eect of exper implies that
1 =
(ETC3410)

( 2 + 25 ), or 1 + 2 + 25 = 0.
Background Material

59 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model III

Since
MEgender

MErace

3 ,

the hypothesis that the marginal eect of gender is twice that of race
implies that
4 = 23 , or 4 23 = 0.
Notice that each of these economic hypotheses has been expressed as a
restriction on the parameters of the model.

(ETC3410)

Background Material

60 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model IV

The two hypotheses we wish to test impose the following two linear
restrictions on the parameters of the wage equation

1 + 2 + 25 = 0
(1.35)
4

23 = 0

The restrictions in (1.35) can be written more compactly as

= r ,

(2x 6 )(6x 1 )

(ETC3410)

(2x 1 )

Background Material

(1.36)

61 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model V
where

0 1 1
0 0 0

0
0

0 0 2
2 1 0

, =

To see this note that

R = r

)
0 1 1
0 0 0

(ETC3410)

0 0 2
2 1 0

Background Material

2
6
6
6
6
6
6
4

1
2
3
4
5

7
7
7
7=
7
7
5

0
0

62 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model VI
)

1 + 2 + 25
23 + 4

0
0

)
1 + 2 + 25 = 0
(1.35)
4

23 = 0

In general, q (independent) linear restrictions on the kx1 vector can be


written as

= r .

(qxk )(kx 1 )

(qx 1 )

(1.37)

The precise denitions of R and r depend on the particular restrictions being


tested.
(ETC3410)

Background Material

63 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model VII
The advantage of expressing our restrictions in the form of (1.37) is that it
enables us to represent a set of linear restrictions on without specifying
exactly what the restrictions are, and to derive results that will hold for any
set of linear restrictions on .
Under the null hypothesis that the restrictions in (1.37) are correct,

r = 0.

(1.38)

However, since is unknown, how do we determine whether or not (1.38)


holds?
An obvious approach is to consider whether or not

where b
is our estimator of .
(ETC3410)

Rb

r = 0,

Background Material

64 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model VIII
However, b
is a random variable the value of which varies from sample to
sample. Therefore, the question we need to consider is whether or not
Rb
r is statistically signicantly dierent from zero.

To determine whether or not R b


r is statistically signicantly dierent
from zero we need to know the probability distribution of R b
r . We next
b
show that the asymptotic distribution of R r can be derived from our
knowledge of the asymptotic distribution of b
.
Assume that

Then,

asy
b

Rb

(ETC3410)

asy

RN

V
n

Background Material

(1.39)

V
n
65 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model IX
)
)

Rb

Rb

Rb

asy

asy

asy

R ,

RVR 0
,
n

R ,

RVR 0
n

r,

RVR 0
.
n

(1.40)

In going from the second line to the third line of the derivation we used the
result that

Var (R b
) = RVar (b
)R 0
RVR 0
=
.
n
(ETC3410)

Background Material

66 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model X
Equation (1.40) implies that under the null hypothesis that

r = 0,

R
Rb

asy

0,

RVR 0
.
n

(1.41)

In principal, we could use (1.41) as our test statistic. However, if we did so,
the critical value for our test would depend on R, and there would be a
dierent critical value for each possible choice of R.
We can eliminate the dependence on R of the critical value for our test
statistic by transforming our test statistic from a normal random variable
into a chi-square variable. The transformation is achieved by appealing to
the following well known theorem in mathematical statistics.
(ETC3410)

Background Material

67 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XI

Theorem
1. Let Z be a kx1 random vector. If
Z

asy

N (0, ),

then

Z 0

asy

2 (q ),

where q is the rank of the matrix .

(ETC3410)

Background Material

68 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XII
Applying Theorem 1 to

Rb

asy

0,

RVR 0
,
n

(1.41)

with R b
r playing the role of Z, we conclude that, under the null
hypothesis

Rb

RVR 0
n

r = 0,
1

Rb

asy

2 (q ),

(1.42)
(1.43)

where q is the number of restrictions imposed under the null


hypothesis.

(ETC3410)

Background Material

69 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XIII
The statistic on the left-hand side of (1.43) is not feasible, since it depends
on the unknown matrix V. A feasible test statistic for testing (1.42) is given
by

W = Rb

b R0
RV
n

Rb

asy

2 (q ),

(1.44)

b is a consistent estimator of V, i.e.


where V

b) = V.
p lim( V

b is a consistent estimator of V, the left-hand sides of (1.43) and


As long as V
(1.44) are asymptotically equivalent.
Note:

(ETC3410)

Background Material

70 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XIV

A hypothesis test based on (1.44) is called a Wald test (because it


was rst proposed by Hermann Wald in 1943).
A Wald test is the most common form of hypothesis test used in
econometrics because, unlike other tests, a Wald test can be conducted
no matter what estimation method is used to estimate the regression
equation.
Since only the asymptotic distribution of W is known, the Wald test is
an asymptotic test, and may be unreliable in small samples.
The Wald test statistic in (1.44) is sometimes written as
W = n Rb

(ETC3410)

b R0
RV

Background Material

Rb

r .

(1.45)

71 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XV
The derivation of (1.44) depends crucially on the result that
asy
b

V
n

(1.39)

and illustrates how knowledge of the asymptotic distribution of an


estimator can be used to construct an asymptotically valid test statistic.
Testing at the 5% signicance level, we reject the null hypothesis that

r =0

(1.42)

if

Wcalc > 20.95 (q ),


where Wcalc denotes the sample value of the test statistic, and 20.95 (q )
denotes the 95th percentile of the chi-square distribution with q degrees of
freedom.
(ETC3410)

Background Material

72 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XVI
By Theorem 1 above

q = rank

bR
RV
n

which in turn can be shown to equal the number of restrictions imposed


under the null hypothesis. Therefore q in

W = Rb

b R0
RV
n

Rb

asy

2 (q )

(1.44)

is always equal to the number of restrictions imposed under the null


hypothesis that

(ETC3410)

r =0

Background Material

(1.42)

73 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XVII
Equivalently, we reject (1.42) if

value < 0.05,

p
where

value = prob 2 (q ) > Wcalc .

It can be shown that

W
q

asy

F (q, n

k ),

(1.46)

where n is the sample size and k denotes the number of regressors in the
model (including the constant).

(ETC3410)

Background Material

74 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XVIII
Consequently, one can also implement the Wald test as an asymptotic
F-test. In this case we reject the null hypothesis if

Fcalc

Wcalc
> F0.95 (q, n
q

k ),

where F0.95 (q, n k ) denotes the 95th percentile of an F distribution with


q degrees of freedom in the numerator and n-k degrees of freedom in the
denominator.
Note:
Tests based on (1.44) and (1.46) are asymptotically equivalent.
However, they produce dierent p-values in nite samples.
Some software packages report results based on (1.44), some report
results based on (1.46) and some report both.
(ETC3410)

Background Material

75 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XIX
Some researchers believe that that F (q, n k ) is a better
2
approximation to the nite sample distribution of W
q than (q ) is to
the nite sample distribution of W. Consequently, they use (1.46) in
the hope that it will produce more reliable results in a nite sample.
In the special case of testing

H0 : k = 0
in the linear regression equation

y = 1 + 2 x2 + ..... + k + u
(i.e. testing the individual signicance of xk ), it can be shown that he test
statistic

W = Rb

(ETC3410)

b R0
RV
n

Background Material

Rb

asy

2 (q )

(1.44)
76 / 85

1.5 Testing linear restrictions on the parameters of an


econometric model XX
reduces to

Wz =

b
k
se (b
k )

asy

N (0, 1).

(1.47)

If the model is estimated by maximum likelihood, the null hypothesis

r =0

(1.42)

can also be tested by performing a likelihood ratio (LR) test. It can be


shown that under (1.42) the test statistic

LR

2(lu

lr )

asy

2 (q ),

(1.48)

where lu and lr respectively denote the maximized values of the


log-likelihood function of the unrestricted and restricted models, and q again
denotes the number of restrictions imposed under the null. The LR test will
be discussed in more detail in Topic 2.
(ETC3410)

Background Material

77 / 85

1.6 A review of generalized least squares (GLS) I


Consider the linear regression model

yi = 0 + 1 xi 1 + 1 xi 2 + ... + 1 xik + ui , i = 1, ..., n,

(1.49)

or in matrix notation

y = X + u.

(1.50)

In introductory econometrics units it is often assumed that

var (ui jxi ) = 2 , i = 1, ....n

(1.51)

cov (ui , uj jxi , xj ) = 0, i 6= j,

(1.52)

and
where

xi = (xi 1 , xi 2 , ....., xik ).


Equations (1.51) and (1.52) respectively state that the errors in (1.50) are
conditionally homoskedastic and conditionally serially uncorrelated.
(ETC3410)

Background Material

78 / 85

1.6 A review of generalized least squares (GLS) II


When (1.51) and (1.52) hold,

6
6
Var (u jX ) = 6
6
4
(nxn )
When

2 0 . . 0
0 2 0 . 0
.
. . . .
.
. . . .
0
. . . 2

6
7
6
7
7 = 2 6
6
7
4
5

1 0 . . 0
0 1 0 . 0
. . . . .
. . . . .
0 . . . 1

Var (u jX ) = 2 In ,

7
7
7 = 2 In .
7
5
(1.53)

the errors in (1.50) are said to be "spherical", and when (1.53) is violated
they are said to be "non-spherical".
Notice that when the errors are spherical, the error covariance matrix is a
scalar identity matrix, that is, an identity matrix multiplied by a scalar 2 .
Assumption (1.51) is usually unrealistic for cross-section data, and
assumption (1.52) is usually unrealistic for time series data.
(ETC3410)

Background Material

79 / 85

1.6 A review of generalized least squares (GLS) III


Denote the conditional error variance matrix for non-spherical errors by

Var (u jX ) = 6= 2 In ,

(1.54)

where the precise form of depends on the nature of the departure from
sphericity. For example, in the case of conditionally uncorrelated,
heteroskedastic errors

6
6
=6
6
4

(ETC3410)

21 0 . . 0
0 22 0 . 0
.
. . . .
.
. . . .
0
. . . 2n

Background Material

7
7
7.
7
5

80 / 85

1.6 A review of generalized least squares (GLS) IV


It is well known that when (1.54) holds the OLS estimator of in

y = X + u.

(1.50)

is ine cient. In this case an e cient estimator can be obtained by


executing the following steps:

S1 Multiply on both sides of


y = X+u

(1.50)

by 1/2 and obtain

1/2

y =

1/2

X+

1/2

u,

or

y = X +u ,

(1.55)

where

y
(ETC3410)

1/2

y, X

1/2

Background Material

X, u

1/2

u.
81 / 85

1.6 A review of generalized least squares (GLS) V

Notice that

Var (u jX ) = Var (

=
=
=
=
=

1/2

1/2

1/2

1/2

u)

Var (u jX )

1/2

1/2

1/2

(using (1.54))

1/2

1/2

0 0
In .

(1.56)

Therefore, the errors in (1.55) are spherical, and by the Gauss-Markov


theorem in (1.55) can be e ciently estimated by OLS.

(ETC3410)

Background Material

82 / 85

1.6 A review of generalized least squares (GLS) VI


S2 Applying the usual OLS formula to
y = X +u

(1.55)

b
= (X 0 X ) 1 X 0 y
h
i 1
= ( 1/2 X )0 ( 1/2 X
( 1/2 X )0 y
h
i 1
= X 0 1/2 1/2 X
X 0 1/2 1/2 y

(1.57)

we obtain

X 0

X 0

y.

The estimator in (1.57) is called the generalized least squares (GLS)


estimator of in the regression equation

y = X + u,
(ETC3410)

Background Material

(1.50)
83 / 85

1.6 A review of generalized least squares (GLS) VII


and is denoted by

b
GLS = X 0

X 0

y.

(1.58)

In summary, the OLS estimator of in

y = X + u,

(1.50)

is

b
OLS = (X 0 X )

X 0y ,

X 0

and the GLS estimator of is

b
GLS = X 0
(ETC3410)

Background Material

y.

(1.58)

84 / 85

1.6 A review of generalized least squares (GLS) VIII


Notice b
GLS can be obtained from the formula for the OLS estimator,

b
OLS = (X 0 X )

X 0y ,

by inserting 1 between X 0 X and between X 0 y , where

= Var (u jX )
b
GLS is not a feasible estimator, since it depends on the unknown matrix
1 . A feasible GLS (FGLS) estimator is given by
h
i 1
b
b 1X
b 1y ,
FGLS = X 0
X 0
(1.59)

where

b = .
p lim

b is a consistent estimator of .
That is,

Many of the estimators that we will discuss in this unit are FGLS estimators.
(ETC3410)

Background Material

85 / 85

You might also like