You are on page 1of 62

Lecture Series 1

Linear Random and Fixed Effect Models and


Their (Less) Recent Extensions
Stefanie Schurer
stefanie.schurer@rmit.edu.au
RMIT University
School of Economics, Finance, and Marketing

January 21, 2014

1 / 62

Overview
1

Recap: Linear model set-up, random effects estimation and


fixed effects estimation;

Relationship between random and fixed (and between) effects


estimators;

Is fixed effects estimation always preferable to random effects


estimation?;

Hausman-Taylor (1981) approach to estimating coefficients on


both time-varying and time-invariant variables;

Correlated random effects (CRE): a flexible extension to


random effect models to relax orthogonality condition;

Plumper and Troegers Fixed Effects Vector Decomposition


Approach and Rule of Thumb;

Application: Estimating the effects of health on wages.


2 / 62

References for Lecture 1


1
2
3
4
5
6

Greene, W.H. (2011). Econometric Analysis. Pearson


Education Limited. 399-438.
Wooldridge, J. (2009). Econometric Analysis of Cross Section
and Panel Data. The MIT Press. 285-382, 345-361.
Hsiao, C. (2003). Analysis of Panel Data. Econometric
Society Monographs. CUP: New York. 27-44.
Mundlak, Y. (1978). On the Pooling of Time Series and
Cross-section Data. Econometrica 46: 69-85.
Hausman, J., Taylor, W.E. (1981). Panel Data and
Unobservable Individual Effects. Econometrica 49: 1377-1398.
Pluemper, T., Traeger, V. (2007). Efficient estimation of
time-invariant and rarely changing variables in finite sample
panel analyses with unit fixed effects. Political Analysis 15:
124-139.
Contoyannis, P., Rice, N. (2001). The impact of health on
wages. Evidence from the BHPS. Empirical Economics 26:
599-622.

3 / 62

1. Recap: Linear model set-up

4 / 62

Heterogeneous intercept models


Consider the following linear regression model which allows for
individual-specific heterogeneity i
Yit = Xit + it ,

(1)

for all i = 1, . . . , N and t = 1, . . . , T


it = i + uit ,

(2)

Yit is some outcome of interest;


Xit is a vector of covariates (Xit1 , . . . , XitK ) and generally

includes a constant term, i.e. Xit1 = 1 for all i and t. These


may include also time-invariant variables such as Xi .
The unobserved (errors) consist of two components: i
(constant across time), uit is an idiosyncratic error term that
varies across individuals and time
(uit iid(0, u2 ), E (it |i , Xi 1 , . . . , XiT ))=0.
5 / 62

The model in matrix notation

The NT observations are ordered first by i units, and then by t


observations, such that:
Y = X + ,

(3)

The dimensions are:


Y: NT 1 vector of Yit s;
X: NT K matrix with rows columns Xitk ;
: NT 1 vector of it s.

6 / 62

OLS estimator in matrix form


The OLS estimator for is:
OLS = (X X )1 (X Y )

(4)

Our focus here is how to estimate this model under different


assumptions about the individual-specific heterogeneity i . Early
discussions (examples) in the literature were concerned with
whether i should be treated as a random variable (which would
add a error term) or as a fixed parameter to be estimated for each
cross-sectional group.
More modern approaches to panel data econometrics are more
concerned with the question whether i is correlated with the
explanatory variables of interest (e.g. Wooldridge, 2009, p.
285-286).
7 / 62

Random versus fixed effect models

We will examine the implications for OLS estimation under the


alternative assumptions that:
1

i is uncorrelated with Xit for all t = 1, . . . , T (referred to as


random effects model);

i is allowed to arbitrarily correlate with Xit for all


t = 1, . . . , T (referred to as fixed effects model);

i is assumed to linearly depend on Xit (Referred to as


correlated random effects model);

We will consider the suitable estimators in each case.

8 / 62

Random effect models


In the random effect model we assume Cov (Xit , i ) = 0
t = 1, . . . , T , or the stronger assumption of zero conditional
expectation, i.e.: E (i |Xi 1 , . . . XiT ) = 0. In this scenario, using
OLS will yield unbiased parameter estimates, but wrong standard
errors and thus unreliable statistical inference. Lets take a look at
why:
Consider the properties of the OLS estimator:
E (OLS |X ) = E {(X X )1 X Y |X }

= E {(X X )

X (X + )|X }

= + (X X )

X E {|X }

(5)
(6)
(7)
(8)

9 / 62

Random effect models


Now think of the sampling properties of the OLS estimator:
Var (OLS |X ) = Var {(X X )1 X Y |X }

= Var {(X X )

X (X + )|X }
1

= Var { + (X X )

= XX

(9)

X |X }

(10)
(11)

X Var {|X }X (X X )

(12)

Recall, the OLS assumption about is that it iid (0, 2 ) and so:
Var (OLS |X ) = 2 (X X )1 ,

(13)

and replacing 2 by an estimate, typically the sample variance of


the regression errors:
s2 =

N T
1 XX 2
eit .
NT

(14)

i =1 t=1

10 / 62

Random effect models


But whats wrong with the variance when we allow for unobserved
heterogeneity? Due to it = i + uit the assumption of
independent errors across observations fails. In particular, if
i N(0, 2 ) and uit iid(0, u2 ), where 2 = u2 + 2 , then the
variance-covariance matrix of i = (i 1 , i 2 , . . . , iT ) :
u2 + 2
2
...
2
2

2
u2 + 2 . . .
2
2

2
2
2

...

2
Var (i |Xi ) =

2
2
. . . u2 + 2
2
2
2
2
2

...

u + 2

u2 IT T + 2 iT 1 i1T
= (i is a vector of ones).

11 / 62

Random effect models

Since the observations i and j are independent, the disturbance


covariance matrix for the full NT observations is:

0 ... 0 0
0 ... 0 0
N

= .
= INN
T T
.. ..
.
.
... . .
0 0 . . . 0 NT NT

12 / 62

Random effect models

There are two solutions to fix the wrong standard errors implied by
cross-sectional unobserved heterogeneity when using OLS:
1

Correcting the OLS standard errors: robust covariance matrix


estimation; estimate model with OLS, then adjust standard
errors ex post.

Random effects estimation: obtain a more efficient estimator


of using generalised least squares. transform the data first,
then use OLS on transformed data - this approach is similar to
(feasible) GLS when controlling for e.g. heteroskedasticity.

13 / 62

1. Correcting OLS standard errors ex post

Note that Var (OLS |X ) = X X 1 X Var {|X }X (X X )1

implies that Var (|X ) is a NT NT matrix with a block


diagonal structure;
For each of the N cross-sectional groups there will be T T

diagonal blocks corresponding to Var (i |X );


Off these diagonal blocks the matrix has zeros due to the

assumed independence of the cross-sectional sample;


Thus we can correct the OLS standard errors by replacing

Var (|X ) with a suitable estimate from the sample data.

14 / 62

1. Correcting OLS standard errors ex post


Suitable estimators are:
2 by:
Estimate

s2

N T 1 T
NT (T 1) 1 X X X
=
eit eis
2

(15)

i =1 t=1 s=t+1

Estimate 2 by:
2

s = (NT )

N X
T
X

eit2

(16)

i =1 t=1

This approach is nothing else than robust covariance matrix

estimation (See p. 390 in Greene, 2012).

15 / 62

Random effects (or GLS) estimation


We want to transform the data in a way that the variance of

the transformed errors is equal to the identity matrix; i.e.


Var (
) = Var () = INT

(17)

A good candidate for transforming the data for each

individual is 1/2 - hence, if we find this term, we can


pre-multiply Yi , Xi and i by 1/2 (or, in terms of matrix
notation: 1/2 and pre-multiply Y , X , and ).
See the derivations of 1/2 on the blackboard;
The final result is:
1/2 =

[I iT 1 i1T
],
u
T

where
=1 p

u
u2 + T 2

(18)

(19)
16 / 62

Random effects estimation


Consider the following transformation of our benchmark linear
regression model:
1/2 Y = 1/2 X + 1/2 ,

(20)

Y = X + .

(21)

or

where, for instance:

1/2 Yi =

Yi 1 Yi
Yi 2 Yi
..
.
YiT Yi

1/2

Xi =
,

Xi 1 Xi
Xi 2 Xi
..
.
XiT Xi

17 / 62

Random effects estimation


We can show that the transformed errors have the property that
Var (
) = Var () = INT (try to do at home). Thus, feasible
GLS regression based on this transformation satisfies the necessary
assumptions for efficient estimation of , and is referred to as the
random effects estimator:
RE = (X X )1 X Y = (X 1 X )1 X 1 Y .

(22)

The variance of this estimator is (Homework: Check whether you


can derive all steps by yourself - we will talk about it in class next
week)
Var (RE |X ) = Var {(X 1 X )1 X 1 Y |X }

= (X
= (X

X)

X)

X (X

(23)
1

X)

(24)
(25)
18 / 62

Fixed effect models


In fixed effect model we allow for the possibility of Cov (Xit , i ). In
this case, the OLS estimator OLS will be biased and inconsistent.
This is so because:
E (OLS |X ) = E {(X X )1 X Y |X }

= E {(X X )

6= ,

X (X + )|X }

= + (X X )

X E {|X }

(26)
(27)
(28)
(29)

where the last inequality stems from the fact that E (i |Xit ) 6= 0.

19 / 62

Fixed effect models


There are two solutions to the problem. Re-consider the original
model:
Yit = Xit + it ,

(30)

for all i = 1, . . . , N and t = 1, . . . , T


it = i + uit ,

(31)

Within-group fixed effects: subtract the within-group means


from the original regression equation that combines Eqs. 30
and 31.

First-differences between two adjacent time periods.

20 / 62

Within-group fixed effects


Construct the within-group average of benchmark linear regression
model:
Yi = Xi + i + ui ,
(32)
P
P
Yit and Xi = T 1 T
where Yi = T 1 T
t=1
t=1 Xit , and
PT
1
ui = T
t=1 uit . Then, subtract Eq. 32 from combined Eqs. 30
and 31.
Yit Yi = (Xit Xi ) + (uit ui ) (i i ).

(33)

And so the within-group fixed effects estimator is:


FE =

N X
T
N X
T
hX
i1 h X
i

(Xit Xi )(Xit Xi )
(Xit Xi )(Yit Yi )
i =1 t=1

i =1 t=1

(34)
21 / 62

Within-group fixed effects


In contrast to the random effects or GLS procedure which uses

both within-group (across time) and between-group (across


cross-sectional units) variation to estimate , the within-group
fixed effect approach uses only the within-group variation.
Any time-invariant observable characteristics will also

difference out, so that their coefficients cannot be identified


(unless they are interacted with time-varying variables).
N degrees of freedom will be lost, since this approach

estimates the group sample means (one for each group).


Even though the transformed errors in Eq. 33 (uit ui ) are

non-classical (which means what?), the OLS standard errors


from the fixed effects regression are correct.

22 / 62

First differences approach

Alternatively, one can eliminate i from the equations by taking a


first difference of the regression model combining Eqs. 30 and 31:
Yit Yit1 = (Xit Xit1 ) + (uit uit1 ).

(35)

Only coefficients on time-varying regressors are estimable.


First differences is equivalent to within-group fixed effect

estimation for T=2.

23 / 62

Pros and cons

The first differences approach is easy to implement manually

and keeping track of the correct number of degrees of


freedom is more straightforward.
If the model is correctly specified and if there is no serial

correlation, then within-group fixed effect estimation is more


efficient than first differences.
The relative efficiency between the two estimators depends on

the degree of serial correlation in the idiosyncratic errors


(Cov (uit , uis ), for t 6= s). (Why?)

24 / 62

2. Relationship between random and fixed


(and between) effect estimators

25 / 62

Some transformations

Consider the following transformation

T 1 iT 1 i1T
,
N
where INN is the identity matrix of dimension N N,
is
the Kronecker product, and iT 1 is a T 1 vector of 1s.

Group-means transformation: P = INN

Deviations from group means: Q = INT NT P

26 / 62

Some transformations, cont.


P and Q have the effect of transforming the data to group means,
and deviations from means, respectively:


Y11 Y1
Y1

...
...


Y1T Y1
Y1

Y2
Y21 Y2

...
...

PY = , QY =
, and so on.
Y2T Y2
Y2
...

...

Y Y
Y
N
N1
N

...
...

YN
YNT YN
Note that P and Q are idempotent (P 2 = P, Q 2 = Q) and
orthogonal (PQ = 0).
27 / 62

Fixed and random effects similarity


Hence, the fixed effects estimator of can be expressed in more
compact notation:
FE = (X QX )1 X QY

(36)

The random effect transformation described above is a partial


deviation from group means:
Yit = Yit Yi ,

(37)

Xit = Xit Xi ,

1/2
u2
where = 1 2 +T
.
2

(38)

and

28 / 62

Fixed and random effects similarity


The partial deviations framework provides an optimal use of the
within group and the between group variation. Note that the larger
is the between-group fraction of total variation (i.e. 2 relative to
u2 ) and/or the larger is T , the greater will be (closer to 1), and
the more weight is given to within-group, compared to
between-group, variation.
2 = 0, then = 0 and the full variation in
Suppose T=3 and

the data is used, compared to = 0.5, if 2 = u2 ;

2 = 2 , then if T=3, = 0.5


Alternatively, suppose
u

compared to = 0.75 if T=15.

29 / 62

Random effects as weighted average


The random effect estimator can be thought of as a weighted
average of the within-group estimator FE and the between-group
estimator BE based on the group-means data:
RE = kk FE + (Ikk kk )BE ,

BE

N
hX
i =1

T (Xi X )(Xi X )

N
i1 X

(39)

T (Xi X )(Yi Y )

i =1

= (X PX )

X PY

30 / 62

Random effects as weighted average

N X
T
hX

(Xit Xi )(Xit Xi ) +

i =1 t=1

N X
T
X

N
X

T (Xi X )(Xi X )

i1

i =1

(Xit Xi )(Xit Xi )

i =1 t=1

u2
= (1 )2 .
u2 + T 2

(40)

If = 0, then FE and RE are equivalent (a lot of weight is

given to within-group variation).


If = 1, a lot of weight is given to between-group variation.
However, 0 < < 1 is the more likely case.
31 / 62

Summary

If E (i |Xit ) = 0,
Both RE and FE are consistent for (and so would be OLS).
RE is efficient, OLS has biased standard errors.
If E (i |Xit ) 6= 0,
RE is inconsistent for .
FE is consistent for .

32 / 62

Testing
The efficiency/consistency trade-off between RE and FE suggests
a method to test the random effects restriction. One of these tests
is the Hausman test. Under the null hypothesis,
H0 : E (i |Xit ) = 0, RE is efficient, but it is inconsistent under the
alternative hypothesis (Ha : E (i |Xit ) 6= 0). In contrast, FE is
consistent under both H0 and H1 .
The Hausman test statistic for this test is:
H = (FE RE ) {Var (FE RE )}1 (FE RE ),

(41)

where Var (FE RE ) = Var (FE ) Var (RE ) is the


variance-covariance matrix of the difference between the fixed
effects and random effects estimator.
33 / 62

Testing

Under the null hypothesis, the Hausman test statistic has a 2


distribution with degrees of freedom equal to the dimension of ,
i.e.:
H 2k
(42)
Note: Since the fixed effects estimation method can only identify
coefficients on time-variant variables, the relevant dimension of
is the number of time-varying variable coefficients.

34 / 62

3. Is fixed effects estimation always preferable


to random effects estimation?

35 / 62

Is FE always better than RE?

Recall: The fixed effects estimator uses only the within-group (=


difference from group mean) variation and ignores the
between-group variation. This method is used because of a
concern that this between-group variation is contaminated with
unobserved heterogeneity.
In some cases, the cross-sectional variation may be more reliable
than the within-group time-variation, in which case fixed effects
estimation may be worse than the OLS or RE alternatives.

36 / 62

Is FE always better than RE?


Examples are:
Measurement error in Xit : If Xit is measured with classical, i.e.

purely random, error, then taking either differences-from-mean


or first-differences will exacerbate the noise-to-signal ratio in
the resulting data Serious attenuation bias in FE
Endogenous changes in Xit : If X is endogenous, i.e. changes

in Xit over time are not exogenous to changes in Yit , then


fixed effects estimation may be worse than random effects or
OLS. In this case, (Xit Xi ) may be strongly correlated with
(it i ).
There may not be enough variation in the X variables,

although FE can estimate the coefficient even if X rarely


changes (Pluemper and Troeger, 2007)

37 / 62

4. Hausman-Taylor (1981) approach to


estimating coefficients on both time-varying
and time-invariant variables

38 / 62

Hausman-Taylor (1981) approach


If we have a situation, in which we have both time-variant and
time-invariant variables of interest, Hausman and Taylor show that
consistent estimation of the coefficients of interest is possible, if
not all of the time-varying coefficients are correlated with the
unobserved heterogeneity,
The basic idea is to use the group means of the time-varying
variables that are uncorrelated with the unobserved heterogeneity
as instrument for the time-invariant variables to obtain consistent
estimates of their coefficients, while consistent estimates of the
time-varying variable coefficients can be obtained using standard
fixed effects estimation.
This requires there are at least as many uncorrelated time-varying
variables as correlated time-invariant variables and also that there
is suitable correlation between these.
39 / 62

Hausman-Taylor (1981) approach

Consider the linear regression of Yit on k time-varying covariates


(Xit ) and g time-invariant covariates (Zi ):
Yit = Xit + Zi + it ,

(43)

where i = 1, . . . , N, t = 1, . . . , T , and
it = i + uit ,

(44)

40 / 62

Hausman-Taylor (1981) approach

Sub-divide each of the Xit = (X1it X2it ) and Zi = (Z1i Z2i ) :


X1it and X2it consist of k1 and k2 variables, respectively

(k1 + k2 = k);
Z1i and Z2i consist of g1 and g2 variables, respectively

(g1 + g2 = g );
E (i |X1it ) = 0 and E (i |Z1i ) = 0; and
E (i |X2it ) 6= 0 and E (i |Z2i ) 6= 0.

41 / 62

Hausman-Taylor (1981) approach


The intuition for the Hausman-Taylor approach is follows:
STEP 1: Fixed effects provides consistent estimation of the
coefficients on the time-varying variables:
(45)
FE = (X QX )1 X QY
N 1
Remember that Q = I P, where P = I
T ii The
residual variance obtained in this step is a consistent estimator
of u2 .
STEP 2: Using FE to construct the group means of the
within-group residuals:
di = Yi Xi FE = Zi + i + ui ,

(46)

where ui is the group-mean residual (uit ).


If (46) was estimated with OLS or GLS, then
is likely to be
biased, due to the correlation of Zi 2 with i .
42 / 62

Hausman-Taylor (1981) approach


Where does expression for d in (46) come from? The group means
of the within-group residuals are derived as follows:
d = P(Y X FE ) = P{I X (X QX )1 X Q}Y
= P{I X (X QX )1 X Q}(X + Z + + u)
= P(X + Z + + u X )
= P(Z + + u)
= Z + + Pu
This is a regression of the group-mean residuals from the fixed
effects regression on the Zi s, with i + ui being the group-mean
residuals.

43 / 62

Hausman-Taylor (1981) approach


1i as instruments for Z2i . This will provide
STEP 3: Use X
consistent estimation of if there are sufficient X1 s (i.e. order
condition: k1 g2 ), and the X1i s are correlated with the Z2i s
(rank condition).
Then estimate (46) with a 2-SLS approach, where:

= (Zi PA Zi )1 Zi PA di

(47)

where A = [X1it Z1i ], and PA is the projection matrix:

and

PA = A(A A)1 A

(48)

Z2 = A(A A)1 A Z2

(49)

44 / 62

Hausman-Taylor (1981) approach


NOTE: Both FE and
2SLS are consistent. However, since

FE is likely to be inefficient, then 2SLS , which stem from the


FE approach are likely to be inefficient too. Therefore,
Hausman and Taylor suggest an extension to estimate and
in a more efficient way.

STEP 4: The residual variance in the step above is a

consistent estimator of 2 = u2 /T + 2 . Using the


consistent estimator of u2 from the first step, we deduce an
estimator for 2 = 2 u2 /T . The weight for feasible GLS
is:
=1 p

u
u2 + T 2

(50)

45 / 62

Hausman-Taylor (1981) approach


STEP 5: Construct a weighted instrumental variable

estimator. The full set of variables is:

wit = (X1it
X2it
Z1i Z2i ) = WNT (k1 +k2 +g1 +g2 ) ,

(51)

so the transformed variables of GLS are:

wit

Yit

= wit w
i
= Yit Yi .

The instruments used are:


vit = [(X1it X1i ) (X2it X2i ) Z1i X1i ]

(52)

46 / 62

Hausman-Taylor (1981) approach


1

Instrumental variable estimator (efficient):


( )IV = [(W V )(V V )1 (V W )]1 [(W V )(V V )1 (V Y )]
(53)

Instrumental variable estimator using un-weighted variables


(inefficient):
( )IV = [(W V )(V V )1 (V W )]1 [(W V )(V V )1 (V Y )]
(54)

Feasible GLS estimator


( )GLS = [W W ]1 [W Y ]

(55)
47 / 62

5. Correlated random effects (CRE): a flexible


extension to random effect models

48 / 62

Intuition of CRE

Recall that the random effects estimator is biased if is correlated


with Xit . Chamberlain (1984) and Mundlak (1978) observed that
if i is correlated with Xit in period t, then it will also be
correlated with Xit in period s, where t 6= s. One interpretation of
this observation is that Xit should be included in the period s
regression. More generally, all the realisations of the X s should be
included in each periods regression.
That is, if i is correlated with Xit in the structural form then all
leads and lags of Xit should be included in the regression.

49 / 62

Formalisation of CRE
Specify the linear projection of i on the set of Xit s:

i = Xi1 1 + Xi2 2 + . . . + XiT


T + i .

(56)

Eq. 56 provides a way to decompose i into two components:


1

) that is
A component (Xi1 1 + Xi2 2 + . . . + XiT
T
correlated with the observable covariates; and

A component (i ) that is uncorrelated with the covariates.

The s are the projection coefficients that reflect the extent of the
correlation between i and Xit , and i is, by construction, a true
random effect - i.e. uncorrelated with Xit for all t.

50 / 62

Formalisation of CRE
Note:
E (i |Xit ) does not have to be linear in the Xit s. It is only the

linear correlation that causes bias/inconsistency in the OLS


and (random effects/GLS) estimator. Hence, only the linear
projection is required for CRE to be unbiased/consistent.
Mundlak (1978) adopted the more restricted specification that

1 = 2 = T = . This restriction implies that Eq. 56


reduces to:
i = (T Xi ) + i

(57)

51 / 62

Mundlaks assumption and consequences


The assumption that the individual-specific effect is equally
correlated with all time-period Xit s implies a very easy
implementation of the correction. All you need to do is to replace
i in Eq. 60:
Yit = Xit + i + uit ,
(58)
With (and ignore the scaling factor of T):

To get:

i = (T Xi ) + i

(59)

Yit = Xit + Xi + i + uit ,

(60)

where i is a true random effect.

52 / 62

Chamberlains approach
If you do not want to make the strong assumptions made by
Mundlak, then implementation of this correction is slightly more
difficult. Use Eq. 56 to substitute for i in combined Eq. 30, we
get:

Yit

= Xit + Xi1 1 + Xi2 2 + . . . + XiT


T + i + uit (61)
X
= Xit ( + t ) +
Xis s + i + uit .
(62)
s6=t

or, in more compact form:

Yit = Xi1 t1 + Xi2 t2 + . . . + XiT


tT + i + uit .

where ts =

s
+ t

(63)

s 6= t
s = t.
53 / 62

Some explanations
Eq. 62 is the reduced form equation for the model. The errors
(i + uit ) are uncorrelated with the regressors. This expression
shows that one way to view the problem of ignoring the correlation
between the covariates and the unobserved heterogeneity is an
omitted variables problem that can be solved by including all the
out-of-period realisations of Xis in the period t equation.
In Eq. 63, the coefficient on Xit , i.e. tt , consists of two
components:
1

The structural effect of interest ;

The component t , which reflects the correlation of Xit with


the unobserved heterogeneity.

54 / 62

Estimation of CRE
The parameters of interest ( and t s) can be estimated by the
minimum distance approach - it requires two steps:
1

Estimate the unrestricted reduced form equations as outlined


in Eq. 63 by OLS. Include all the leads and lags of the Xit s in
the period t regression, and estimate this regression separately
for each time period.

Estimate the parameters of interest by imposing the implied


restrictions (see below) on the first-stage reduced form
coefficients using a minimum distance estimation method.
This latter means to use a quadratic form criteria as the basis
for estimating the parameters of interest in the second stage.

The implied cross-equation restrictions are:


1

ts = s t 6= s;

tt st = t 6= s.

The details of minimum distance are explained on the white-board.


55 / 62

Evaluation of CRE
This approach is called random effects because it

parameterises the distribution of i (i.e. by projecting i onto


the set of sample realisations of Xit );
It requires to estimate 1 + TK + K parameters (risk of
proliferation of parameters);
It relies on the measured Xit s being time-varying. Time
invariant variables will be absorbed into the i in this
specification;
A test of the (zero-) correlation between the covariates and
the unobserved heterogeneity is given by testing
H0 : 1 = 2 = . . . = T = 0 vs Ha : not all are zero;
An important caveat to the CRE discussion is that Xis enters
the period t equation only via its correlation with i . In some
situations, out-of-period regressors may have independent,
structural reasons for being included (this approach may fail
then).
56 / 62

6. Plumper and Troeger (2007) approach to


modelling (nearly) time-invariant variables

57 / 62

Three-stage procedure

Run a fixed effects model - predict the individual fixed effect;

Decompose the individual fixed effects into the part explained


by time-invariant and/or rarely changing variables and an
error term (hi );

Re-estimate the first stage by pooled OLS including the


time-invariant variables plus the error term of stage 2.

58 / 62

Three-stage procedure
yit yi = k

K
X

(xkit xki +m

M
X

(zmi zmi )+(eit ei )+(ui ui ))

m=1

k=1

(64)
Let:
ui = yi

K
X

k xkit ei )

(65)

k=1

ui =

M
X

m zmi + hi

(66)

m=1

and
hi = ui

M
X

m zmi

(67)

m=1

yit = + k

K
X
k=1

xkit +

M
X

m=1

m zmi + hi + it

(68)
59 / 62

Monte Carlo Simulations


1

Compare finite sample properties of the FEVD estimator


against those of the Pooled OLS, RE, and Hausman-Taylor IV
estimator (Use RMSE as criterion);

If both time-invariant and time-varying variables correlate


strongly with the individual FE, than FEVD outperforms all
estimators;
When considering the estimates of coefficients on rarely
changing variables, FEVD outperforms FE if:

Ratio between Between/Within variation is high (threshold is

1.7), and;
Overall R 2 is low, and;
Correlation between rarely changing variables and ind. FE is

low.

60 / 62

7. Application: Effect of Health on Hourly


Wages (Contoyannis and Rice, 2001) using six
waves of BHPS

61 / 62

Assumptions
1

Remember: Use the mean values of the exogenous


time-varying variables to instrument the time-invariant
endogenous variables

Time-invariant endogenous variables: Higher Degree;

Time-variant endogenous variables: Health (Psychological and


Physiological), workforce sector, occupation;

Test for the validity of the instruments in the Hausman and


Taylor approach using a Hausman test (comparing the
estimated coefficients with those of a FE model): They should
be sufficiently close.

Approach is valid only if health is correlated with the


individual, time-invariant effect of wages, but not with the
period-specific effects of wages
62 / 62

You might also like