Basic Metrics BM

Basic Econometrics
Econometric Analysis (56268)

Dr. Keshab Bhattarai
Hull Univ. Business School
February 7, 2011
Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 1 / 243
Algebra of Matrices
A =
_
a
11
a
12
a
21
a
22
_
; B =
_
b
11
b
12
b
21
b
22
_
; C =
_
c
11
c
12
c
21
c
22
_
;
Addition:
A +B =
_
a
11
a
12
a
21
a
22
_
+
_
b
11
b
12
b
21
b
22
_
=
_
a
11
+b
11
a
12
+b
12
a
21
+b
21
a
22
+b
22
_
(1)
Subtraction:
A B =
_
a
11
a
12
a
21
a
22
_
_
b
11
b
12
b
21
b
22
_
=
_
a
11
b
11
a
12
b
12
a
21
b
21
a
22
b
22
_
(2)
Multiplication:
AB =
_
a
11
a
12
a
21
a
22
_
_
b
11
b
12
b
21
b
22
_
=
_
a
11
b
11
+a
12
b
21
a
11
b
12
+a
12
b
22
a
21
b
11
+a
22
b
21
a
21
b
12
+a
22
b
22
_
(3)
Determinant and Transpose of Matrices
Determinant of A
[A[ =

a
11
a
12
a
21
a
22
= (a
11
a
22
a
21
a
12
) ; (4)
Determinant of B [B[ =

b
11
b
12
b
21
b
22
= (b
11
b
22
b
21
b
12
)
Determinant of C [C[ =

c
11
c
12
c
21
c
22
= (c
11
c
22
c
21
c
12
)
Transposes of A, B and C
A
/
=
_
a
11
a
21
a
12
a
22
_
; B
/
=
_
b
11
b
21
b
12
b
22
_
; C
/
=
_
c
11
c
21
c
12
c
22
_
(5)
Singular matrix [D[ = 0. non-singular matrix [D[ ,= 0.
Inverse of A
A
1
=
_
a
11
a
12
a
21
a
22
_
1
=
1
[A[
adj (A) (6)
adj (A) = C
/
(7)
For C cofactor matrix. For this cross the row and column corresponding
to an element and multiply by (1)
i +j
C =
_
[a
22
[ [a
21
[
[a
12
[ [a
11
[
_
=
_
a
22
a
21
a
12
a
11
_
(8)
C
/
=
_
a
22
a
21
a
12
a
11
_
/
=
_
a
22
a
12
a
21
a
11
_
(9)
Inverse of A
A
1
=
1
(a
11
a
22
a
21
a
12
)
_
a
22
a
12
a
21
a
11
_
=
_
a
22
(a
11
a
22
a
21
a
12
)

a
12
(a
11
a
22
a
21
a
12
)
a
21
(a
11
a
22
a
21
a
12
)
a
11
(a
11
a
22
a
21
a
12
)
_
(10)
Exercise: Find B
1
.
Matrix Algebra
Market 1:
X
d
1
= 10 2p
1
+p
2
(11)
X
S
1
= 2 + 3p
1
(12)
Market 2:
X
d
2
= 15 +p
1
p
2
(13)
X
S
2
= 1 + 2p
2
(14)
X
d
1
= X
S
1
implies 10 2p
1
+p
2
= 2 + 3p
1
X
d
1
= X
S
1
implies 15 +p
1
p
2
= 1 + 2p
2
_
5 1
1 3
_ _
p
1
p
2
_
=
_
12
16
_
(15)
Application of Matrix in Solving Equations
_
p
1
p
2
_
=
_
5 1
1 3
_
1
_
12
16
_
(16)
[A[ =

a
11
a
12
a
21
a
22
5 1
1 3
= (5 3 (1) (1)) = 15 1 =
14;
C
/
=
_
a
22
a
21
a
12
a
11
_
/
=
_
a
22
a
12
a
21
a
11
_
=
_
3 1
1 5
_
_
p
1
p
2
_
=
1
14
_
3 1
1 5
_ _
12
16
_
=
1
14
_
(3 12) + (1 16)
(1 12) + (5 16)
_
=
_
52
14
92
14
_
=
_
26
7
46
7
_
(17)
Cramers Rule
p1 =
12 1
16 3
5 1
1 3
=
36 + 16
15 1
=
26
7
; p2 =
5 12
1 16
5 1
1 3
=
80 + 12
15 1
=
46
7
(18)
Market 1:
LHS = 10 2p
1
+p
2
= 10 2
_
26
7
_
+
_
46
7
_
=
64
7
= 2 +3p
1
= RHS
(19)
Market 2:
LHS = 15 +p
1
p
2
= 15 +
26
7

46
7
=
85
7
= 1 + 2p
2
=
85
7
= RHS
(20)
QED.
Extension to N-markets is obvious; a condence for solving large models.
Spectral Decomposition of a Matrix
[A I [ =

5 1
1 3
0
0
5 1
1 3
= 0 (21)
is Eigen value
(5 ) (3 ) 1 = 0 (22)
15 5 3 +
2
1 = 0 or
2
8 + 14 = 0 (23)
Eigen values
1
,
2
=
8
_
8
2
4 14
2
=
8
_
8
2
=
8 2.83
2
= 5.4, 2.6 (24)
_
5 1
1 3
_ _
x
1
x
2
_
=
_
0
0
_
(25)
Spectral Decomposition of a Matrix
for
1
= 5.4
_
5 5.4 1
1 3 5.4
_ _
x
1
x
2
_
=
_
0
0
_
(26)
_
0.4 1
1 2.4
_ _
x
1
x
2
_
=
_
0
0
_
(27)
x
2
= 0.4x
1
Normalisation
x
2
1
+x
2
2
= 1 ; x
2
1
+ (0.4x
1
)
2
= 1 (28)
1.16x
2
1
= 1; x
2
1
=
1
1.16
; x
1
=
_
0.862 = 0.928 (29)
x
2
= 0.4x
1
= 0.4 (0.928) = 0.371 (30)
Eigenvector 1
V
1
=
_
x
1
x
2
_
=
_
0.928
0.371
_
(31)
2
= 2.6
_
5 2.6 1
1 3 2.6
_ _
x
1
x
2
_
=
_
0
0
_
(32)
_
2.4 1
1 0.4
_ _
x
1
x
2
_
=
_
0
0
_
(33)
x
2
= 2.4x
1
x
2
1
+x
2
2
= 1; x
2
1
+ (2.4x
1
)
2
= 1 (34)
6.76x
2
1
= 1; x
2
1
=
1
6.76
; x
1
=
_
0.129 = 0.373 (35)
x
2
= 2.4 x
1
= 2.4 (0.373) = 0.895 (36)
Eigenvector 2
V
2
=
_
x
1
x
2
_
=
_
0.373
0.895
_
(37)
Orthogonality (Required for GLS)
(V
1
)
/
(V
2
) = 0 (38)
V
1
=
_
x
1
x
2
_
=
_
0.928
0.371
_
(39)
V
2
=
_
x
1
x
2
_
=
_
0.373
0.895
_
(40)
_
0.928 0.371

_
0.373
0.895
_
= 0.346 0.332 - 0 (41)
Orthogonality (Required for GLS)
(V
1
)
/
(V
2
) = 0 (42)
_
0.928 0.371

_
0.373
0.895
_
= 0.346 0.332 - 0 (43)
(V
1
V
2
)
/
(V
1
V
2
) = (V
1
V
2
) (V
1
V
2
)
/
= I (44)
_
0.928
0.371
0.373
0.895
_ _
0.928
0.371
0.373
0.895
_
T
=
_
1 0
0 1
_
(45)
Diagonalisation, Trace of Matrix
Inverse of an orthogonal matrix equals its transpose Q =
_
V
/
1
V
2
_
Q
1
= Q
/
(46)
Q
/
AQ = (47)
=
_
1
0 .. 0
0
2
.. 0
: : : :
0 0 ..
n
_
_
(48)
n
i =1
i
=
n
i =1
a
ii
(49)
[A[ =
1
2
....
n
(50)
Quadratic forms, Positive and Negative Denite Matrices
quadratic form
q (x) = (x
1
x
2
)
_
a
11
a
12
a
21
a
22
_ _
x
1
x
2
_
(51)
Positive denite matrix (matrix with all positive eigen values)
q (x) = x
/
Ax > 0 (52)
Positive semi-denite matrix
q (x) = x
/
Ax > 0 (53)
Negative denite matrix (matrix with all negative eigen values)
q (x) = x
/
Ax < 0 (54)
Negative semi-denite matrix
q (x) = x
/
Ax _ 0 (55)
Generalised Least Square
Take a regression
Y = X +e (56)
Assumption of homoskedasticity and no autocorrelation are violated
var (
i
) ,=
2
for \ i (57)
covar (
i
j
) ,= 0 (58)
The variance covariance of error is given by
= E
_
ee
/
_
=
_
2
1

12
..
1n
21

2
2
..
2n
: : : :
n1

n2
..
2
n
_
_
(59)
Q
/
Q = (60)
= QQ
/
= Q
1
2
1
2
Q
/
(61)
P = Q
1
2
(62)
P
/
P = I ; P
/
P =
1
(63)
Transform the model
PY = PX +Pe (64)
Y
+
= X
+
+e
+
(65)
Y
+
= PY X
+
= PX and e
+
= Pe
GLS
= (X
/
P
/
PX)
1
(X
/
P
/
PY)
GLS
=
_
X
/
1
X
_
1
_
X
/
1
Y
_
(66)
Regression Model
Consider a linear regression model:
Y
i
=
1
+
2
X
i
+
i
i = 1...N (67)
Errors represent all missing elements from this relationship; plus and
minuses cancel out. Mean of error is zero; E (
i
) = 0.
i
~ N
_
0,
2
_
(68)
Normal equations of above regression:
Y
i
=

1
N +
X
i
(69)
Y
i
X
i
=

X
i
+
X
2
i
(70)
Ordinary Least Square (OLS): Assumptions
List the OLS assumptions on error terms e
i
.
Normality of Errors:
E (
i
) = 0 (71)
Homoskedasticity:
var (
i
) =
2
for \ i (72)
No autocorrelation:
covar (
i
j
) = 0 (73)
Indendence of errors from depenent variables:
covar (
i
X
i
) = 0 (74)
Derivation of normal equations for the OLS estimators
Choose

1
and

2
to minimise sum of square errors:
Min S
2
=
2
i
=
_
Y
i

2
X
i
_
2
(75)
First order conditions:
S
1
= 0;
S
2
= 0; (76)
_
Y
i

2
X
i
_
(1) = 0 (77)
_
Y
i

2
X
i
_
(X
i
) = 0 (78)
Y
i
=

1
N +
X
i
(79)
Y
i
X
i
=

X
i
+
X
2
i
(80)
There are two unknows

1,

2
and two equations. One way to nd

1 ,

2
is to use subtitution and reduced form method.
Slope estimator by the reduced form equation method
Multiply the second equation by N and rst by

X
i
X
i
Y
i
=

1
N
X
i
+
2
_
X
i
_
2
(81)
N
Y
i
X
i
=

1
N
X
i
+
2
N
X
2
i
(82)
By subtraction this reduces to
X
i
Y
i
N
Y
i
X
i
=

2
_
X
i
_
2
X
2
i
(83)
2
=

X
i
Y
i
N
Y
i
X
i
(
X
i
)
2
N
X
2
i
=

x
i
y
i
x
2
i
(84)
This is the OLS Estimator of

2
, the slope parameter.
Intercept estimator by the reduced form equation method
When

2
is known it is easy to nd

1
by averaging out the regression
Y
i
=
1
+
2
X
i
+
i
as:
1
= Y
2
X (85)
Proof:
X
i
Y
i
N
Y
i
X
i
(
X
i
)
2
N
X
2
i
=

x
i
y
i
x
2
i
;
LHS =

X
i
Y
i
N
Y
i
X
i
(
X
i
)
2
N
X
2
i
=
NXNY N
Y
i
X
i
_
NX
_
2
N
X
2
i
=
NXNY N
Y
i
X
i
_
NX
_
2
N
X
2
i
=
NXY
Y
i
X
i
NX
2
X
2
i
=

Y
i
X
i
NXY
X
2
i
NX
2
=
_
Y
i
- Y
_ _
X
i
- X
_
_
X
i
- X
_
2
=

x
i
y
i
x
2
i
= RHS (86)
Normal equations in matrix form and OLS Estimators
Y = XB +e (87)
Y
i
=

1
N +
X
i
(88)
Y
i
X
i
=

X
i
+
X
2
i
(89)
_

Y
i
Y
i
X
i
_
=
_
N

X
i
X
i

X
2
i
_
_

2
_
(90)
OLS Estimators:
_

2
_
=
_
N

X
i
X
i

X
2
i
_
1
_

Y
i
Y
i
X
i
_
;

=
_
X
/
X
_
1
X
/
Y (91)
Analysis of Variance
var ( Y
i
) =

_
Y
i
Y
i
2
=
Y
i
Y
i
+ e
i
_
2
=

_
Y
i
Y
i
_
2
+
e
2
i
+ 2
Y
i
Y
i
_
e
i
_
Y
i
Y
i
_
2
=

_
Y
i
Y
i
_
2
+
e
2
i
(92)
TSS = RSS +ESS (93)
For N observations and K explanatory variables
[Total variation] = [Explained variation] + [Residual variation]
df = N-1 K-1 N-K
Relation between Rsquare and Rbarsquare
Prove that two forms R
2
= 1 (1 R
2
)
N1
NK
or R
2
= R
2 N1
NK

K1
NK
are
equivalent
Proof
LHS = R
2
= 1 (1 R
2
)
N 1
N K
= R
2
+
_
1 R
2
_
_
1 R
2
_
N 1
N K
= R
2
_
1 R
2
_
_
N 1
N K
1
_
= R
2
+
_
1 R
2
_
_
N 1 N +K
N K
_
= R
2
_
1 R
2
_
_
K 1
N K
_
= R
2
+R
2
K 1
N K

K 1
N K
= R
2
_
1 +
K 1
N K
_
K 1
N K
= R
2
_
N K +K 1
N K
_
K 1
N K
= R
2
_
N 1
N K
_
K 1
N K
RHS; QED (94)
Linearity of slope and intercept parameters (pages 42-46)
Consider a linear regression
Y
i
=
1
+
2
X
i
+
i
i = 1 ...N (95)
i
~N
_
0,
2
_
(96)
Intercept and slopes are linear on dependent varibales
2
=

x
i
y
i
x
2
i
=
w
i
y
i
(97)
Where w
i
=
x
i
x
2
i
is a constant.
1
= Y
2
X =

y
i
N
X
w
i
y
i
(98)
Thus .
2
and

1
are linear on y
i
Unbiasedness of Intercept Parameter
1
= Y
2
X =

y
i
N
X
w
i
y
i
; w
i
=
x
i
x
2
i
(99)
E
_
1
_
= E
_
(
1
+
2
X
i
+
i
)
N
_
E
_
2
X
_
(100)
E
_
1
_
= E
_
N
1
N
+

2
X
i
N
+
i
N
_
E
_
2
X
_
(101)
E
_
1
_
=
1
+
2
X E
_
2
X
_
(102)
_
E
_
1
_

1
_
= X
_
2
E
_
2
__
(103)
E
_
1
_
=
1
(104)
Unbiasedness of Slope Parameter
2
=

x
i
y
i
x
2
i
=
w
i
y
i
; w
i
=
x
i
x
2
i
(105)
E
_
2
_
= E
_
w
i
y
i
_
= E
w
i
(
1
+
2
X
i
+
i
) (106)
E
_
2
_
=
1
E
_
w
i
_
+
2
E
_
w
i
x
i
_
+E
_
w
i
i
_
(107)
E
_
2
_
=
2
(108)
Minimum Variance of Slope Parameter
E
_
2
_
=
w
i
y
i
(109)
Var
_
2
_
= var
_

_
X
i
X
_
_
X
i
X
_
2
_
var (y
i
) =
1
x
2
i

2
(110)
Take b
2
any other linear and unbiased estimator. Then need to prove that
var (b
2
) > var (
2
)
E (b
2
) =
k
i
y
i
k
i
= w
i
+c
i
(111)
E (b
2
) = E
_
k
i
(
1
+
2
X
i
+
i
)
= (112)
E
_
w
i
(
1
+
2
X
i
+
i
) +
c
i
(
1
+
2
X
i
+
i
)
(113)
E (b
2
) = E
_
w
i
+
2
w
i
x
i
+
w
i
i
+
1
c
i
+
2
c
i
x
i
+
c
i
i

(114)
E (b
2
) =
2
(115)
Minimum Variance of Slope Parameter (cont.)
E (b
2
) =
2
(116)
var (b
2
) = E [b
2

2
]
2
= E
_
k
i
2
= E
_
(w
i
+c
i
)
i
2
(117)
var (b
2
) =
1
x
2
i

2
+
2
c
2
i
= var (
2
) +
2
c
2
i
(118)
var (b
2
) > var (
2
) (119)
QED. Thus the OLS slope parameter is the best, linear and
unbiased (BLUE).
Similar proof can be applied for Var
_
1
_
.
Consistency of OLS Estimator: Large Sample or
Assymptotic Property
Var
_
2
_
=
1
x
2
i

2
(120)
Var
_
2
_
lim N
=

2
N
x
2
i
N
= 0
lim N
(121)
Covariance between Slope and Intercept Parameters
cov
_
1
,
2
_
= E
_
1
E
_
1
__ _
2
E
_
2
__
= E
_
1

1
_ _
2

2
_
= XE
_
2

2
_
2
*
_
E
_
1
_

1
_
= X
_
2
E
_
2
__
(122)
= X
1
x
2
i

2
(123)
.
Regression in matrix
Let Y is N 1 vector of dependent variables, X is N K matrix of
explanatory variables, e is N 1 vector of independently and identically
distributed normal random variable with mean equal to zero and a constant
variance e~N(0,
2
I ); is a K 1 vector of unknown coecients
Y = X +e (124)
Objective is to minimise sum square errors
Min
S () = e
/
e = (Y X)
/
(Y X)
= Y
/
Y Y
/
(X) (X)
/
Y + (X)
/
(X) (125)
= Y
/
Y 2X
/
X + (X)
/
(X) (126)
First order condition in Matrix Method
S ()
= 2X
/
Y + 2
X
/
X = 0 (127)
==

=
_
X
/
X
_
1
X
/
Y (128)
Blue Property in Matrix: Linearity and Unbiasedness
=
_
X
/
X
_
1
X
/
Y (129)
= aY; a =
_
X
/
X
_
1
X
/
(130)
Linearity proved.
E
_
_
= E
_
_
X
/
X
_
1
X
/
(X +e)
_
(131)
E
_
_
= E
_
_
X
/
X
_
1
X
/
X
_
+E
_
_
X
/
X
_
1
X
/
e
_
(132)
E
_
_
= +E
_
_
X
/
X
_
1
X
/
e
_
(133)
E
_
_
= (134)
Unbiasedness is proved.
Blue Property in Matrix: Minimum Variance
E
_
_
= E
_
_
X
/
X
_
1
X
/
e
_
(135)
E
_
E
_
_

_
2
= E
_
_
X
/
X
_
1
X
/
e
_
/
_
_
X
/
X
_
1
X
/
e
_
(136)
=
_
X
/
X
_
1
X
/
XE
_
e
/
e
_ _
X
/
X
_
1
=
2
_
X
/
X
_
1
(137)
Take an alternative estimator b
b =
_
_
X
/
X
_
1
X
/
+c
_
Y (138)
b =
_
_
X
/
X
_
1
X
/
+c
_
(X +e) (139)
b = E
_
_
X
/
X
_
1
X
/
e +ce
_
(140)
Now it need to be shown that
cov (b) > cov
_
_
(141)
b = E
_
_
X
/
X
_
1
X
/
e +ce
_
(142)
cov (b) = E
_
(b ) (b )
/
= E
_
_
X
/
X
_
1
X
/
e +ce
_ _
_
X
/
X
_
1
X
/
e +ce
_
=
2
_
X
/
X
_
1
+
2
c
2
(143)
cov (b) > cov
_
_
(144)
Proved.
Thus the OLS is BLUE =Best, Linear, Unbiased Estimator.
What is the statistical inference?
Inference is statement about population based on sample information.
Economic theory provides these relations. Statistical inference
empirically tests their validity based on cross section or time series or
panel data.
Hypotheses are set up according to the economic theory, estimates of
parameters are estimated using OLS (similar other)
estimators.Consider a linear regression
Y
i
=
1
+
2
X
i
+
i
i = 1 ...N (145)
True values of
1
and
2
are unknown; their values can be estimated
using the OLS technique.

1
and

2
are such estimators of
1
and
2
. Validity of these estimators/estimates are tested using statistical
distributions. Two most important tests are
1
Siginicance of an individual coecient: t-test
2
Overall signicance of the model: F -test
Overall t of the data to the model is indicated by R
2
. (
2
,
Durbin-Watson, Unit root tests to follow).
Standard hypothesis about individual coeceints (t-test)
Null hypothesis: value of intercept and slope coecients are zero.
H
0
:
1
= 0
H
0
:
2
= 0
Alternative hypotheses: Intercept and slope coecients are non -zero.
H
A
:
1
,= 0
H
A
:
2
,= 0
Parameter
2
is slope,
Y
X
; it measures how much Y will change when X
changes by one unit. Parameter
1
is intercept. It shows amount of Y
when X is zero.
Economic theory: a noraml demand function should have
1
> 0 and
2
< 0; a normal supply function should have
1
,= 0
2
> 0. This is the
hypothesis to be tested empirically.
Standard hypothesis about the validity of the model
(F-test)
Null hypothesis: both intercept and slope coecients are zero; model is
meaningless and irrelevant:
H
0
:
1
=
2
= 0
Alternative hypotheses: at least one of the parameters is non -zero, model
is relevant:
H
A
: either
1
,= 0 or
2
,= 0 or both
1
,= 0,
2
,= 0
As is often seen, some of the coecients in a regression may be
insignicant but F-statistics is signicant and model is valid.
An Example of regression on deviations from the mean
Easy to work with a simple example
Table: Data Table:Price and Quantity
X 1 2 3 4 5 6
Y 6 3 4 3 2 1
What are the estimates of

1
and

2
?
Here

X
i
= 21 ;

Y
i
= 19 ;

Y
i
X
i
=52

X
2
i
=91

Y
2
i
=75
Y = 3.17 X = 3.5
Ols stimators
2
=

y
i
x
i
x
2
i
;

1
= Y
2
X (146)
Normal Equations and Deviation Form
Normal equations of above regression
Y
i
=

1
N +
X
i
(147)
Y
i
X
i
=

X
i
+
X
2
i
(148)
Dene deviations as
x
i
=
_
X
i
X
_
(149)
y
i
= (Y
i
y) (150)
_
X
i
X
_
= 0;
(Y
i
y) = 0 (151)
Normal Equations and Deviation Form
Putting these in the Normal equations
(Y
i
y) =

1
N +
_
X
i
X
_
(152)
_
X
i
X
_
(Y
i
y) =

_
X
i
X
_
+
_
X
i
X
_
2
(153)
Terms

_
X
i
X
_
= 0;
(Y
i
y) = 0 drop out
_
X
i
X
_
(Y
i
y) =
x
i
y
i
and

_
X
i
X
_
2
=
x
2
i
This is a regression through origin. Therefore estimator of slope
ceoceint with deviation
2
=

x
i
y
i
x
2
i
(154)
1
= Y
2
X (155)
The reliability of .
2
and

1
depends on their variances; t-test is used
to determine their signicance.
Deviations from the mean
Useful short-cuts ( though matrix method is more accurate,
sometimes quick short cuts like this can be handy)
x
2
i
=
_
X
i
X
_
2
=
X
2
i
NX
2
= 91 6(3.5)
2
= 17.5 (156)
y
2
i
=
_
Y
i
Y
_
2
=
Y
2
i
NY
2
= 91 6(3.17)
2
= 14.7 (157)
y
i
x
i
=

_
Y
i
Y
_
_
X
i
X
_
=

Y
i
X
i
Y
X
i
X
Y
i
+NYX =
Y
i
X
i
YNX XNY +NYX
=

Y
i
X
i
YNX = 52 (3.5) (6) (3.17) = 14.57(158)
OLS estimates by the deviation method
Estimate of the slope coecient:
2
=

y
i
x
i
x
2
i
=
14.57
17.5
= 0.833 (159)
This is negative as expected.
Estimate of the intercept coecient.
1
= Y
2
X = 3.17 (0.833) (3.5) = 6.09 (160)
It is positive as expected.
Thus the regression line tted from the data:
Y
i
=

1
+
2
X
i
= 6.09 0.833X
i
(161)
How reliable is this line? Answer to this should be based on the analysis of
variance and statistical tests.
Variation of Y, Predicted Y and error
Total variation to be explained:
y
2
i
=
_
Y
i
Y
_
2
=
Y
2
i
NY
2
= 75 6(3.17)
2
= 14.707 (162)
Variation explained by regression:
y
2
i
=

(
2
x
i
)
2
=

2
2
x
i
2
=
_
y
i
x
i
x
2
i
_
2
x
i
2
=
(
y
i
x
i
)
x
2
i
2
=
(14.57)
2
17.5
=
212.28
17.5
= 12.143 (163)
Note that in deviation form:
y
i
=
2
x
i
.
Unexplained variation (accounted by various errors):
e
2
i
=
y
2
i

y
2
i
= 14.707 12.143 = 2.564 (164)
Measure of Fit: Rsquare and Rbar-square
The measure of t R
2
is ratio of total variation explained by regression
_
y
2
i
_
to total variation that need to be explained
_
y
2
i
_
R
2
=

y
2
i
y
2
i
=
12.143
14.707
= 0.826 (165)
This regression model explains about 83 percent of variation in y.
R
2
= 1 (1 R
2
)
N 1
N K
= 1 (1 0.826)
5
4
= 0.78 (166)
Variance of error indicates the unexplained variation
var (e
i
) =
2
=

e
2
i
N K
=
2.564
4
= 0.641 (167)
var (y
i
) =

y
2
i
N 1
=
14.7
5
= 2.94 (168)
Variance of Parameters
Reliability of estimated parameters depends on their variances, standard
errors and t-values
var
_
2
_
=
1
x
i
2

2
=
0.641
17.5
= 0.037 (169)
var
_
1
_
=
_
1
N
+
X
2
x
i
2
_

2
=
_
1
6
+
3.5
2
17.5
_
0.641 = (0.867) 0.641 = 0.556
(170)
Prove these formula (see later on).
Standard errors
SE
_
2
_
=
_
var
_
2
_
=
_
0.037 = 0.192 (171)
SE
_
1
_
=
_
var
_
1
_
=
_
0.556 = 0.746 (172)
T-test
Theoretical value of T distribution is derived by dividing mean by standard
error. Mean is a normally distributed variable and the standard error
2
distribution. Originally t-distribution was established by W.S. Gossett of
Guiness Brewery in 1919.
One- and Two-Tailed Tests
If the area in only one tail of a curve is used in testing a statistical
hypothesis, the test is called a one-tailed test; if the area of both tails are
used, the test is called two-tailed.
The decision as to whether a one-tailed or a two-tailed test is to be used
depends on the alternative hypothesis.
One-tailed test
X _ z
Two-tailed test
z _ X _ z
Test of signicance of parameters (t-test)
t
_
2
_
=

2

2
SE
_
2
_
=

2
0
SE
_
2
_
=
0.833
0.192
= 4.339 (173)
t
_
1
_
=

1
SE
_
1
_
=
6.09
0.746
= 8.16 (174)
These calculated t-values need to be compared to t-values from the
theoretical table t-table.
Decision rule: (one tail test following economic theory)
Accept H
0
:
1
> 0 if t
_
1
_
< t
,df
reject H
0
:
1
> 0 or accept
H
A
:
1
0 if t
_
1
_
> t
,df
Accept H
0
:
2
< 0 if t
_
2
_
< t
,df
reject H
0
:
2
< 0 or accept
H
A
:
2
0 if t
_
2
_
> t
,df
P-value: Probability of test statistics exceeding table value.
Test of signicance of parameters (t-test)
Theoretical values of t are given in a t Table. Column of t-table have level
of signicance () and rows have degrees of freedom.
Here t
,df
is t-table value for degrees of freedom (df = n k) and level
of signicance. df = 6 2 = 4.
Table: Relevant t-values (one tail) fron t-Table
(n, ) 0.05 0.025 0.005
1 6.314 12.706 63.657
2 2.920 4.303 9.925
4 2.132 2.776 4.604
t
_
1
_
= 8.16 > t
,df
= t
0.05,4
= 2.132. Thus the intercept is statistically
signicant; t
_
2
_
= [4.339[ > t
,df
= t
0.05,4
= 2.132. thus the slope is
also statistically signicant at 5% and 2.5% level of signicance.
Condence interval on the slope parameter
A researcher may be interested more in knowing the interval in which the
true parameter may lie than in the point estimte where is the level of
signicance or the probability of error such as 1% or 5%. That means
accuracy of the estimate is (1 ) %.
A 95% level condence interval for
1
and
2
is:
P
_
2
SE
_
2
_
t
,n
<
2
<
2
+SE
_
2
_
t
,n
_
= (1 ) (175)
P [0.833 0.192 (2.132.) <
2
< 0.833 + 0.192 (2.132.)]
= (1 0.05) = 0.95 (176)
P [1.242 <
2
< 0.424] = 0.95 (177)
There is 95 condence that the true value of slope
2
lies between 0.424
and 1.242.
Condence interval on the intercept parameter
95 % condence interval on the slope parameter:
P
_
1
SE
_
2
_
t
,n
<
1
<
1
+SE
_
2
_
t
,n
_
= (1 ) (178)
P [6.09 0.746 (2.132.) <
1
< 6.09 + 0.746 (2.132.)]
= (1 0.05) = 0.95 (179)
P [4.500 <
2
< 7.680] = 0.95 (180)
There is 95 condence that the true value of intercept
1
lies between
4.500 and 7.680.
F-Test
F-value is the ratio of sum of squared normally distributed variables (
2
)
adjusted for relevant degrees of freedom.
F =
V
1
/n
1
V
2
/n
2
= F (n
1
, n
2
) (181)
Where V
1
and V
2
are variances of numberator and denomenator and
n
1
and n
2
are degrees of freedom of numberator and denomenator.
H
0
: Variance are the same; H
A
: Variance are dierent. F
crit
values are
obtained from F-distribution table. Accept H
0
if F
Calc
< F
crit
and reject if
F
Calc
> F
crit
.
F- is ratio of two
2
distributed variables with degrees of freedom n2 and
n1.
F
calc
=

y
2
i
K1
e
2
i
NK
=
12.143
1
2.564
4
=
12.143
0.641
= 18.94; n1 = K 1 and n2 = N K
(182)
Table: Relevant F-values from the F-Table
1% level of signicance 5% level of signicance
(n2, n1) 1 2 3 1 2 3
1 4042 4999.5 5403 161.4 199.5 215.7
2 98.50 99.00 99.17 18.51 19.00 19.16
4 21.20 18.00 16.69 7.71 6.94 6.59
n1 = degrees of freedom of numerator; n2 =degrees of freedom of
denominator; for 5% level of signicance F
n1,n2
= F
1,4
= 7.71;
F
calc
> F
1,4
;for 1% level of signicance F
n1,n2
= F
1,4
= 21.20; F
calc
> F
1,4
==imply that this model is statistically signicant at 5% but not at 1%
level of signicance. Model is meaningful only at 5% level of signicance.
Prediction and error of prediction
What is the prediction of Y when X is 0.5?
Y
i
=

1
+
2
X
i
= 6.09 0.833 (0.5) = 5.673 (183)
Prediction error
f = Y
0

Y
0
=
1
+
2
X
i
+
0

2
X
i
(184)
Mean of prediction error
E (f ) = E
_
1
+
2
X
i
+
0

2
X
i
_
= 0 (185)
Predictor is ubiased.
t-test for variance of forecast
t
f
=
Y
0

Y
0
SE (f )
~t
N2
(186)
Standard error of forecast. Find var (f ) .
SE (f ) =
_
var (f ) (187)
Condence interval of forecast
Pr
_
t
c
_
Y
0

Y
0
SE (f )
_ t
c
_
= (1 ) (188)
Pr
_
Y
0
t
c
SE (f ) _ Y
0
_

Y
0
+t
c
SE (f )
_
= (1 ) (189)
Variance of Y and error
E (
i
)
2
=

e
2
i
N k
=
2
(190)
where N is is number of observations and k is the number of parameters
including intercept.
var (Y
i
) = E
_
Y
i
Y
_
2
= E
_
1
+
2
X
i
+
i

2
X
_
2
=
_
1
+
2
E (X
i
) +E (
i
) E
_
1
_
E
_
2
_
X
_
2
=
_
1
+
2
X +E (
i
)
1

2
X

2
=
_
1
+
2
X +E (
i
)
1

2
X

2
= [E (
i
) ]
2
=
2
(191)
Variance of Slope Parameter
2
=

x
i
y
i
x
2
i
(192)
E
_
2
_
=
w
i
y
i
(193)
where
w
i
=
x
i
x
2
i
=
_
X
i
X
_
_
X
i
X
_
2
(194)
Var
_
2
_
= var
_

_
X
i
X
_
_
X
i
X
_
2
_
var (y
i
) =
1
x
2
i

2
(195)
Variance of Intercept Parameter
1
= Y
2
X (196)
var
_
1
_
= var
_
Y
2
X
_
= E
_
y
i
N
X
x
i
y
i
x
2
i
_
2
= E
_
1
N
X
x
i
x
2
i
_
2
E
_
y
i
2
=
_
N
N
2
+X
2
w
2
i
2
1
N
X
w
i
_

2
(197)
=
_
1
N
+
X
2
x
2
i
_

2
(198)
2
1
N
X
w
i
= 0 because

w
i
= 0
/
Covariance of Parameters (with Matrix)
b =
_

2
_
=
_
X
/
X
_
1
X
/
Y =
_
X
/
X
_
1
X
/
(X +e)
= +
_
X
/
X
_
1
X
/
e (199)
b =
_
X
/
X
_
1
X
/
e (200)
cov (b ) = E
_
_
X
/
X
_
1
X
/
ee
/
X
_
X
/
X
_
1
_
=
_
X
/
X
_
1
2
(201)
_
X
/
X
_
1
=
_
N

X
i
X
i

X
2
i
_
1
(202)
cov
_
_
=
_
X
/
X
_
1

2
=
1
N
X
2
i
(
X
i
)
2
_

X
2
i

X
i
X
i
N
_
(203)
_
X
/
X
_
1
=
_
N

X
i
X
i

X
2
i
_
1
(204)
cov (b ) =
_
var (b
1
) var (b
1
, b
2
)
var (b
1
, b
2
) var (b
2
)
_
=
_
_
(
X
i
)
2
N
(
X
i
X
)
2
X
(
X
i
X
)
2
X
(
X
i
X
)
2
1
(
X
i
X
)
2
_
_
(205)
Variance of Prediction error
var
_
1
_
=
_
1 +
1
N
+
(x
0
x)
2
(x
0
x)
2
i
_

2
(206)
Proof
Y
0
=

Y
0
+
0
(207)
var (Y
0
) = var
_
Y
0
_
+var (
0
) (208)
var
_
Y
0
_
= var
_
1
+
2
X
0
_
= var
_
1
_
+X
2
0
var
_
2
_
+2X
0
covar
_
2
_
(209)
Variance of Prediction
var
_
Y
0
_
=

_
X
i
X
_
2
N
_
X
i
X
_
2

2
+X
2
0

_
X
i
X
_
_
X
i
X
_
2

2
+2X
0
_
X
1
_
X
i
X
_
2
_

2
(210)
add and subtract
N
(
X
i
X
)
2
N
(
X
i
X
)
2

2
var
_
Y
0
_
=

_
X
i
X
_
2
N
_
X
i
X
_
2

2
_
X
i
X
_
2
N
_
X
i
X
_
2

2
+X
2
0

_
X
i
X
_
_
X
i
X
_
2

2
+2X
0
_
X
1
_
X
i
X
_
2
_

2
+
N
_
X
i
X
_
2
N
_
X
i
X
_
2

2
(211)
Variance of forecast
Taking common elements out
var
_
Y
0
_
=
2
_
_
(
X
i
X
)
2
N
(
X
i
X
)
2
N
(
X
i
X
)
2
+
X
2
0
2X
0
X+
(
X
i
X
)
2
(
X
i
X
)
2
_
_
(212)
var
_
Y
0
_
=
2
_

_
X
i
X
_
2
N
_
X
i
X
_
2
+
_
X
0
X
_
2
_
X
i
X
_
2
_
(213)
Variance of forecast
var
_
Y
0
_
=
2
_
1
N
+
_
X
0
X
_
2
_
X
i
X
_
2
_
(214)
var (f ) = var
_
Y
0
_
+var (
0
) (215)
var (f ) =
2
_
1
N
+
_
X
0
X
_
2
_
X
i
X
_
2
_
+
2
(216)
var (f ) =
2
_
1 +
1
N
+
_
X
0
X
_
2
_
X
i
X
_
2
_
(217)
Estimation and Inference
type I and type II errors
Elaborate on the following with relevant diagrams
True False
Accept Correct Type II error
Reject Type I error Correct
Distributions: Normal, t, F and chi_square
Normal Distribution
f (x) =
1
_
2
exp
_
1
2
(x )
2
2
_
(218)
Lognormal
f (x) =
1
_
2
exp
_
1
2
(ln x )
2
2
_
(219)
Standard normal:
e ~ N (0, 1) (220)
Any distribution can be converted to the standard normal distribution by
normalization.
Distributions: Normal, t, F and chi_square
Chi-square: Sum of the Square of a normal distribution
Z =
k
i =1
Z
2
i
(221)
with k degrees of freedom.
t Distribution: ratio of normal to chi-square
t =
Z
1
_
Z
1
/k
(222)
F - distribution: ratios of two chi-square distribution with df k
1
and k
2
F =
_
Z
1
/k
1
_
Z
1
/k
2
(223)
Large Sample Theory
Probability limit
p lim() = (224)
Central limit theorem
t Distribution: ratio of normal to chi-square
Y
/
_
T
= N (0, 1) (225)
Convergence in limit
lim
t
p
6 = 1 == p lim
_
_
= (226)
t-distribution more accurate for nite samples but the normal distribution
asymptotically approximates any other distribution according to the central
limit theorem.
Large Sample Theory
Probability limit of sum of two numbers is sum of probability limits
Probability limit of product of two numbers is product of probability
limits
Probability limit of a function is the function of the probability limit
(Slutskey theorem)
Multiple Regression Model in Matrix
Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+ .... +
k
X
k,i
+
i
i = 1 ...N
(227)
and assumptions
E (
i
) = 0 (228)
E (
i
x
j ,i
) = 0; var (
i
) =
2
for \ i ;
i
~N
_
0,
2
_
(229)
covar (
i
j
) = 0 (230)
Explanatory variables are uncorrelated.
E (X
1,i
X
1,j
) = 0 (231)
Object is to choose parameters that minimise the sum of squared
errors
Min S
2
...
k
=
2
i
=
_
Y
i

1
X
1,i

2
X
2,i
...
k
X
k,i
_
2
(232)
Derivation of Normal Equations
;
S
0
= 0;
S
1
= 0;
S
2
= 0;
S
3
= 0; ......
S
k
= 0 (233)
Normal equations for two explanatory variable case
Y
i
=

0
N +
X
1,i
+
X
2,i
(234)
X
1,i
Y
i
=

X
1,i
+
X
2
1,i
+
X
1,i
X
2,i
(235)
X
2,i
Y
i
=

X
2,i
+
X
1,i
X
2,i
+
X
2
2,i
(236)
_
_

Y
i
X
1,i
Y
i
X
2,i
Y
i
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
_
2
_
_
(237)
Matrix must be non-singular
_
X
/
X
_
1
,= 0 (238)
Normal equations in matrix form
_
2
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
1
_
_

Y
i
Y
i
X
1,i
Y
i
X
2,i
_
_
(239)
=
_
X
/
X
_
1
X
/
Y (240)
0
=
Y
i

X
1,i

X
2,i
Y
i
X
1,i

X
2
1,i

X
1,i
X
2,i
Y
i
X
2,i

X
1,i
X
2,i

X
2
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(241)
Use Cramer Rule to solve for paramers
1
=
N

Y
i

X
2,i
X
1,i

Y
i
X
1,i

X
1,i
X
2,i
X
2,i

Y
i
X
2,i

X
2
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(242)
2
=
N

X
1,i

Y
i
X
1,i

X
2
1,i

Y
i
X
1,i
X
2,i

X
1,i
X
2,i

Y
i
X
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(243)
Covariance of Parameters
cov
_
_
=
_
X
/
X
_
1

2
(244)
cov
_
_
=
_
_
_
var (
1
) cov(
2
) cov(
3
)
cov(
2
) var (
2
) cov(
3
)
cov(
3
) cov(
3
) var (
3
)
_
_
_
(245)
cov
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
1

2
(246)
Determinant and cofactor matrix required for inverse
[X
/
X[ =
N
X
2
1,i
X
2
2,i
+
X
1,i
X
1,i
X
2,i
X
2,i
+
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2,i
X
2
1,i
N
X
1,i
X
2,i
X
1,i
X
2,i

X
2
2,i
X
1,i
X
1,i
Adj (X/X) = C
/
C =
_
X
2
1,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2
2,i
X
1,i
X
1,i
X
2,i
X
2,i
X
2
2,i
X
1,i
X
2
1,i
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2
2,i
N X
2,i
X
2,i
X
2
2,i
N X
1,i
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2
1,i
X
1,i
X
2,i
N X
2,i
X
1,i
X
1,i
X
2,i
N X
1,i
X
1,i
X
2
1,i
_
(247)
Variance of parameters
var
_
0
_
=

X
2
1,i

X
1,i
X
2,i
X
1,i
X
2,i

X
2
2,i
[X
/
X[

2
(248)
var
_
1
_
=
N

X
2,i
X
2,i

X
2
2,i
[X
/
X[

2
(249)
var
_
2
_
=
N

X
1,i
X
1,i

X
2
1,i
[X
/
X[

2
(250)
Standard Error and t-values
SE
_
0
_
=
_
var
_
0
_
; t
_
0
_
=

0

0
SE
_
0
_
(251)
SE
_
1
_
=
_
var
_
1
_
; t
_
1
_
=

1

1
SE
_
1
_
(252)
SE
_
2
_
=
_
var
_
2
_
; t
_
2
_
=

2

2
SE
_
2
_
(253)
y
2
i
= Y
/
Y NY
2
(254)
y
2
=

x
1
y +
x
2
y =

/
x
/
y (255)
e
2
i
=
y
2
i

y
2
i
(256)
Rsquare and F Statistics
R
2
=

/
x
/
y
Y
/
Y
(257)
F
calc
=

y
2
i
K1
e
/
e
NK
=

y
2
i
K 1
N K
e
/
e
(258)
F
calc
=

y
2
i
y
2
i
(K 1)
(N K)
y
2
i
e
/
e
=
R
2
K 1
N K
(1 R
2
)
(259)
Numerical Example: Does level of unempolyment depend
on claimant count, strikes and work hours?
How does the level of unemployment (Y
i
)relate to the level of claimant
counts (X
1,i
), numbers of stopages(X
2,i
) because of industrial strikes and
number of work hours (X
3,i
)in UK? Data from the Labour Force Survey
for 19 years;N = 19.
Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+
i
i = 1 ...N (260)
_
_
N

X
1,i

X
2,i

X
3,i
X
1,i

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
2,i

X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
3,i

X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
=
_
_
19 29057 4109 16904.6
29057 53709128.8 6872065.8 25461639.46
4109 6872065.8 1132419 3638145
16904.6 25461639.46 3638145 15059252.96
_
_
Numerical Example: OLS Setup
_
Y
i
X
1,i
Y
i
X
2,i
Y
i
X
3,i
Y
i
_
_
=
_
_
37326
63415261.4
8476146
32958009
_
_
_
3
_
_
=
_
_
19 29057 4109 16904.6
29057 53709128.8 6872065.8 25461639.46
4109 6872065.8 1132419 3638145
16904.6 25461639.46 3638145 15059252.96
_
_
1
_
_
37326
63415261.4
8476146
32958009
_
_
Numerical Example: Estimates of parameters, their
standard errors and t-values
_
3
_
_
=
_
_
402.4319485 0.018888662 0.013959234 0.423181717
0.018888662 1.00E 06 9.85E 07 1.97E 05
0.013959234 9.85E 07 5.37E 06 1.53E 05
0.423181717 1.97E 05 1.53E 05 0.000445415
_
_
_
_
37326
63415261.4
8476146
32958009
_
_
_
3
_
_
=
_
_
5560.880967
0.98458293
0.223288328
6.820108368
_
_
;
_
_
SE
_
0
_
SE
_
1
_
SE
_
2
_
SE
_
3
_
_
_
=
_
_
871.2384101
0.04348893
0.100627026
0.916586018
_
_
;
_
_
t
_
0
_
t
_
1
_
t
_
2
_
t
_
3
_
_
_
=
_
_
6.382731641
22.63985164
2.218969753
7.440772858
_
_
t-test
Hypotheses: H
0
:
i
= 0 against H
A
:
i
,= 0
Critical values of t for 15 degrees fo freedom at 5% level of signicance = 2.13;
Each of above computed t-values are greater than table values. Therefore statistical enough evidence to reject the null
hypothess. All of four parameters are statistically signicant.
Numerical Example:Sum Square Error and Covariance of
Beta
var (e) = E (
i
)
2
=
e
2
i
N k
=
28292.59842
19 4
= 1886.173228 =
2
(261)
cov
_
_
=
_
_
402.4319485 0.018888662 0.013959234 0.423181717
0.018888662 1.00E 06 9.85E 07 1.97E 05
0.013959234 9.85E 07 5.37E 06 1.53E 05
0.423181717 1.97E 05 1.53E 05 0.000445415
_
_
(1886.173228) (262)
=
_
_
759056.3673 35.62728928 26.32953287 798.1940248
35.62728928 0.001891287 0.001858782 0.037244366
26.32953287 0.001858782 0.010125798 0.028859446
798.1940248 0.037244366 0.028859446 0.840129929
_
_
Rsquare and Adjusted Rsquare
R
2
=
4428800.138
4457092.737
= 0.99365223 (263)
R
2
= 1
_
1 R
2
_
N 1
N K
= 0.992382676 (264)
Source of Variance Sum Degrees of freedom Mean F-value
Total sum square (TSS) 4457092.737 18 247616.2632
Regression Sum Square (RSS) 4428800.138 3 1476266.713 782.6782243
Sum of square error 28292.59842 15 1886.173228
Hypothesis: H
0
:
0
=
1
=
2
=
3
= 0 or model is meaningless against H
A
:
0
,=
1
,=
2
,=
3
,= 0 or at least one
i
,= 0
model explains something.
Critical values of F for degrees of freedom of 3 and 15 at 5 percent level of signicance = 3.29.
Calculated F-statistics is much higher than critical value. Thereofre there is statistical evidence to reject the null hypothesis.
That means in general this model is statistically siginicant.
Normal equations for K variables
Y
i
=

0
N +
X
1,i
+
X
2,i
+
X
3,i
+ .. +
X
k,i
(265)
Y
i
X
1,i
=

X
1,i
+
X
2
1,i
+
X
1,i
X
2,i
+ . +
X
1,i
X
k,i
(266)
...............................................................................
Y
i
X
k,i
=

X
k,i
+
X
1,i
X
k,i
+
X
k,i
X
2,i
+ . +
X
2
k,i
(267)
Process is similar to the three variable model - except that this
general model will have more coecients to evaluate and test and
requires data on more variables.
Regression in Matrix (pages 42-46)
Let Y is N 1 vector of dependent variables X is N K matrix of
explanatory variables
Errors e is N 1 vector of independently and identically distributed
normal random variable with mean equal to zero and a constant variance
e~N(0,
2
I ); is a K 1 vector of unknown coecients
Y = X +e (268)
Objective is to minimise sum square errors
Min
S
_
_
= e
/
e =
_
Y
X
_
/
_
Y
X
_
= Y
/
Y Y
/
_
X
_
X
_
/
Y +
_
X
_
/
_
X
_
(269)
= Y
/
Y 2
X
/
Y +
_
X
_
/
_
X
_
(270)
First order condition in Matrix Method
S ()
= 2X
/
Y + 2
X
/
X = 0 (271)
==

=
_
X
/
X
_
1
X
/
Y (272)
e =
_
Y
X
_
(273)
Estimate of variance of errors

2
=

e
2
i
N k
=
e
/
e
N k
(274)
Derivation of Parameters (with Matrix Inverse)
For two variable, Y
i
=
1
+
2
X
i
+
i
i = 1 ...N, case
_
X
/
X
_
1
=
_
N

X
i
X
i

X
2
i
_
1
=
1
N
X
2
i
(
X
i
)
2
_

X
2
i

X
i
X
i
N
_
(275)
_
X
/
X
_
1
=
_
_

X
2
i
N
X
2
i
(
X
i
)
2

X
i
N
X
2
i
(
X
i
)
2

X
i
N
X
2
i
(
X
i
)
2
N
N
X
2
i
(
X
i
)
2
_
_
(276)
_

2
_
=
_
_

X
2
i
N
X
2
i
(
X
i
)
2

X
i
N
X
2
i
(
X
i
)
2

X
i
N
X
2
i
(
X
i
)
2
N
N
X
2
i
(
X
i
)
2
_
_
_

Y
i
X
/
i
Y
i
_
(277)
Derivation of Parameters (with Matrix Inverse)
_

2
_
=
_
_

X
2
i

Y
i
X
i
X
/
i
Y
i
N
X
2
i
(
X
i
)
2
N
X
/
i
Y
i
X
i
Y
i
N
X
2
i
(
X
i
)
2
_
_
=
_
_

X
i
X
/
i
Y
i
X
2
i

Y
i
N
X
2
i
(
X
i
)
2
X
i
Y
i
N
X
/
i
Y
i
N
X
2
i
(
X
i
)
2
_
_
(278)
Compares to what we had earlier:
2
=

X
i
Y
i
N
Y
i
X
i
(
X
i
)
2
N
X
2
i
=

x
i
y
i
x
2
i
(279)
cov
_
_
=
_
N

X
i
X
i

X
2
i
_
1

2
(280)
cov
_
_
=
_
X
/
X
_
1

2
=
1
N
X
2
i
(
X
i
)
2
_

X
2
i

X
i
X
i
N
_

2
(281)
Variance of Parameters (with Matrix)
Take the corresponding diagonal element for variance:
var
_
2
_
=
N
N
X
2
i
(
X
i
)
2

2
=
1
x
2
i

2
(282)
var
_
1
_
=

X
2
i
N
X
2
i
(
X
i
)
2

2
(283)
Standard errors
SE
_
2
_
=
_
var
_
2
_
; SE
_
1
_
=
_
var
_
1
_
(284)
t-values
t
_
2
_
=

2

2
SE
_
2
_
; t
_
1
_
=

1

1
SE
_
1
_
(285)
Variances
e
2
i
=
y
2
i

y
2
i
(286)
y
2
=
(x)
/
(x) ; x = X X (287)
y
2
i
=
2
x
i
)
2
=

2
2
x
i
2
(288)
R
2
=

y
2
i
y
2
i
and F
calc
=

y
2
i
K1
e
2
i
NK
; F
calc
=
R
2
K 1
N K
(1 R
2
)
(289)
Variances in multiple regression
For Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
i
Y =

Y +e =

X +e (290)
y
2
i
= Y
/
Y NY
2
(291)
Regression of two explantory variables in the deviation from the mean:
y =

1
x
1
+
1
x
2
(292)
Explained variation in multiple regression
y
2
=
_
x
1
+
x
2
_
2
=

2
1
x
2
1
+
x
1
x
2
+
x
1
x
2
+
2
2
x
2
2
=

1
_
x
2
1
+
x
1
x
2
_
+
2
_
x
1
x
2
+
x
2
1
_
=

x
1
y +
x
2
y =

/
x
/
y (293)
e
2
i
=
y
2
i

y
2
i
(294)
Explained variation in multiple regression
y
2
=
_
2
_
_
x
11
x
12
. x
1N
x
21
x
22
. x
2N
_
_
_
y
1
y
2
.
y
N
_
_
=

/
x
/
y (295)
e
/
e = Y
/
Y
/
x
/
y (296)
R-square and F-statistics in multiple regression
R
2
=

/
x
/
y
Y
/
Y
(297)
F
calc
=

y
2
i
K1
e
/
e
NK
(298)
F
calc
=
R
2
K 1
N K
(1 R
2
)
(299)
Blue Property in Matrix: Linearity and Unbiasedness
=
_
X
/
X
_
1
X
/
Y (300)
= aY; a =
_
X
/
X
_
1
X
/
(301)
Linearity proved.
E
_
_
= E
_
_
X
/
X
_
1
X
/
(X +e)
_
(302)
E
_
_
= E
_
_
X
/
X
_
1
X
/
X
_
+E
_
_
X
/
X
_
1
X
/
e
_
(303)
E
_
_
= +E
_
_
X
/
X
_
1
X
/
e
_
(304)
E
_
_
= (305)
Unbiasedness is proved.
E
_
_
= E
_
_
X
/
X
_
1
X
/
e
_
(306)
E
_
E
_
_

_
2
= E
_
_
X
/
X
_
1
X
/
e
_
/
_
_
X
/
X
_
1
X
/
e
_
(307)
=
_
X
/
X
_
1
X
/
XE
_
e
/
e
_ _
X
/
X
_
1
=
2
_
X
/
X
_
1
(308)
b =
_
_
X
/
X
_
1
X
/
+c
_
Y (309)
b =
_
_
X
/
X
_
1
X
/
+c
_
(X +e) (310)
b = E
_
_
X
/
X
_
1
X
/
e +ce
_
(311)
Now it need to be shown that
cov (b) > cov
_
_
(312)
b = E
_
_
X
/
X
_
1
X
/
e +ce
_
(313)
cov (b) = E
_
(b ) (b )
/
= E
_
_
X
/
X
_
1
X
/
e +ce
_ _
_
X
/
X
_
1
X
/
e +ce
_
=
2
_
X
/
X
_
1
+
2
c
2
(314)
cov (b) > cov
_
_
(315)
Proved.
Thus the OLS is BLUE =Best, Linear, Unbiased Estimator.
Consider a linear regression without intercept term
Y
i
=
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+
i
i = 1 ...N (316)
and assumptions
E (
i
) = 0 (317)
E (
i
x
j ,i
) = 0 (318)
var (
i
) =
2
for \ i (319)
covar (
i
j
) = 0 (320)
i
~N
_
0,
2
_
(321)
Objective is to choose parameters that minimise the sum of squared
errors
Min S
1,
,
3
=
2
i
=
_
Y
i

1
X
1,i

2
X
2,i

3
X
3,i
_
2
(322)
S
1
= 0;
S
2
= 0;
S
3
= 0; (323)
Normal equations for three explanatory variable case
X
1,i
Y
i
=

X
2
1,i
+
X
1,i
X
2,i
+
X
1,i
X
3,i
(324)
X
2,i
Y
i
=

X
1,i
X
2,i
+
X
2
2,i
+
X
2,i
X
3,i
(325)
X
3,i
Y
i
=

X
1,i
X
3,i
+
X
2,i
X
3,i
+
X
2
3,i
(326)
_
_

X
1,i
Y
i
X
2,i
Y
i
X
3,i
Y
i
_
_
=
_
_

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
_
3
_
_
(327)
_
3
_
_
=
_
_

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
1
_
_

X
1,i
Y
i
X
2,i
Y
i
X
3,i
Y
i
_
_
(328)
=
_
X
/
X
_
1
X
/
Y (329)
1
=
X
1,i
Y
i
X
2,i
Y
i
X
3,i
Y
i
X
1,i
X
2,i

X
1,i
X
3,i
X
2
2,i

X
2,i
X
3,i
X
2,i
X
3,i

X
2
3,i
X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
(330)
2
=
X
2
1,i

X
1,i
X
2,i

X
1,i
Y
i
X
1,i
X
2,i

X
2
2,i

X
2,i
Y
i
X
1,i
X
3,i

X
2,i
X
3,i

X
3,i
Y
i
X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
(331)
2
=
X
2
1,i

X
1,i
Y
i

X
1,i
X
3,i
X
1,i
X
2,i

X
2,i
Y
i

X
2,i
X
3,i
X
1,i
X
3,i

X
3,i
Y
i

X
2
3,i
X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
(332)
Covariance of Parameters
Matrix must be non-singular (X
/
X)
1
,= 0
cov
_
_
=
_
_
_
var (
1
) var (
2
) var (
3
)
var (
2
) var (
2
) var (
3
)
var (
3
) var (
3
) var (
3
)
_
_
_
(333)
cov
_
_
=
_
X
/
X
_
1
2
(334)
cov
_
_
=
_
_

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
1

2
(335)
Data (text book example)
Table: Data for a multiple regression
y 1 -1 2 0 4 2 2 0 2
x1 1 -1 1 0 1 0 0 1 0
x2 0 1 0 1 2 3 0 -1 0
x3 -1 0 0 0 0 0 1 1 1
Squares and cross products
X/X =
_
_
1 1 1 0 1 0 0 1 0
0 1 0 1 2 3 0 1 0
1 0 0 0 0 0 1 1 1
_
_
_
_
1 0 1
1 1 0
1 0 0
0 1 0
1 2 0
0 3 0
0 0 1
1 1 1
0 0 1
_
_
=
_
_
5 0 0
0 16 1
0 1 4
_
_
Sum and cross products
X/Y =
_
_
1 1 1 0 1 0 0 1 0
0 1 0 1 2 3 0 1 0
1 0 0 0 0 0 1 1 1
_
_
_
_
1
1
2
0
4
2
2
0
2
_
_
=
_
_
8
13
3
_
_
_
_

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
=
_
_
5 0 0
0 16 1
0 1 4
_
_
and
_
_

X
1,i
Y
i
X
2,i
Y
i
X
3,i
Y
i
_
_
=
_
_
8
13
3
_
_
Estimation of Parameters
_
3
_
_
=
_
_
5 0 0
0 16 1
0 1 4
_
_
1
_
_
8
13
3
_
_
(336)
_
3
_
_
=
_
_
0.2 0 0
0 0.063 0.016
0 0.016 0.254
_
_
_
_
8
13
3
_
_
=
_
_
1.6
0.873
0.968
_
_
(337)
Prediction equation
Y
i
= 1.6X
1,i
+ 0.873X
2,i
+ 0.968X
3,i
(338)
Sum Squares
y
2
i
=
Y
2
NY
2
= 34 9 (1.3333)
2
= 18.00 (339)
y =

1
x
1
+
2
x
2
+
3
x
3
(340)
y
2
=

x
1
y +
x
2
y +
x
3
y (341)
= 1.6 + 4 + 0.873 + 5 + 0.968 + 0.333 = 11.087 (342)
e
2
i
=
y
2
i

y
2
i
= 18 11.087 = 6.913 (343)
R
2
is not reliable for regression from the origin.
Estimation of Errors
e
i
= Y
i
1.6X
1,i
+ 0.873X
2,i
+ 0.968X
3,i
(344)
e
1
= 1 1.6 (1) + 0.873 (0) + 0.968 (1) = 0.368 (345)
e
2
= 1 1.6 (1) + 0.873 (1) + 0.968 (0) = 0.273 (346)
e
3
= 2 1.6 (1) + 0.873 (0) + 0.968 (0) = 0.4 (347)
e
4
= 0 1.6 (0) + 0.873 (1) + 0.968 (0) = 0.873 (348)
e
5
= 4 1.6 (1) + 0.873 (2) + 0.968 (0) = 0.654 (349)
e
6
= 2 1.6 (0) + 0.873 (3) + 0.968 (0) = 0.619 (350)
e
7
= 2 1.6 (0) + 0.873 (0) + 0.968 (1) = 1.032 (351)
e
8
= 0 1.6 (1) + 0.873 (1) + 0.968 (1) = 1.695 (352)
e
9
= 2 1.6 (0) + 0.873 (0) + 0.968 (1) = 1.032 (353)
Sum of Error square, variance and covariance of Beta
e
2
i
= 0.368
2
+ (0.273)
2
+ 0.4
2
+ (0.873)
2
+ (0.654)
2
+ (0.619)
2
+ 1.032
2
+ (1.695)
2
+ 1.032
2
= 6.9460 (354)
Variance of errors
var (e) = E (
i
)
2
=

e
2
i
N k
=
6.9460
9 3
= 1.1577 =
2
(355)
cov
_
_
=
_
_

X
2
1,i

X
1,i
X
2,i

X
1,i
X
3,i
X
1,i
X
2,i

X
2
2,i

X
2,i
X
3,i
X
1,i
X
3,i

X
2,i
X
3,i

X
2
3,i
_
_
1

2
(356)
=
_
_
0.2 0 0
0 0.063 0.016
0 0.016 0.254
_
_
(1.1577) =
_
_
0.232 0 0
0 0.074 0.018
0 0.018 0.294
_
_
var
_
1
_
= 0.232; var
_
2
_
= 0.074; var
_
1
_
= 0.294; (357)
cov
_
2
_
= cov
_
3
_
= 0; cov
_
3
_
= cov
_
2
_
= 0; (358)
SE
_
1
_
=
_
0.232 = 0.482; SE
_
2
_
=
_
0.074 = 0.272;
var
_
3
_
=
_
0.294 = 0.542; (359)
t
_
1
_
=
1.6
0.482
= 3.32; t
_
2
_
=
0.873
0.272
= 3.20; t
_
3
_
=
0.968
0.542
= 1.79;
(360)
Test of Restrictions
Hypothesis H
0
:
1
=
2
=
3
= 0 against H
A
:
1
,= 0;
2
,= 0;
or
3
,= 0
Here J = 3 is the number of restrictions
F-test
F =
(Rb r )
/
[Rcov (b) R
/
]
1
(Rb r )
J
(361)
R =
_
_
1 0 0
0 1 0
0 0 1
_
_
; b =
_
3
_
_
; r =
_
_
0
0
0
_
_
(362)
F =
_
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
3
_
_
_
0
0
0
_
_
_
_
_
/
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
0.232 0 0
0 0.074 0.018
0 0.018 0.294
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
/
_
_
1
_
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
3
_
_
_
0
0
0
_
_
_
_
_
J = 3
(363)
See matrix_restrictions.xls for calculations.
F =
_
1.6 0.873 0.968
_
_
_
4.3190 0 0
0 13.821 0.8638
0 0.8638 3.455
_
_
_
_
1.6
0.873
0.968
_
_
3
(364)
F =
_
1.6 0.873 0.968
_
_
_
6.91042
11.22943
2.59141
_
_
3
=
23.37
3
= 7.79 (365)
.F
(m1,m2),
= F
(3,6),5%
= 4.76; critical value for F at degrees of freedom of
(3,6) at 5% condence interval is 4.76.
F calculated is bigger than F critical => Reject null hypothesis, which
says .H
0
:
1
=
2
=
3
= 0
At least one of these parameters is signicant and explains variation in y,
in other words accept H
A
:
1
,= 0;
2
,= 0; or
3
,= 0
Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+ .... +
k
X
k,i
+
i
i = 1 ...N
(366)
and assumptions
E (
i
) = 0 (367)
E (
i
x
j ,i
) = 0; var (
i
) =
2
for \ i ;
i
~N
_
0,
2
_
(368)
covar (
i
j
) = 0 (369)
Explanatory variables are uncorrelated.
E (X
1,i
X
1,j
) = 0 (370)
Objective is to choose parameters that minimise the sum of squared
errors
Min S
2
...
k
=
i
=
_
Y
i

1
X
1,i

2
X
2,i
....
k
X
k,i
_
(371)
;
S
0
= 0;
S
1
= 0;
S
2
= 0;
S
3
= 0; ......
S
k
= 0 (372)
Y
i
=

0
N +
X
1,i
+
X
2,i
(373)
X
1,i
Y
i
=

X
1,i
+
X
2
1,i
+
X
1,i
X
2,i
(374)
X
2,i
Y
i
=

X
2,i
+
X
1,i
X
2,i
+
X
2
2,i
(375)
_
_

Y
i
X
1,i
Y
i
X
2,i
Y
i
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
_
2
_
_
(376)
_
2
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
1
_
_

Y
i
Y
i
X
1,i
Y
i
X
2,i
_
_
(377)
=
_
X
/
X
_
1
X
/
Y (378)
0
=
Y
i

X
1,i

X
2,i
Y
i
X
1,i

X
2
1,i

X
1,i
X
2,i
Y
i
X
2,i

X
1,i
X
2,i

X
2
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(379)
1
=
N

Y
i

X
2,i
X
1,i

Y
i
X
1,i

X
1,i
X
2,i
X
2,i

Y
i
X
2,i

X
2
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(380)
2
=
N

X
1,i

Y
i
X
1,i

X
2
1,i

Y
i
X
1,i
X
2,i

X
1,i
X
2,i

Y
i
X
2,i
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(381)
Evaluate the determinant
X
/
X
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
(382)
[X
/
X[ =
N
X
2
1,i
X
2
2,i
+
X
1,i
X
1,i
X
2,i
X
2,i
+
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2,i
X
2
1,i
N
X
1,i
X
2,i
X
1,i
X
2,i

X
2
2,i
X
1,i
X
1,i
Dederminant = (cross product from to left to right - cross product from
bottom left to right)
X
/
X
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
N

X
1,i
X
1,i

X
2
1,i
X
2,i

X
1,i
X
2,i
(383)
Multicollinearity problem: Singularity
Signicant R
2
but insignicant t-ratios. why?
In existence of exact multicollinearity X
/
X is singular, i.e. [X
/
X[ = 0
X
1,i
= X
2,i
[X
/
X[ =
N
X
2
1,i
X
2
2,i
+
X
1,i
X
1,i
X
2,i
X
2,i
+
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2,i
X
2
1,i
N
X
1,i
X
2,i
X
1,i
X
2,i

X
2
2,i
X
1,i
X
1,i
Substituting out X
1,i
[X
/
X[ =
N
2
X
2
2,i
X
2
2,i
+
2
X
2,i
X
2
2,i
X
2,i
+
2
X
2,i
X
2
2,i
X
2,i
X
2,i
X
2,i
X
2
2,i
N
2
X
2
2,i
X
2
2,i

2
X
2
2,i
X
2,i
X
2,i
= 0
X
/
X
X
2,i

X
2,i
X
2,i

2
X
2
2,i

X
2,i
X
2,i
X
2,i

X
2,i
X
2,i

X
2
2,i
= 0 (384)
Parameters are indeterminate in model with exact
multicollinearity
0
=
Y
i

X
1,i

X
2,i
Y
i
X
1,i

X
2
1,i

X
1,i
X
2,i
Y
i
X
2,i

X
1,i
X
2,i

X
2
2,i
0
= (385)
1
=
N

Y
i

X
2,i
X
1,i

Y
i
X
1,i

X
1,i
X
2,i
X
2,i

Y
i
X
2,i

X
2
2,i
0
= (386)
2
=
N

X
1,i

Y
i
X
1,i

X
2
1,i

Y
i
X
1,i
X
2,i

X
1,i
X
2,i

Y
i
X
2,i
0
= (387)
Covariance of parameters cannot be estimated in model
with exact multicollinearity
_
X
/
X
_
1
= (388)
cov
_
_
=
_
_
_
var (
1
) cov(
2
) cov(
3
)
cov(
2
) var (
2
) cov(
3
)
cov(
3
) cov(
3
) var (
3
)
_
_
_
= (389)
cov
_
_
=
_
X
/
X
_
1
2
= (390)
cov
_
_
=
_
_
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
_
_
1

2
= (391)
Numerical example of exact multicollinearity
Table: Data for a multiple regression
y 3 5 7 6 9 6 7
x1 1 2 3 4 5 6 7
x2 5 10 15 20 25 30 35
Evaluate the determinant
X
/
X
N

X
1,i

X
2,i
X
1,i

X
2
1,i

X
1,i
X
2,i
X
2,i

X
1,i
X
2,i

X
2
2,i
7 28 140
28 140 700
140 700 3500
;
(392)
_
_

Y
i
Y
i
X
1,i
Y
i
X
2,i
_
_
=
_
_
43
188
980
_
_
Numerical example of exact multicollinearity
[X
/
X[ =
N
X
2
1,i
X
2
2,i
+
X
1,i
X
1,i
X
2,i
X
2,i
+
X
2,i
X
1,i
X
2,i
X
1,i
X
2,i
X
2,i
X
2
1,i
N
X
1,i
X
2,i
X
1,i
X
2,i

X
2
2,i
X
1,i
X
1,i
= (7 140 3500 + 28 700 140 + 140 700 28
140 140 140 7 700 700 28 28 3500) = 0
You can evaluate determinants easily in excel using following steps:
1. select the cell where to put the result.and press shift and control
continously by two ngers of left hand
2. use mouse by right hand to choose math and trig function
3. choose MDETERM
4. Select matrix for which to evaluate the determinant
5. press OK and you will see the reslut.
Normal equations in deviation form
_

2
_
=
_

x
2
1,i

x
1,i
x
2,i
x
1,i
x
2,i

x
2
2,i
_
1
_

y
i
x
1,i
y
i
x
2,i
_
(393)
=
_
X
/
X
_
1
X
/
Y (394)
1
=

y
i
x
1,i

x
1,i
x
2,i
y
i
x
2,i

x
2
2,i

x
2
1,i

x
1,i
x
2,i
x
1,i
x
2,i

x
2
2,i
(395)
2
=

x
2
1,i

y
i
x
1,i
x
1,i
x
2,i

y
i
x
2,i

x
2
1,i

x
1,i
x
2,i
x
1,i
x
2,i

x
2
2,i
(396)
Variances of parameters
_

x
2
1,i

x
1,i
x
2,i
x
1,i
x
2,i

x
2
2,i
_
1
=
1
x
2
1,i
x
2
2,i
(
x
1,i
x
2,i
)
2
_

x
2
2,i

x
1,i
x
2,i
x
1,i
x
2,i

x
2
1,i
_
(397)
var
_
1
_
=

x
2
2,i
x
2
1,i
x
2
2,i
(
x
1,i
x
2,i
)
2

2
(398)
var
_
2
_
=

x
2
1,i
x
2
1,i
x
2
2,i
(
x
1,i
x
2,i
)
2

2
(399)
Variance Ination Factor in Inexact Multicollinearity
Let correlations between X
1,i
and X
2,i
be given by r
12
. Then Variance
ination factor is
1
(
1r
2
12
)
var
_
2
_
=

x
2
1,i
_
x
2
1,i
x
2
2,i
(
x
1,i
x
2,i
)
2
_
2
=
1
_
x
2
1,i
x
2
2,i
x
2
1,i

(
x
1,i
x
2,i
)
2
x
2
1,i
_
2
=
1
x
2
2,i
_
x
2
1,i
x
2
1,i

(
x
1,i
x
2,i
)
2
x
2
2,i
x
2
1,i
_
2
=
1
x
2
2,i
[1 r
2
12
]
2
=
1
(1 r
2
12
)
1
x
2
2,i

2
(400)
Solutions for Multicollinearity Problem
When Variance is high the standard errors are hish and that makes
t-statistics very small and insignicant
SE
_
2
_
=
_
var
_
2
_
; SE
_
1
_
=
_
var
_
1
_
;
t
1
=

1

1
SE
_
1
_
; t
2
=

2

2
SE
_
2
_
(401)
.since 0 < r
12
< 1 it raises the variance and hence stancard errors and
lowers t-values.
First detect the pairwise correlations between explalantory variables such
X
1,i
and X
3,i
be given by r
12
.
Drop highly correlated variables.
Adopts Kliens rule of thumb:
compare R
2
y
from overall regression to R
2
x
from auxiliary regression.
Determine multicollinearity if R
2
x
> R
2
y
. Drop highly correlated variables.
Heteroreskedastity
Y
i
=
1
+
2
X
i
+
i
i = 1 ...N (402)
and OLS assumptions
E (
i
) = 0 (403)
E (
i
x
i
) = 0 (404)
var (
i
) =
2
for \ i (405)
covar (
i
j
) = 0 (406)
Then the OLS Regression coecients are:
2
=

x
i
y
i
x
2
i
;

1
= Y
2
X (407)
Heteroskedasticity occurs when variances of errors are not constant,
var (
i
) =
2
i
variance of errors vary for each i . This is mainly a cross
section problem.
Main reasons for heteroskedasticity
Learning reduces errors;
driving practice, driving errors and accidents
typing practice and typing errors,
defects in productions and improved machines
experience in jobs reduces number of errors or wrong decisions
Improved data collection: better formulas and good software
More heteroscedasticity exists in cross section than in time series data.
Nature of Heteroskedasticity
E (
i
)
2
=
2
i
(408)
2
=

x
i
y
i
x
2
i
(409)
E
_
2
_
=
w
i
y
i
(410)
where
w
i
=
x
i
x
2
i
=
_
X
i
X
_
_
X
i
X
_
2
(411)
Var
_
2
_
= var
_

_
X
i
X
_
_
X
i
X
_
2
_
var (y
i
) =

x
2
i

2
i
[
x
2
i
]
2
(412)
OLS Estimator is still unbiased
2
=

x
i
y
i
x
2
i
=
w
i
y
i
(413)
E
_
2
_
= E
_
w
i
y
i
_
= E
w
i
(
1
+
2
X
i
+
i
) (414)
E
_
2
_
=
1
E
_
w
i
_
+
2
E
_
w
i
x
i
_
+E
_
w
i
i
_
(415)
E
_
2
_
=
2
(416)
OLS Parameter is inecient with Heteroskedasticity
E
_
2
_
=
w
i
y
i
(417)
E
_
2
_
= E
_
w
i
y
i
_
= E
w
i
(
1
+
2
X
i
+
i
) (418)
E
_
2
_
=
1
E
_
w
i
_
+
2
E
_
w
i
x
i
_
+E
_
w
i
i
_
(419)
E
_
2
_
=
2
+E
_
w
i
i
_
(420)
Var
_
2
_
= E
_
E
_
2
_

2
_
2
= E
_
w
i
i
_
2
(421)
Var
_
2
_
= E
_
w
2
i

2
i
_
+
cov (
i
j
)
2
(422)
Var
_
2
_
=

x
2
i

2
i
[
x
2
i
]
2
(423)
OLS Estimator is inconsistent assymptotically
Var
_
2
_
=

x
2
i

2
i
[
x
2
i
]
2
(424)
Var
_
2
_
lim N
=

x
2
i

2
i
[
x
2
i
]
2
=
lim N
(425)
Various tests of heteroskedasticity
Spearman Rank Test
Park Test
Goldfeld-Quandt Test
Glesjer Test
Breusch-Pagan,Godfrey test
White Test
ARCH test
(See food_hetro.xls excel spreadsheet for some exmaples on how to
compute these. Gujarati (2003) Basic Econometrics,McGraw Hill is a
good text for Heteroskedasticity; x-hetro test in PcGive).
GLS Solution of the Heteroskedasticity Problem When
Variance is Known
Y
i
i
=

1
i
+
2
X
i
i
+

i
i
i = 1 ...N (426)
Variance of error in this this tranformed equation equals 1 :
var
_
i
_
=

2
i
2
i
= 1
if
2
i
=
2
X
i
(427)
Y
i
X
i
=

1
X
i
+
2
+

i
X
i
; var
_
i
x
i
_
=

2
x
2
i
x
2
i
=
2
(428)
In matrix notation
OLS
=
_
X
/
X
_
1
_
X
/
Y
_
(429)
GLS
=
_
X
/
1
X
_
1
_
X
/
1
Y
_
(430)
1
is inverse of variance covariance matrix.
= E
_
ee
/
_
=
_
2
1

12
..
1n
21

2
2
..
2n
: : : :
n1

n2
..
2
n
_
_
(431)
P
/
P = I ; P
/
P =
1
(432)
Spearman rank test of heteroskedactity
r
s
= 1 6

i
d
2
i
n (n
2
1)
(433)
steps:
run OLS of y on x.
obtain errors e
rank e and y or x
nd the dierence of the rank
use t-statistics if ranks are signicantly dierent assuming n > 8 and
rank correlation coecient = 0.
t = 1 6
r
s
_
n 2
_
1 r
2
s
with df (n 2) (434)
If t
cal
> t
crit
there is heteroskedasticity.
Glesjer Test of heteroskedasticity
Model
Y
i
=
1
+
2
X
i
+e
i
i = 1 ...N (435)
There are a number of versions of it:
[e
i
[ =
1
+
2
X
i
+v
i
(436)
[e
i
[ =
1
+
2
_
X
i
+v
i
(437)
[e
i
[ =
1
+
2
1
X
i
+v
i
(438)
[e
i
[ =
1
+
2
1
_
X
i
+v
i
(439)
[e
i
[ =
_
1
+
2
X
i
+v
i
(440)
[e
i
[ =
_
1
+
2
X
2
i
+v
i
(441)
In each case dot-test H
0
:
i
= 0 against H
A
:
i
,= 0. If is signicant
then that is the evidence of heteroskedasticity.
White test
White test of heteroskedasticity is more general test
Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
i
i = 1 ...N
run OLS and obtain error squares e
2
i
regress e
2
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
2
1,i
+
4
X
2
2,i
+
5
X
1,i
X
2,i
+v
i
Compute test statistics n.R
2
=
2
df
If the calculated
2
df
value is greater that the
2
df
table value then,
there is evidence of heteroskedasticity.
White test of heteroskedasticity
1
This is a more general test.
2
Model Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+
i
3
Run OLS to this and get e
2
i
as:
4
e
2
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+
4
X
2
1,i
+
5
X
2
2,i
+
6
X
2
3,i
+
7
X
1,i
X
2,i
+
8
X
2,i
X
3,i
. +
i
5
Compute the test statistics
6
n.R
2
~
2
df
7
Again if the calculated
2
df
is greater than table value there is an
evidence of heteroskedasticity.
Park test of heteroskedasticity
Model
Y
i
=
1
+
2
X
i
+e
i
i = 1 ...N (442)
Error square:
2
i
=
2
X
i
e
v
i
i
(443)
Or taking log
ln
2
i
= ln
2
+
2
X
i
+v
i
(444)
steps : run the OLS regression for (Y
i
) and get the estimates of error
terms (e
i
) .
Square e
i
and then run a regression of lne
2
i
with x variable. Do t-test
H
0
:
2
= 0 against H
A
:
2
,= 0. If is signicant then that is the
evidence of heteroskedasticity.
Goldfeld-Quandt test of heteroskedasticity
Model
Y
i
=
1
+
2
X
i
+e
i
i = 1 ...N (445)
Steps:
Rank observations in ascending order of one of the x variable
Omit c numbers of central observations leaving two groups
NC
2
with
number of osbervations
Fit OLS to the rst
NC
2
and the last
NC
2
observations and nd sum
of the squared errors from both of them.
Set hypothesis
2
1
=
2
2
against
2
1
,=
2
2
.
compute =
ERSS
2
/df 2
ERSS
1
/df 1
.
It follows F distribution.
Breusch-Pagan,Godfrey test of heteroskedasticity
Y
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+ .... +
k
X
k,i
+
i
i = 1 ...N
run OLS and obtain error squares
Obtain average error square
2
=

i
e
2
i
n
and p
i
=
e
2
i

2
regress p
i
on a set of explanatory variables
p
i
=
0
+
1
X
1,i
+
2
X
2,i
+
3
X
3,i
+ .... +
k
X
k,i
+
i
obtain squares of explained sum (EXSS)
=
1
2
(EXSS)
=
1
m1
(EXSS) ~
2
m1
H
0
:
0
=
1
=
2
=
3
= .. =
k
= 0
No heteroskedasticity and
2
i
=
1
a constant. If calculated
2
m1
is
greater than table value there is an evidence of heteroskedasticity.
ARCH test of heteroskedasticity
Engle (1987) autoregressive conditional heteroskedasticy (ARCH): more
useful for time series data
Model Y
t
=
0
+
1
X
1,t
+
2
X
2,t
+
3
X
3,t
+ .... +
k
X
k,t
+e
t
t
~ N
_
0,
_
0
+
2
e
2
t1
__

2
t
=
0
+
2
e
2
t1
(446)
1
Here
2
t
not observed. Simple way is to run OLS of Y
t
and get e
2
t
2
ARCH (1)
3
e
2
t
=
0
+
2
e
2
t1
+v
t
4
ARCH (p)
5
e
2
t
=
0
+
2
e
2
t1
+
3
e
2
11
+
4
e
2
11
+ .. +
p
e
2
1p
+v
t
6
Compute the test statistics
7
n.R
2
~
2
df
8
Again if the calculated
2
df
is greater than table value there is an
evidence of ARCH eect and heteroskedasticity.
9
Both ARCH and GARCH models are estimated using iterative
Maximum Likelihood procedure.
GARCH tests of heteroskedasticity
Bollerslevs generalised autoregressive conditional heteroskedasticy
(GARCH) process is more general
1
GARCH (1)
2
t
=
0
+
2
e
2
t1
+
2
t1
+v
t
(447)
2
GARCH (p,q)
3

2
t
=
0
+
2
e
2
t1
+
3
e
2
t2
+
4
e
2
t3
+ .. +
p
e
2
tp
+
1
2
t1
+
2
t2
+ ..
q
2
tq
+ .. +v
t
4
Compute the test statistics n.R
2
~
2
df
. Sometimes written as
5
h
t
=
0
+
2
e
2
t1
+
3
e
2
t2
+
4
e
2
t3
+ .. +
p
e
2
tp
+
1
h
t1
+
2
h
t2
+ ..
q
h
tq
+ .. +v
t
6
where h
t
=
2
t
7
Various functional forms of h
t
: h
t
=
0
+
2
e
2
t1
+
1
_
h
t1
+v
i
or
h
t
=
0
+
2
e
2
t1
+
_
1
h
t1
+
2
h
t2
+v
i
8
Both ARCH and GARCH modesl are estimated using iterative
Maximum Likelihood procedure. Volatility package in PcGive
estimates ARCH-GARCH models.
Autocorrelation
Y
t
=
1
+
2
X
t
+
t
t = 1 ...T (448)
Classical assumptions
E (
t
) = 0 (449)
E (
t
x
t
) = 0 (450)
var (
t
) =
2
for \ t covar (
t
t1
) = 0 (451)
In presence of autocorrelation (rst order)
t
=
t1
+v
t
(452)
Then the OLS Regression coecients are:
2
=

x
t
y
t
x
2
t
;

1
= Y
2
X ; =

e
t
e
t1
e
2
t
(453)
Causes and cosequences of autocorrelation
Autocorrelation occurs when covariances of errors are not zero,
covar (
t
t1
) ,= 0 covariance of errors are nonnegative This is mainly a
problem observed in time series data.
Causes of autocorrelation
inertia , specication bias, cobweb phenomena
manipulation of data
Consequences of autocorrelation
Estimators are still linear and unbiased, but
they there not the best, they are inecient.
Remedial measures
When is known - transform the model
When is unknown estimate it and transform the model
Negative autocorrelation
Cyclical autocorrelation
Nature of Autocorrelation
2
=

x
t
y
t
x
2
t
(454)
E
_
2
_
=
w
t
y
t
(455)
where
E (
t
)
2
=
2
(456)
E
_
2
_
=
2
+E
_
w
t
t
_
(457)
E
_
2
_
=
1
E
_
w
t
_
+
2
E
_
w
t
x
t
_
+E
_
w
t
t
_
(458)
Var
_
2
_
= E
_
E
_
2
_

2
_
2
= E
_
w
t
t
_
2
(459)
Var
_
2
_
=
1
x
2
t

2
+ 2
(x
t
x
t1
)
[
x
2
t
]
2
cov (
t
t1
) (460)
OLS Estimator is still unbiased
t
=
t1
+v
t
(461)
2
=

x
t
y
t
x
2
t
=
w
t
y
t
(462)
E
_
2
_
= E
_
w
t
y
t
_
= E
w
t
(
1
+
2
X
t
+
t
) (463)
E
_
2
_
=
1
E
_
w
t
_
+
2
E
_
w
t
x
t
_
+E
_
w
t
t
_
(464)
E
_
2
_
=
2
(465)
OLS Parameter is inecient with Autocorrelation
E
_
2
_
=
w
t
y
t
(466)
E
_
2
_
= E
_
w
t
y
t
_
= E
w
t
(
1
+
2
X
t
+
t
) (467)
E
_
2
_
=
1
E
_
w
t
_
+
2
E
_
w
t
x
t
_
+E
_
w
t
t
_
(468)
E
_
2
_
=
2
+E
_
w
t
t
_
(469)
Var
_
2
_
= E
_
E
_
2
_

2
_
2
= E
_
w
t
t
_
2
(470)
Var
_
2
_
= E
_
w
2
t

2
t
_
+ 2
w
t
w
t1
cov (
t
t1
)
2
(471)
Variance of OLS parameter in presence of autocorrelation
Var
_
2
_
=
1
x
2
t

2
_
1 + 2
x
t
x
t1
[
x
2
t
]
cov (
t
t1
)
_
var (
t
)
_
var (
t1
)
_
* var (
t
) = var (
t1
) (472)
Var
_
2
_
=
1
x
2
t

2
_
_
1 + 2
(x
t
x)(x
t1
x)
x
2
t

1
+
+2
(x
t
x)(x
t1
x)
x
2
t

2
+ .. + 2
(x
t
x)(x
t1
x)
x
2
t

s
_
_
(473)
OLS Estimator is inconsistent assymptotically
Var
_
2
_
=
1
x
2
t

2
_
_
1 + 2
(x
t
x)(x
t1
x)
x
2
t

1
+
+2
(x
t
x)(x
t1
x)
x
2
t

2
+ .. + 2
(x
t
x)(x
t1
x)
x
2
t

s
_
_
(474)
Var
_
2
_
lim N
=
1
x
2
t

2
_
_
1 + 2
(x
t
x)(x
t1
x)
x
2
t

1
+ 2
(x
t
x)(x
t1
x)
x
2
t

2
+.. + 2
(x
t
x)(x
t1
x)
x
2
t

s
_
_
=
(475)
Durbin-Watson Distribution
Durbin-Watson test of autocorrelation
d =
T
t=1
(e
t
e
t1
)
2
T
t=1
e
2
t
(476)
d =
T
t=1
_
e
2
t
2e
t
e
t1
+e
2
t1
_
T
t=1
e
2
t
= 2 (1 ) (477)
Autocorrelation and Durbin-Watson Statistics
d = 2 (1 ) (478)
= 0 ==d = 2 (479)
= 1 ==d = 4 (480)
Autocorrelation
Estimates of
1
and
2
are given in this table. Both are statistically
signicant as the overall model is.
Table: Consumption on income and prices (double log model): Estimates of
elasticities
Coecient Standard Error t-value t-prob
Intercept 3.164 0.705 4.49 0.000
Log income 1.143 0.156 7.33 0.000
Log prices -0.829 0.036 -23.0 0.000
R
2
= 0.97 , F = 266 (0.00) , DW = 1.93 , N = 17.
2
2
= 0.355 [0.837] ; Arch F[1, 12] = 1.01 [0.33]
Caclulated value of Durbin-Watson statistics is 1.93. Theoretical
Durbin-Watson table values for N =12 are d
L
= 0.812 and d
u
= 1.579
Clearly the computed Durbin-Watson statistics is 1.93 is above these
values. There is no evidence of statistically signicant autocorrelation in
this problem.
Steps for testing Autocorrelation in PcGive
1. Run the regression using single equation dynamic model in
econometrics package.
2. look at the Durbin-Watson statistics (d=2 means no autocorrelation).
2.click test/test
3. select error autocorrelation test and choose the order of autocorrelation
3. Error autocorrelation coecients in auxiliary regression:
Lag Coecient Std.Error
1 -0.12025 0.3316
2 -0.20083 0.4231
RSS = 0.0132044 sigma = 0.00110037
Testing for error autocorrelation from lags 1 to 2
Chi^2(2) = 0.51033 [0.7748] and F-form F(2,12) = 0.18569 [0.8329]
4. read above estimates. Here the autocorrelation is not signicant.
5. Use normality of errors.
UK supply function
Estimation of UK supply function. Here Y
t
is output index or national
income X
t
is ination or price index.
Y
t
=
0
+
1
X
t
+
t
t = 1 ...T (481)
Estimates below are from quarterly data, 1960:1 to 2008:3
Table: National income on GDP Deator of UK (Spurious regression)
Intercept -26199.8 2729 -9.60 0.000
Deator 2766.23 46.91 59.0 0.000
R
2
= 0.95 , F(1,183) = 3477 [0.000]** , DW = 0.0269 , N =185
Residual function
Supply function should be positively sloped; result here show the deator
has positive expected sign and t-value is very signicant. However, this
relation is spurious because R
2
> DW . This is called a spurious regression
because variables are non-stationary. Spurious regression is meansingless.
It is evident for nonnormality of errors. When DW = 0.0269
autocorrelation is close to 1. . It can be estimated from the residual.
Table: Estimation of autocorrelation UK supply function
Residual lag1 1.005 0.01237 81.3 0.000
Intercept 237.045 263.0 0.901 0.369
R
2
= 0.97 , (1,182) = 6604 [0.000]** , DW = 2.2 , N =184
The estimated value of is close 1.005 and this is signicant.
UK supply function
One remedy solve the autocorrelation problem is to take the rst dierence
of the dependent variable.
Table: Change in income on GDP Deator of UK (Spurious regression)
Intercept 129.5 422.0 0.307 0.759
Deator 34.005 7.217 4.71 0.000
R
2
= 0.11 , F(1,181) = 22.2 [0.00]* , DW = 2.37 , N =183
Taking the rst dierence has solved the problem of autocorrelation. For N =200
upper and lower bounds of Durbin-Watson table values are d
L
= 1.758 and
d
u
= 1.778 . It indicates a negative autocorrelation but this seems to lie in the
Durbin Watson tables inconclusive region. It is dicult to be denite on the
evidence of autocorrelation as the calculated statistics falls in the inconclusive
region. Let us check this by estimating value of . This is now -0.19 and is
signicant though far away from 1.005 seen above.
Residual
Table: Estimation of autocorrelation UK supply function
Residual lag1 -0.19 0.074 -2.64 0.009
Intercept 6.74 242.1 0.03 0.978
R
2
= 0.04, F(1,180) = 6.984 [0.000]** , DW = 2.2 , N =182
UK supply function
Table: Growth rate of income on ination in UK
Intercept 0.010 0.003 3.10 0.002
Ination 0.736 0.1578 4.66 0.000
R
2
= 0.11 , F(1,183) = 21.8 [0.00]* , DW = 2.7 , N =182
Growth and GDP deator
Table: Growth rate of income on GDP Deator of UK (Spurious regression)
Intercept 0.028 0.004 6.95 0.000
Deator -0.00014 6.923e-005 -2.04 0.043
R
2
= 0.022 , F(1,183) = 4.2 [0.04]* , DW = 2.7 , N =183
Looking at these latest two tables there is evidence for aggregate supply
function for the UK, though there is slight evidence of negative
autocorrelation.
Growth and decit in UK
Table of results summarising all above calculations are presented as:
Table: Growth on net borrowing
Coecient Stadard Error t-value
Intercept 3.283 0.783 4.191
Net borrowing 0.349 0.133 2.613
R
2
= 0.406 , F = 6.147 , N = 12.
Theoretical values of t are given in a t Table. Column of t-table have level
of signicance () and rows have degrees of freedom.
Here t
,df
is t-table value for degrees of freedom (df = n k) and level
of signicance. df = 12-2=10.
Table: Relevant t-values (one tail) fron t-Table
(n, ) 0.05 0.025 0.005
1 6.314 12.706 63.657
2 2.920 4.303 9.925
10 1.182 2.228 3.169
t
_
1
_
= 4.191 > t
,df
= t
0.05,10
= 1.182. Thus the intercept is
statistically signicant; t
_
2
_
= [2.613[ > t
,df
= t
0.05,10
= 1.182. Thus
the slope is also statistically signicant at 5% and 2.5% level of
signicance.
F- is ratio of two
2
distributed variables with degrees of freedom n2 and
n1.
F
calc
=

y
2
i
K1
e
2
i
NK
=
22.96
1
33.629
10
= 6.15 (482)
Table: Relevant F-values from the F-Table
1% level of signicance 5% level of signicance
(n2, n1) 1 2 3 1 2 3
1 4042 4999.5 5403 161.4 199.5 215.7
2 98.50 99.00 99.17 18.51 19.00 19.16
10 10.04 7.56 6.55 4.96 4.10 3.71
n1 = degrees of freedom of numerator; n2 =degrees of freedom of
denominator; for 5% level of signicance F
n1,n2
= F
1,10
= 4.96;
F
calc
> F
1,10
;for 1% level of signicance F
n1,n2
= F
1,10
= 10.04;
F
calc
< F
1,10
==imply that this model is not statistically signicant at 1%
but signigicant at 5% level of signicance. Model is meaningful.
Testing autocorrelation
d =
T
t=1
(e
t
e
t1
)
2
T
t=1
e
2
t
=
58.473
33.629
= 1.74 (483)
=
T
t=1
e
t
e
t1
T
t=1
e
2
t
=
1.2832
33.629
= 0.0381 (484)
Durbin-Watson Table (part of it)
Table: Durbin-Watson Tables relevant part
5% level of signicance
K = 2 K = 3 K = 4
d
u
d
L
d
u
d
L
d
u
d
L
9 0.824 1.320 0.629 1.699 0.455 2.128
10 0.879 1.320 0.697 1.641 0.525 2.016
12 0.971 1.331 0.812 1.579 0.658 2.864
,
Here the calcualated Durwin-Watston statistics d = 1.74 > dL
12,2
= 1.331
du
n1,n2
= du
12,2
= 0.971 dL
n1,n2
= dL
12,2
= 1.331
Autocorrelation is positive becasue d = 1.74 < 2 but that autocorrelation
is not statistically signicant.The calcualted DW valued = 1.74 is clearly
out of the inconclusive region as it does not fall in the range
Testing for heteroskedactity
One way is to regress predicted square errors e
2
i
in predicated square of y,
Y
2
i
The test statistics nR
2
~
2
df
df =1 here.
e
2
i
=
0
+
1
Y
2
i
+v
i
(485)
n.R
2
= 6.089
Table: Table Values of Chi-Square
(n, ) 0.10 0.05 0.01
1 2.7055 3.8415 6.6349
2 4.605 5.991 9.210
10 15.987 18.307 23.209
Null hypothesis is no heteroskedasticity. nR
2
= 6.089 >
2
df
= 2.7055
== there is heteroskedasticity. White test or ARCH and AR test suggest
there is slight problem of heteroskedasticity in the errors in this model.
However, heteroskedasticity is more serious for cross section than for time
series. Therefore conclusion of above model are still valid.
Transformation of the model in the presence of
autocorrelation
when autocorrelation coecient is known
Y
t
=
1
+
2
X
t
+
t
t = 1 ...T (486)
t
=
t1
+v
t
(487)
Y
t
Y
t1
= (
1
1
) +
2
(X
t
X
t1
) +
t

t1
(488)
Y
+
t
=
+
1
+
2
X
+
t
+
+
t
(489)
Apply OLS in this transformed model
+
1
and
2
will have BLUE properties.
Transformation of the model in the presence of
autocorrelation
when autocorrelation coecient is unknown
This method is similar to the above ones, except that it involves multiple
iteration for estimating . Steps are as following:
1. Get estimates

1
and

2
from the original model; get error terms e
i
and estimate
2. Transform the original model multiplying it by and by taking the
rst dierence,
3. Estimate

1
and

2
from the transformed model and get errors

e
i
of this transformed model
4. Then again estimate

and use those values to transform the
original model as
Y
t
Y
t1
= (
1
1
) +
2
(X
t
X
t1
) +
t

t1
(490)
5. Continue this iteration process until

converges.
PcGive suggests using dierences in variables. Diagnos /ACF options in
OLS in Shazam will generate these iterations.
GLS to solve autocorrelation
In matrix notation
OLS
=
_
X
/
X
_
1
_
X
/
Y
_
(491)
GLS
=
_
X
/
1
X
_
1
_
X
/
1
Y
_
(492)
1
is inverse of variance covariance matrix.
Take a regression
Y = X +e (493)
Assumption of homoskedasticity and no autocorrelation are violated
var (
i
) ,=
2
for \ i (494)
covar (
i
j
) ,= 0 (495)
The variance covariance of error is given by
= E
_
ee
/
_
=
_
2
1

12
..
1n
21

2
2
..
2n
: : : :
n1

n2
..
2
n
_
_
(496)
Q
/
Q = (497)
= QQ
/
= Q
1
2
1
2
Q
/
(498)
P = Q
1
2
(499)
P
/
P = I ; P
/
P =
1
(500)
Transform the model
PY = PX +Pe (501)
Y
+
= X
+
+e
+
(502)
Y
+
= PY X
+
= PX and e
+
= Pe
GLS
= (X
/
P
/
PX)
1
(X
/
P
/
PY)
GLS
=
_
X
/
1
X
_
1
_
X
/
1
Y
_
(503)
Dummy Variables in a Regression Model
It represents qualitative aspect or characteristic in the data
Quality : good, bad; Location: south/north/east/west;
characterisitcs: fat/thin or tall/short
Time: Annual 1970s/ 1990s.; seasonal: Summer,Autumn, Winter,
Spring;
Gender: male/female; Education: GCSE/UG/PD/PhD
Subjects: Math/English/Science/Economics
Ethnic backgrounds: Black, White, Asian, Cacasian, European,
American, Latinos, Mangols, Ausis.
Y
i
=
1
+
2
X
i
+
2
D
i
+
i
i = 1 ...N (504)
i
~N
_
0,
2
_
(505)
Here D
i
is special type of variable
D
i
=
_
1 = if the certain quality exists
0 = otherwise
(506)
Dummy Variables in a Regression Model
Three types of dummy
1
Slope dummy
2
Intercept dummy
3
Interaction between slope and intercept
Examples
Earnding dierences by gender, region, ethnicity or religion, occupation,
education level.
Unemployment duration by gender, region, ethnicity or religion,
occupation, education level.
Demand for a product by by weather, season, gender, region, ethnicity
or religion, occupation, education level.
Test scores by gender, previous background, ethnic origin
Growth rates by decades, countries, exchange rate regimes
Dummy Variables Trap
Consider seasonal dummies as
Y
i
=
1
+
2
X
i
+
2
D
1
+
2
D
2
+
2
D
3
+
2
D
4
+
i
(507)
where
D
1
=
_
1 = if summer
0 otherwise
(508)
D
2
=
_
1 = if autumn
0 otherwise
(509)
D
3
=
_
1 = if winter
0 otherwise
(510)
D
4
=
_
1 = if spring
0 otherwise
(511)
Since

D
i
= 1, it will cause multicollinearity as:
D
1
+D
2
+D
3
+D
4
= 1 (512)
drop on of D
i
to avoid the dummy variable trap.
Dummy Variables in a piecewise linear regression models
Threshold eects in sales
tari charges by volume of transaction -mobile phones
Panel regression: time and individual dummies
Pay according to hierarchy in an organisation
prot from whole sale and retail sales
age dependent earnings -Scholarship for students, pensions and
allowances for elderly
tax allowances by level of income or business
Investment credit by size of investment
prices, employemnts, prots or sales for small, medium and large scale
corporations
requirements according to weight or hight of body
Analysis of Structural change in decit regime
Suppose scal policy regimes have changed since 1970.
Regresses growth rate of output (Y
i
) on netborrowing (X
i
) as:
Expansionary scal policy regime from 1970 to 1990
Y
i
=
1
+
2
X
i
+e
i
i = 1 ...T
Balanced scal policy regime from 1990 to 2009
Y
i
=
1
+
2
X
i
+e
i
i = 1 ...T
H
0
: Link between growth and decit has remained the same
1
=
1
;
2
=
2
H
A
: There has been a shift in regime
Chow Test for stability of parameters or structural change
Use n1 and n2 observations to estimate overall and separate
regressions with (n1+n2-k, n1-k, and n2-k) degrees of freedoms;
obtain sum square of residual (SSR1) with n1+n2-k dfs assuming that
1
=
1
;
2
=
2
; : for the whole sample (restricted estimation)
SSR2 (with n1-k dfs): rst sample
SSR3 (with n2-k dfs): second sample
SSR4 = SSR2+ SSR3 (with n1+n2-2k dfs): unrestricted sum square
errors
obtain S5 = S1-S4;
do F-test
F =
S
5
k
S
4
(n1+n22k)
(513)
The advantage of this approach to the Chow test is that it does not
require the construction of the dummy and interaction variables.
Distributed Lag Model
C
t
=
0
+
1
X
t
+
2
X
t1
+
3
X
t2
+
4
X
t3
+ ... + +
k
X
tk
+
t
t = 1 ...T (514)
Reasons for lags
Psychological reasons: it takes time to believe something.
Technological reasons: takes time to change new machines or to
update.
Institutional reasons: rules, regulations, notices, contracts.
Lagged marginal eect in consumption of an increase in income at period
0.
Koycks Model
short run multiplier :
1
intermediate run multiplier:
1
+
2
+
3
long run multiplier:

1
+
2
+
3
+ ... +
k
proportion of the long run impact at a certain period:
+
=

1
Koycks procedure:
2
=
1
;
3
=
2
1
;
k
=
k
1
and so on.
C
t
=
0
+
1
X
t
+
1
X
t1
+
2
1
X
t2
+
3
1
X
t3
+... +
k
1
X
tk
+
t
(515)
Koycks procedure
Koyck procedure converts distributed lag model into an autoregressive
model. It involves (a) multiplying (2) by , which is between 0 and 1, 0 <
< 1; (b) takking one period lag of that and (c) subtracting from (2)
C
t
=
0
+
1
X
t
+
2
1
X
t1
+
3
1
X
t2
+
4
1
X
t3
+ ... +
k+1
1
X
tk
+
t
(516)
C
t
=
0
+
1
X
t
+
1
X
t1
+
2
1
X
t2
+
3
1
X
t3
+... +
k
1
X
tk
+
t
(517)
C
t1
=
0
+
1
X
t1
+
2
1
X
t2
+
3
1
X
t3
+
4
1
X
t4
+... +
k+1
1
X
tk1
+
t1
(518)
Koycks procedure
Take the dierence between these two
C
t
C
t1
= (1 )
0
+
1
X
t
+
k
1
X
tk
+
t

t1
(519)
Term
k
1
X
tk
0 as 0 < < 1
C
t
= (1 )
0
+
1
X
t
+ C
t1
+u
t
(520)
u
t
=
t

t1
By cancelling terms it transforms to an autoregressive equation as
following:
In steady state C
t
= C
t1
= C ;
C =
0
+

1
(1 )
X
t
+
u
t
(1 )
(521)
term

1
(1)
gives the long run impact of the change in X
t
on C
t
Choice of Length of Lag in Koyck Model
Median lag: -
log 2
log()
: 50% of the long run impact is felt over this lag
Mean lag:
k=0
k
k
k=0
k
: mean of the total impact
Koyck mean lag:

(1)
: average lag length
How to choose lag length
Minimise Akaiki information criteria
AIC = ln
SSE
N
T N
+
2 (n + 2)
T N
(522)
Minimise Swartz criteria (minimise these values)
SC (N) = ln
SSE
N
T N
+
2 (n + 2) ln (T N)
T N
(523)
Problems with Koyck Model
It is very restrictive. The successive coecient may not decline
geometrically when 0 < < 1.
There is no a-priori guide to the maximum length of lag ; Tinbergen
suggests to use trial and error, rst regress C
t
on X
t
and X
t1
, if the
coecients are signicant, keep introducing lagged terms of higher order.
But more lags implies fewer degrees of freedom
Multicollinearity may appear
Data mining
Autoregressive term is correlated with the error term,
Durbin-Watson statistics cannot be applied in this case. Need to use
Durbin-h statistics which is dened as
h =
_
1
1
d
_
_
T 1
1 (T 1) SE (
2
)
2
(524)
Almons polynomial lag model
Koyck procedure is very restrictive where the values of coecients decline
in geometric proportions. However impact of economic variables may be
better explained by a quadratic cubic or higher order polynomial of the
form:
C
t
=
0
+
1
X
t
+
1
X
t1
+
2
1
X
t2
+
3
1
X
t3
+... +
k
1
X
tk
+
t
(525)
quadratic impact structure:
i
=
0
+
1
i +
2
i
2
+
3
cubic impact structure:
i
=
0
+
1
i +
2
i
2
+
3
i
3
k-order polynomial lag structure:
i
=
0
+
1
i +
2
i
2
+
3
i
3
+ ... +
k
i
k
C
t
=
0
+

1
(1 )
X
t
+
u
t
(1 )
(526)
C
t
=
0
+

k=0
_
0
+
1
i +
2
i
2
+
3
i
3
+ ... +
k
i
k
_
X
t1
+u
t
(527)
Advantages of Almon model over Koyck
a. Flexible; can incorporate variety of lag
b. do not have to decline geometrically, Koyck had rigid lag structure
c. No lagged dependent variable in the regression
d. Number of coecient estimated signicantly smaller than in the
Koyck model
e. is likely to be multicollinear.
Estimates of a polynomial distributed lag model
Autoreggressive Distributed Lag Model: ARDL (1,1)
Y
t
= +
0
X
t
+
1
X
t1
+ Y
t1
+
t
(528)
This can be represented by an innitely distributed lag as following
Y
t
= +
0
X
t
+
1
X
t1
+
l
i =0
i l
(
1
+
0
) X
t1
+
t
(529)
lag weights:
0
=
0
;
1
= (
1
+
0
) ;
2
= (
1
+
0
) =
2
1
; ....,
S
=
S
1
ARDL (2,2)
Y
t
= +
0
X
t
+
1
X
t1
+
2
X
t2
+
1
Y
t1
+
2
Y
t2
+
t
(530)
Main Features of Simultaneous Equation System
Single equation models have Y dependent variable to be determined
by a X or a set of X variables and the error term.
one way causation from independent variables to the dependent
variable.
However, many variables in economics are interdependent and there is
two way causation.
Consider a market model with demand and supply.
Price determines quantity and quantity determines price.
Same is true in national income determination model with
consumption and income.
Main Feature of Simultaneous Equation System
Both quantities and prices ( or income and consumption) are
determined simultaneously.
A system of equations, not a single equation, need to be estimated in
order to be able to capture this interdependency among variables.
The main features of a simultaneous equation model are:
(i) two or more dependent (endogenous) variables and a set of
explanatory (exogenous) variables
(ii) a set of equations
Computationally cumbersome and errors in one equation transmitted
through the whole system. High non-linearity in parameters.
Keynesian Model
C
t
=
0
+
1
Y
t
+u
t
(531)
Y
t
= C
t
+I
t
(532)
Here
0
and
1
are structural parameters ; Income (Y
t
) and consumption
(C
t
) are endogenous variables and investment (I
t
) is exogenous variable.
Table: Coecient matrix for rank test
constant C
t
Y
t
I
t
-
0
1 -
1
0
0 -1 1 1
Derivation of the reduced Form Equation
Y
t
=
0
+
1
Y
t
+u
t
+I
t
(533)
Y
t
=

0
1
1
+
1
1
1
I
t
+
1
1
1
u
t
(534)
C
t
=
0
+
1
_

0
1
1
+
1
1
1
I
t
+
1
1
1
u
t
_
+u
t
(535)
C
t
=

0
1
1
+

1
1
1
I
t
+
1
1
1
u
t
(536)
Keynesian model: Estimation of the reduced form of the
model
In the income determination model the reduced form is obtained by
expressing C and Y endogenous variables in terms of I which is the only
exogenous variable in the model.
C
t
=
1,1
+
1,2
I
t
+V
1,t
(537)
Y
t
=
2,1
+
2,2
I
t
+V
2,t
(538)
Cons = + 3.647*Invest + 1.272e+010
(SE) (0.017) (4.34e-013)
GDP = + 5.014*Invest + 5.441e+010
(SE) (0.0228) (5.8e-013)
1,1
=

0
1
1
= 12.72;
1,2
=

1
1
1
= 3.65 (539)
2,1
=

0
1
1
= 54.41
2,2
=
1
1
1
= 5.01 (540) Dr. Bhattarai (Hull Univ. Business School) Classical and multiple regression February 7, 2011 218 / 243
Empirical Part: Exercise in PcGive
construct data set in macroeocnomic variables ( Y, C, I , G, T , X, M,
MS, i, ination, wage rate, exchange rate etc)
save data in *.csv format
Start GiveWin and PcGive and open data le
choose multiple equation dynamic modelling
determine endogenous and exogenous variables and run simultaneous
equation using 3SLS or FIML
Study coecients
Change policy variables and construct few scenarios
Estimation of reduced form
Table: Simultaneous equation model of UK, 1971:2-2010:2
Consumption equation GDP equation
exogenous variables Coecient tvalue prob Coecient tvalue prob
Investment 3.64682 214.0 0.000 5.01427 220.0 0.000
Constant 12.7228 2.932e+022 0.000 54.408 9.379e+022 0.000
Vector Portmanteau(12): 836.726;Vector Normality test: Chi^2(4) = 5.6050 [0.2307]
MOD( 2) Estimating the model by FIML (using macro_2010.csv)
The estimation sample is: 1971 (2) to 2010 (2)
Retrieval of the structural parameters:
1
=

1,2
2,2
=

1
1
1
%
1
1
1
=
3.64682
5.01427
= 0.727 (541)
0
=
1,1
(1
1
) = 12.7228 (1 0.727) = 3.47 (542)
Estimated system:
C
t
= 3.47 + 0.727

Y
t
(543)
Y
t
=

C
t
+I
t
(544)
This seems very plausible result. Validity of this is tested by Vector
Portmanteau, Vector Normality test, Vector hetro tests.
Techniques of estimation of simultaneous equation models
Single Equations Methods: Recursive OLS
Ordinary Least Squares
Indirect Least Squares
Two Stage Least Squares Method
System Method
Seemingly Unrelated Regression Equations
Rank and Order Conditions for Identication
Order condition:
K k > m1 (545)
Rank condition: =>
(A) > (M 1) (M 1) (546)
order of the matrix.
M = number of endogenous variables in the model
K = number of exogenous variables in the model including the intercept
m = number of endogenous variables in an equation
k = number of exogenous variables in a given equation
Rank condition is dened by the rank of the matrix, which should have a
dimension (M-1), where M is the number of endogenous variables in the
model.
Determining the Rank of the Matrix
Table: Coecient matrix for rank test
constant Y
t
I
t
0 1 1
Rank of matrix is the order of non-singular matrix
Rank matrix is formed from the coecients of the variables (both
endogenous and exogenous) excluded from that particular equation
but included in the other equations in the model.
The rank condition tells us whether the equation under consideration
is identied or not.
The order condition tells us if it is exactly identied or overidentied.
Steps for Rank Condition
Write down the system in the tabular form
Strike out the coecients corresponding to the equation to be
identied
Strike out the columns corresponding to those coecients in 2 which
are nonzero.
The entries left in the table will give only the coecients of the
variables included in the system but not in the equation under
consideration. From these coecients form all possible A matrices of
order M-1 and obtain a corresponding determinant. If at least one of
these determinants is non-zero then that equation is identied.
Summary of Order and Rank Conditions of Identication
If (K k) > (m1) and the order of rank (A) is M-1 then the
concerned equation is overidentied.
If (K k) = (m1) and the order of rank (A) is M-1 then the
equation is exactly identied.
If (K k) > (m1) and he order of rank (A) is less than M-1
then the equation is underidentied.
If (K k) < (m1) the structural equation is unidentied. The the
order of rank (A) is less M-1 in this case.
Keynesian Model: Simultaneity Bias
1
=

c
t
y
t
y
2
t
=

_
C
t
C
_
y
t
y
2
t
=

C
t
y
t
y
2
t
(547)
1
=

C
t
y
t
y
2
t
=

(
0
+
1
Y
t
+u
t
) y
t
y
2
t
(548)
cov(Y, e) = E (Y
t
E (Y
t
)) E (u
t
E (u
t
)) = E
_
u
t
1
1
_
u
t
=

2
e
1
1
(549)
p lim
_
1
_
=
1
+
u
t
y
t
y
2
t
=
1
+

u
t
y
t
T
y
2
t
T
=
1
+

2
e
1
1
2
y
(550)
Indentication issue in a Market Model
Consider a relation between quantity and price
Q
t
=
0
+
1
P
t
+u
t
(551)
A priory it is impossible to say whether this a demand or supply
model, both of them have same variables.
If we estimate a regression model like this how can we be sure
whether the parameters belong to a demand or supply model?
We need extra information. Economic theory suggests that demand is
related with income of individual and supply may be respond to cost
or weather condition
Market Model
Q
d
t
=
0
+
1
P
t
+
2
I
t
+u
1,t
(552)
Q
s
t
=
0
+
1
P
t
+
2
P
t1
+u
2,t
(553)
In equilibrium quantity demand equals quantity supplied Q
d
t
= Q
s
t
0
+
1
P
t
+
2
I
t
+u
1,t
=
0
+
1
P
t
+
2
P
t1
+u
2,t
(554)
Solve for P
t
1
P
t

1
P
t
=
0
0
+
2
P
t1
+
2
I
t
+u
2,t
u
1,t
(555)
P
t
=

0
1

1

2
1

1
I
t
+

2
1

1
P
t1
+
u
2,t
u
1,t
1

1
(556)
Reduced Form of the Market Model
Using this price to solve for quantity
Q
d
t
=
0
+
1
P
t
+
2
I
t
+u
1,t
=
0
+
1
_
1

2
1
I
t
+

2
1
P
t1
+
u
2,t
u
1,t
1
_
+
2
I
t
+u
1,t
Q
t
=

1
1

1

2
1

1
I
t
+

1
1

1
P
t1
+

1
u
2,t

1
u
1,t
1

1
(557)
P
t
=
1,0
+
1,1
P
t1
+
1,2
I
t
+V
1,t
(558)
Q
t
=
1,0
+
1,1
P
t1
+
1,2
I
t
+V
1,t
(559)
Market Model: Reduced form coecients
1,0
=

0
1

1

1,1
=

2
1

1

1,2
=

2
1

1
V
1,t
=
u
2,t
u
1,t
1

1
2,0
=

1
1

1

2,1
=

2
1

1

2,2
=

1
1

1
; (560)
V
1,t
=
u
2,t
u
1,t
1

1
; V
2,t
=

1
u
2,t

1
u
1,t
1

1
(561)
Recursive estimation
Y
1,t
=
10
+
11
X
1,t
+
12
X
2,t
+e
1,t
(562)
Y
2,t
=
20
+
21
Y
1,t
+
21
X
1,t
+
22
X
2,t
+e
2,t
(563)
Y
3,t
=
30
+
31
Y
1,t
+
33
Y
2,t
+
31
X
1,t
+
32
X
2,t
+e
3,t
(564)
Apply OLS to (1) and get the predicted value of

Y
1,t
. Then use

Y
1,t
into
equation (2) and apply OLS to equation (2) to get predicted value of

Y
2,t
.
And Finally use predicted values of

Y
1,t
and

Y
2,t
to estimate in equation
(3).
Normal equations with instrumental variables
Y
t
=
0
+
1
X
t
+
2
Y
t1
+u
t
(565)
X
t1
as instrument for
2
Y
t1
Y
t
=
0
N +
1
X
t
+
2
X
t1
(566)
X
t
Y
t
=
0
X
t
+
1
X
2
t
+
2
X
t1
Y
t1
(567)
X
t1
Y
t
=
0
X
t1
+
1
X
t
X
t1
+
2
X
t1
Y
t1
(568)
Normal equations with instrumental variables
This is dierent than the normal equations when instruments were not
used.
Y
t
=
0
N +
1
X
t
+
2
Y
t1
(569)
X
t
Y
t
=
0
X
t
+
1
X
2
t
+
2
X
t1
Y
t1
(570)
Y
t1
Y
t
=
0
Y
t1
+
1
X
t
Y
t1
+
2
Y
2
t1
(571)
Sargan test (SARG) is used for validity of instruments
Divide variables which are uncorrelated and correlated with the error
terms X
1
, X
2
, ..., X
p
and Z
1
, Z
2
, ..., Z
s
. Use instruments.W
1
, W
2
, ..., W
p
Obtain estimates of u
t
from the original regression.
Replace Z
1
, Z
2
, ..., Z
s
by instruments, W
1
, W
2
, ..., W
p
.
Regress on all and but exclude . Obtain R
2
of the regression.
Compute SARG statistics SARG = (n k) R
2
where n is the
number of observations and k is the number of coecients; SARG follows
2
distribution with df = s p.
H
0
: W instruments are valid if the computed SARG exceed the
2
critical value ; if H
0
: is rejected at least one instrument is not valid.
Two Stage Least Square Estimation (2SLS)
Consider a hybrid of Keynesian and classical model in which income Y
1,t
is
function of money Y
2,t
investment X
1,t
and government spending X
2,t
.
Y
1,t
=
1,0
+
11
Y
2,t
+
11
X
1,t
+
12
X
2,t
+e
1,t
(572)
Y
2,t
=
2,0
+
21
Y
1,t
+e
2,t
(573)
First estimate Y
1,t
is all exogenous variables.
Y
1,t
=

1,0
+

1,1
X
1,t
+

1,2
X
2,t
+e
1,t
(574)
Obtain predicted

Y
1,t
Y
1,t
=

1,0
+

1,1
X
1,t
+

1,2
X
2,t
(575)
Two Stage Least Square Estimation (2SLS)
Y
1,t
=

Y
1,t
+e
1,t
(576)
In the second stage put this into the money supply equation
Y
2,t
=
2,0
+
21
Y
1,t
+e
2,t
Y
2,t
=
2,0
+
21
_
Y
1,t
+e
1,t
_
+e
2,t
(577)
Y
2,t
=
2,0
+
21
Y
1,t
+
21
e
1,t
+e
2,t
(578)
Y
2,t
=
2,0
+
21
Y
1,t
+e
+
2,t
(579)
e
+
2,t
=
21
e
1,t
+e
2,t
(580)
Application of the OLS in this equation gives consistent estimators.
Restricted Least Square Estimation
Restrictions in Multiple Regression: Restricted Least Square Estimation
(Judge-Hill-Grith-Lutkopohl-Lee (1988): 236)
OLS procedure to minimise the sum of squared error terms.
Min
S () = e
/
e = (Y X)
/
(Y X)
= Y
/
Y Y
/
(X) (X)
/
Y + (X)
/
(X) (581)
= Y
/
Y 2X
/
Y + (X)
/
(X) (582)
S ()
= 2X
/
Y + 2
X
/
X = 0 ==
=
_
X
/
X
_
1
X
/
Y (583)
Imposing a restriction involves constrained optimisation with a Lagrange
multiplier.
L = e
/
e + 2
_
r
/

/
R
/
_
= (Y X)
/
(Y X) + 2
_
r
/

/
R
/
_
= Y
/
Y 2X
/
Y + (X)
/
(X) + 2
_
r
/

/
R
/
_
(584)
Partial derivation of this constrained minimisation function (Lagrangian
function) wrt and yields
L
= 2X
/
X + 2X
/
Xb 2R
/
= 0 (585)
L
= 2 (r Rb) = 0 (586)
X
/
Xb = X
/
Y + R
/
(587)
b =
_
X
/
X
_
1
X
/
Y +
_
X
/
X
_
1
R
/
(588)
b =

+
_
X
/
X
_
1
R
/
(589)
This is the restricted least square estimator but need still to be solved for
. For that multiply the above equation both sides by R
Rb = R
+R
_
X
/
X
_
1
R
/
(590)
=
_
R
_
X
/
X
_
1
R
/
_
1
_
Rb R
_
(591)
=
_
R
_
X
/
X
_
1
R
/
_
1
[r Rb] (592)
b =

+
_
X
/
X
_
1
R
/
=

+
_
X
/
X
_
1
R
/
_
R
_
X
/
X
_
1
R
/
_
1
[r Rb]
(593)
Thus the restricted least square estimator is a linear function of the
restriction, [r Rb].
Thus the restricted least square estimator is a linear function of the
restriction, [r Rb].
E (b) = E
_
_
+
_
X
/
X
_
1
R
/
_
R
_
X
/
X
_
1
R
/
_
1
[r RE (b)] (594)
E (b) = E
_
_
(595)
For variance we need to use property of an idempotent matrix AA=A.
Such as
A =
_
0.4 0.8
0.3 0.6
_
(596)
Recall in unrestricted case

= (X
/
X)
1
X
/
Y = + (X
/
X)
1
X
/
e
E (b) =
_
X
/
X
_
1
X
/
e +
_
X
/
X
_
1
R
/
_
R
_
X
/
X
_
1
R
/
_
1
_
r RE (b) R
_
X
/
X
_
1
X
/
e
_
(597)
Since Rb r = 0
E (b) = M
_
X
/
X
_
1
X
/
e (598)
Where M is the idempotent matrix:
M = I
_
X
/
X
_
1
R
/
_
R
_
X
/
X
_
1
R
/
_
1
R (599)
The variance covariance matrix of
cov (b) = [E (b) ] [E (b) ]
/
= E
_
M
_
X
/
X
_
1
X
/
ee
/
X
_
X
/
X
_
1
M
/
_
(600)
cov (b) =
2
M
_
X
/
X
_
1
M (601)
cov (b) =
2
M
_
X
/
X
_
1
(602)
M =
2
_
I
_
X
/
X
_
1
R
/
_
R
_
X
/
X
_
1
R
/
_
1
_
R (603)
Thus the variance of the restricted least square estimator is smaller than
the variance of the unrestricted least square estimator.

Basic Metrics BM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Metrics BM

Uploaded by

Copyright:

Available Formats

Basic Econometrics

Econometric Analysis (56268)

You might also like