Linear Regression Primer

DERIVING LINEAR REGRESSION COEFFICIENTS
This sequence shows how the regression coefficients for a simple regression model are
derived, using the least squares criterion (OLS, for ordinary least squares)
1
0
1
2
3
4
5
6
0 1 2 3
Y
X
3
Y
2
Y
1
Y
u X Y + + =
2 1
| |
True model
0
1
2
3
4
5
6
0 1 2 3
We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6).
X
Y
3
Y
2
Y
1
Y
2
u X Y + + =
2 1
| |
True model
0
1
2
3
4
5
6
0 1 2 3
2
Y
3
Y
2 1 1
b b Y + =
2 1 2
2
b b Y + =
2 1 3
3
b b Y + =
Y
b
2

b
1

X
Writing the fitted regression as Y = b
1
+ b
2
X, we will determine the values of b
1
and b
2
that
minimize RSS, the sum of the squares of the residuals.
3
^
1
Y
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
0
1
2
3
4
5
6
0 1 2 3
2
Y
3
Y
2 1 1
b b Y + =
2 1 2
2
b b Y + =
2 1 3
3
b b Y + =
Y
b
2

b
1

X
4
1
Y
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
Given our choice of b
1
and b
2
, the residuals are as shown.
2 1 3 3 3
2 1 2 2 2
2 1 1 1 1
3 6
2 5
b b Y Y e
b b Y Y e
b b Y Y e
= =
= =
= =
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
b b b b b b e e e RSS
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
The sum of the squares of the residuals is thus as shown above.
5
2 1 3 3 3
2 1 2 2 2
2 1 1 1 1
3 6
2 5
b b Y Y e
b b Y Y e
b b Y Y e
= =
= =
= =
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
The quadratics have been expanded.
6
Like terms have been added together.
7
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
For a minimum, the partial derivatives of RSS with respect to b
1
and b
2
should be zero. (We
should also check a second-order condition.)
8
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
The first-order conditions give us two equations in two unknowns.
9
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
Solving them, we find that RSS is minimized when b
1
and b
2
are equal to 1.67 and 1.50,
respectively.
10
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
Here is the scatter diagram again.
11
0
1
2
3
4
5
6
0 1 2 3
2
Y
3
Y
2 1 1
b b Y + =
2 1 2
2
b b Y + =
2 1 3
3
b b Y + =
Y
b
2

b
1

X
1
Y
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
12
0
1
2
3
4
5
6
0 1 2 3
2
Y
3
Y
17 . 3
1
= Y
67 . 4
2
= Y
17 . 6
3
= Y
Y
b
2

b
1

X
1
Y
u X Y + + =
2 1
| |
True model
Fitted model
X Y 50 . 1 67 . 1
+ =
The fitted line and the fitted values of Y are as shown.
13
Before we move on to the general case, it is as well to make a small but important
mathematical point.
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
14
When we establish the expression for RSS, we do so as a function of b
1
and b
2
. At this
stage, b
1
and b
2
are not specific values. Our task is to determine the particular values that
minimize RSS.
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
15
We should give these values special names, to differentiate them from the rest.
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
16
Obvious names would be b
1
OLS
and b
2
OLS
, OLS standing for Ordinary Least Squares and
meaning that these are the values that minimize RSS. We have re-written the first-order
conditions and their solution accordingly.
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
0 28 12 6 0
OLS
2
OLS
1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
OLS
2
OLS
1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
OLS
2
OLS
1
= = b b
17
Now we will proceed to the general case with n observations.
X X
n
X
1

Y
n
Y
1
Y
u X Y + + =
2 1
| |
True model
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
18
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
b
2

b
1

Given our choice of b
1
and b
2
, we will obtain a fitted line as shown.
n
Y
n n
X b b Y
2 1
+ =
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
19
b
2

b
1

The residual for the first observation is defined.
1
e
n n n n n
X b b Y Y Y e
X b b Y Y Y e
2 1
1 2 1 1 1 1 1
.....
= =
= =
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
n n
X b b Y
2 1
+ =
Similarly we define the residuals for the remaining observations. That for the last one is
marked.
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
1
e
n
e
20
b
2

b
1

n n n n n
X b b Y Y Y e
X b b Y Y Y e
2 1
1 2 1 1 1 1 1
.....
= =
= =
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
n n
X b b Y
2 1
+ =

+ + + =
+ + + +
+
+ + + =
+ + = + + =
i i i i i i
n n n n n n
n n n
X b b Y X b Y b X b nb Y
X b b Y X b Y b X b b Y
X b b Y X b b Y e e RSS
2 1 2 1
2 2
2
2
1
2
2 1 2 1
2 2
2
2
1
2
1 2 1 1 1 2 1 1
2
1
2
2
2
1
2
1
2
2 1
2
1 2 1 1
2 2
1
2 2 2
2 2 2
...
2 2 2
) ( ... ) ( ...
21
RSS, the sum of the squares of the residuals, is defined for the general case. The data for
the numerical example are shown for comparison..
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
22
The quadratics are expanded.
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =

+ + + =
+ + + +
+
+ + + =
+ + = + + =
i i i i i i
n n n n n n
n n n
2 1 2 1
2 2
2
2
1
2
2 1 2 1
2 2
2
2
1
2
1 2 1 1 1 2 1 1
2
1
2
2
2
1
2
1
2
2 1
2
1 2 1 1
2 2
1
2 2 2
2 2 2
...
2 2 2
) ( ... ) ( ...

+ + + =
+ + + +
+
+ + + =
+ + = + + =
i i i i i i
n n n n n n
n n n
2 1 2 1
2 2
2
2
1
2
2 1 2 1
2 2
2
2
1
2
1 2 1 1 1 2 1 1
2
1
2
2
2
1
2
1
2
2 1
2
1 2 1 1
2 2
1
2 2 2
2 2 2
...
2 2 2
) ( ... ) ( ...
Like terms are added together.
23
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2 1 2 1
2
2
2
1
2
2 1
2
2 1
2
2 1
2
3
2
2
2
1
12 62 28 14 3 70
6 36 12 9 36
4 20 10 4 25
2 6 6 9
) 3 6 ( ) 2 5 ( ) 3 (
b b b b b b
b b b b b b
b b b b b b
b b b b b b
+ + + =
+ + + +
+ + + +
+ + + =
+ + = + + =
24
Note that in this equation the observations on X and Y are just data that determine the
coefficients in the expression for RSS.
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
}

+ + + =
i i i i i i
X b b Y X b Y b X b nb Y RSS
2 1 2 1
2 2
2
2
1
2
2 2 2
25
The choice variables in the expression are b
1
and b
2
. This may seem a bit strange because
in elementary calculus courses b
1
and b
2
are usually constants and X and Y are variables.
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
}

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
26
However, if you have any doubts, compare what we are doing in the general case with what
we did in the numerical example.
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
}

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
27
The first derivative with respect to b
1
.
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
}

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
0 2 2 2 0
2 1
1
= + =
c
c
i i
X b Y nb
b
RSS
28
With some simple manipulation we obtain a tidy expression for b
1
.
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
}

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
0 2 2 2 0
2 1
1
= + =
c
c
i i
X b Y nb
b
RSS

=
i i
X b Y nb
2 1
X b Y b
2 1
=
The first derivative with respect to b
2
.
29

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
2 1 2 1
2
2
2
1
12 62 28 14 3 70 b b b b b b RSS + + + =
0 28 12 6 0
2 1
1
= + =
c
c
b b
b
RSS
0 62 28 12 0
2 1
2
= + =
c
c
b b
b
RSS
50 . 1 , 67 . 1
2 1
= = b b
0 2 2 2 0
2 1
1
= + =
c
c
i i
X b Y nb
b
RSS

=
i i
X b Y nb
2 1
X b Y b
2 1
=
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
}
Divide through by 2.
30

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
0 2 2 2 0
2 1
1
= + =
c
c
i i
X b Y nb
b
RSS

=
i i
X b Y nb
2 1
X b Y b
2 1
=
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0
1
2
2
= +
i i i i
X b Y X X b
We now substitute for b
1
using the expression obtained for it and we thus obtain an
equation that contains b
2
only.
31

+ + + =
i i i i i i
2 1 2 1
2 2
2
2
1
2
2 2 2
0 2 2 2 0
2 1
1
= + =
c
c
i i
X b Y nb
b
RSS

=
i i
X b Y nb
2 1
X b Y b
2 1
=
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0
1
2
2
= +
i i i i
X b Y X X b
0 ) (
2
2
2
= +
i i i i
X X b Y Y X X b
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
32
The definition of the sample mean has been used.
0
1
2
2
= +
i i i i
X b Y X X b
0 ) (
2
2
2
= +
i i i i
X X b Y Y X X b
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0 ) (
2
2
2
= +

X n X b Y Y X X b
i i i
n
X
X
i
=
X n X
i
=
33
The last two terms have been disentangled.
0
1
2
2
= +
i i i i
X b Y X X b
0 ) (
2
2
2
= +
i i i i
X X b Y Y X X b
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0 ) (
2
2
2
= +

X n X b Y Y X X b
i i i
0
2
2
2
2
= +

X nb Y X n Y X X b
i i i
0
1
2
2
= +
i i i i
X b Y X X b
0 ) (
2
2
2
= +
i i i i
X X b Y Y X X b
34
0 2 2 2 0
1
2
2
2
= + =
c
c
i i i i
X b Y X X b
b
RSS
0 ) (
2
2
2
= +

X n X b Y Y X X b
i i i
0
2
2
2
2
= +

X nb Y X n Y X X b
i i i
Terms not involving b
2
have been transferred to the right side.
( ) Y X n Y X X n X b
i i i
=

2 2
2
To create space, the equation is shifted to the top of the slide.
35
i i i
=

2 2
2
i i i
=

2 2
2
Hence we obtain an expression for b
2
.
36
i i i
=

2 2
2
2 2
2
X n X
Y X n Y X
b
i
i i
In practice, we shall use an alternative expression. We will demonstrate that it is equivalent.

37
i i i
=

2 2
2
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
2 2
2
X n X
Y X n Y X
b
i
i i
( )( )
( ) ( )
Y X n Y X
Y X n Y n X X n Y Y X
Y X n Y X X Y Y X
Y X Y X Y X Y X Y Y X X
i i
i i
i i i i
i i i i i i
=
+ =
+ =
+ =

Expanding the numerator, we obtain the terms shown.
38
i i i
=

2 2
2
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
2 2
2
X n X
Y X n Y X
b
i
i i
( )( )
( ) ( )
Y X n Y X
Y X n Y X X Y Y X
i i
i i
i i i i
i i i i i i
=
+ =
+ =
+ =

In the second term the mean value of Y is a common factor. In the third, the mean value of
X is a common factor. The last term is the same for all i.
39
i i i
=

2 2
2
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
2 2
2
X n X
Y X n Y X
b
i
i i
i i i
=

2 2
2
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
2 2
2
X n X
Y X n Y X
b
i
i i
We use the definitions of the sample means to simplify the expression.

40
( )( )
( ) ( )
Y X n Y X
Y X n Y X X Y Y X
i i
i i
i i i i
i i i i i i
=
+ =
+ =
+ =

n
X
X
i
=
X n X
i
=
Hence we have shown that the numerators of the two expressions are the same.
41
i i i
=

2 2
2
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
( )( )
( ) ( )
Y X n Y X
Y X n Y X X Y Y X
i i
i i
i i i i
i i i i i i
=
+ =
+ =
+ =

2 2
2
X n X
Y X n Y X
b
i
i i
The denominator is mathematically a special case of the numerator, replacing Y by X.

Hence the expressions are quivalent.
42
( )( ) Y X n Y X Y Y X X
i i i i
=

( )
2 2
2
X n X X X
i i
=

i i i
=

2 2
2
2 2
2
X n X
Y X n Y X
b
i
i i
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
The scatter diagram is shown again. We will summarize what we have done. We
hypothesized that the true model is as shown, we obtained some data, and we fitted a line.
43
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
b
2

b
1

n n
X b b Y
2 1
+ =
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
44
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
n n
X b b Y
2 1
+ =
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
b
2

b
1

We chose the parameters of the fitted line so as to minimize the sum of the squares of the
residuals. As a result, we derived the expressions for b
1
and b
2
.
X b Y b
2 1
=
( )( )
( )

=
2
2
X X
Y Y X X
b
i
i i
45
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
b
2

b
1

Again, we should make the mathematical point discussed in the context of the numerical
example. These are the particular values of b
1
and b
2
that minimize RSS, and we should
differentiate them from the rest by giving them special names, for example b
1
OLS
and b
2
OLS
.
X b Y b
OLS
2
OLS
1
=
( )( )
( )

=
2
OLS
2
X X
Y Y X X
b
i
i i
n n
X b b Y
2 1
+ =
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
46
X X
n
X
1

Y
1 2 1 1
X b b Y + =
1
Y
n
Y
b
2

b
1

However, for the next few chapters, we shall mostly be concerned with the OLS estimators,
and so the superscript 'OLS' is not really necessary. It will be dropped, to simplify the
notation.
n n
X b b Y
2 1
+ =
X b Y b
OLS
2
OLS
1
=
( )( )
( )

=
2
OLS
2
X X
Y Y X X
b
i
i i
u X Y + + =
2 1
| |
True model
X b b Y
2 1
+ =
Fitted model
47
Typically, an intercept should be included in the regression specification. Occasionally,
however, one may have reason to fit the regression without an intercept. In the case of a
simple regression model, the true and fitted models become as shown.
u X Y + =
2
| X b Y
2
=
True model Fitted model
48
We will derive the expression for b
2
from first principles using the least squares criterion.
The residual in observation i is e
i
= Y
i
b
2
X
i
.
i i i i i
X b Y Y Y e
2
= =
u X Y + =
2
| X b Y
2
=
49
With this, we obtain the expression for the sum of the squares of the residuals.
i i i i i
X b Y Y Y e
2
= =
( )

+ = =
2 2
2 2
2
2
2
2
i i i i i i
X b Y X b Y X b Y RSS
u X Y + =
2
| X b Y
2
=
We differentiate with respect to b
2
. The OLS estimator is the value that makes this slope
equal to zero (the first-order condition for a minimum). Note that we have differentiated
properly between the general b
2
and the specific b
2
OLS
.
50
i i i i i
X b Y Y Y e
2
= =
( )

+ = =
2 2
2 2
2
2
2
2
i i i i i i

=
i i i
Y X X b
b
RSS
2 2
d
d
2
2
2
u X Y + =
2
| X b Y
2
=
0 2 2
2 OLS
2
=
i i i
Y X X b
51
Hence, we obtain the OLS estimator of b
2
for this model.
i i i i i
X b Y Y Y e
2
= =
( )

+ = =
2 2
2 2
2
2
2
2
i i i i i i

=
i i i
Y X X b
b
RSS
2 2
d
d
2
2
2
=
2
OLS
2
i
i i
X
Y X
b
u X Y + =
2
| X b Y
2
=
0 2 2
2 OLS
2
=
i i i
Y X X b
52
i i i i i
X b Y Y Y e
2
= =
( )

+ = =
2 2
2 2
2
2
2
2
i i i i i i

=
i i i
Y X X b
b
RSS
2 2
d
d
2
2
2
=
2
OLS
2
i
i i
X
Y X
b
0 2
d
d
2
2
2
2
> =
i
X
b
RSS
The second derivative is positive, confirming that we have found a minimum.
u X Y + =
2
| X b Y
2
=
0 2 2
2 OLS
2
=
i i i
Y X X b

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section 1.3 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2012.10.28

Linear Regression Primer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression Primer

Uploaded by

Copyright:

Available Formats

DERIVING LINEAR REGRESSION COEFFICIENTS

In practice, we shall use an alternative expression. We will demonstrate that it is equivalent.

We use the definitions of the sample means to simplify the expression.

The denominator is mathematically a special case of the numerator, replacing Y by X.

You might also like