Professional Documents
Culture Documents
yi = 0 + 1 xi + i , i = 1, 2, , n (1.1)
where
1
EXAMPLE 1 (Soybean Yield and Fertilizer) Suppose that soybean yield is deter-
mined by the model
yield = 0 + 1 fertilizer + .
So that y = soybean yield of each plot, x = amounts of fertilizer applied to each
plot, and contains other factors such as land quality, rainfall, and so on.
wage = 0 + 1 educ + .
yi = 0 + 1 xi
ei = yi yi = yi 0 1 xi .
. Fact 2:
n
X n
X n
X
2
(xi x) = (xi x)xi = xi2 nx. (2.2)
i=1 i=1 i=1
2
DEFINITION 2 Least squares (LS) method Choose 0 and 1 to minimize the
sum of squared residuals, that is,
P
find 0 , 1 to minimize Q = ni=1 (yi 0 1 xi )2 w.r.t. 0 , 1 .
The LSE of 0 and 1 are found by solving the normal equations for 0 , 1 :
0 = y 1 x
and 1 = Sxy /Sxx ,
P
where sample mean y is defined as y = n1 ni=1 yi and sum of squares are defined
by
n
X n
X
Sxx = (xi x)2 , Sxy = (xi x)(yi y).
i=1 i=1
yi = 0 + 1 xi or yi y = 1 (xi x).
3
Figure 1: Residual for the ith observation.
0 = 1.280, 1 = 0.0526.
Fitted regression line:
y = 1.280 + 0.0526x,
which is shown in Figure 1.1(b) for reference.
Note: Graphically, the regression line is chosen to minimize the sum of squared
vertical departures of all observations from the line.
4
residuals is zero i.e.
n
X
ei = 0,
i=1
in view of (2.3). Second, the sum of the observed values equals the sum of the
fitted value:
Xn Xn
yi = yi .
i=1 i=1
Third, the sample covariance between the regressors and the LS residuals is zero.
Mathematically, by (2.4)
Xn
xi ei = 0.
i=1
Lastly, the regression line always goes through the point (x, y).
4 Estimation of 2 in SLR
It seems reasonable to assume that the greater the variability of the random
error (which is measured by its variance 2 ), the greater will be the errors in
the estimation of the model parameters 0 and 1 , and in the error of prediction
when y is used to predict y for some values of x. Consequently, you should
not be surprised, as we proceed through this chapter, to find 2 appears in the
formulas for all confidence intervals and test statistics that we use.
In most practical situations, 2 will be unknown and we must use the data to
estimate its value.
Recall the way of estimating 2 from i.i.d sample Y1 , , Yn with EYi = 0 and
var(Yi ) = 2 .
. find
n
X n
X
2
(Yi EYi ) = (Yi Y )2 .
i=1 i=1
Square the difference between each observation and the estimate of its
mean.
5
. find
n
X n
X
(yi Eyi )2 = (yi (0 + 1 xi ))2 = SSE.
i=1 i=1
Square the difference between each observation and the estimate of its
mean. Here SSE denotes the error sum of squares.
6
PROOF Note that
1 hX i
n n n
1 X X
1 = (xi x)(yi y) = (xi x)yi y (xi x)
Sxx Sxx
i=1 i=1 i=1
n
xi x
n
1 X X
= (xi x)yi = yi
Sxx Sxx
i=1 i=1
n
X xi x
= ki yi with ki = ,
Sxx
i=1
n
X n
X
1 1
= ki x yi = li yi with li = ki x.
n n
i=1 i=1
Thus Theorem 2 is true and hence, 0 and 1 are normal variables by Fact 3.
What about their means and variances ?
and
n
X n
X xi x Sxx
ki xi = xi = = 1.
Sxx Sxx
i=1 i=1
7
We also have
X
n Xn n
X 2
V ar(1 ) = V ar ki yi = ki2 V ar(yi ) = 2 ki2 = ,
Sxx
i=1 i=1 i=1
where
n
X n
X (xi x)2 1
ki2 = = .
Sxx Sxx
i=1 i=1
1
p1 N(0, 1).
2 /Sxx
But this is not useful because we do not know 2 . Replacing 2 with the estimator
s2 , we obtain
1
p1 t(n 2).
s2 /Sxx
In what follows, is type 1 error probability = P(reject H0 |H0 true), always
between 0 and 1 and usually set at 0.01, 0.05 or 0.10. The level hypothesis tests
concerning 1 are classified as follows:
. A Two-Sided test H0 : 1 = c, H1 : 1 6= c
8
Alternatively reject H0 for case A if p value = P(|tn2 | > Tobserved ) is smaller than
. Moreover, the confidence interval for case A is
1/2 /2
1 sSxx tn2 .
E(y) = 0 + 1 x0 .
9
EXAMPLE 5 Example 1.1 (contd)
Suppose you ar thinking of predicting the cholesterol level (m ) of several
patients whose age is 60. Then the predicted value is 0 + 601 = 4.436. The 95%
prediction interval becomes
1/2
1 (60 39.417)2
4.436 0.3340 + 2.074,
24 4139.833
i.e. [4.173, 4.699].
ynew = 0 + 1 x0 .
as in (5.3). Note that future response ynew is independent of ynew , which is de-
pendent on the past observations x1 , , xn . Since it is a linear combination of
normal random variables, it is a normal random variable with mean
E(ynew ) E(ynew ) = 0.
and variance
1 (x0 x)2
V ar(ynew ynew ) = V ar(ynew ) + V ar(ynew ) = 2 + 2 + .
n Sxx
Therefore
y ynew
r new t(n 2).
1 (x0 x)2
s 1 + n + Sxx
10
EXAMPLE 6 Example 1.1 (contd)
Suppose we wish to predict the cholesterol level of a future patient whose age is
60 say. Then the predicted value is
0 + 601 = 4.436.
6 Analysis of variance
An analysis of variance is a formal method, by tabulating the results in an anal-
ysis of variance (ANOVA) table, to check whether the fitted model is adequate.
It provides a different way of looking at what we have already done.
Consider the linear relationship
yi y = (yi y) + (yi yi ).
11
and
n
X n
X
Syy = (yi y)2 = [(yi yi ) + (yi y)]2
i=1 i=1
n
X n
X n
X
2 2
= (yi yi ) + (yi y) + 2 (yi yi )(yi y)
i=1 i=1 i=1
= SSE + SSR.
It measures how strong the linear relationship between x and y is. The higher
the R2 , the stronger the linear relationship.
Based on the ANOVA table there is another way to test
H0 : 1 = 0 vsH1 : 1 6= 0.
12
Under the assumption of 1 = 0, the F-ratio
Hence, this F-ratio can be used to check whether or not the predictor variable X
significantly contribute to the explanation of the response variable Y . Rejection
rule is to reject H0 if F > F(1, n2). Note that we now know the exact distribution
of F, and hence p-value can be calculated accordingly. Loosely speaking,
. For a size test, we read from standard statistical tables or computer soft-
ware:
()
Critical value F1,22
5% 4.301
1% 7.945
In either case, the F-ratio 102.7473 is much bigger than the critical value
and we should reject H0 at the stated significance level.
13
6.2 R2 statistic and F statistic
A significant result for an F test of the regression does not necessarily imply a big
R2 . The test is used to judge whether there is evidence of a linear relationship
between Y and X, but R2 is used to measure how good that relationship is.
To illustrate this point we below consider some examples.
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 0.700 0.193 3.423 6.553 7.680 10.245 1.223 7.338 12.285 6.564 8.711 6.144
. p-value = P (F1,10 > 5.152688) = 0.0466, which is smaller than 0.05. (Alter-
(0.05)
natively, check that the F-ratio 5.153 > F1,10 = 4.965.) Hence the linear
relationship between X and Y is significant at the 5% level.
. The quantity R2 = 0.3401, which shows that the regression line is far from
being a good fit to the data.
This illustrates the earlier remark that a significant linear relationship is not nec-
essarily accompanied by a good fit of the regression line.
[ In fact, significant linear relationship means SLR is better than no SLR, but
does not mean that the SLR is a good fit. ]
The scatterplot of Y against X (with fitted regression line) gives the scatterplot
of the data, together with the fitted regression line. Clearly, the observed Y values
display a strong upwards trend as X increases, but the observations still suffer
from large fluctuations from the fitted line.
Generally, if T t(n 2) then T 2 F(1, n 2). This implies that the F test
rejects if and if the t-test rejects.
14
12
10
8
Y
6
4
2
0
2 4 6 8 10 12
Figure 3: Scatterplot of X vs Y.
yi = 1 xi + i .
15