Chapter 8

88
8. Multiple Regression: Tests of Hypothesis & Confidence Intervals
Consider hypothesis tests & CI for the parameters  0 , 1 ,,  k in β in the model y  Xβ  ε .
 
Assume throughout the chapter that y is N n Xβ, 2 I , where X is n   k  1 of rank
k  1  n . The x ’s are fixed constants.

8.1 Test of Overall Regression
Expressed as H 0 : β1  0; β1   1 ,  2 ,,  k  . Want to test H 0 : β1  0 , not H 0 : β  0 where
 0 
β    .
 β1 
Use the centered model (7.28),
   1 
y   j, Xc     ε where Xc   I  J  X1 is the centered matrix
 β1   n 
 x11  x1 x12  x2  x1k  xk 
 
 1   x21  x1 x22  x2  x2 k  xk 
Xc   I  J  X1  and X1 contains all the column
 n      
 
 xn1  x1 xn 2  x2  xnk  xk 
of X except the first. The total sum of squares  in1  yi  y  can be partitioned as
2
 in1  yi  y   βˆ 1 Xc y    in1  yi  y   βˆ 1 Xc y 

2 2
 βˆ 1 Xc Xc βˆ 1  SSE  SSR  SSE (8.1)
SSE is given in (7.33). The regression sum of squares SSR  βˆ 1 Xc Xc βˆ 1 is clearly due to β1 .

To construct an F‐test involving SSR and SSE, need to express the sums of squares in (8.1) as
quadratic forms in y so that we can use theorems to show that these sums of squares have chi‐
square distributions and are independent.

 1  n
Using  in1  yi  y   y   I  J  y , βˆ 1   Xc Xc  Xc y and SSE    yi  y   βˆ 1 Xc y , write
2 1 2
 n  i 1
(8.1) as
 1 
y   I  J  y  SSR  SSE
 n 
 1 
 y Xc  Xc Xc  Xc y  y   I  J  y  y Xc  Xc Xc  Xc y
1 1
 n  (8.2)
 1 
 y Ay  y   I  J  A  y
 n 

where A  Xc  Xc Xc  Xc .
1

89
Properties of the matrices of the quadratic froms in (8.2):
1 1
Theorem 8.1A. The matrices I  J , A  Xc  Xc Xc  Xc , and I  J  A have the following
1
n n
properties:

 1 
(i) A  I  J   A ,
 n 
(ii) A is idempotent of rank k ,
1
(iii) I  J  A is idempotent of rank n  k  1 ,
n
 1 
(iv) A  I  J  A   O .
 n 

 
Theorem 8.1B. If y is N n Xβ, 2 I , then SSR  2  βˆ 1 Xc Xc βˆ 1  2 and
SSE  2    in1  yi  y   βˆ 1 Xc Xc βˆ 1   2 have the following distributions:

2
 
(i) SSR  is   k , 1  where 1  μAμ 2 2  β1 Xc X c β1 2 2 .
2 2
(ii) SSE  2 is  2  n  k  1

 
Theorem 8.1C. If y is N n Xβ, 2 I , then SSR & SSE are independent, where SSR & SSE are
defined in (8.2) & Theorem 8.1B.

 
Theorem 8.1D. If y is N n Xβ, 2 I , the distribution of
F
SSR k 2
 
SSR k
(8.3)
SSE  n  k  1  2
SSE  n  k  1
is as follows:

(i) If H 0 : β1  0 is false, then F is distributed as F  k , n  k  1, 1  , where
1  μAμ 2 2  β1 Xc Xc β1 2 2 .

(ii) If H 0 : β1  0 is true, then 1 =0 and F is distributed as F  k , n  k  1

The test for H 0 : β1  0 is carried out as follows:
Reject H 0 if F  F ,k ,n  k 1 where F , k , n  k 1 is the upper  percentage point of the (central) F
distribution. Alternatively, a p ‐value can be used to carry out the test. A p ‐value is the tail area
of the central F distribution beyond the calculated F ‐value, that is, the probability of
exceeding the calculated F ‐value, assuming H 0 : β1  0 is true. A p ‐value less than  is
equivalent to F  F ,k ,n  k 1 .

90
We summarize the results leading to the F ‐test in the analysis of variance in Table 8.1

Table 8.1. Analysis of Variance for F ‐test of H 0 : β1  0
Source of d.f Sum of Squares Mean Squares Expected Mean
variation Square
Due to β1 k SSR  βˆ 1 Xc y SSR/k 1
 2  β1 Xc Xc β1
k
Error n‐k‐1 SSE   i  yi  y  βˆ 1 Xc y SSE/(n‐k‐1) 2
2
Total n‐1 SST   i  yi  y 

2

Note:
1. The entry in the column for expected mean squares in Table 8.1 are simply
E  SSR k  & E  SSE  n  k  1  and if v is distributed as  2  n,   , then E  v   n  2 ; and
 
we have shown that E s 2   2 .
2. If H 0 : β1  0 is true, both the expected mean squares in Table 8.1 are equal to  2 , and
we expect F to be near 1. If β1  0 , then E  SSR k    2 since Xc Xc is positive definite, and
we expect F to exceed 1. Therefore, H 0 is rejected for large values of F .

3. The test of H 0 : β1  0 in Table 8.1 has been developed using the centered model (7.26).
4. SSR & SSE can also be expressed in terms of the noncentered model y  Xβ  ε in (7.3):
SSR  βˆ Xy  n y 2 & SSE  y y  βˆ Xy (8.4)

Example 8.1: Using the data in Example 7.2, test for H 0 : β1  0 where, β1   1 ,  2  . From
Example 7.2, Xy   90, 482,872  and βˆ   5.3754,3.0118, 1.2855  . Now, to find y y , βˆ Xy
and n y 2 :
12
y y   yi2  22  32    142  840,
i 1
 90 
ˆβXy   5.3754,3.0118, 1.2855   482   814.5410,
 
 872 
 
2
 y 
2
 19 
n y 2  n  i i   12    675
 n   12 
Thus, by (8.4), SSR=139.5410 & SSE= 25.4590 and
n
  yi  y   y y  n y  165 .
2 2

i 1

91
Table 8.2. Analysis of Variance for Overall Regression Test for the Data in Example 7.2
Source of d.f Sum of Squares Mean Squares F
variation
Due to β1 2 139.5410 69.7705 24.665
Error 9 25.4590 2.8288
Total 11 165.0000

The F ‐test is given in Table 8.2. Since 24.665  F0.05,2,9  4.26 , we reject H 0 : β1  0 and
conclude that at least one of 1 or  2 is not zero. The p ‐value is 0.00023.

8.2. Test On a Subset of the  ’s
To test the hypothesis that a subset of the x ’s is not useful in predicting y . Without loss of
generality, assume that the  ’s to be tested have been arranged last in  , with a
corresponding arrangement of the columns of X . Then  & X can be partitioned accordingly,
and by (7.50), the model for all n observations becomes
β 
y  Xβ  ε   X1 , X 2   1   ε
 β2  (8.5)
 X1β1  X 2β 2  ε

where β 2 contains the  ’s to be tested, the intercept  0 included in β1 .
The hypothesis of interest is H 0 : β 2  0 . If the parameters in β 2 is h , then X 2 is n  h , β1 is
 k  h  1  1 , and X1 is n   k  h  1 . Thus, β1   0 , 1 ,,  k h  and β 2    k h 1 ,,  k  .

Consider model y   0  1 x1   2 x2  3 x12   4 x22  5 x1 x2   and we wish to test the
hypothesis H 0 : 3   4  5  0 . For this, we have β1    0 , 1 ,  2  and β 2    3 ,  4 , 5  .
Note that β1 in (8.5) is different from β1 in Section 8.1, in which  was partitioned as
 0 
β   
 β1 
so that β1 constituted all of  except  0 .

To test H 0 : β 2  0 versus H1 : β 2  0 , use a full‐and‐reduced‐model approach:
The full model is given by (8.5) and under H 0 : β 2  0 , the reduced model becomes
y  X1β1*  ε* . (8.6)
To compare the fit of the full model (8.5) to the fit of the reduced model (8.6), add and subtract
βˆ Xy and βˆ 1* X1 y to the total sum of squares y y so as to obtain the partitioning
   
y y  y y  βˆ Xy  βˆ Xy  βˆ 1* X1 y  βˆ 1* X1 y (8.7)
 
 SSE  SS  β 2 | β1   SS β1* (8.8)
92
 
where the sum of squares SS β1*  βˆ 1* X1 y is from the reduced model (8.6) and
SS  β 2 | β1   βˆ Xy  βˆ 1* X1 y is the “extra” regression sum of squares due to β 2 after adjusting

for β1 . Note that SS  β 2 | β1   βˆ Xy  βˆ 1* X1 y can also be expressed as


SS  β 2 | β1   βˆ Xy  n y 2  βˆ 1* X1 y  n y 2 
 SSR  full   SSR  reduced 
which is the difference between the overall regression sum of squares for the full model and the
overall regression sum of squares for the reduced model.

If H 0 : β 2  0 is true, would expect that SS  β 2 | β1  to be small so that y y in (8.8) is mostly
 
composed of SS β1* and SSE. If β 2  0 , expect that SS  β 2 | β1  to be larger and account for
more of y y . Thus, we are testing H 0 : β 2  0 in the full model, in which there are no
restrictions on β1 . We are not ignoring β1 (assuming β1  0 ) but are testing H 0 : β 2  0 in the
presence of β1 .

To develop a test statistic based on SS  β 2 | β1  , first write (8.7) in terms of quadratic forms in
y . Using βˆ   XX  Xy and βˆ 1*   X1 X1  X1 y , (8.7) becomes
1 1
y y  y y  y X  XX  Xy  y X  XX  Xy  y X1  X1 X1  X1 y

1 1 1

 y X1  X1 X1  X1 y
1
 y  I  X  XX  X y  y   X  XX  X  X1  X1 X1  X1  y

1 1 1
    (8.9)
 y X1  X1 X1  X1 y
1
 y   I  A1  y  y   A1  A 2  y  y A 2 y (8.10)

where A1  X  XX  X and A 2  X1  X1 X1  X1 . The matrix I  A1 is idempotent with rank
1 1
n  k  1 , where k  1 is the rank of X ( k  1 is also the number of elements in β ). The matrix

A1  A 2 is also idempotent as shown in the following theorem.

Theorem 8.2A. The matrix A1  A 2  X  XX  X  X1  X1 X1  X1 is idempotent with rank h ,
1 1
where h is the number of elements in β 2 .

 
Theorem 8.2B. If y is N n Xβ, 2 I , and A1 and A 2 are defined in (8.9) & (8.10), then
(i) y   I  A1  y  is 
2
 n  k  1 ;
2
(ii) y   A1  A 2  y  is  2  h, 1  where

2
1  β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2 2 2 ;

1
 
(iii) y   I  A1  y and y   A1  A 2  y are independent.

93
 
Theorem 8.2C. Let y is N n Xβ, 2 I and define an F ‐statistic as follows:
y   A1  A 2  y h SS  β 2 | β1  h
F 
y   I  A1  y  n  k  1 SSE  n  k  1

βˆ Xy  βˆ Xy  h
*
1 1
(8.11)
 yy  βˆ Xy   n  k  1

where βˆ   XX  Xy is from the full model y  Xβ  ε and βˆ 1*   X1 X1  X1 y is from the
1 1
reduced model y  X1β1*  ε* . The distribution of F in (8.11) is as follows:

(i) If H 0 : β 2  0 is false, then F is distributed as F  h, n  k  1, 1  where
1  β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2 2 2 .
1

 
(ii) If H 0 : β 2  0 is true, then 1  0 and F is distributed as F  h, n  k  1

The test for H 0 : β 2  0 is carried out as follows: Reject H 0 if F  F ,h , n  k 1 , where F , h ,n  k 1 is
the upper  percentage point of the (central) F‐distribution. Alternatively, we reject H 0 if
p   , where p is the p ‐value.

We summarize the results leading to the F ‐test in the analysis of variance in Table 8.3, where
β1 is  k  h  1  1 , β 2 is h  1 , X1 is n   k  h  1 and X 2 is n  h .

The expected mean squares are given by
 SS  β 2 | β1   1
    β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2
2 1
E (8.12)
 h  h
 SSE 
 
2
E
 n  k  1 

Note that if H 0 is true, both the expected mean squares are equal to  2 and if H 0 is false,
E  SS  β 2 | β1  h   E  SSE  n  k  1  since X2 X 2  X2 X1  X1 X1  X1 X 2 is positive definite.
1

Theorem 8.2D. If the model is partitioned as (8.5), then SS  β | β   βˆ Xy  βˆ * X y can be
2 1 1 1
written as
SS  β 2 | β1   βˆ 2  X2 X 2  X2 X1  X1 X1  X1 X 2  βˆ 2
1
(8.13)
 
where β̂ 2 is from a partitioning of β in the full model:
 βˆ 1 
βˆ      XX  Xy
1
(8.14)
 βˆ 
 2

94
Table8.3. Analysis of Variance for F‐test of H 0 : β 2  0
Source of d.f Sum of Squares Mean Square F‐statistic
variation
Due to β 2 SS  β 2 | β1   βˆ Xy  βˆ 1* X1 y SS  β 2 | β1  / h SS  β 2 | β1  / h
h

adjusted for SSE  n  k  1
β1
Error n‐k‐1 SSE  y y  βˆ Xy SSE  n  k  1
Total n‐1 SST  y y  n y 2

Example 8.2(a): In separate h/out.

Example 8.2(b): The full‐and‐reduced‐model test of H 0 : β 2  0 in Table 8.3 can be used to test
for significance of a single ˆ . Suppose we wish to test H :   0 ,where β is partitioned as
j 0 k
 0 
 
 1 
β      β1 
   k 


  k 1 
 
 k 
X is partitioned as X   X1 , x k  , where x k is the last column of X and X1 contains all columns
except x k . The reduced model is y  X1β1*  ε* and β1* is estimated as βˆ 1*   X1 X1  X1 y . In this
1
case, h  1 and the F ‐statistic in (8.11) becomes

βˆ Xy  βˆ 1* X1 y
F (8.15)
 
y y  βˆ Xy  n  k  1
which is distributed as F 1, n  k  1 if H 0 :  k  0 is true.

Example 8.2(c): The test in Section 8.1 for overall regression can be obtained as a full‐and‐
reduced model test. In this case, partitioning X   j, X1  and
 0 
 
 1    0 
β 
   β 
.
   1 
 
 k 
The reduced model is y   0* j  ε* , for which we have
0  
ˆ *  y and SS  *  n y 2 .
0
Then, SS  β1 |  0   βˆ Xy  n y 2 which is the same as (8.4).

95
8.3. F ‐Test in Terms of R 2 .
The F‐statistics in Section 8.1 & 8.2 can be expressed in terms of R 2 as defined in (7.46).

Theorem 8.3A. The F ‐statistics in (8.3) & (8.11) for testing H 0 : β1  0 and H 0 : β 2  0 ,
respectively, can be written in terms of R 2 as
F

βˆ Xy  n y 2 k

 (8.16)

y y  βˆ Xy  n  k  1 
R2 k
 (8.17)
1  R  2
 n  k  1

and
F
*
1 1
(8.18)
 yy  βˆ Xy   n  k  1

R 2
 Rr2  h
(8.19)
1  R  2
 n  k  1

where R 2 for the full model is given in (7.46) and Rr2 for the reduced model y  X1β1*  ε* in
(8.6) is similarly defined as
βˆ * X y  n y 2
Rr2  1 1 (8.20)
y y  n y 2

Example 8.3. For the dependent variable y2 in the chemical reaction data in Table 8.2(a), a full
model with nine x’s and a reduced model with three x’s were considered in Example 8.2(a). The
values of R 2 for the full model and reduced model are 0.8485 and 0.3771, respectively. To test
the significance of the increase in R 2 from 0.3771 to 0.8485, we use (8.19):
F
R 2
 Rr2  h  0.8485  0.3771

6
 4.6671
1  R  2
 n  k  1 1  0.8485 9
which is the same as the value obtained for F in Example 8.2(a).

8.4. The General Linear Hypothesis Tests For H 0 : Cβ  0
Hypothesis H 0 : Cβ  0 , where C is a q   k  1 coefficient matrix of rank q   k  1 , is known
as the general linear hypothesis. The hypothesis H 0 : β1  0 in Section 8.1 can be expressed in
the form H 0 : Cβ  0 as follows:
 0 
H 0 : Cβ   0, I k     β1  0 ; 0 is a k  1 vector.
 β1 
Similarly, H 0 : β 2  0 in Section 8.2 can be expressed in the form H 0 : Cβ  0 as follows:
96
β 
H 0 : Cβ   O, I h   1   β 2  0
 β2 
where the matrix O is h   k  h  1 and the vector 0 is a h  1 .
For more general hypotheses such as
H 0 : 21   2   2  2 3  3 4  1   4  0
can be expressed as
 0 
 
 0 2 1 0 0   1   0 
   
H 0 :  0 0 1 2 3    2    0  and
    
 0 1 0 0 1   3   0 
 
 4
H 0 : 1   2  3   4 can be expressed in terms of three differences,
H 0 : 1   2   2  3  3   4  0 or equivalently, as H 0 : Cβ  0 :
 0 
 
 0 1 1 0 0   1   0 
   
H 0 :  0 0 1 1 0    2    0  .
 0 0 0 1 1     
  3   0
 
 4
The following theorem gives the sums of squares used in the test of H 0 : Cβ  0 versus
H 0 : Cβ  0 along with their properties.

 
Theorem 8.4A. If y is distributed as N n Xβ, 2 I and C is a q   k  1 of rank q   k  1 , then
Cβˆ is N q Cβ, 2C  XX  C ;

1
(i)
 
  1
SSH  2  Cβˆ C  XX  C Cβˆ  2 is  2  q,   , where
1
(ii)
 
1
   Cβ  C  XX  C Cβ 2 2 ;
1

 
SSE  2  y  I  X  XX  X y  2 is  2  n  k  1 ;
1
(iii)
 
(iv) SSH & SSE are independent.

Note: The sum of squares due to Cβ (due to the hypothesis) is denoted as SSH.
Thus, the F‐test for H 0 : Cβ  0 versus H 0 : Cβ  0 is given as the following theorem.

 
Theorem 8.4B.Let y be N n Xβ, 2 I and define the statistic :
 Cβˆ  C  XX 
 1
C Cβˆ q
1
F
SSH q
   ; (8.21)
SSE  n  k  1 SSE  n  k  1
C is a q   k  1 of rank q   k  1 and βˆ   XX  Xy . The distribution of F in (8.21) is as
1
follows:
97
(i) If H 0 : Cβ  0 is false, then F is distributed as F  q, n  k  1,   .

(ii) If H 0 : Cβ  0 is true, then F is distributed as F  q, n  k  1 .

Note:
1. The F‐test in Theorem 8.4B is called the general linear hypothesis test.
2. The degrees of freedom q is the number of linear combinations in Cβ .
3. The test for H 0 : Cβ  0 is carried out as follows: Reject H 0 if F  F ,q ,n  k 1 , where F is
given in (8.21) and F , q , n  k 1 is the upper  percentage point of the (central) F‐
distribution. Alternatively, reject H 0 if p  value   .

The expected mean squares for the F‐test are given by
 SSH  1 
1
     Cβ  C  XX  C Cβ
2 1
E (8.22)
 q  q
 SSE 
 
2
E
 n  k 1

Example 8.4(a). Sometimes, the hypothesis can be incorporated directly into the model to
obtain the reduced model. Suppose the full model is
yi   0  1 xi1   2 xi 2  3 xi 3   i
and the hypothesis is H 0 : 1  2 2 . Then the reduced model becomes
yi   0  2 2 xi1   2 xi 2  3 xi 3   i

  ci   c 2  2 xi1  xi 2    c 3 xi 3   i
where  ci indicates a parameter subject to the constraint 1  2 2 . The full model and reduced
model could be fit, and the difference SS  β 2 | β1   βˆ Xy  βˆ * X1 y would be the same as SSH in
(8.21).

Example 8.4(b). Consider the dependent variable y1 in the chemical reaction data in Example
8.2(a). For the model y1   0  1 x1   2 x2  3 x3   . Test H 0 : 2 1  2 2  3 using (8.21). To
express H 0 in the form Cβ  0 , the matrix C becomes
 0 1 1 0 
C  and we obtain
 0 0 2 1
 0.1214   0.003366 0.006943 
 , C  XX  C  
1
Cβˆ   ,
 0.6118   0.006943 0.044974 
1  0.1214   0.003366 0.006943   0.1214 
1
   
2  0.6118   0.006943 0.044974   0.6118 
F
5.3449
28.62301 2
  2.6776
5.3449
Since F  F0.05,2,9  4.256 , we cannot reject the H 0 : Cβ  0 .

98
8.5 Tests On  j and aβ

Testing One  j or One aβ .
Can be obtained using either the full‐and‐reduced model approach (Section 8.2) or the general
linear hypothesis approach (Section 8.4).
The test statistic for H 0 :  k  0 using a full and reduced model is given in (8.15) as
βˆ Xy  βˆ 1* X1 y
F (8.23)
 
y y  βˆ Xy  n  k  1
which is distributed as F 1, n  k  1 if H 0 is true. In this case,  k is the last  , so that β is
partitioned as β   β1 ,  k  and X is partitioned as X   X1 , x k  , where x k is the last column of

X . Then X1 in the reduced model y  X1β1*  ε* contains all the columns of X except the last.

To test H 0 : aβ  0 for a single linear combination, for example aβ   0, 2, 2,3,1 β , we use
a in place of the matrix C in H 0 : Cβ  0 . Then q  1 , and (8.21) becomes
 aβˆ  a  XX 


 
1
a  aβˆ q
1 2
  aβˆ
F   (8.24)
SSE  n  k  1 s 2a  XX  a
1
where s 2  SSE  n  k  1 . The F‐statistic in (8.24) is distributed as F(1,n‐k‐1) if H 0 : aβ  0 is
true.
To test H 0 :  j  0 using the general linear hypothesis test statistic in (8.24), define
a   0,,0,1,0,0  where the 1 is in the j ‐th position. This gives
ˆ 2j
F 2 (8.25)
s g jj
j ‐th diagonal element of  XX  . If H 0 :  j  0 is true, F in (8.25) is
1
where g jj is the
distributed as F(1,n‐k‐1). Reject H 0 :  j  0 if F  F ,1, n  k 1 or equivalently, if p  value   .
Since the F‐statistic in (8.25) has 1 and n‐k‐1 degrees of freedom, we can equivalently use the t ‐
statistic
ˆ j
tj  (8.26)
s g jj
to test the effect of  j above and beyond the other  ’s. We reject H 0 :  j  0 if t j  t 2, n  k 1
or, equivalently, if p  value   .

99
Example 8.5(a). Test H 01 : 1  0 and H 02 :  2  0 for the following data.

y 2 3 2 7 6 8 10 7 8 12 11 14
x1 0 2 2 2 4 4 4 6 6 6 8 8
x2 2 6 7 5 9 8 7 10 11 9 15 13

data example;
input y x1 x2;
cards;
2 0 2
3 2 6
2 2 7
7 2 5
6 4 9
8 4 8
10 4 7
7 6 10
8 6 11
12 6 9
11 8 15
14 8 13
;
proc reg;
model y=x1 x2/xpx i;

OUTPUT:
The REG Procedure
Model: MODEL1
Model Crossproducts X'X X'Y Y'Y
Variable Intercept x1 x2 y
Intercept 12 52 102 90
x1 52 296 536 482
x2 102 536 1004 872
y 90 482 872 840
100
The REG Procedure

Model: MODEL1
Dependent Variable: y
Number of Observations Read 12

Number of Observations Used 12
X'X Inverse, Parameter Estimates, and SSE
Variable Intercept x1 x2 y
Intercept 0.9747634069 0.2429022082 -0.228706625 5.3753943218

x1 0.2429022082 0.1620662461 -0.111198738 3.011829653
x2 -0.228706625 -0.111198738 0.0835962145 -1.285488959
y 5.3753943218 3.011829653 -1.285488959 25.458990536
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 139.54101 69.77050 24.66 0.0002

Error 9 25.45899 2.82878
Corrected Total 11 165.00000
Root MSE 1.68190 R-Square 0.8457

Dependent Mean 7.50000 Adj R-Sq 0.8114
Coeff Var 22.42529
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 5.37539 1.66054 3.24 0.0102

x1 1 3.01183 0.67709 4.45 0.0016
x2 1 -1.28549 0.48629 -2.64 0.0268

Using (8.26) and the results from the output, we have
ˆ1 3.0118 3.0188
t1     4.448,
s g11 2.8288 0.16207 0.67709

ˆ2 1.2855 1.2855
t2     2.643,
s g 22 2.8288 0.08360 0.48629
Using   0.05 for each test, we reject both H 01 : 1  0 and H 02 :  2  0 because
t0.025,9  2.262 .

101
8.6 Confidence Intervals & Prediction Intervals.

Assume that y be N n Xβ, 2 I 

8.6.1 Confidence Region for β .
If C is equal to I in (8.21), q become k  1 , and substract β to obtain a central F‐distribution
and make the probability statement
 


 
  
P  βˆ  β XX βˆ  β  k  1 s 2  F ,k 1, n  k 1   1   ,

where s  SSH  n  k  1 . Thus, a 100 1    % joint confidence region for  0 , 1 ,,  k in β
2
is given by all vectors β that satisfy
 βˆ  β  XX  βˆ  β    k  1 s F 2
 , k 1, n  k 1 (8.27)

8.6.2. Confidence Interval for  j .
If  j  0 , we can substract  j in (8.26) so that t j  ˆ j   j   s g jj has the central t‐
distribution, where g jj is the j ‐th diagonal element of  XX  . Then

1
 ˆ j   j 
P  t 2,n  k 1   t 2, n  k 1   1   .
 s g jj 
Solving the inequality for  j gives

P ˆ j  t 2, n  k 1 s g jj   j  ˆ j  t 2, n  k 1 s 
g jj  1   .
Before taking the sample, the probability that the random interval will contain  j is 1   . After
taking the sample, the 100 1    % confidence interval for  j ,
ˆ j  t 2,n  k 1s g jj , (8.28)
is no longer random, and we say that we are 100 1    % confident that the interval contains
 j .

Example 8.6.2. Compute a 95% CI for each  j using y2 in the chemical reaction data in Example
8.2(a). The matrix  XX 1 and the estimate β̂ have the following values
 65.37550 0.33885 0.31252 0.02041 
 
 0.33885 0.00184 0.00127 0.00043 
 XX   
1
,
0.31252 0.00127 0.00408 0.00176 
 
 0.02041 0.00043 0.00176 0.02161 
 26.0353 
 
ˆβ   0.4046  .
 0.2930 
 
 1.0338 
For 1 , we obtain, by (8.28),
102
ˆ1  t0.025,15 s g11

 0.4046   2.1314  4.0781 0.00184

 0.4046  0.3723
  0.0322,0.7769 
For the other  j ’s, we have
 0 : 26.0353  70.2812   96.3165, 44.2459 
1 : 0.2930  0.5551   0.2621,0.8481

3 :1.0338  1.2777   0.2439, 2.3115 
8.6.3 Confidence Interval for aβ .

If aβ  0 , we can substract aβ from aβˆ to obtain
 aβˆ  aβ 
2
F
s 2a  XX  a
1
which is distributed as F 1, n  k  1 . Then,

aβˆ  aβ
t (8.29)
s a  XX  a
1
is distributed as t  n  k  1 , and a 100 1    % confidence interval for a single value of aβ is

given by
a  XX  a
1
aβˆ  t 2, n  k 1 s (8.30)

8.6.4 Confidence Interval for E  y  .
Let x0  1, x01 , x02 ,, x0 k  denote a particular choice of x  1, x1 , x2 ,, xk  . Note that x0
need not be one of the x ’s in the sample; that is x0 need not be a row of x . If x0 is very far
outside the area covered by the sample, however, prediction based on x0 may be poor.

Let y0 be an observation corresponding to x0 . Then
y0  x0β   , and
E  y0   x0β . (8.31)
We wish to find a CI for the E  y0  , that is, for the mean of the distribution of y  value
corresponding to x0 . By Corollary 1 to Theorem 7.4D, the minimum variance unbiased
estimator of E  y0  is given by
Ê  y0   x0βˆ (8.32)
Since (8.31)&(8.32) are of the form aβ and aβˆ , respectively, we obtain a 100 1    %
confidence interval for E  y0   x0β from (8.30):
x0  XX  x0
1
x0βˆ  t 2, n  k 1 s (8.33)
103
Note: The confidence coefficient 1   for the interval in (8.33) holds only for a single choice of
the vector x0 .

Confidence interval (8.33) in terms of the centered model yi    β1  x01  x1    i ; where
x01   x01 , x02 ,, x0 k  and x1   x1 , x2 ,, xk  , (8.31), (8.32) and (8.33) become

E  y0     β1  x01  x1  (8.34)
Ê  y   y  βˆ   x  x 
0 1 01 1 (8.35)
  x 01  x1   Xc Xc   x 01  x1 
1
y  βˆ 1  x 01  x1   t
1
2, n  k 1 s (8.36)
n

For the case of simple linear regression, (8.31), (8.32) and (8.36) reduce to
E  y0    0  1 x0 (8.37)
Ê  y   ˆ  ˆ x
0 0 1 0 (8.38)
 x0  x 
2
1
ˆ0  ˆ1 x0  t 2, n  2 s  (8.39)
n  in1  xi  x 2
 
2
 i yi  ˆ0  ˆ1 xi SSE
where s 2   .
n2 n2

Note: The width of the interval in (8.39) depends on how far x0 is from x .

Example 8.6.4. For the grades data in Example 6.1, find a 95% CI for E  y0  , where x0 =80.
Using (8.39),
1  80  58.056 
2
ˆ0  ˆ1  80   t0.025,16 s

 ,
18 19530.944
80.5386  2.1199 13.8547  0.2832  ,
80.5386  8.3183
 72.2204,88.8569 

8.6.5 Prediction Interval for a Future Observation
A CI for a future observation y0 corresponding to x0 is called a prediction interval rather than a
confidence interval because y0 is an individual observation, thereby a random variable rather
than a parameter. To be 100 1    % confident that the interval contains y0 , the prediction
interval will clearly have to be wider than a confidence interval for the parameter E  y0  .
Since y  x β   , we predict y by ŷ  x βˆ , which is also the estimator of E  y   x β . The
0 0 0 0 0 0 0 0
r.v y0 and ŷ0 are independent because y0 is a future observation to be obtained
independently of the n observations used to compute ŷ  x βˆ . Therefore, 0 0
104

var  y0  yˆ 0   var  y0   var  yˆ 0   var  y0   var x0βˆ   .

 var x0βˆ   0  var x0βˆ   
Since x0βˆ is a constant, this becomes
 
var  y0  yˆ 0   var   0   var x0βˆ   2   2 x0  XX  x0
1
, (8.40)
  2 1  x0  XX  x0 
1
 
which is estimated by s 1  x0  XX  x0  . It can be shown that E  y0  yˆ 0   0 and that s 2 is
2 1
 
independent of both y0 and ŷ0  x0β . Therefore, the t ‐statistic
ˆ
y0  yˆ 0  0
t (8.41)
s 1  x0  XX  x 0
1
is distributed as t  n  k  1 , and
 
y0  yˆ 0
P  t 2,n  k 1   t 2,n  k 1   1   .
 
s 1  x0  XX  x0
1
 
The inequality can be solved for y0 to obtain the 100 1    % prediction interval
1  x0  XX  x0  y0  yˆ 0  t 1  x0  XX  x0
1 1
yˆ 0  t 2, n  k 1 s 2, n  k 1 s
or, using ŷ0  x0βˆ , we have

1  x0  XX  x0
1
x0βˆ  t 2, n  k 1 s (8.42)

Note: The confidence coefficient 1   for the prediction interval in (8.42) holds for only one
value of x0 .

In terms of centered model, the 100 1    % prediction interval in (8.42) becomes
  x 01  x1   Xc Xc   x 01  x1 
1
y  βˆ   x 01  x1   t
1
2, n  k 1 s 1 (8.43)
n
For the case of simple linear regression, (8.42) & (8.43) reduce to
 x0  x 
2
1
ˆ0  ˆ1 x0  t 2,n  2 s 1   (8.44)
n  in1  x0  x 2
 
2
where s 2
 .
n2 n2

Example 8.6.5. Using the grades data in Example 6.1, find a 95% prediction interval for y0 when
x0  80 . Using (8.44),
105
1  80  58.056 
2
ˆ0  ˆ1  80   t0.025,16 s 1   ,

18 19530.944
80.5386  2.1199 13.8547 1.0393 ,
80.5386  30.5258
 50.0128,111.0644 

8.6.6 Confidence Interval for  2 .
By theorem,  n  k  1 s 2  2 is  2  n  k  1 . Therefore,

P  12 
 n  k  1   2
 1   ,

2, n  k 1  2, n  k 1 
(8.45)
  2

where 2 2,n  k 1 is the upper  2 percentage point of the chi‐square distribution and
12 2,n  k 1 is the lower  2 percentage point. Solving the inequality for  2 yields the
100 1    % confidence interval

 n  k  1 s 2   2   n  k  1 s 2 (8.46)
2 2, n  k 1 12 2, n  k 1
A 100 1    % confidence interval for  is

 n  k  1 s 2     n  k  1 s 2 (8.47)
 2
2, n  k 1
2
1 2,n  k 1

Exercises: Using y2 in the chemical reaction data in Example 8.2(a)
(i) Find a 95% confidence interval for E  y0   x0β , where x0  1,165,32,5  .
(ii) Find a 95% prediction interval for y0  x0β   , where x0  1,165,32,5  .
(iii) Test H 0 : 2 1  2 2  3 .
1
Example 8.2(a)
In experiment to obtain maximum yield in chemical reaction, the following variable are choosen:
x1= temperature (Celsius degree) , x2 = concentration of a reagent (%), x3=time of reaction
(hours). Two different response variables were observed : y1=percent of unchanged starting
material, y2=percent converted to the desired product.

Table 8.2(a): Chemical Reaction Data
Y1 Y2 X1 X2 X3
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
18 57.3 167 32.5 6.5
26.3 55 167 22.5 6.5
9.9 58.9 167 27.5 9.5
25 50.3 167 27.5 3.5
14.1 61.1 177 20 6.5
15.2 62.9 177 20 6.5
15.9 60 160 34 7.5
19.6 60.6 160 34 7.5

Consider the dependent variable y2 in the chemical reaction data. Now, check the usefulness of
second‐order terms in predicting y2, use as a full model,
y2   0  1 x1   2 x2  3 x3   4 x12  5 x22   6 x32

  7 x1 x2  8 x1 x3  9 x2 x3   ,
and test H 0 :  4  5    9  0

data chemical;
input y1 y2 x1 x2 x3;
cards;
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
2
18 57.3 167 32.5 6.5

26.3 55 167 22.5 6.5
9.9 58.9 167 27.5 9.5
25 50.3 167 27.5 3.5
14.1 61.1 177 20 6.5
15.2 62.9 177 20 6.5
15.9 60 160 34 7.5
19.6 60.6 160 34 7.5
;
proc reg;
model y2=x1 x2 x3;

output:
The REG Procedure
Model: MODEL1
Dependent Variable: y2

Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 151.00219 50.33406 3.03 0.0623

Error 15 249.46202 16.63080
Root MSE 4.07809 R-Square 0.3771

Dependent Mean 56.33684 Adj R-Sq 0.2525
Coeff Var 7.23876
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -26.03526 32.97343 -0.79 0.4421

x1 1 0.40455 0.17469 2.32 0.0351
x2 1 0.29299 0.26045 1.12 0.2783
x3 1 1.03380 0.59943 1.72 0.1051
3
data chemical;
input y1 y2 x1 x2 x3;
cards;
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
18 57.3 167 32.5 6.5
26.3 55 167 22.5 6.5
9.9 58.9 167 27.5 9.5
25 50.3 167 27.5 3.5
14.1 61.1 177 20 6.5
15.2 62.9 177 20 6.5
15.9 60 160 34 7.5
19.6 60.6 160 34 7.5
;
proc glm; /*unlike proc reg, proc glm allows polynomial in the model*/
/*statements */
model y2=x1 x2 x3 x1*x1 x2*x2 x3*x3 x1*x2 x1*x3 x2*x3;
output:

The GLM Procedure

4

Dependent Variable: y2

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 9 339.7887457 37.7543051 5.60 0.0086

Error 9 60.6754648 6.7417183


R‐Square Coeff Var Root MSE y2 Mean

0.848487 4.608852 2.596482 56.33684

Source DF Type I SS Mean Square F Value Pr > F

x1 1 65.34630867 65.34630867 9.69 0.0125
x2 1 36.18984581 36.18984581 5.37 0.0457
x3 1 49.46603624 49.46603624 7.34 0.0240
x1*x1 1 0.22191886 0.22191886 0.03 0.8600
x2*x2 1 83.13759745 83.13759745 12.33 0.0066
x3*x3 1 11.00115058 11.00115058 1.63 0.2334
x1*x2 1 73.87317137 73.87317137 10.96 0.0091
x1*x3 1 20.29147665 20.29147665 3.01 0.1168
x2*x3 1 0.26124008 0.26124008 0.04 0.8483

Source DF Type III SS Mean Square F Value Pr > F

x1 1 53.47793092 53.47793092 7.93 0.0202
x2 1 36.03765833 36.03765833 5.35 0.0461
x3 1 24.62826755 24.62826755 3.65 0.0883
x1*x1 1 37.33001918 37.33001918 5.54 0.0431
x2*x2 1 0.12012988 0.12012988 0.02 0.8967
x3*x3 1 0.74964520 0.74964520 0.11 0.7464
x1*x2 1 60.52113170 60.52113170 8.98 0.0150
x1*x3 1 18.09124540 18.09124540 2.68 0.1358
x2*x3 1 0.26124008 0.26124008 0.04 0.8483

Standard
Parameter Estimate Error t Value Pr > |t|

Intercept ‐2282.928273 739.5889589 ‐3.09 0.0130
x1 22.859973 8.1165920 2.82 0.0202
x2 21.414751 9.2623257 2.31 0.0461
x3 33.609745 17.5846445 1.91 0.0883
5

x1*x1 ‐0.053803 0.0228645 ‐2.35 0.0431
x2*x2 ‐0.007749 0.0580525 ‐0.13 0.8967
x3*x3 ‐0.085891 0.2575754 ‐0.33 0.7464
x1*x2 ‐0.123372 0.0411765 ‐3.00 0.0150
x1*x3 ‐0.186260 0.1137027 ‐1.64 0.1358
x2*x3 -0.034100 0.1732286 -0.20 0.8483

Test hypothesis H 0 : β 2  0 versus H 0 : β 2  0
For the full model,
y2   0  1 x1   2 x2  3 x3   4 x12  5 x22   6 x32

  7 x1 x2  8 x1 x3  9 x2 x3   ,
we obtain βˆ Xy  n y 2 =339.7888 and for the reduced model,
y2   0  1 x1   2 x2  3 x3   ,
we have βˆ 1* X1 y  n y 2 =151.0022. The difference is βˆ Xy  βˆ 1* X1 y =188.7866.
The error sum of squares is SSE=60.6755 (from the full model output), and the F‐statistic is given
by (8.11) as
F
 
βˆ Xy  βˆ 1* X1 y h

188.7866 6 31.4644
  4.6671
 
y y  βˆ Xy  n  k  1 60.6755 9 6.7417

From the statictical table, F0.05,6,9  3.374 . Since F  F ,6,9 , reject H 0 : β 2  0 .
Conclusion: Conclude that the second‐order terms are usefull in prediction of y2 .

Note: The overall F for the reduced model is 3.03 with p‐value 0.0623, so that x1 , x2 and x3 are
inadequate for predicting y2 . The overall F for the full model is 5.60 with p‐value=0.086.
Chapter 8. Consider the model with normal assumption.
Model y  Xβ  ε with y is N n  Xβ, 2I  , where X is n   k  1 of

rank k  1  n (Full rank model). The x ’s are fixed constants.
8.1 Test of Overall Regression
Expressed as H 0 : β1  0; β1   1 ,  2 ,,  k  . Want to test H 0 : β1  0 , not

H 0 : β  0 where
 0 
β    .
 β1 

To construct an F‐test involving SSR and SSE, need to express the sums of
squares in (SST = SSR + SSE) as quadratic forms in y so that we can use
theorems to show that these sums of squares have chi‐square distributions and
are independent.
 1 
y   I  J  y  SSR  SSE
 n 
 1 
 y Xc  Xc Xc  Xc y  y   I  J  y  y Xc  Xc Xc  Xc y
1 1
 n 
(8.2)
 1 
 y Ay  y   I  J  A  y
 n 
where A  Xc  Xc Xc  Xc .

1

These quadratic form have properties as stated in the following Theorem 8.1A
1 1
Theorem 8.1A. The matrices I  J , A  Xc  Xc Xc  Xc , and I  J  A have
1
n n
the following properties:

 1 
(i) A  I  J   A ,
 n 
(ii) A is idempotent of rank k ,
(iii)
1
I  J  A is idempotent of rank n  k  1 ,
n
 1 
(iv) A  I  J  A   O .
 n 
y is N n  Xβ, 2 I  , then SSR   βˆ 1 Xc Xc βˆ 1  and

2 2
Theorem 8.1B. If
SSE  2    in1  yi  y   βˆ 1 Xc X c βˆ 1   2 have the following distributions:
2
 
(i) SSR  2 is  2  k , 1  where 1  μAμ 2 2  β1 Xc Xc β1 2 2 .
(ii) SSE  2 is  2  n  k  1
Theorem 8.1C. If y is N n  Xβ, 2I  , then SSR & SSE are independent, where

SSR & SSE are defined in (8.2) & Theorem 8.1B.
Theorem 8.1D. If y is N n  Xβ, 2 I  , the distribution of
F

SSR k 2  
SSR k

SSE  n  k  1  
2
SSE  n  k  1
(8.3)
is as follows:
(i) If H 0 : β1  0 is false, then F is distributed as F  k , n  k  1, 1  ,

where
1  μAμ 2 2  β1 Xc Xc β1 2 2 .
(ii) If H 0 : β1  0 is true, then 1 =0 and F is distributed as
F  k , n  k  1
Reject H 0 if F  F ,k ,n  k 1 where F ,k ,n  k 1 is the upper  percentage point

of the (central) F distribution.
Remember that SSR & SSE in terms of the noncentered model y  Xβ  ε is
SSR  βˆ Xy  n y 2 & SSE  y y  βˆ Xy
Try the Example 8.1.
8.2. Test On a Subset of the  ’s
To test the hypothesis that a subset of the x ’s is not useful in predicting y .
 β1 
y  Xβ  ε   X1 , X 2     ε
The model:  β2 
 X1β1  X 2β 2  ε
where β 2 contains the  ’s to be tested, the intercept  0 included in β1 .
The hypothesis of interest is H 0 : β 2  0 . If the parameters in β 2 is h , then

X2 is n  h , β1 is  k  h  1 1 , and X1 is n   k  h  1 . Thus,
β1    0 , 1 ,,  k  h  and β 2    k  h 1 ,,  k  .
Now we have the model as written in full and reduced model which is
to test H 0 : β 2  0 versus H1 : β 2  0 , by using the full‐and‐reduced‐

model approach where under H 0 : β 2  0 , the reduced model becomes
y  X1β1*  ε* .
Partitioning the sum of squares as the following

SS  β 2 | β1   βˆ Xy  n y 2  βˆ 1* X1 y  n y 2 
 SSR  full   SSR  reduced 

where SS  β 2 | β1   βˆ Xy  βˆ 1 X1 y , which is the difference between the

*
overall regression sum of squares for the full model and the overall regression
sum of squares for the reduced model.
To develop a test statistic based on SS  β 2 | β1  , need to develop the quadratic
form which is
y y  y   I  A1  y  y   A1  A 2  y  y A 2 y
where A1  X  XX  X and A 2  X1  X1 X1  X1 . The matrix I  A1 is

1 1
idempotent with rank n  k  1 , where k  1 is the rank of X ( k  1 is also the
number of elements in β ). The matrix A1  A 2 is also idempotent as shown

in the following theorem.
Theorem 8.2A. The matrix A1  A 2  X  XX  X  X1  X1 X1  X1 is idempotent

1 1
with rank h , where h is the number of elements in β 2 .
Theorem 8.2B. If y is N n  Xβ, 2 I  , and A1 and A 2 as defined before, then
(i) y   I  A1  y  2 is 
2
 n  k  1 ;
(ii) y   A1  A 2  y  2 is  2  h, 1  where
1  β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2 2 2 ;

1
 
(iii) y   I  A1  y and y   A1  A 2  y are independent.

 
Theorem 8.2C. Let y is N n Xβ,  I and define an F ‐statistic as follows:
2
y   A1  A 2  y h SS  β 2 | β1  h
F 
y   I  A1  y  n  k  1 SSE  n  k  1

*
1 1
 yy  βˆ Xy 

 n  k  1
where βˆ   XX  Xy is from the full model y  Xβ  ε and βˆ 1   X1 X1  X1 y

1 * 1
is from the reduced model y  X1β1  ε . The distribution of F is as follows:

* *
(i) If H 0 : β 2  0 is false, then F is distributed as F  h, n  k  1, 1  where
1  β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2 2 2 .

1
 
(ii) If H 0 : β 2  0 is true, then 1  0 and F is distributed as F  h, n  k  1
Reject H 0 if F  F ,h ,n  k 1 , where F ,h ,n  k 1 is the upper  percentage

point of the (central) F‐distribution. Alternatively, we reject H 0 if p   ,
where p is the p ‐value.
Example 8.2(a)

8.4 F test for the general linear hypothesis
Consider the regression model
yi   0  1 x1i   2 x2i  3 x3i   4 x4i  5 x5i   i ; i  1, , n
 
With E   i   0, E  i j  0 for i  j , and Var   i    .
2
Suppose we want to test the following linear hypotheses:
(a)
H 0 : 2  0
H1 :  2  0
(b)
H 0 : 2  3
H : 3
1 2
(c)
H 0 : 1  5 or 1  5  0
H1 : 1  5 or 1  5  0

(d)
H 0 : 1   2  3   4  5  0
H1 : At least one i  0
This hypothesis can be expressed as
H 0 : β1  0

H1 : β1  0 where, β1   1 ,  2 ,  3 ,  4 ,  5  .
(e)
H 0 :   2 , 5   0
H1 :   2 , 5   0

All these hypotheses above can be expressed through the general
linear hypothesis:
H 0 : Cβ  γ  0
H1 : Cβ  γ  0
Now, let’s find the matrix C and the vector γ for each one of the
hypotheses (a) – (e):
(a) C   0,0,1,0,0,0  and γ  0
(b) C   0,0,1,0,0,0  and γ  3
(c) C   0,1,0,0,0, 1 and γ  0
(d) This is also called the overall significance of the model. The
matrix C and the vector γ will be defined as follows:

Cβ  γ  0
 0 
0 1 0 0 0 0  0
   1   
0 0 1 0 0 0 0
 2   
0 0 0 1 0 0   0
   3   
0 0 0 0 1 0  0
4  
0 
1     0 
 0 0 0 0
 5 
This will give us the hypothesis 1   2  3   4  5  0 .

(e) We are testing here whether the two parameters   2 , 5  are
significantly simultaneously. The matrix C and the vector γ will be
defined as follows:
Cβ  γ  0
 0 
 
 1 
 0 0 1 0 0 0   2   0 
     
 0 0 0 0 0 1   3   0 

 
 4 
 5 
This will give us the hypothesis  2  5  0 .
So, to test the hypothesis of H 0 : Cβ  0 vs H 0 : Cβ  0 , we have
this theorem regarding sums of squares that will be used to test the
hypothesis with their properties:
Theorem 8.4A. If y is distributed as N n  Xβ, 2 I  and C is a q   k  1

of rank q   k  1 , then
Cβˆ is N q Cβ, 2C  XX  C ;

1
(i)
 
ˆ  C  XX 1 C Cβˆ  2  2  q,   ,
1
(ii) SSH  2
 Cβ is
 
where
1
   Cβ  C  XX  C Cβ 2 2 ;
1
 
(iii) SSE   y   I  X  XX  X  y  is 

2  1
 2 2
 n  k  1 ;
(iv) SSH & SSE are independent.

Note: The sum of squares due to Cβ (due to the hypothesis) is
denoted as SSH.
And the F‐test is given as follows:
 
Theorem 8.4B.Let y be N n Xβ,  I and define the statistic :
2
 
ˆ  C  XX 1 C Cβˆ q
1
Cβ
F
SSH q
  

SSE  n  k  1 SSE  n  k  1 ;
C is a q   k  1 of rank q   k  1 and βˆ   XX  Xy . The
1
distribution of this F is as follows:
(i) If H 0 : Cβ  0 is false, then F is distributed as F  q, n  k  1,  

.
(ii) If H 0 : Cβ  0 is true, then F is distributed as F  q, n  k  1 .
This F‐test for H 0 : Cβ  0 is called the General Linear hypothesis

Test with the degrees of freedom q is the number of linear
combinations in Cβ .
We reject H 0 if F  F ,q,nk 1 , where F ,q,nk 1 is the upper 

percentage point of the (central) F‐distribution. Alternatively, reject
H 0 if p  value   .
The expected mean squares for this F‐test are given by
 SSH  1 
1
E    2
       Cβ
Cβ  C X X
1
C 

 q  q

 SSE 
 
2
E
 n  k 1 
Now try Example 8.4(b). Use SAS to find all the relevant values for
the F‐test.

8.5 Testing One  k or One aβ .
The F‐test statistic for H 0 :  k  0
(i) by using a full and reduced model is
βˆ Xy  βˆ 1* X1 y

F

 
y y  βˆ Xy  n  k  1

which is distributed as F 1, n  k  1 if H 0 is true.
Note that, in this case,  k is the last  , so that β is partitioned as

β   β1 ,  k  and X is partitioned as X   X1 , x k  , where x k is the last
column of X . Then X1 in the reduced model y  X1β1  ε contains

* *
all the columns of X except the last.
(ii) by using general linear hypothesis approach
Let say we want to test H 0 :  j  0
We define a   0,,0,1,0,0  where the 1 is in the j ‐th position.
This gives
ˆ 2j
F
s 2 g jj
where g jj is the j ‐th diagonal element of  XX 1 . If H 0 :  j  0 is

true, F‐test is distributed as F(1,n‐k‐1). We reject H 0 :  j  0 if
F  F ,1,n  k 1 or equivalently, if p  value   .
Note that since the F‐statistic has 1 and n‐k‐1 degrees of freedom, we
can equivalently use the t ‐statistic
ˆ j
tj 
s g jj
to test the effect of  j above and beyond the other  ’s. We reject

H 0 :  j  0 if t j  t 2, n  k 1 or, equivalently, if p  value   .
F‐test for H 0 : aβ  0
To test H 0 : aβ  0 for a single linear combination, for example

aβ   0, 2, 2,3,1 β , we use a in place of the matrix C in H 0 : Cβ  0
. Then q  1 , and the F‐test becomes
 

 
1
aβˆ a  XX  a  aβˆ q
1 2
a βˆ
 
F  SSE  n  k  1
 1 ;
s 2a  XX  a
where s  SSE  n  k  1 . This F‐statistic is distributed as

2
F(1,n‐k‐1) if H 0 : aβ  0 is true.
Try Example 8.5(a).

8.6 Confidence & Prediction Intervals
Assume that y be N n Xβ, I

2
 
8.6.1 Confidence Region of β
To test the hypothesis of H 0 : Cβ  0 vs H 0 : Cβ  0 , we have
 Cβˆ  C  XX 
 1
C Cβˆ q
1
F
SSH q
  
.
SSE  n  k  1 SSE  n  k  1
If C is equal to I , q becomes k  1 , we can substract β from the

β̂ to obtain a central F‐distribution, and we can make the probability
statement such that
 
 

P  βˆ  β XX βˆ  β

   k  1 s 2  F ,k 1,n  k 1   1   ,

where s  SSH  n  k  1 . Thus, a 100 1    % joint confidence

2
region for  0 , 1 ,,  k in β is given by all vectors β that satisfy
 βˆ  β  XX  βˆ  β    k  1 s F
2
 , k 1, n  k 1
But it is not easy to interpret this region therefore we are more
interested in the confidence intervals for the indidvidual  j !!
8.6.2 Confidence Interval for  j .
To test the hypothesis of H 0 :  j  0 we can use t‐statistics
ˆ j
tj  s g
jj
If  j  0 , we can substract  j in the equation above so that

t j  ˆ j   j  s g jj has the central t‐distribution, where g jj is the j
‐th diagonal element of  XX  . Then
1
 ˆ j   j 
P  t 2, n  k 1   t 2, n  k 1   1   .
 s g jj 
Solving the inequality for  j gives

P ˆ j  t 2, n  k 1s g jj   j  ˆ j  t 2, n  k 1s 
g jj  1   .
So that the 100 1    % confidence interval for  j , is
ˆ j  t 2, n  k 1 s g jj ,
and we say that we are 100 1    % confident that the interval
contains  j .
Now try Example 8.6.2!
8.6.3 Confidence Interval for aβ .
To test the hypothesis of H 0 : aβ  0 we use the following F‐test
 aβˆ  a  XX 


 
1
a  aβˆ q
1 2
a βˆ
F    1
SSE  n  k  1 s 2a  XX  a
If aβ  0 , we can substract aβ from aβˆ to obtain
 
2
aβˆ  aβ
F
s 2a  XX  a
1
which is distributed as F 1, n  k  1 . Then,
aβˆ  aβ
t
s a  XX  a
1
is distributed as t  n  k  1 .
Therefore, a 100 1    % confidence interval for a single value of

aβ is given by
2, n  k 1s a  XX  a
1
aβˆ  t

8.6.4 Confidence Interval for E  y  the predicted values
Let x 0  1, x01 , x02 , , x0 k  denote a particular choice of

x  1, x1 , x2 ,, xk  and y0 be an observation corresponding to x0 .
Then
y0  x0 β   , and
E  y0   x0β .
To find a CI for the E  y0  , that is, for the mean of the distribution of
y  value corresponding to x0 . The minimum variance unbiased
estimator of E  y0  is given by Ê  y0   x0 βˆ .

These E  y0  and Ê  y0  are of the form aβ and aβˆ , respectively.
Therefore, a 100 1    % confidence interval for E  y0   x0β is
x0  XX  x 0
1
x0βˆ  t 2, n  k 1 s which is hold only for a
single choice of the vector x0 .
In term of centered form, yi    β1  x01  x1    i ; where
x 01   x01 , x02 ,, x0 k  and x1   x1 , x2 ,, xk  , the E  y0  and Ê  y0 
each can be written as E  y0     β1  x01  x1  and
Ê  y0   y  βˆ 1  x 01  x1  respectively with their a 100 1    %
confidence interval as
  x01  x1   Xc Xc   x01  x1 
1
y  βˆ 1  x01  x1   t
1
2, n  k 1 s
n

If we consider simple linear regression, E  y0    0  1 x0 and
Ê  y0   ˆ0  ˆ1 x0 , therefore the 100 1    % confidence interval

will be
 x0  x 
2
1
ˆ0  ˆ1 x0  t 2,n  2 s 
n  in1  xi  x 2

 
2
where s  
2
n2 n2
.
Note: The width of the interval will depends on how far x0 is from
x .

Try Example 8.6.4!
8.6.5 Prediction Interval for a Future Observation
Now we want to find a 100 1    % prediction interval that contains a
future observation y0 .
The t ‐statistic for this future observation can be written as
y0  yˆ0  0

t
s 1  x0  XX  x0
1
is distributed as t  n  k  1 , and
 
y0  yˆ 0
P  t 2, n  k 1   t 2,n  k 1   1   .
 
s 1  x0  XX  x0
1
 
This inequality can be solved for y0 to obtain the 100 1    %
prediction interval as the following:
1  x0  XX  x0  y0  yˆ 0  t 1  x0  XX  x0
1 1
yˆ 0  t 2, n  k 1 s 2, n  k 1 s
or, using ŷ0  x0β , we have

ˆ
2, n  k 1 s 1  x 0  X X  x 0 and this interval holds

1
x0βˆ  t 
for only one value of x0 .
How to write this in term of centered form?

In terms of centered model, the 100 1    % prediction interval can
be written as
  x01  x1   Xc Xc   x01  x1 

1
y  βˆ   x01  x1   t
1
2, n  k 1 s 1 
n
Taking example for the case of simple linear regression, these
equations will reduce to
x  x
2
1
ˆ0  ˆ1 x0  t 2, n  2 s 1   n 0
n  i 1  x0  x 2 where
 
2
s2   .
n2 n2
Try Example 8.6.5!
Confidence Interval for  .
2
8.6.6
By theorem,  n  k  1 s  is 
2 2 2
 n  k  1 . Therefore,

P  12
 n  k  1 s 2
 2

2, n  k 1   2, n  k 1   1   ,
 2 

where   2 percentage point of the chi‐

2
2, n  k 1 is the upper
square distribution and 1 is the lower  2 percentage

2
2, n  k 1
point. Solving the inequality for 2 yields the 100 1    %

confidence interval
 n  k  1 s 2   2   n  k  1 s 2
2 2,n  k 1 12 2,n  k 1
A 100 1    % confidence interval for  is

 n  k  1 s 2     n  k  1 s 2
2 2
 2, n  k 1 1 2, n  k 1

Chapter 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8

Uploaded by

Copyright:

Available Formats

88

 in1  yi  y   βˆ 1 Xc y    in1  yi  y   βˆ 1 Xc y 

 βˆ 1 Xc Xc βˆ 1  SSE  SSR  SSE (8.1)

SSE is given in (7.33). The regression sum of squares SSR  βˆ 1 Xc Xc βˆ 1 is clearly due to β1 .

SSE  2    in1  yi  y   βˆ 1 Xc Xc βˆ 1   2 have the following distributions:

(ii) SSE  2 is  2  n  k  1

Total n‐1 SST   i  yi  y 

SS  β 2 | β1   βˆ Xy  βˆ 1* X1 y is the “extra” regression sum of squares due to β 2 after adjusting

y y  y y  y X  XX  Xy  y X  XX  Xy  y X1  X1 X1  X1 y

 y  I  X  XX  X y  y   X  XX  X  X1  X1 X1  X1  y

n  k  1 , where k  1 is the rank of X ( k  1 is also the number of elements in β ). The matrix

(ii) y   A1  A 2  y  is  2  h, 1  where

1  β2  X2 X 2  X2 X1  X1 X1  X1 X 2  β 2 2 2 ;

reduced model y  X1β1*  ε* . The distribution of F in (8.11) is as follows:

case, h  1 and the F ‐statistic in (8.11) becomes

Then, SS  β1 |  0   βˆ Xy  n y 2 which is the same as (8.4).

Cβˆ is N q Cβ, 2C  XX  C ;

(i) If H 0 : Cβ  0 is false, then F is distributed as F  q, n  k  1,   .

8.5 Tests On  j and aβ

partitioned as β   β1 ,  k  and X is partitioned as X   X1 , x k  , where x k is the last column of

 aβˆ  a  XX 

Example 8.5(a). Test H 01 : 1  0 and H 02 :  2  0 for the following data.

Model Crossproducts X'X X'Y Y'Y

The REG Procedure

Number of Observations Read 12

X'X Inverse, Parameter Estimates, and SSE

Intercept 0.9747634069 0.2429022082 -0.228706625 5.3753943218

Model 2 139.54101 69.77050 24.66 0.0002

Root MSE 1.68190 R-Square 0.8457

Intercept 1 5.37539 1.66054 3.24 0.0102

distribution, where g jj is the j ‐th diagonal element of  XX  . Then

ˆ1  t0.025,15 s g11

8.6.3 Confidence Interval for aβ .

which is distributed as F 1, n  k  1 . Then,

is distributed as t  n  k  1 , and a 100 1    % confidence interval for a single value of aβ is

x01   x01 , x02 ,, x0 k  and x1   x1 , x2 ,, xk  , (8.31), (8.32) and (8.33) become

ˆ0  ˆ1  80   t0.025,16 s

or, using ŷ0  x0βˆ , we have

ˆ0  ˆ1  80   t0.025,16 s 1   ,

A 100 1    % confidence interval for  is

18 57.3 167 32.5 6.5

Number of Observations Read 19

Model 3 151.00219 50.33406 3.03 0.0623

Root MSE 4.07809 R-Square 0.3771

Intercept 1 -26.03526 32.97343 -0.79 0.4421

Model y  Xβ  ε with y is N n  Xβ, 2I  , where X is n   k  1 of

Expressed as H 0 : β1  0; β1   1 ,  2 ,,  k  . Want to test H 0 : β1  0 , not

where A  Xc  Xc Xc  Xc .

y is N n  Xβ, 2 I  , then SSR   βˆ 1 Xc Xc βˆ 1  and

(i) SSR  2 is  2  k , 1  where 1  μAμ 2 2  β1 Xc Xc β1 2 2 .

(ii) SSE  2 is  2  n  k  1

Theorem 8.1C. If y is N n  Xβ, 2I  , then SSR & SSE are independent, where

Theorem 8.1D. If y is N n  Xβ, 2 I  , the distribution of

(i) If H 0 : β1  0 is false, then F is distributed as F  k , n  k  1, 1  ,

Reject H 0 if F  F ,k ,n  k 1 where F ,k ,n  k 1 is the upper  percentage point

SSR  βˆ Xy  n y 2 & SSE  y y  βˆ Xy

8.2. Test On a Subset of the  ’s

where β 2 contains the  ’s to be tested, the intercept  0 included in β1 .

The hypothesis of interest is H 0 : β 2  0 . If the parameters in β 2 is h , then

β1    0 , 1 ,,  k  h  and β 2    k  h 1 ,,  k  .

to test H 0 : β 2  0 versus H1 : β 2  0 , by using the full‐and‐reduced‐

where SS  β 2 | β1   βˆ Xy  βˆ 1 X1 y , which is the difference between the

where A1  X  XX  X and A 2  X1  X1 X1  X1 . The matrix I  A1 is

idempotent with rank n  k  1 , where k  1 is the rank of X ( k  1 is also the