Professional Documents
Culture Documents
8. Multiple Regression: Tests of Hypothesis & Confidence Intervals
Consider hypothesis tests & CI for the parameters 0 , 1 ,, k in β in the model y Xβ ε .
Assume throughout the chapter that y is N n Xβ, 2 I , where X is n k 1 of rank
k 1 n . The x ’s are fixed constants.
8.1 Test of Overall Regression
Expressed as H 0 : β1 0; β1 1 , 2 ,, k . Want to test H 0 : β1 0 , not H 0 : β 0 where
0
β .
β1
Use the centered model (7.28),
1
y j, Xc ε where Xc I J X1 is the centered matrix
β1 n
x11 x1 x12 x2 x1k xk
1 x21 x1 x22 x2 x2 k xk
Xc I J X1 and X1 contains all the column
n
xn1 x1 xn 2 x2 xnk xk
of X except the first. The total sum of squares in1 yi y can be partitioned as
2
n i 1
(8.1) as
1
y I J y SSR SSE
n
1
y Xc Xc Xc Xc y y I J y y Xc Xc Xc Xc y
1 1
n (8.2)
1
y Ay y I J A y
n
where A Xc Xc Xc Xc .
1
89
Properties of the matrices of the quadratic froms in (8.2):
1 1
Theorem 8.1A. The matrices I J , A Xc Xc Xc Xc , and I J A have the following
1
n n
properties:
1
(i) A I J A ,
n
(ii) A is idempotent of rank k ,
1
(iii) I J A is idempotent of rank n k 1 ,
n
1
(iv) A I J A O .
n
Theorem 8.1B. If y is N n Xβ, 2 I , then SSR 2 βˆ 1 Xc Xc βˆ 1 2 and
F
SSR k 2
SSR k
(8.3)
SSE n k 1 2
SSE n k 1
is as follows:
(i) If H 0 : β1 0 is false, then F is distributed as F k , n k 1, 1 , where
1 μAμ 2 2 β1 Xc Xc β1 2 2 .
(ii) If H 0 : β1 0 is true, then 1 =0 and F is distributed as F k , n k 1
The test for H 0 : β1 0 is carried out as follows:
Reject H 0 if F F ,k ,n k 1 where F , k , n k 1 is the upper percentage point of the (central) F
distribution. Alternatively, a p ‐value can be used to carry out the test. A p ‐value is the tail area
of the central F distribution beyond the calculated F ‐value, that is, the probability of
exceeding the calculated F ‐value, assuming H 0 : β1 0 is true. A p ‐value less than is
equivalent to F F ,k ,n k 1 .
90
We summarize the results leading to the F ‐test in the analysis of variance in Table 8.1
Table 8.1. Analysis of Variance for F ‐test of H 0 : β1 0
Source of d.f Sum of Squares Mean Squares Expected Mean
variation Square
Due to β1 k SSR βˆ 1 Xc y SSR/k 1
2 β1 Xc Xc β1
k
Error n‐k‐1 SSE i yi y βˆ 1 Xc y SSE/(n‐k‐1) 2
2
Note:
1. The entry in the column for expected mean squares in Table 8.1 are simply
E SSR k & E SSE n k 1 and if v is distributed as 2 n, , then E v n 2 ; and
we have shown that E s 2 2 .
2. If H 0 : β1 0 is true, both the expected mean squares in Table 8.1 are equal to 2 , and
we expect F to be near 1. If β1 0 , then E SSR k 2 since Xc Xc is positive definite, and
we expect F to exceed 1. Therefore, H 0 is rejected for large values of F .
3. The test of H 0 : β1 0 in Table 8.1 has been developed using the centered model (7.26).
4. SSR & SSE can also be expressed in terms of the noncentered model y Xβ ε in (7.3):
SSR βˆ Xy n y 2 & SSE y y βˆ Xy (8.4)
Example 8.1: Using the data in Example 7.2, test for H 0 : β1 0 where, β1 1 , 2 . From
Example 7.2, Xy 90, 482,872 and βˆ 5.3754,3.0118, 1.2855 . Now, to find y y , βˆ Xy
and n y 2 :
12
y y yi2 22 32 142 840,
i 1
90
ˆβXy 5.3754,3.0118, 1.2855 482 814.5410,
872
2
y
2
19
n y 2 n i i 12 675
n 12
Thus, by (8.4), SSR=139.5410 & SSE= 25.4590 and
n
yi y y y n y 165 .
2 2
i 1
91
Table 8.2. Analysis of Variance for Overall Regression Test for the Data in Example 7.2
Source of d.f Sum of Squares Mean Squares F
variation
Due to β1 2 139.5410 69.7705 24.665
Error 9 25.4590 2.8288
Total 11 165.0000
The F ‐test is given in Table 8.2. Since 24.665 F0.05,2,9 4.26 , we reject H 0 : β1 0 and
conclude that at least one of 1 or 2 is not zero. The p ‐value is 0.00023.
8.2. Test On a Subset of the ’s
To test the hypothesis that a subset of the x ’s is not useful in predicting y . Without loss of
generality, assume that the ’s to be tested have been arranged last in , with a
corresponding arrangement of the columns of X . Then & X can be partitioned accordingly,
and by (7.50), the model for all n observations becomes
β
y Xβ ε X1 , X 2 1 ε
β2 (8.5)
X1β1 X 2β 2 ε
where β 2 contains the ’s to be tested, the intercept 0 included in β1 .
The hypothesis of interest is H 0 : β 2 0 . If the parameters in β 2 is h , then X 2 is n h , β1 is
k h 1 1 , and X1 is n k h 1 . Thus, β1 0 , 1 ,, k h and β 2 k h 1 ,, k .
Consider model y 0 1 x1 2 x2 3 x12 4 x22 5 x1 x2 and we wish to test the
hypothesis H 0 : 3 4 5 0 . For this, we have β1 0 , 1 , 2 and β 2 3 , 4 , 5 .
Note that β1 in (8.5) is different from β1 in Section 8.1, in which was partitioned as
0
β
β1
so that β1 constituted all of except 0 .
To test H 0 : β 2 0 versus H1 : β 2 0 , use a full‐and‐reduced‐model approach:
The full model is given by (8.5) and under H 0 : β 2 0 , the reduced model becomes
y X1β1* ε* . (8.6)
To compare the fit of the full model (8.5) to the fit of the reduced model (8.6), add and subtract
βˆ Xy and βˆ 1* X1 y to the total sum of squares y y so as to obtain the partitioning
y y y y βˆ Xy βˆ Xy βˆ 1* X1 y βˆ 1* X1 y (8.7)
SSE SS β 2 | β1 SS β1* (8.8)
92
where the sum of squares SS β1* βˆ 1* X1 y is from the reduced model (8.6) and
SS β 2 | β1 βˆ Xy n y 2 βˆ 1* X1 y n y 2
SSR full SSR reduced
which is the difference between the overall regression sum of squares for the full model and the
overall regression sum of squares for the reduced model.
If H 0 : β 2 0 is true, would expect that SS β 2 | β1 to be small so that y y in (8.8) is mostly
composed of SS β1* and SSE. If β 2 0 , expect that SS β 2 | β1 to be larger and account for
more of y y . Thus, we are testing H 0 : β 2 0 in the full model, in which there are no
restrictions on β1 . We are not ignoring β1 (assuming β1 0 ) but are testing H 0 : β 2 0 in the
presence of β1 .
To develop a test statistic based on SS β 2 | β1 , first write (8.7) in terms of quadratic forms in
y . Using βˆ XX Xy and βˆ 1* X1 X1 X1 y , (8.7) becomes
1 1
y X1 X1 X1 X1 y
1
(8.9)
y X1 X1 X1 X1 y
1
y I A1 y y A1 A 2 y y A 2 y (8.10)
where A1 X XX X and A 2 X1 X1 X1 X1 . The matrix I A1 is idempotent with rank
1 1
where h is the number of elements in β 2 .
Theorem 8.2B. If y is N n Xβ, 2 I , and A1 and A 2 are defined in (8.9) & (8.10), then
(i) y I A1 y is
2
n k 1 ;
2
Theorem 8.2C. Let y is N n Xβ, 2 I and define an F ‐statistic as follows:
y A1 A 2 y h SS β 2 | β1 h
F
y I A1 y n k 1 SSE n k 1
βˆ Xy βˆ Xy h
*
1 1
(8.11)
yy βˆ Xy n k 1
where βˆ XX Xy is from the full model y Xβ ε and βˆ 1* X1 X1 X1 y is from the
1 1
Theorem 8.2D. If the model is partitioned as (8.5), then SS β | β βˆ Xy βˆ * X y can be
2 1 1 1
written as
SS β 2 | β1 βˆ 2 X2 X 2 X2 X1 X1 X1 X1 X 2 βˆ 2
1
(8.13)
where β̂ 2 is from a partitioning of β in the full model:
βˆ 1
βˆ XX Xy
1
(8.14)
βˆ
2
94
Table8.3. Analysis of Variance for F‐test of H 0 : β 2 0
Source of d.f Sum of Squares Mean Square F‐statistic
variation
Due to β 2 SS β 2 | β1 βˆ Xy βˆ 1* X1 y SS β 2 | β1 / h SS β 2 | β1 / h
h
adjusted for SSE n k 1
β1
Error n‐k‐1 SSE y y βˆ Xy SSE n k 1
Total n‐1 SST y y n y 2
Example 8.2(a): In separate h/out.
Example 8.2(b): The full‐and‐reduced‐model test of H 0 : β 2 0 in Table 8.3 can be used to test
for significance of a single ˆ . Suppose we wish to test H : 0 ,where β is partitioned as
j 0 k
0
1
β β1
k
k 1
k
X is partitioned as X X1 , x k , where x k is the last column of X and X1 contains all columns
except x k . The reduced model is y X1β1* ε* and β1* is estimated as βˆ 1* X1 X1 X1 y . In this
1
8.3. F ‐Test in Terms of R 2 .
The F‐statistics in Section 8.1 & 8.2 can be expressed in terms of R 2 as defined in (7.46).
Theorem 8.3A. The F ‐statistics in (8.3) & (8.11) for testing H 0 : β1 0 and H 0 : β 2 0 ,
respectively, can be written in terms of R 2 as
F
βˆ Xy n y 2 k
(8.16)
y y βˆ Xy n k 1
R2 k
(8.17)
1 R 2
n k 1
and
F
βˆ Xy βˆ Xy h
*
1 1
(8.18)
yy βˆ Xy n k 1
R 2
Rr2 h
(8.19)
1 R 2
n k 1
where R 2 for the full model is given in (7.46) and Rr2 for the reduced model y X1β1* ε* in
(8.6) is similarly defined as
βˆ * X y n y 2
Rr2 1 1 (8.20)
y y n y 2
Example 8.3. For the dependent variable y2 in the chemical reaction data in Table 8.2(a), a full
model with nine x’s and a reduced model with three x’s were considered in Example 8.2(a). The
values of R 2 for the full model and reduced model are 0.8485 and 0.3771, respectively. To test
the significance of the increase in R 2 from 0.3771 to 0.8485, we use (8.19):
F
R 2
Rr2 h 0.8485 0.3771
6
4.6671
1 R 2
n k 1 1 0.8485 9
which is the same as the value obtained for F in Example 8.2(a).
8.4. The General Linear Hypothesis Tests For H 0 : Cβ 0
Hypothesis H 0 : Cβ 0 , where C is a q k 1 coefficient matrix of rank q k 1 , is known
as the general linear hypothesis. The hypothesis H 0 : β1 0 in Section 8.1 can be expressed in
the form H 0 : Cβ 0 as follows:
0
H 0 : Cβ 0, I k β1 0 ; 0 is a k 1 vector.
β1
Similarly, H 0 : β 2 0 in Section 8.2 can be expressed in the form H 0 : Cβ 0 as follows:
96
β
H 0 : Cβ O, I h 1 β 2 0
β2
where the matrix O is h k h 1 and the vector 0 is a h 1 .
For more general hypotheses such as
H 0 : 21 2 2 2 3 3 4 1 4 0
can be expressed as
0
0 2 1 0 0 1 0
H 0 : 0 0 1 2 3 2 0 and
0 1 0 0 1 3 0
4
H 0 : 1 2 3 4 can be expressed in terms of three differences,
H 0 : 1 2 2 3 3 4 0 or equivalently, as H 0 : Cβ 0 :
0
0 1 1 0 0 1 0
H 0 : 0 0 1 1 0 2 0 .
0 0 0 1 1
3 0
4
The following theorem gives the sums of squares used in the test of H 0 : Cβ 0 versus
H 0 : Cβ 0 along with their properties.
Theorem 8.4A. If y is distributed as N n Xβ, 2 I and C is a q k 1 of rank q k 1 , then
1
SSH 2 Cβˆ C XX C Cβˆ 2 is 2 q, , where
1
(ii)
1
Cβ C XX C Cβ 2 2 ;
1
SSE 2 y I X XX X y 2 is 2 n k 1 ;
1
(iii)
(iv) SSH & SSE are independent.
Note: The sum of squares due to Cβ (due to the hypothesis) is denoted as SSH.
Thus, the F‐test for H 0 : Cβ 0 versus H 0 : Cβ 0 is given as the following theorem.
Theorem 8.4B.Let y be N n Xβ, 2 I and define the statistic :
Cβˆ C XX
1
C Cβˆ q
1
F
SSH q
; (8.21)
SSE n k 1 SSE n k 1
C is a q k 1 of rank q k 1 and βˆ XX Xy . The distribution of F in (8.21) is as
1
follows:
97
2 0.6118 0.006943 0.044974 0.6118
F
5.3449
28.62301 2
2.6776
5.3449
Since F F0.05,2,9 4.256 , we cannot reject the H 0 : Cβ 0 .
98
aβˆ
F (8.24)
SSE n k 1 s 2a XX a
1
where s 2 SSE n k 1 . The F‐statistic in (8.24) is distributed as F(1,n‐k‐1) if H 0 : aβ 0 is
true.
To test H 0 : j 0 using the general linear hypothesis test statistic in (8.24), define
a 0,,0,1,0,0 where the 1 is in the j ‐th position. This gives
ˆ 2j
F 2 (8.25)
s g jj
j ‐th diagonal element of XX . If H 0 : j 0 is true, F in (8.25) is
1
where g jj is the
distributed as F(1,n‐k‐1). Reject H 0 : j 0 if F F ,1, n k 1 or equivalently, if p value .
Since the F‐statistic in (8.25) has 1 and n‐k‐1 degrees of freedom, we can equivalently use the t ‐
statistic
ˆ j
tj (8.26)
s g jj
to test the effect of j above and beyond the other ’s. We reject H 0 : j 0 if t j t 2, n k 1
or, equivalently, if p value .
99
Variable Intercept x1 x2 y
Intercept 12 52 102 90
x1 52 296 536 482
x2 102 536 1004 872
y 90 482 872 840
100
Variable Intercept x1 x2 y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
8.6 Confidence Intervals & Prediction Intervals.
Assume that y be N n Xβ, 2 I
8.6.1 Confidence Region for β .
If C is equal to I in (8.21), q become k 1 , and substract β to obtain a central F‐distribution
and make the probability statement
P βˆ β XX βˆ β k 1 s 2 F ,k 1, n k 1 1 ,
where s SSH n k 1 . Thus, a 100 1 % joint confidence region for 0 , 1 ,, k in β
2
is given by all vectors β that satisfy
βˆ β XX βˆ β k 1 s F 2
, k 1, n k 1 (8.27)
8.6.2. Confidence Interval for j .
If j 0 , we can substract j in (8.26) so that t j ˆ j j s g jj has the central t‐
ˆ j j
P t 2,n k 1 t 2, n k 1 1 .
s g jj
Solving the inequality for j gives
P ˆ j t 2, n k 1 s g jj j ˆ j t 2, n k 1 s
g jj 1 .
Before taking the sample, the probability that the random interval will contain j is 1 . After
taking the sample, the 100 1 % confidence interval for j ,
ˆ j t 2,n k 1s g jj , (8.28)
is no longer random, and we say that we are 100 1 % confident that the interval contains
j .
Example 8.6.2. Compute a 95% CI for each j using y2 in the chemical reaction data in Example
8.2(a). The matrix XX 1 and the estimate β̂ have the following values
65.37550 0.33885 0.31252 0.02041
0.33885 0.00184 0.00127 0.00043
XX
1
,
0.31252 0.00127 0.00408 0.00176
0.02041 0.00043 0.00176 0.02161
26.0353
ˆβ 0.4046 .
0.2930
1.0338
For 1 , we obtain, by (8.28),
102
aβˆ aβ
2
F
s 2a XX a
1
Let x0 1, x01 , x02 ,, x0 k denote a particular choice of x 1, x1 , x2 ,, xk . Note that x0
need not be one of the x ’s in the sample; that is x0 need not be a row of x . If x0 is very far
outside the area covered by the sample, however, prediction based on x0 may be poor.
Let y0 be an observation corresponding to x0 . Then
y0 x0β , and
E y0 x0β . (8.31)
We wish to find a CI for the E y0 , that is, for the mean of the distribution of y value
corresponding to x0 . By Corollary 1 to Theorem 7.4D, the minimum variance unbiased
estimator of E y0 is given by
Ê y0 x0βˆ (8.32)
Since (8.31)&(8.32) are of the form aβ and aβˆ , respectively, we obtain a 100 1 %
confidence interval for E y0 x0β from (8.30):
x0 XX x0
1
x0βˆ t 2, n k 1 s (8.33)
103
Note: The confidence coefficient 1 for the interval in (8.33) holds only for a single choice of
the vector x0 .
Confidence interval (8.33) in terms of the centered model yi β1 x01 x1 i ; where
x 01 x1 Xc Xc x 01 x1
1
y βˆ 1 x 01 x1 t
1
2, n k 1 s (8.36)
n
For the case of simple linear regression, (8.31), (8.32) and (8.36) reduce to
E y0 0 1 x0 (8.37)
Ê y ˆ ˆ x
0 0 1 0 (8.38)
x0 x
2
1
ˆ0 ˆ1 x0 t 2, n 2 s (8.39)
n in1 xi x 2
2
i yi ˆ0 ˆ1 xi SSE
where s 2 .
n2 n2
Note: The width of the interval in (8.39) depends on how far x0 is from x .
Example 8.6.4. For the grades data in Example 6.1, find a 95% CI for E y0 , where x0 =80.
Using (8.39),
1 80 58.056
2
r.v y0 and ŷ0 are independent because y0 is a future observation to be obtained
independently of the n observations used to compute ŷ x βˆ . Therefore, 0 0
104
var y0 yˆ 0 var y0 var yˆ 0 var y0 var x0βˆ .
var x0βˆ 0 var x0βˆ
Since x0βˆ is a constant, this becomes
var y0 yˆ 0 var 0 var x0βˆ 2 2 x0 XX x0
1
, (8.40)
2 1 x0 XX x0
1
which is estimated by s 1 x0 XX x0 . It can be shown that E y0 yˆ 0 0 and that s 2 is
2 1
independent of both y0 and ŷ0 x0β . Therefore, the t ‐statistic
ˆ
y0 yˆ 0 0
t (8.41)
s 1 x0 XX x 0
1
is distributed as t n k 1 , and
y0 yˆ 0
P t 2,n k 1 t 2,n k 1 1 .
s 1 x0 XX x0
1
The inequality can be solved for y0 to obtain the 100 1 % prediction interval
1 x0 XX x0 y0 yˆ 0 t 1 x0 XX x0
1 1
yˆ 0 t 2, n k 1 s 2, n k 1 s
x 01 x1 Xc Xc x 01 x1
1
y βˆ x 01 x1 t
1
2, n k 1 s 1 (8.43)
n
For the case of simple linear regression, (8.42) & (8.43) reduce to
x0 x
2
1
ˆ0 ˆ1 x0 t 2,n 2 s 1 (8.44)
n in1 x0 x 2
2
i yi ˆ0 ˆ1 xi SSE
where s 2
.
n2 n2
Example 8.6.5. Using the grades data in Example 6.1, find a 95% prediction interval for y0 when
x0 80 . Using (8.44),
105
1 80 58.056
2
n k 1 s 2 2 n k 1 s 2 (8.46)
2 2, n k 1 12 2, n k 1
Example 8.2(a)
In experiment to obtain maximum yield in chemical reaction, the following variable are choosen:
x1= temperature (Celsius degree) , x2 = concentration of a reagent (%), x3=time of reaction
(hours). Two different response variables were observed : y1=percent of unchanged starting
material, y2=percent converted to the desired product.
Table 8.2(a): Chemical Reaction Data
Y1 Y2 X1 X2 X3
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
18 57.3 167 32.5 6.5
26.3 55 167 22.5 6.5
9.9 58.9 167 27.5 9.5
25 50.3 167 27.5 3.5
14.1 61.1 177 20 6.5
15.2 62.9 177 20 6.5
15.9 60 160 34 7.5
19.6 60.6 160 34 7.5
Consider the dependent variable y2 in the chemical reaction data. Now, check the usefulness of
second‐order terms in predicting y2, use as a full model,
y2 0 1 x1 2 x2 3 x3 4 x12 5 x22 6 x32
7 x1 x2 8 x1 x3 9 x2 x3 ,
and test H 0 : 4 5 9 0
data chemical;
input y1 y2 x1 x2 x3;
cards;
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
2
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
data chemical;
input y1 y2 x1 x2 x3;
cards;
41.5 45.9 162 23 3
33.8 53.3 162 23 8
27.7 57.5 162 30 5
21.7 58.8 162 30 8
19.9 60.6 172 25 5
15 58 172 25 8
12.2 58.6 172 30 5
4.3 52.4 172 30 8
19.3 56.9 167 27.5 6.5
6.4 55.4 177 27.5 6.5
37.6 46.9 157 27.5 6.5
18 57.3 167 32.5 6.5
26.3 55 167 22.5 6.5
9.9 58.9 167 27.5 9.5
25 50.3 167 27.5 3.5
14.1 61.1 177 20 6.5
15.2 62.9 177 20 6.5
15.9 60 160 34 7.5
19.6 60.6 160 34 7.5
;
proc glm; /*unlike proc reg, proc glm allows polynomial in the model*/
/*statements */
model y2=x1 x2 x3 x1*x1 x2*x2 x3*x3 x1*x2 x1*x3 x2*x3;
output:
The GLM Procedure
Number of Observations Read 19
Number of Observations Used 19
4
Dependent Variable: y2
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 9 339.7887457 37.7543051 5.60 0.0086
Error 9 60.6754648 6.7417183
Corrected Total 18 400.4642105
R‐Square Coeff Var Root MSE y2 Mean
0.848487 4.608852 2.596482 56.33684
Source DF Type I SS Mean Square F Value Pr > F
x1 1 65.34630867 65.34630867 9.69 0.0125
x2 1 36.18984581 36.18984581 5.37 0.0457
x3 1 49.46603624 49.46603624 7.34 0.0240
x1*x1 1 0.22191886 0.22191886 0.03 0.8600
x2*x2 1 83.13759745 83.13759745 12.33 0.0066
x3*x3 1 11.00115058 11.00115058 1.63 0.2334
x1*x2 1 73.87317137 73.87317137 10.96 0.0091
x1*x3 1 20.29147665 20.29147665 3.01 0.1168
x2*x3 1 0.26124008 0.26124008 0.04 0.8483
Source DF Type III SS Mean Square F Value Pr > F
x1 1 53.47793092 53.47793092 7.93 0.0202
x2 1 36.03765833 36.03765833 5.35 0.0461
x3 1 24.62826755 24.62826755 3.65 0.0883
x1*x1 1 37.33001918 37.33001918 5.54 0.0431
x2*x2 1 0.12012988 0.12012988 0.02 0.8967
x3*x3 1 0.74964520 0.74964520 0.11 0.7464
x1*x2 1 60.52113170 60.52113170 8.98 0.0150
x1*x3 1 18.09124540 18.09124540 2.68 0.1358
x2*x3 1 0.26124008 0.26124008 0.04 0.8483
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept ‐2282.928273 739.5889589 ‐3.09 0.0130
x1 22.859973 8.1165920 2.82 0.0202
x2 21.414751 9.2623257 2.31 0.0461
x3 33.609745 17.5846445 1.91 0.0883
5
x1*x1 ‐0.053803 0.0228645 ‐2.35 0.0431
x2*x2 ‐0.007749 0.0580525 ‐0.13 0.8967
x3*x3 ‐0.085891 0.2575754 ‐0.33 0.7464
x1*x2 ‐0.123372 0.0411765 ‐3.00 0.0150
x1*x3 ‐0.186260 0.1137027 ‐1.64 0.1358
x2*x3 -0.034100 0.1732286 -0.20 0.8483
Test hypothesis H 0 : β 2 0 versus H 0 : β 2 0
For the full model,
y2 0 1 x1 2 x2 3 x3 4 x12 5 x22 6 x32
7 x1 x2 8 x1 x3 9 x2 x3 ,
we obtain βˆ Xy n y 2 =339.7888 and for the reduced model,
y2 0 1 x1 2 x2 3 x3 ,
we have βˆ 1* X1 y n y 2 =151.0022. The difference is βˆ Xy βˆ 1* X1 y =188.7866.
The error sum of squares is SSE=60.6755 (from the full model output), and the F‐statistic is given
by (8.11) as
F
βˆ Xy βˆ 1* X1 y h
188.7866 6 31.4644
4.6671
y y βˆ Xy n k 1 60.6755 9 6.7417
From the statictical table, F0.05,6,9 3.374 . Since F F ,6,9 , reject H 0 : β 2 0 .
Conclusion: Conclude that the second‐order terms are usefull in prediction of y2 .
Note: The overall F for the reduced model is 3.03 with p‐value 0.0623, so that x1 , x2 and x3 are
inadequate for predicting y2 . The overall F for the full model is 5.60 with p‐value=0.086.
Chapter 8. Consider the model with normal assumption.
0
β .
β1
To construct an F‐test involving SSR and SSE, need to express the sums of
squares in (SST = SSR + SSE) as quadratic forms in y so that we can use
theorems to show that these sums of squares have chi‐square distributions and
are independent.
1
y I J y SSR SSE
n
1
y Xc Xc Xc Xc y y I J y y Xc Xc Xc Xc y
1 1
n
(8.2)
1
y Ay y I J A y
n
These quadratic form have properties as stated in the following Theorem 8.1A
1 1
Theorem 8.1A. The matrices I J , A Xc Xc Xc Xc , and I J A have
1
n n
the following properties:
1
(i) A I J A ,
n
(ii) A is idempotent of rank k ,
(iii)
1
I J A is idempotent of rank n k 1 ,
n
1
(iv) A I J A O .
n
F
SSR k 2
SSR k
SSE n k 1
2
SSE n k 1
(8.3)
is as follows:
(ii) If H 0 : β1 0 is true, then 1 =0 and F is distributed as
F k , n k 1
Remember that SSR & SSE in terms of the noncentered model y Xβ ε is
Try the Example 8.1.
To test the hypothesis that a subset of the x ’s is not useful in predicting y .
β1
y Xβ ε X1 , X 2 ε
The model: β2
X1β1 X 2β 2 ε
Now we have the model as written in full and reduced model which is
SS β 2 | β1 βˆ Xy n y 2 βˆ 1* X1 y n y 2
SSR full SSR reduced
overall regression sum of squares for the full model and the overall regression
sum of squares for the reduced model.
To develop a test statistic based on SS β 2 | β1 , need to develop the quadratic
form which is
y y y I A1 y y A1 A 2 y y A 2 y
(i) y I A1 y 2 is
2
n k 1 ;
y A1 A 2 y h SS β 2 | β1 h
F
y I A1 y n k 1 SSE n k 1
βˆ Xy βˆ Xy h
*
1 1
yy βˆ Xy
n k 1
8.4 F test for the general linear hypothesis
Consider the regression model
With E i 0, E i j 0 for i j , and Var i .
2
Suppose we want to test the following linear hypotheses:
(a)
H 0 : 2 0
H1 : 2 0
(b)
H 0 : 2 3
H : 3
1 2
(c)
H 0 : 1 5 or 1 5 0
H1 : 1 5 or 1 5 0
(d)
H 0 : 1 2 3 4 5 0
H1 : At least one i 0
This hypothesis can be expressed as
H 0 : β1 0
H1 : β1 0 where, β1 1 , 2 , 3 , 4 , 5 .
(e)
H 0 : 2 , 5 0
H1 : 2 , 5 0
All these hypotheses above can be expressed through the general
linear hypothesis:
H 0 : Cβ γ 0
H1 : Cβ γ 0
Now, let’s find the matrix C and the vector γ for each one of the
hypotheses (a) – (e):
(d) This is also called the overall significance of the model. The
matrix C and the vector γ will be defined as follows:
Cβ γ 0
0
0 1 0 0 0 0 0
1
0 0 1 0 0 0 0
2
0 0 0 1 0 0 0
3
0 0 0 0 1 0 0
4
0
1 0
0 0 0 0
5
This will give us the hypothesis 1 2 3 4 5 0 .
(e) We are testing here whether the two parameters 2 , 5 are
significantly simultaneously. The matrix C and the vector γ will be
defined as follows:
Cβ γ 0
0
1
0 0 1 0 0 0 2 0
0 0 0 0 0 1 3 0
4
5
This will give us the hypothesis 2 5 0 .
So, to test the hypothesis of H 0 : Cβ 0 vs H 0 : Cβ 0 , we have
this theorem regarding sums of squares that will be used to test the
hypothesis with their properties:
ˆ C XX 1 C Cβˆ 2 2 q, ,
1
(ii) SSH 2
Cβ is
where
1
Cβ C XX C Cβ 2 2 ;
1
Note: The sum of squares due to Cβ (due to the hypothesis) is
denoted as SSH.
And the F‐test is given as follows:
Theorem 8.4B.Let y be N n Xβ, I and define the statistic :
2
ˆ C XX 1 C Cβˆ q
1
Cβ
F
SSH q
SSE n k 1 SSE n k 1 ;
C is a q k 1 of rank q k 1 and βˆ XX Xy . The
1
distribution of this F is as follows:
The expected mean squares for this F‐test are given by
SSH 1
1
E 2
Cβ
Cβ C X X
1
C
q q
SSE
2
E
n k 1
Now try Example 8.4(b). Use SAS to find all the relevant values for
the F‐test.
8.5 Testing One k or One aβ .
The F‐test statistic for H 0 : k 0
(i) by using a full and reduced model is
all the columns of X except the last.
(ii) by using general linear hypothesis approach
Let say we want to test H 0 : j 0
We define a 0,,0,1,0,0 where the 1 is in the j ‐th position.
This gives
ˆ 2j
F
s 2 g jj
ˆ j
tj
s g jj
F‐test for H 0 : aβ 0
1
aβˆ a XX a aβˆ q
1 2
a βˆ
F SSE n k 1
1 ;
s 2a XX a
Try Example 8.5(a).
8.6 Confidence & Prediction Intervals
Cβˆ C XX
1
C Cβˆ q
1
F
SSH q
.
SSE n k 1 SSE n k 1
P βˆ β XX βˆ β
k 1 s 2 F ,k 1,n k 1 1 ,
βˆ β XX βˆ β k 1 s F
2
, k 1, n k 1
But it is not easy to interpret this region therefore we are more
interested in the confidence intervals for the indidvidual j !!
8.6.2 Confidence Interval for j .
To test the hypothesis of H 0 : j 0 we can use t‐statistics
ˆ j
tj s g
jj
If j 0 , we can substract j in the equation above so that
t j ˆ j j s g jj has the central t‐distribution, where g jj is the j
‐th diagonal element of XX . Then
1
ˆ j j
P t 2, n k 1 t 2, n k 1 1 .
s g jj
Solving the inequality for j gives
P ˆ j t 2, n k 1s g jj j ˆ j t 2, n k 1s
g jj 1 .
ˆ j t 2, n k 1 s g jj ,
and we say that we are 100 1 % confident that the interval
contains j .
Now try Example 8.6.2!
2
aβˆ aβ
F
s 2a XX a
1
which is distributed as F 1, n k 1 . Then,
aβˆ aβ
t
s a XX a
1
is distributed as t n k 1 .
2, n k 1s a XX a
1
aβˆ t
y0 x0 β , and
E y0 x0β .
To find a CI for the E y0 , that is, for the mean of the distribution of
y value corresponding to x0 . The minimum variance unbiased
These E y0 and Ê y0 are of the form aβ and aβˆ , respectively.
Therefore, a 100 1 % confidence interval for E y0 x0β is
x0 XX x 0
1
x0βˆ t 2, n k 1 s which is hold only for a
single choice of the vector x0 .
Ê y0 y βˆ 1 x 01 x1 respectively with their a 100 1 %
confidence interval as
x01 x1 Xc Xc x01 x1
1
y βˆ 1 x01 x1 t
1
2, n k 1 s
n
If we consider simple linear regression, E y0 0 1 x0 and
x0 x
2
1
ˆ0 ˆ1 x0 t 2,n 2 s
n in1 xi x 2
2
i yi ˆ0 ˆ1 xi SSE
where s
2
n2 n2
.
Note: The width of the interval will depends on how far x0 is from
x .
Try Example 8.6.4!
8.6.5 Prediction Interval for a Future Observation
future observation y0 .
The t ‐statistic for this future observation can be written as
y0 yˆ0 0
t
s 1 x0 XX x0
1
is distributed as t n k 1 , and
y0 yˆ 0
P t 2, n k 1 t 2,n k 1 1 .
s 1 x0 XX x0
1
This inequality can be solved for y0 to obtain the 100 1 %
prediction interval as the following:
1 x0 XX x0 y0 yˆ 0 t 1 x0 XX x0
1 1
yˆ 0 t 2, n k 1 s 2, n k 1 s
How to write this in term of centered form?
In terms of centered model, the 100 1 % prediction interval can
be written as
x x
2
1
ˆ0 ˆ1 x0 t 2, n 2 s 1 n 0
n i 1 x0 x 2 where
2
i yi ˆ0 ˆ1 xi SSE
s2 .
n2 n2
Try Example 8.6.5!
Confidence Interval for .
2
8.6.6
By theorem, n k 1 s is
2 2 2
n k 1 . Therefore,
P 12
n k 1 s 2
2
2, n k 1 2, n k 1 1 ,
2
n k 1 s 2 2 n k 1 s 2
2 2,n k 1 12 2,n k 1
A 100 1 % confidence interval for is
n k 1 s 2 n k 1 s 2
2 2
2, n k 1 1 2, n k 1