Professional Documents
Culture Documents
The model Y = X + accounts for variation in the responses Y , measured by Y T Y , in 2 ways: 1. systematic variation explained by the regressor variables x1 , . . . , xk in X 2. random variation in From the LS estimate Y = Y + Therefore YTY = (Y + )T (Y + ) = Y T Y + T + 2T Y But Y (i.e. T Y = 0) T Y = ((I H )Y )T HY = Y T (H H 2 )Y = 0
ANOVA continued
Therefore YTY i.e. where SSmod = Y T Y = Y T HY , the model sum of squares SSE = T = Y T (I H )Y , the residual sum of squares YTY = Y T Y + T = SSmod + SSE
Note that Y T Y is the squared length of the vector Y , & similarly for T
XTX XX
T 1
(X X )
Example 1 continued
SSmod for the model E(y ) = 0 again! For n observations from the model E(y ) = 0 it is easy to check that: 0 = y so Y = y X
Therefore
YTY
2 T = y X X 2 = ny
as before.
E(yij ) = i for i = 1, . . . , t ; j = 1, . . . , ni It is not too difcult to show H is an n n block diagonal matrix consisting of blocks 1 Jn ni i where J is a square matrix of 1s, and n =
t
ni . Hence
SSmod =
i =1
ni y 2 i
Comparing Models
In example 2 (two slides previous) the hypothesis H 0 : 1 = 2 = = t is tested by comparing the t of the model in Example 2 with that in Example 1. Model 1 is a special case of Model 2, and if this ts nearly as well as 2 then H0 is rejected.
Example 3
Fitting the model E(y ) = 0 + 1 x1 + 2 x2 + 3 x1 x2 We can ask: Does the rate at which E(y ) changes with x1 depend on the value of x2 ? (i.e. does the simpler model E(y ) = 0 + 1 x1 + 2 x2 adequately t the data?) The answer to the question is to test: H0 : 3 = 0 v Ha : 3 = 0. This could be done if we knew the distributional properties of 3 . So far we know the rst two moments of this distribution, but we need further assumptions to ensure the distribution is normal.
Example 4
the question whether E(y ) depends on x2 involves comparing this model to E(y ) = 0 + 1 x1 i.e. testing H0 : 2 = 3 = 0
General Formulation
Assuming the model Y = X + , Y = X0 + , N(0, 2 I ) N(0, 2 I ), C (X0 ) C (X ) (1) (2)
we wish to know if some simpler model gives an acceptable t to the data. Models (1) & (2) can be expressed as (1) E(Y ) C (X ); respectively. Often the subspace C (X0 ) C (X ) is described by a set of r linear equations L = 0 where L is an r p matrix of constants. (2) E(Y ) C (X0 )
Examples
Example 3 (above) L= 0 0 1 0 0 0 0 1
1. The smaller SSE (equivalently the larger SSmod) the better the model, in that the regressors explain more of the variation in Y . 2. Comparing 2 models (a full and reduced model) if the difference in SSE (equivalently SSmod) is small then the reduced model may be considered adequate. 3. From 2 models we get two sets of estimates Yfull & Yreduced
and the difference in SSmod is the squared distance between these 2 estimates.
ni (y i y )2
the familiar treatment sum of squares in a one-way ANOVA. Exercise Prove the equality above.
We need to know whether the change in error SS is large
computer output
R( | ) notation
To emphasize the role of the regressors in E(y ) = 0 + 1 x1 + 2 x2 + + k xk put: R(0 , 1 , . . . , k ) = SSmod To see if the x s effect E(y ) compare this model to the constant mean model E(y ) = 0 for which R(0 ) = n y 2 = the correction for the mean (CM) The difference in SSmod for the 2 models is denoted by R(1 , 2 , . . . , k | 0 ). i.e. R(1 , 2 , . . . , k | 0 ) = R(0 , 1 , . . . , k ) R(0 )
R( | ) notation continued
When testing for a subset model obtained by omitting terms from a larger model, say testing H0 : r +1 = r +2 = = k = 0
where E(y ) = 0 + 1 x1 + + r xr + r +1 xr +1 + + k xk the difference in model (error) sums of squares R(0 , 1 , . . . , k ) R(0 , 1 , . . . , r ) and is denoted by R(r +1 , r +2 , . . . , k | 0 , 1 , . . . , r ) i.e. for coefcients and R( | ) = R(, ) R( ) which measures the effect (decrease in SSE) of adding to a model already containing .
SSreg
YTY = SSmod + SSE = R(0 , 1 , 2 , . . . , k ) + SSE = R(0 ) + R(1 , 2 , . . . , k | 0 ) + SSE i.e. where YTY = R(0 ) + SSreg + SSE
SSreg = R(1 , 2 , . . . , k | 0 )
SSreg is the amount by which the regressor variables reduce SSE compared to a constant mean.
1 ( yi )2 and yi2 1 yi )2 = Since R(0 ) = n n( get a partition of the total corrected SS,
(yi y )2 we
ANOVA tables
Source Mean Regressors Residual Total (uncorrected) SS
1 JY YT n 1 T Y (H n J )Y T Y (I H )Y
df 1 p1 np
YTY
This is often presented as a decomposition of SSTo: Source Regressors Residual Total (corrected) SS Y T (H Y T (I H )Y
1 n J )Y
df p1 np
1 J )Y Y T (I n
Mean squares, MS, and F values are e.g. given by MSreg = SSreg p1 MSE = SSE np and F = MSreg MSE
E(SS)
A rst step in evaluating the effect on SSE of enlarging the regression space from C (X0 ) to C (X ) is to calculate expected sums of squares E(Y T (H H0 )Y ). The following are obtained by applying formula for E(Y T AY ) In general: E(Y T (H H0 )Y ) = (p p0 ) 2 + T X T (H H0 )X so E(SSE) = (n p ) 2 E(SSmod) = p 2 + T (X T X ) E(SSreg) = (p 1) + X (I
2 T T 1 n J )X
Notes: (1) has been proved already. 1 J )X does not involve the intercept and so is In (3) T X T (I n a quadratic form in the k = p 1 regression coefcients 1 , . . . , k
F tests
For the usual inferences we need to assume normal error terms, i.e. 1 , . . . , n are i.i.d. N(0, 2 ) . If e.g. E(y ) does not depend on the regressor variables then H0 : 1 = 2 = = k = 0 is true and E(MSreg) = 2 . If H0 is false then E(MSreg) > 2 . Hence, to test H0 use F = MSreg MSE
rejecting H0 if F 1. A precise test is obtained knowing that the normal error assumption implies that under H0 F F (p 1, n p )
and the p value for testing H0 given an observed value Fobs is p = Pr(F > Fobs | F F (p p0 , n p )) Equivalently, at signicance level , H0 is rejected if Fobs > F1(p p0 , n p ) where F1 (1 , 2 ) is the 100(1 ) percentage point of an F distribution with (1 , 2 ) df.
Decision Rule: If F F (1 - ; p q , n p ), conclude H0 . If F > F (1 - ; p q , n p ), conclude HA . Example: Do we need a group of terms? For example with E(y ) = 0 + 1 x1 + + 5 x5 we could test H0 : 3 = 4 = 5 = 0. The F test has (3, n 6) df and F = R(3 , 4 , 5 | 0 , 1 , 2 )/3 MSE
For the model E(y ) = 0 + 1 x1 + 2 x2 + 3 x3 + 4 x4 we can test 1 = 2 3 = 24 The reduced model is E(y ) = 0 + 1 x1 + 1 x2 + 3 x3 + 0.53 x 4 = 0 + 1 (x1 + x2 ) + 3 (x3 + 0.5x4 ) We can now perform a regression (with an intercept) on the variables x1 + x2 and x3 + 0.5x4
Reduced models to consider Two obvious hypotheses are: 1. Are the 2nd order terms required? 2. Is x2 required? This is suggested by the fact that coefcients of terms involving x2 are of borderline signicance. ANOVA tables for the full model and the 2 reduced models are obtained from the SAS procedure GLM.
Answering the questions 1. Comparing model C (X0 ) with C (X ) we can use any of:
increase in SSmod increase in SSreg decrease in SSE
so there is strong evidence to include the 2nd order terms. 2. Terms in x2 (2643.34 2626.03)/(9 5) F = 124.77/10 4.3275 = 0.347 = 12.477 Pr(F 0.347 | F F (4, 10)) = 0.84
The points marked are the experimental points in this designed experiment.