Professional Documents
Culture Documents
Another reason why we have not encouraged the use of the more limited Proc ANOVA is because it
lacks the ability to deal with random effects. For random and mixed models, one can use Proc GLM [or
the even more general Proc Mixed, which we will not cover in this class]. The expected mean squares
(EMS's) for the model effects are generated via the Random statement in Proc GLM. The Random
statement is used to designate those model effects that are random, and it must appear after the Model
statement. The generic syntax:
And an example:
Proc GLM;
Class A B C;
Model Response = A|B|C;
Random B A*B B*C / test;
With no specified options, the Random statement alone will produce a table of EMS's for each effect in
the model. When you include the "/ test" option, the Random statement tells SAS to determine the form
of the appropriate F test for each effect using an approach similar to the Satterthwaite approximation
(Topic 10.5). SAS will then generate F ratios (approximated, when necessary) and p-values for each test.
Very handy, needless to say.
It is now commonly accepted that an interaction should be treated as a random effect if any one of the
effects involved in the interaction is random. However, Proc GLM does not operate under this
presumption; therefore, it is your responsibility to explicitly designate main random effects and their
interactions as random using the Random statement.
because it does not explicitly declare the A*B interaction to be random. The correct code in this case is:
Proc GLM;
Class A B;
Model Response = A|B;
Random B A*B / test;
In this example, we're using the data set from Lab 6 (Example 2), a three-way factorial CRD with one
replication per three-way combination of factors. As before, there are not enough degrees of freedom to
include the three-way interaction in the model. Not as before, we are going to treat all three factors as
random effects. Recall the general linear model for this design:
Y ijk i j k ( ) ij ( ) ik ( ) jk ijk
The table of expected mean squares (EMS) for this model is shown below:
A 2 + c 2 + b 2 + bc 2 MSA MSE
MS ( AB ) MS ( AC )
B 2 + a 2 + c 2 + ac 2 MSB MSE
MS ( AB ) MS ( BC )
C 2 + b 2 + a 2 + ab 2 MSC MSE
MS ( AC ) MS ( BC )
AB 2 + c 2 MS ( AB )
MSE
AC 2 + b 2 MS ( AC )
MSE
BC 2 + a 2 MS ( BC )
MSE
Error 2
With a = 3, b = 5, and c = 2.
Data Interac3;
Input A B C Response @@;
Cards;
1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113
1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143
1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63
1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167
1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128
;
Proc GLM Data = Interac3;
Class A B C;
Model Response = A|B|C@2;
Random A|B|C@2 / test;
* Performs approximate hypothesis tests (i.e. F tests) for each effect specified in
the model, using appropriate error terms as determined by the EMS's;
Run;
Quit;
The Output
Sum of
Source DF Squares Mean Square F Value Pr > F
!! WRONG !!
A 2 3599.266667 1799.633333 620.56 <.0001
B 4 6423.133333 1605.783333 553.72 <.0001
A*B 8 9675.066667 1209.383333 417.03 <.0001
C 1 5333.333333 5333.333333 1839.08 <.0001
A*C 2 5692.466667 2846.233333 981.46 <.0001
B*C 4 7987.000000 1996.750000 688.53 <.0001
NOTE: The above table incorrectly uses the MSE (2.9) as the denominator for all F tests!
This incorrect ANOVA table is immediately followed, however, by a table of EMS's that are needed to
construct the correct F tests. SAS does all this as a result of the Random statement in Proc GLM.
Notice how SAS uses the correct error terms to calculate the approximate F values in each case.
The specific error term is listed beneath each result and is based on the EMS table.
The non-integer error df's are approximated using Satterthwaite's method.
In the following example, you will learn a method for generating a dataset in SAS [Disclaimer: We do not
recommend trying to get published using SAS-generated datasets, though it's probably happened]. This is
a nice little program for exploring how the EMS's change with different combinations of effects (random
and fixed).
Data EMSPlay;
Do A = 1 to 3;
Do B = 1 to 5;
Do C = 1 to 2;
Response = 4*RanNor(0)+15; * Picks RANdom numbers from a NORmal;
Output; * distribution with mean 15 and stdev 4;
End;
End;
End;
Proc Print;
Proc GLM Data = EMSPlay;
Class A B C;
Model Response = A|B|C@2;
Random A B A*B A*C B*C / test; * The model if C were a fixed effect;
Run;
Quit;
Now, since C is a fixed effect, we do not include it in the Random statement; but its interactions with
random factors are included (A*C and B*C). Notice how, in the EMS expressions, SAS designates the
fixed effect of C as "Q(C)" rather than as 15 Var(C). Don't be thrown by this. Since C is a fixed effect
(i.e. the levels of C were not randomly sampled from a normal population), it technically has no variance,
though the "fixed effect" of C has a computational form that is identical to that for variances, namely:
( ... )
2
.. k
Q (C ) r
c 1
This formula is analogous to all the MS formulae you've seen until now. You can determine the value of
the leading coefficient (r) in various ways:
The mean of each level of C is found by taking the average of 15 numbers (3 levels of A, 5 levels of B).
So, to express the variance in a per-observation-basis, one must multiply the "variance" of C by 15.
r a b 15
By the way, this is the same coefficient needed to calculate MSC by hand:
(Y Y ... )
2
( 64 . 1 3 77 . 4 6 ) ( 90 . 8 77 . 4 6 )
2 2
.. k
MSC a b 15 5333 . 3 3
c 1 2 1
The following example demonstrates the effect an unbalanced design has on the computation of means
and sums of squares (i.e. we're finally going to look at the difference between Types I and III SS!):
Data SSBonanza;
Input A B Response @@;
Cards;
1 1 5 1 1 6
1 2 2 1 2 3 1 2 5 1 2 6 1 2 7
1 3 3
2 1 2 2 1 3
2 2 8 2 2 8 2 2 9
2 3 4 2 3 4 2 3 6 2 3 6 2 3 7
Proc GLM Data = SSBonanza;
Class A B;
Model Response = A|B / SS1 SS2 SS3 SS4; * Tells SAS to generate SS Types I-IV;
Means A B; * Generates normal arithmetic means;
LSMeans A B / pdiff lines; * Generates least-square means. The PDIFF option
requests p-values for all pairwise comparisons
with H0: LSi = LSj but no error control!;
LSMeans A B / pdiff Adjust = Tukey lines;* Controls error. Can also do Dunnett, etc;
Run;
Quit;
Output
Sum of
Source DF Squares Mean Square F Value Pr > F
Output of Means
Level of -----------Response----------
A N Mean Std Dev
1 8 4.62500000 1.76776695
2 10 5.70000000 2.35937845
Level of -----------Response----------
B N Mean Std Dev
1 4 4.00000000 1.82574186
2 8 6.00000000 2.50713268
3 6 5.00000000 1.54919334
H0:LSMean1=
Response LSMean2
A LSMEAN Pr > |t|
1 4.36666667 0.2227 NS
2 5.41111111
Response LSMEAN
B LSMEAN Number
1 4.00000000 1
2 6.46666667 2
3 4.20000000 3
i/j 1 2 3
1 0.0192 * 0.8579 NS
2 0.0192 0.0376 *
3 0.8579 0.0376
B 4.2000000 3 3
B 4.0000000 1 1
NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons
should be used.
As highlighted above, SAS offers the good advice to use p-values for planned comparisons only because
these particular pair-wise comparisons are being made without any control for MEER. In other words, it
is best to perform as few of these comparisons as absolutely necessary in order to keep the EER to a
minimum.
Output of LSMeans and p-values for protected (Tukey-like) pairwise comparisons
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer
H0:LSMean1=
Response LSMean2
A LSMEAN Pr > |t|
1 4.36666667 0.2227
2 5.41111111
Response LSMEAN
B LSMEAN Number
1 4.00000000 1
2 6.46666667 2
3 4.20000000 3
i/j 1 2 3
1 0.0470 * 0.9817 NS
2 0.0470 0.0887 NS
3 0.9817 0.0887
Using the more conservative Tukey-Kramer test which controls MEER, we lose significance between
levels 2 and 3 of factor B.
For unbalanced mixed models with crossed factors, it is necessary to use a different SAS
procedure called Proc Mixed (ST&D 411) that will not be covered in this class. The syntax is
similar to Proc GLM, but the output is substantially more complex. Information about Proc
Mixed is available at:
https://jukebox.ucdavis.edu/slc/sasdocs/sashtml/stat/chap41/index.htm