You are on page 1of 9

PLS205 Lab 7 February 20, 2014

Laboratory Topics 10 & 11

Random models (EMS tables by hand and in SAS)


Generating random datasets to explore EMS tables
Unbalanced designs and SS types I - IV

Random Models in SAS

Another reason why we have not encouraged the use of the more limited Proc ANOVA is because it
lacks the ability to deal with random effects. For random and mixed models, one can use Proc GLM [or
the even more general Proc Mixed, which we will not cover in this class]. The expected mean squares
(EMS's) for the model effects are generated via the Random statement in Proc GLM. The Random
statement is used to designate those model effects that are random, and it must appear after the Model
statement. The generic syntax:

Random effects / test;

And an example:
Proc GLM;
Class A B C;
Model Response = A|B|C;
Random B A*B B*C / test;

With no specified options, the Random statement alone will produce a table of EMS's for each effect in
the model. When you include the "/ test" option, the Random statement tells SAS to determine the form
of the appropriate F test for each effect using an approach similar to the Satterthwaite approximation
(Topic 10.5). SAS will then generate F ratios (approximated, when necessary) and p-values for each test.
Very handy, needless to say.

It is now commonly accepted that an interaction should be treated as a random effect if any one of the
effects involved in the interaction is random. However, Proc GLM does not operate under this
presumption; therefore, it is your responsibility to explicitly designate main random effects and their
interactions as random using the Random statement.

In other words, if B is a random effect, the following code is incorrect:


Proc GLM;
Class A B;
Model Response = A|B;
Random B / test;

because it does not explicitly declare the A*B interaction to be random. The correct code in this case is:
Proc GLM;
Class A B;
Model Response = A|B;
Random B A*B / test;

PLS205 2014 1 Lab Topics 10-11


Example 7.1 Random effects model [Lab7ex1.sas]

In this example, we're using the data set from Lab 6 (Example 2), a three-way factorial CRD with one
replication per three-way combination of factors. As before, there are not enough degrees of freedom to
include the three-way interaction in the model. Not as before, we are going to treat all three factors as
random effects. Recall the general linear model for this design:

Y ijk i j k ( ) ij ( ) ik ( ) jk ijk

The table of expected mean squares (EMS) for this model is shown below:

Source Expected Mean Squares F

A 2 + c 2 + b 2 + bc 2 MSA MSE
MS ( AB ) MS ( AC )

B 2 + a 2 + c 2 + ac 2 MSB MSE
MS ( AB ) MS ( BC )

C 2 + b 2 + a 2 + ab 2 MSC MSE
MS ( AC ) MS ( BC )

AB 2 + c 2 MS ( AB )
MSE

AC 2 + b 2 MS ( AC )
MSE

BC 2 + a 2 MS ( BC )
MSE

Error 2

With a = 3, b = 5, and c = 2.

PLS205 2014 2 Lab Topics 10-11


The corresponding SAS code

Data Interac3;
Input A B C Response @@;
Cards;
1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113
1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143
1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63
1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167
1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128
;
Proc GLM Data = Interac3;
Class A B C;
Model Response = A|B|C@2;
Random A|B|C@2 / test;
* Performs approximate hypothesis tests (i.e. F tests) for each effect specified in
the model, using appropriate error terms as determined by the EMS's;
Run;
Quit;

The Output
Sum of
Source DF Squares Mean Square F Value Pr > F

Model 21 38710.26667 1843.34603 635.64 <.0001


Error 8 23.20000 2.90000
Corrected Total 29 38733.46667

R-Square Coeff Var Root MSE Response Mean

0.999401 2.198286 1.702939 77.46667

Source DF Type III SS Mean Square F Value Pr > F

!! WRONG !!
A 2 3599.266667 1799.633333 620.56 <.0001
B 4 6423.133333 1605.783333 553.72 <.0001
A*B 8 9675.066667 1209.383333 417.03 <.0001
C 1 5333.333333 5333.333333 1839.08 <.0001
A*C 2 5692.466667 2846.233333 981.46 <.0001
B*C 4 7987.000000 1996.750000 688.53 <.0001

NOTE: The above table incorrectly uses the MSE (2.9) as the denominator for all F tests!

This incorrect ANOVA table is immediately followed, however, by a table of EMS's that are needed to
construct the correct F tests. SAS does all this as a result of the Random statement in Proc GLM.

Source Type III Expected Mean Square .

A Var(Error) + 5 Var(A*C) + 2 Var(A*B) + 10 Var(A)


B Var(Error) + 3 Var(B*C) + 2 Var(A*B) + 6 Var(B)
A*B Var(Error) + 2 Var(A*B)
C Var(Error) + 3 Var(B*C) + 5 Var(A*C) + 15 Var(C)
A*C Var(Error) + 5 Var(A*C)
B*C Var(Error) + 3 Var(B*C)

SAS uses these EMS's to carry out approximate F tests:

Source DF Type III SS Mean Square F Value Pr > F


A 2 3599.266667 1799.633333 0.44 0.6704
Error 3.8798 15724 4052.716667
Error: MS(A*B) + MS(A*C) - MS(Error)

PLS205 2014 3 Lab Topics 10-11


Source DF Type III SS Mean Square F Value Pr > F
B 4 6423.133333 1605.783333 0.50 0.7362
Error 8.6986 27864 3203.233333
Error: MS(A*B) + MS(B*C) - MS(Error)

Source DF Type III SS Mean Square F Value Pr > F


A*B 8 9675.066667 1209.383333 417.03 <.0001
A*C 2 5692.466667 2846.233333 981.46 <.0001
B*C 4 7987.000000 1996.750000 688.53 <.0001
Error: MS(Error) 8 23.200000 2.900000

Source DF Type III SS Mean Square F Value Pr > F


C 1 5333.333333 5333.333333 1.10 0.3454
Error 4.6414 22465 4840.083333
Error: MS(A*C) + MS(B*C) - MS(Error)

Notice how SAS uses the correct error terms to calculate the approximate F values in each case.
The specific error term is listed beneath each result and is based on the EMS table.
The non-integer error df's are approximated using Satterthwaite's method.

Generating Random Datasets in SAS

In the following example, you will learn a method for generating a dataset in SAS [Disclaimer: We do not
recommend trying to get published using SAS-generated datasets, though it's probably happened]. This is
a nice little program for exploring how the EMS's change with different combinations of effects (random
and fixed).

Example 7.2 [Lab7ex2.sas]

Data EMSPlay;
Do A = 1 to 3;
Do B = 1 to 5;
Do C = 1 to 2;
Response = 4*RanNor(0)+15; * Picks RANdom numbers from a NORmal;
Output; * distribution with mean 15 and stdev 4;
End;
End;
End;
Proc Print;
Proc GLM Data = EMSPlay;
Class A B C;
Model Response = A|B|C@2;
Random A B A*B A*C B*C / test; * The model if C were a fixed effect;
Run;
Quit;

PLS205 2014 4 Lab Topics 10-11


The EMS table
Source Type III Expected Mean Square

A Var(Error) + 5 Var(A*C) + 2 Var(A*B) + 10 Var(A)


B Var(Error) + 3 Var(B*C) + 2 Var(A*B) + 6 Var(B)
A*B Var(Error) + 2 Var(A*B)
C Var(Error) + 3 Var(B*C) + 5 Var(A*C) + Q(C)
A*C Var(Error) + 5 Var(A*C)
B*C Var(Error) + 3 Var(B*C)

Now, since C is a fixed effect, we do not include it in the Random statement; but its interactions with
random factors are included (A*C and B*C). Notice how, in the EMS expressions, SAS designates the
fixed effect of C as "Q(C)" rather than as 15 Var(C). Don't be thrown by this. Since C is a fixed effect
(i.e. the levels of C were not randomly sampled from a normal population), it technically has no variance,
though the "fixed effect" of C has a computational form that is identical to that for variances, namely:

( ... )
2
.. k
Q (C ) r
c 1

This formula is analogous to all the MS formulae you've seen until now. You can determine the value of
the leading coefficient (r) in various ways:

1. Just thinking about it

The mean of each level of C is found by taking the average of 15 numbers (3 levels of A, 5 levels of B).
So, to express the variance in a per-observation-basis, one must multiply the "variance" of C by 15.

2. By hand (referencing an EMS table, like the one in example 10.1)

r a b 15

By the way, this is the same coefficient needed to calculate MSC by hand:

(Y Y ... )
2
( 64 . 1 3 77 . 4 6 ) ( 90 . 8 77 . 4 6 )
2 2
.. k
MSC a b 15 5333 . 3 3
c 1 2 1

Notice this exactly matches MSC in the ANOVA table.

PLS205 2014 5 Lab Topics 10-11


Unbalanced Designs

The following example demonstrates the effect an unbalanced design has on the computation of means
and sums of squares (i.e. we're finally going to look at the difference between Types I and III SS!):

Example 7.3 [Lab7ex3.sas]

Data SSBonanza;
Input A B Response @@;
Cards;
1 1 5 1 1 6
1 2 2 1 2 3 1 2 5 1 2 6 1 2 7
1 3 3

2 1 2 2 1 3
2 2 8 2 2 8 2 2 9
2 3 4 2 3 4 2 3 6 2 3 6 2 3 7
Proc GLM Data = SSBonanza;
Class A B;
Model Response = A|B / SS1 SS2 SS3 SS4; * Tells SAS to generate SS Types I-IV;
Means A B; * Generates normal arithmetic means;
LSMeans A B / pdiff lines; * Generates least-square means. The PDIFF option
requests p-values for all pairwise comparisons
with H0: LSi = LSj but no error control!;
LSMeans A B / pdiff Adjust = Tukey lines;* Controls error. Can also do Dunnett, etc;
Run;
Quit;

Output
Sum of
Source DF Squares Mean Square F Value Pr > F

Model 5 51.04444444 10.20888889 4.70 0.0131


Error 12 26.06666667 2.17222222
Corrected Total 17 77.11111111

R-Square Coeff Var Root MSE Response Mean

0.661960 28.22258 1.473846 5.222222

Source DF Type I SS Mean Square F Value Pr > F

A 1 5.13611111 5.13611111 2.36 0.1501


B 2 15.68286517 7.84143258 3.61 0.0592
A*B 2 30.22546816 15.11273408 6.96 0.0099

Source DF Type II SS Mean Square F Value Pr > F

A 1 9.70786517 9.70786517 4.47 0.0561


B 2 15.68286517 7.84143258 3.61 0.0592
A*B 2 30.22546816 15.11273408 6.96 0.0099

Source DF Type III SS Mean Square F Value Pr > F

A 1 3.59186992 3.59186992 1.65 0.2227


B 2 21.00074906 10.50037453 4.83 0.0289
A*B 2 30.22546816 15.11273408 6.96 0.0099

Source DF Type IV SS Mean Square F Value Pr > F

A 1 3.59186992 3.59186992 1.65 0.2227


B 2 21.00074906 10.50037453 4.83 0.0289
A*B 2 30.22546816 15.11273408 6.96 0.0099

PLS205 2014 6 Lab Topics 10-11


Wow! Notice how the Interaction SS is the same in the Type I and Type III analyses. This is because the
A*B interaction was included as the last term in the model, so the SS assigned to it was whatever was left
after the main effects of A and B were accounted for. In this case, the Type III SS does a better job of
minimizing the overlap due to the broken orthogonality of the treatments by treating each effect (A, B,
and their interaction) individually as the last effect in the model.

Output of Means
Level of -----------Response----------
A N Mean Std Dev

1 8 4.62500000 1.76776695
2 10 5.70000000 2.35937845

Level of -----------Response----------
B N Mean Std Dev

1 4 4.00000000 1.82574186
2 8 6.00000000 2.50713268
3 6 5.00000000 1.54919334

Output of LSMeans and p-values for unprotected (LSD-like) pair-wise comparisons


Least Squares Means

H0:LSMean1=
Response LSMean2
A LSMEAN Pr > |t|

1 4.36666667 0.2227 NS
2 5.41111111

Response LSMEAN
B LSMEAN Number

1 4.00000000 1
2 6.46666667 2
3 4.20000000 3

Least Squares Means for effect B


Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: Response

i/j 1 2 3

1 0.0192 * 0.8579 NS
2 0.0192 0.0376 *
3 0.8579 0.0376

PLS205 2014 7 Lab Topics 10-11


T Comparison Lines for Least Squares Means of B
LS-means with the same
letter are not significantly
different.
Response LSMEAN B LSMEAN Number
A 6.4666667 2 2

B 4.2000000 3 3
B 4.0000000 1 1

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons
should be used.

As highlighted above, SAS offers the good advice to use p-values for planned comparisons only because
these particular pair-wise comparisons are being made without any control for MEER. In other words, it
is best to perform as few of these comparisons as absolutely necessary in order to keep the EER to a
minimum.
Output of LSMeans and p-values for protected (Tukey-like) pairwise comparisons
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer

H0:LSMean1=
Response LSMean2
A LSMEAN Pr > |t|

1 4.36666667 0.2227
2 5.41111111

Least Squares Means


Adjustment for Multiple Comparisons: Tukey-Kramer

Response LSMEAN
B LSMEAN Number

1 4.00000000 1
2 6.46666667 2
3 4.20000000 3

Least Squares Means for effect B


Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: Response

i/j 1 2 3

1 0.0470 * 0.9817 NS
2 0.0470 0.0887 NS
3 0.9817 0.0887

PLS205 2014 8 Lab Topics 10-11


Tukey-Kramer Comparison Lines for Least Squares Means of B
LS-means with the same letter
are not significantly different.
Response LSMEAN B LSMEAN Number
A 6.4666667 2 2
B A 4.2000000 3 3
B 4.0000000 1 1

Using the more conservative Tukey-Kramer test which controls MEER, we lose significance between
levels 2 and 3 of factor B.

The Take Home Message


For unbalanced designs, use LS Means and Type III SS.

For unbalanced mixed models with crossed factors, it is necessary to use a different SAS
procedure called Proc Mixed (ST&D 411) that will not be covered in this class. The syntax is
similar to Proc GLM, but the output is substantially more complex. Information about Proc
Mixed is available at:
https://jukebox.ucdavis.edu/slc/sasdocs/sashtml/stat/chap41/index.htm

PLS205 2014 9 Lab Topics 10-11

You might also like