You are on page 1of 24

Analysis of Variance

Introduction
Analysis of Variance
The Analysis of Variance is abbreviated as
ANOVA
Used for hypothesis testing in
Simple Regression
Multiple Regression
Comparison of Means
Sources
There is variation anytime that all of the
data values are not identical
This variation can come from different
sources such as the model or the factor
There is always the left-over variation that
cant be explained by any of the other
sources. This source is called the error
Variation
Variation is the sum of squares of the
deviations of the values from the mean of
those values
As long as the values are not identical,
there will be variation
Abbreviated as SS for Sum of Squares
Degrees of Freedom
The degrees of freedom are the number
of values that are free to vary once certain
parameters have been established
Usually, this is one less than the sample
size, but in general, its the number of
values minus the number of parameters
being estimated
Abbreviated as df
Variance
The sample variance is the average
squared deviation from the mean
Found by dividing the variation by the
degrees of freedom
Variance = Variation / df
Abbreviated as MS for Mean of the
Squares
MS = SS / df
F
F is the F test statistic
There will be an F test statistic for each
source except for the error and total
F is the ratio of two sample variances
The MS column contains variances
The F test statistic for each source is the
MS for that row divided by the MS of the
error row
F
F requires a pair of degrees of freedom,
one for the numerator and one for the
denominator
The numerator df is the df for the source
The denominator df is the df for the error
row
F is always a right tail test
The ANOVA Table
The ANOVA table is composed of rows,
each row represents one source of
variation
For each source of variation
The variation is in the SS column
The degrees of freedom is in the df column
The variance is in the MS column
The MS value is found by dividing the SS by
the df
ANOVA Table
The complete ANOVA table can be
generated by most statistical packages
and spreadsheets
Well concentrate on understanding how
the table works rather than the formulas
for the variations
The ANOVA Table
Source SS df MS F
(variation) (variance)

Explained*

Error

Total

The explained* variation has different names depending on the particular type
of ANOVA problem
Example 1

Source SS df MS F

Explained 18.9 3

Error 72.0 16

Total

The Sum of Squares and Degrees of Freedom are given. Complete the table.
Example 1 Find Totals

Source SS df MS F

Explained 18.9 3

Error 72.0 16

Total 90.9 19

Add the SS and df columns to get the totals.


Example 1 Find MS

Source SS df MS F

Explained 18.9 3 = 6.30

Error 72.0 16 = 4.50

Total 90.9 19 = 4.78

Divide SS by df to get MS.


Example 1 Find F

Source SS df MS F

Explained 18.9 3 6.30 1.40

Error 72.0 16 4.50

Total 90.9 19 4.78

F = 6.30 / 4.50 = 1.4


Notes about the ANOVA
The MS(Total) isnt actually part of the
ANOVA table, but it represents the sample
variance of the response variable, so its
useful to find
The total df is one less than the sample
size
You would either need to find a Critical F
value or the p-value to finish the
hypothesis test
Example 2

Source SS df MS F

Explained 106.6 21.32 2.60

Error 26

Total

Complete the table


Example 2 Step 1

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 26 8.20

Total

SS / df = MS, so 106.6 / df = 21.32. Solving for df gives df = 5.


F = MS(Source) / MS(Error), so 2.60 = 21.32 / MS. Solving gives MS = 8.20.
Example 2 Step 2

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 31

SS / df = MS, so SS / 26 = 8.20. Solving for SS gives SS = 213.2.


The total df is the sum of the other df, so 5 + 26 = 31.
Example 2 Step 3

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 319.8 31

Find the total SS by adding the 106.6 + 213.2 = 319.8


Example 2 Step 4

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 319.8 31 10.32

Find the MS(Total) by dividing SS by df. 319.8 / 31 = 10.32


Example 2 Notes
Since there are 31 df, the sample size was
32
Since the sample variance was 10.32 and
the standard deviation is the square root
of the variance, the sample standard
deviation is 3.21
Example 3

Source SS df MS F

Explained 56.7

Error 14 13.50

Total

The sample size is n = 20. Work this one out on your own!
Example 3 - Solution

Source SS df MS F

Explained 56.7 5 11.34 0.84

Error 189.0 14 13.50

Total 245.7 19 12.93

How did you do?

You might also like