Professional Documents
Culture Documents
Coefficient of Determination
Unit 3
Coefficient of Determination, r2
Once weve decided its appropriate to use
a line, we need to think about assessing the
accuracy of predictions.
Coefficient of Determination, r2
Suppose we wish to predict the price of homes in a
particular city. We take a random sample of 20
houses to get y = price and x = size (our housing
data).
Clearly, we are going to get some variability in the price,
since houses differ in price.
How much of this variability in price can be explained by the
fact that price is related to size and houses differ in size?
If a lot of the variation in price can be accounted for by
house size, a prediction of price based on house size will be
a big improvement over a prediction not based on house
size.
Our best guess, here, would be the average price of our
sample (y-bar).
Coefficient of Determination, r2
The Coefficient of Determination, r2, is
the proportion of variation in y that can
be attributed to the approximate linear
relationship between x and y. (or that
can be explained by the linear
relationship between x and y).
Coefficient of Determination, r2
r2 is useful because:
it gives the proportion of the variance
(fluctuation) of one variable that is
predictable from the other variable
explains how much of the variability in
the y's can be explained by the fact that
they are related to x
Formula continued
We then find the Sum of Squared
Residuals (SSR)
SSR = (yi i)
Is also called SSE or sum of squares of error
This is sometimes referred to as a measure of
the unexplained variation. Or the amount of
variation in y that cannot be attributed to the
linear relationship between x and y
Formula continued
This gives us
r = 1 (SSR / SSTotal)
If I multiply by 100, I get the percentage of
y variation attributable to the approximate
linear relationship between x and y.
The book uses the formula:
r = (SSM SSE) / SSM
Couple of Examples:
Example
Suppose from our strong example that
r = .9 then r = .81
This means that 81% of the variation in the y
variable is accounted for by the linear
relationship between x and y
Suppose the other model:
r = -.4 then r = .16
This means that only 16% of the variation in the
y variable is accounted for by the linear
relationship
Some points
Always use in context
Must interpret the r with our sentence. Do
not say:
The regression equation can predict 81% of
the data points
81% of data points lie on the LSRL
LSRL accounts for 81% of the data points
Properties of r2
Properties to note:
r2 ranges in value from 0 to 1.0
The magnitude of r2 is proportional to the
strength of the linear relation between x and y
The location of the r2 value relative to 0 and 1.0
indicates the relative proximity of the linear relation
is to a perfect linear relation and no linear relation
Examples
An r2 value of 0.75 indicates that the linear relation
is the distance between:
No linear relation between x and y
A perfect linear relation between x and y.
Se = (SSR / n-2)
This measures the typical amount by
which an observation deviates from
the LSRL (analogous to sample
standard deviation)
Example
Homework
Textbook pp. 190 196
# 15, 16, 31, 32, 47
Example
3.36.pdf
anova tables.pdf
anova answers.pdf