What Does R Measure PDF

WHAT DOES R2 MEASURE?
Steve Thomson, University of Kentucky Computing Center

model sum of squares
total sum of squares .
The model sum of squares, SS(A,B), could also be
partitioned into sequential sums of squares. One
possible partition would be:
SS(A,B) = SS(A) + SS(BIA).
where both terms on the right are squared distances, and
thus positive. The SS(BIA) is the sum of squares for B
"adjusted for", or "orthogonalized to" A. So comparing
the R2 for the submodel with only A's as regressors to the
R2 for the complete model with A's and B's:
R2 = SS(A) ,; SS(A) + SS(BIA)
R2
A
SST
SST
Clearly R2 has to increase as variables are added to the
model (or at least not decrease). Also, note from the
geometry that 0 ,; R2 ,; 1.
The coefficient of determinaHon, squared multiple

correlation coefficient, that is, R2, is perhaps the most
extensively used measure of goodness of fit for
regression models. In my consulting, many people have
asked questions like: "My R2 was .6. That's good, isn't
it?", or "My R2 was only .05. That's bad, isn't it?". Thus
many people seem to use high values of R2 as an index
of fit, concluding that if R2 is above some value, they
have "good" fit, but that small values indicate "bad" fit. In
either case above, probably the best response is
"Compared to what?".
One interpretation of R2 is as the limit of the
regression sum of squares divided by the limit of the total
sum of squares. Or equivalently, it is an estimate of the

amount of variation in observation means compared to
the variation in means pius the variation in residual error.
This can be small even for models with small error, Or
large for models with large error.
If the model Includes an intercept or constant term, tt

is particularly easy to adjust. or orthogonalize, to this
constant, merely subtract the column mean from each
column. Most analyses are performed on such
"corrected" models. One last simple observation from the
geometry is that there are an uncountably infinite number
of subspaces with R2 = 1, namely, any subspace that
includes ~.
I. Review:
The general linear model can be interpreted as
investigating how well an observed nx1 vector, y, can be
represented in some lower dimensional subspace

spanned by the columns of the matrix of regressors.
(Recall that the span of a set of vectors is the set of all
linear combinations of the vectors, i.e. a subspace) The
coefficients of the vectors determining the points in the
subspace are the parameters. The least squares solution
for the parameters are the parameter values
corresponding to the point in the subspace i closest to y..
The various sums of squares in ANOVA can be
represented as (squared) Euclidian distances between
subspaces spanned by different groups of columns in the
Other results on R2;

1. If true replicates are added to the data, so that more
than one y value occurs at each ~, regressors value,
then R2 will often decrease (e.g. Healy, 1984, Draper,
1984). In fact, when there are true replicates its
achievable upper bound will be less than 1.0. In
practice, the resuHs for close~repHcates are often
similar.
regressor matrix.
2.
,
I
SSE
"P-'
For Yi = ~o + 1', x" + IJ" Xi' + ... +

Xi,p-' + 'j, with
the "; independen~ homogeneous normal random
variables with mean 0, when 1', = IJ" = ... = 1'.. , = 0, it
is easy to show that then the expected value of R2.
using corrected sums of squares, is
P' '
n l
(e.g.
Crocker, 1972, Seber, 1977. pg 115). A similar result

holds for R2 using uncorrected sums of squares (
In fact, under these assumptions, R2 follows
.distribution.
.).
n
beta
It would seem that a good fit index for comparing

completely different regression equations should be
optimized by the least squares equation, should be free of
the measurement units, and should be large for models
with small residual error and small for models with a large
residual error. Finally, the regressor variables are usually
considered as fixed. So a fit index should be indapendent
of their actual values. R2 satisfies the first two criteria, but
fails on the last two.
Figure 1. Geometry 01 ANOV A

(Note that distances are square roots of indicated ssq's.)
The total sum of squares, SST, is the (squared) distance
from y. to the origin. The error sum of squares, SSE, is
the (squared) distance from y. to i. in the span of A and B.
The model sum of squares, SS(A, B), is the (squared)
distance from i to the origin. By the Pythagorean
theorem,
SST = SS(A,B) + SSE.
II. An Alternative Definition:
So heuristically, a good model has a large SS(A,B)

compared to SST. This seems to justify the use of their
ratio as an index of fit. namely:
R2= SS(A,B)
SST -
An alternative definition of R2 that has been

recommended (e.g. Draper & Smith, 1981) is
1246
~ = squared correlation between regressand

( response ~ and predicted value .
1. Uncorrected models:
Since it is siightly Simpler, first consider a model not
adjusted for !he intercept. Then, since sums 01 squares
are uncorrected, writing Px as the projection onto the
the
columns
of
X.
subspace
spanned
by
Px =X (X' xr' X',
It is well known that for a linear model "corrected" for the

Intercept or constant term, estimated by ordinaty least
squares {so corrected sums -of squares are used in the
first deflnHlon}, the two definitions are equivalent. For

uncorrected models, R~ Is easily computed in SAS'"
PROC REG or GLM by OUTPUTing the predicted values
and computing correlations In PAOC CORA, and finally
squaring the result.
R2= SSA = y'PXJ. =

SST
yy.
= (ll'X'X6'1- 26'X'e + e'X(X'Xr'X'il/n
To IUustrete a possible problem with the second

definition, suppose below Ihal o's denole observed
values, and p's denote predicted values from \I1e least
squares equation, from a simple regression wilhout an
intercept. Then using Ihe second definition, R~ = 1.
o
(ll'X'X6 + 26'X'g + rlil/n
II the X's were stochastic and regular, B would be the

second moment matrix about zero of the regressors. So
in a sense, R2 is a measure of varfation in the X's. B~
weighted by the true coeffiCients, 6 Another way to look
at this. Is if II; is the mean of the i-\h observation,
expressed as a function of the regressors, then
Il' B l!
1\' B Il +0"
So as n gets large, plim A2
Figure 2.
L~2
Il' B 6= nlim_ ...!::2.-,
n
"
Yet Ihe predicted values, fis, are most certainly nol doing
a good job of predicting the observed values. Of course,
in this case, any linear function of the observed x's will
give an R~ = 1. It is slightiy more complicated lor multiple
i.e. the limiting average value of the individual squared

means. The expression on the right hand side has to be
interpreted carefully however. The ~'s are those
predicted Irom the regressors. This means, for example,
that lor a model wnh no intercept, H a constant is added to
the response, so that observations are moved further
from zero, R2 will actually tend to decrease, because the
model is not adjusted for this new parameter.
regression problems, but generally, even if A~ = t, there

Is no requirement thai the least squares hyperplane must
be "close" to y. Thai is, for any b, and nonzero a,
( Corr(, y:) }2 = ( Corr(a 2 + b, y:)
Since we could
generate a hyperplane passing through zero and any
specified value, that suggests that unless the possible
choices of coefficients is restricted in some way, ~ is not
explicHly a good measure of "fit".
l'}.
Further, note that, asymptotically at least, plim R' is

close to I if 0" is small {i.e. good fitl, or if Il' B Il is large
(i.e. large mean values). Also the contnbution 01 an
individual regressor vartable to R2 will be small when both
the long run mean value Of the squared regressor
variable is small and when its associated "beta-weight" is
small.
III. Limiting Behavior of R2'

The definition of R2 used In SAS procedures seems
to make more sense, R2 =
~~~,
2. Corrected models:
by default, using
SO that X'PX --> B'.

n
For convenience, write the orthogonal projection
matrices p. = X{X'Xr'X' and for i denoting the oxl vector
Suppose a general tinear model is appropriate, i.e.

the nxl regressand vector, y> is appropriately modeled as
~ = X ll. + J!, where the ei's are stochastically indapendent
..
X, X -; B, a positive definite matrix. This implies, with
SSTc
plim denoting limit in probabilily, thai plim X'!! = O. (since

n
-
0"
n
= lL,
a nxn matrix with all elements lin. Since

n
the j is a column of X, there is an g so thai i = X g. Also
then PxP, = P, = p,p.
Thus,
R' = SSf\, (Pxlt-P ,y:)t(p.~_p ,y:) It'Pxlt - yip ,l!
of l's, P,
with mean 0 and variance <:I. X is the nxp matri)( of

values of the regressor variables. The usual {strong}
assumption lor asymptotiC regression results is that
Var( X'i)=
can
show that If x'X --> B, a positive definite

n
matrix, then lor any orthogonal prOjection P, there is a S
One
corrected sums of squares jf the model h~s an intercept,

uncorrected if the model has no intercept. But thai
particular ratio is exactly what R2 is measuring. not
necessarily the quality of fit.
iY.-y'P,y
.,
By the law of large numbers, plim 1~ = 0, so
X'X.)
~k = ft'J<'X a f!!
1247
-0
-> Il'BaO = 0
-
y.'Y.-YP,~
xtp X
As n gets large, with --'-->Bo, say, then,
n
. yip ,?i
.
1l'X'P ,Xll lltXtj j'e
pllm--=pllm(
+
~+
The fact that R' is not quite a measure of fit is not

suprising. It is just the squared correlation between the
response variable y and a linear combination of the
regressor variables. Correlation measures how much
variables vary (linearly) together. So R2 has to depend
kk- )
n n
on the regressors as much as on the response variable.
= lltBoil
Note thaI asymptotically, as n increases, and the

number of regressors (Including the intercept), p, remains
constant R' has the same limit as the adjusted R' of
SAS PROC REG:
fttBll- ft'Boil
fttB'1l
Thus,plimR - t
t.R
2 RtB'R+"
II Bft -ll B"" + a " " (ywhere. now, B' is the limiting, centered matrix of
regression coefficients, corresponding to a moment
malrix of the regressors. Thus R' will be 'large" if if is
small (i.e. good fit). Of ifllt B"1l. is large (Le. large variation
among mean values).
2
R!q= 1-( I-R')~.

np
where n" = n-l for a corrected model with an intercept,
and n for an uncorrected model. This is meant to adjust
for the fact that R' always increases as regressors are
added to the model. R"aq only Increases if the t-statistic
for the added variable is greater than one In absoiute
magnitude. The same remarks would apply to the ofher
finitely corrected "adjusted" R"s, including, for example.
Amemiya's prediction criterion (Amemiya, 1985):
Or again. with correct specification, R2 can also be

n
L ('" interpreted
as
p:(n) f
lim ~;"""'--::---
n~
basically.
the
corrected regression sum of squares.
R' =1-(l-R').!!:!E.,
pc
np
designed to correct still further for the tendency of even
R~q to choose models with many variables. But again,
Ihese have the same asymptotic behavior as R'.
IV. So what does R' measure?

For a correctly specified model, R2 tends toward the
inverse of 1+-"-. where. again, B Is analogous to a

Il.tBIl
sample covariance matrix of regressors for the intercept
case, and a moment matrix for 'the no Intercept case. In
V. So again, what does R' measure?

Even in a correctly specified model, fl' is primarily
an estimate of the relative size of the variation in
observation means and the population variance. Within a
either case, the fttB{1. is a reasonable "regreSSion" sums

of squares, but is not much of an indicator of fit. If one
has perfect frt. ,; = 0, and R' will be 1. Of course, R' will
also tend to 1 as the variation in observation means gets
large.
single set of possible regressors. it could be a useful
index 01 fit to compare various submodels. But, models

with small error, and thus good fll, can have a high R2 or
a small R2 depending on the observation means, I.e. on
the vaJues of the regressor variables. So if R2 is large, it
does not necessarily indicate "good' fit, while R' .small
does not necessarily indicate "bad" fit
In practice, this means that if two experimenters

choose equal sample sizes from the same process, the
one who chooses his regressor variables to have more
variation will tend to have a higher R', For example, in
an economic context, suppose some process remains
true over the years and satisfies the standard regression
assumptions. Then an economist who uses monthly data
from that process will tend to have a lower fl' than one
who uses yearly data, since generally the yearly data will
have more variation. Similarly, suppose two nutritionists
are modeling weight gain as a lunction of calories and
prior weight All else being equal the nuttitionist who
chooses his sample to have more variation In calories
and prior weight will tend to have a higher R'. In an
educational context, with a true model, a researcher who
samples across schools in a stale, could be expected to
have a larger R' than a researcher who samples schools
in a certain county.
SAS is the registered trademark of $AS Institute, inc.,

Casy, NC, USA.
For further information contact:
Steve Thomson
University of Kentucky Computing Center
128 McVey Hall
leXington, KY 40506'()045
Actually this is not surprising. Under the usual

assumptions, the variance of the ordinary least squares
estimates of the /l's, is if (Xt Xl". So large variation in the
X's tends to reduce the variance of the estimates of the
Ws. Thus more variability in the X's increases efflctency.
But fl' is apparentiy not usually Interpreted as a measure
01 efficiency, but rather fit. However, R' is not quite a
measure of efficiency either. For any set of regressors,
R' would also tend to increase as the ft's get large. a
change which would have no effect on the precision of
the estimates.
1248
REFERENCES
Amemiya,
T.
(1985),
Advanced Econometrics.
Cambridge. Massachusetts: Harvaro University Press.
Barrett. J.P. (1974). "The Coefficient 01 Detenninalion Some Limitations", The American Statistician. 28.
19-20.
Chang. P.C. and Alift. A.C. (1987) "Goodness 01 Fit
Statistics for General Linear Regression Equations in
the Presence of Replicated Response". The American
Statistician. 41.195-199.
Crocker, D.C. (1972). "Some Interpretations of the
Multiple Correlation Coefficient", The American
Staustfcian, 26. 31-33.
Draper. N.R. (1984). "The Box-Wetz Criterion Versus R2".
Journal of the Royal Statistical SOCiety, Series A. 147,
101H03.
Draper. N.R. (1985). 'Corrections: The Box-Wetz
Criterion Versus R2". Joumat of /he Royal Statistical
Society, Series A. 148. 357.
Draper. N.R. and Smith. H. (1981). Applied Regression
Analysis. 2nd Ed. New York: Wiley.
Heaty. M.J.R. (1984). "TheUse 01 R2 as a Measure 01
Goodness 01 Fit". Joumat of the Royat Statislical
SOciety. SeriesA. 147.608-609.
Kvalselh. T.O. (1985). "Cautionary Note about R" The
American Statistician. 39, 279285.
Ranney. G.B. and Thigpen. C.C. (19Bl) "The Sample
Coefficient of Determination in Simple linear
Regression-. The American Statistician, 35, 152-153.
Seber, GAF. (1977). Linear Regression Analysis. New
York: Wiley.
Theil. H. and Chung. C.-F. (198B) "Inlonnation-Theoretic
Measures 01 Fit for Univariate and Multivariate linear
Regression". The American Staastician. 42. 249-252.
Willett, J.B. and Singer. J.B. (1988). "Another Cautionary
Note about R2: Its Use in Weighted Regression
Analysis". The American Statistician, 42236-238.
1249

What Does R Measure PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Does R Measure PDF

Uploaded by

Copyright:

Available Formats

WHAT DOES R2 MEASURE?

Steve Thomson, University of Kentucky Computing Center

The coefficient of determinaHon, squared multiple

sum of squares. Or equivalently, it is an estimate of the

If the model Includes an intercept or constant term, tt

represented in some lower dimensional subspace

Other results on R2;

For Yi = ~o + 1', x" + IJ" Xi' + ... +

Crocker, 1972, Seber, 1977. pg 115). A similar result

It would seem that a good fit index for comparing

Figure 1. Geometry 01 ANOV A

II. An Alternative Definition:

So heuristically, a good model has a large SS(A,B)

An alternative definition of R2 that has been

~ = squared correlation between regressand

It is well known that for a linear model "corrected" for the

first deflnHlon}, the two definitions are equivalent. For

R2= SSA = y'PXJ. =

= (ll'X'X6'1- 26'X'e + e'X(X'Xr'X'il/n

To IUustrete a possible problem with the second

(ll'X'X6 + 26'X'g + rlil/n

II the X's were stochastic and regular, B would be the

So as n gets large, plim A2

i.e. the limiting average value of the individual squared

regression problems, but generally, even if A~ = t, there

Further, note that, asymptotically at least, plim R' is

III. Limiting Behavior of R2'

SO that X'PX --> B'.

Suppose a general tinear model is appropriate, i.e.

X, X -; B, a positive definite matrix. This implies, with

plim denoting limit in probabilily, thai plim X'!! = O. (since

a nxn matrix with all elements lin. Since

with mean 0 and variance <:I. X is the nxp matri)( of

show that If x'X --> B, a positive definite

corrected sums of squares jf the model h~s an intercept,

By the law of large numbers, plim 1~ = 0, so

The fact that R' is not quite a measure of fit is not

on the regressors as much as on the response variable.

Note thaI asymptotically, as n increases, and the

R!q= 1-( I-R')~.

Or again. with correct specification, R2 can also be

corrected regression sum of squares.

IV. So what does R' measure?

inverse of 1+-"-. where. again, B Is analogous to a

V. So again, what does R' measure?

either case, the fttB{1. is a reasonable "regreSSion" sums

single set of possible regressors. it could be a useful

index 01 fit to compare various submodels. But, models

In practice, this means that if two experimenters

SAS is the registered trademark of $AS Institute, inc.,

Actually this is not surprising. Under the usual

You might also like