You are on page 1of 4

WHAT DOES R2 MEASURE?

Steve Thomson, University of Kentucky Computing Center


model sum of squares
total sum of squares .
The model sum of squares, SS(A,B), could also be
partitioned into sequential sums of squares. One
possible partition would be:
SS(A,B) = SS(A) + SS(BIA).
where both terms on the right are squared distances, and
thus positive. The SS(BIA) is the sum of squares for B
"adjusted for", or "orthogonalized to" A. So comparing
the R2 for the submodel with only A's as regressors to the
R2 for the complete model with A's and B's:
R2 = SS(A) ,; SS(A) + SS(BIA)
R2
A
SST
SST
Clearly R2 has to increase as variables are added to the
model (or at least not decrease). Also, note from the
geometry that 0 ,; R2 ,; 1.

The coefficient of determinaHon, squared multiple


correlation coefficient, that is, R2, is perhaps the most
extensively used measure of goodness of fit for
regression models. In my consulting, many people have
asked questions like: "My R2 was .6. That's good, isn't
it?", or "My R2 was only .05. That's bad, isn't it?". Thus
many people seem to use high values of R2 as an index
of fit, concluding that if R2 is above some value, they
have "good" fit, but that small values indicate "bad" fit. In
either case above, probably the best response is
"Compared to what?".
One interpretation of R2 is as the limit of the
regression sum of squares divided by the limit of the total

sum of squares. Or equivalently, it is an estimate of the


amount of variation in observation means compared to
the variation in means pius the variation in residual error.
This can be small even for models with small error, Or
large for models with large error.

If the model Includes an intercept or constant term, tt


is particularly easy to adjust. or orthogonalize, to this
constant, merely subtract the column mean from each
column. Most analyses are performed on such
"corrected" models. One last simple observation from the
geometry is that there are an uncountably infinite number
of subspaces with R2 = 1, namely, any subspace that
includes ~.

I. Review:
The general linear model can be interpreted as
investigating how well an observed nx1 vector, y, can be

represented in some lower dimensional subspace


spanned by the columns of the matrix of regressors.
(Recall that the span of a set of vectors is the set of all
linear combinations of the vectors, i.e. a subspace) The
coefficients of the vectors determining the points in the
subspace are the parameters. The least squares solution
for the parameters are the parameter values
corresponding to the point in the subspace i closest to y..
The various sums of squares in ANOVA can be
represented as (squared) Euclidian distances between
subspaces spanned by different groups of columns in the

Other results on R2;


1. If true replicates are added to the data, so that more
than one y value occurs at each ~, regressors value,
then R2 will often decrease (e.g. Healy, 1984, Draper,
1984). In fact, when there are true replicates its
achievable upper bound will be less than 1.0. In
practice, the resuHs for close~repHcates are often
similar.

regressor matrix.

2.

,
I

SSE

"P-'

For Yi = ~o + 1', x" + IJ" Xi' + ... +


Xi,p-' + 'j, with
the "; independen~ homogeneous normal random
variables with mean 0, when 1', = IJ" = ... = 1'.. , = 0, it
is easy to show that then the expected value of R2.
using corrected sums of squares, is

P' '
n l

(e.g.

Crocker, 1972, Seber, 1977. pg 115). A similar result


holds for R2 using uncorrected sums of squares (
In fact, under these assumptions, R2 follows
.distribution.

.).

n
beta

It would seem that a good fit index for comparing


completely different regression equations should be
optimized by the least squares equation, should be free of
the measurement units, and should be large for models
with small residual error and small for models with a large
residual error. Finally, the regressor variables are usually
considered as fixed. So a fit index should be indapendent
of their actual values. R2 satisfies the first two criteria, but
fails on the last two.

Figure 1. Geometry 01 ANOV A


(Note that distances are square roots of indicated ssq's.)
The total sum of squares, SST, is the (squared) distance
from y. to the origin. The error sum of squares, SSE, is
the (squared) distance from y. to i. in the span of A and B.
The model sum of squares, SS(A, B), is the (squared)
distance from i to the origin. By the Pythagorean
theorem,
SST = SS(A,B) + SSE.

II. An Alternative Definition:

So heuristically, a good model has a large SS(A,B)


compared to SST. This seems to justify the use of their
ratio as an index of fit. namely:
R2= SS(A,B)
SST -

An alternative definition of R2 that has been


recommended (e.g. Draper & Smith, 1981) is

1246

~ = squared correlation between regressand


( response ~ and predicted value .

1. Uncorrected models:
Since it is siightly Simpler, first consider a model not
adjusted for !he intercept. Then, since sums 01 squares
are uncorrected, writing Px as the projection onto the
the
columns
of
X.
subspace
spanned
by
Px =X (X' xr' X',

It is well known that for a linear model "corrected" for the


Intercept or constant term, estimated by ordinaty least
squares {so corrected sums -of squares are used in the

first deflnHlon}, the two definitions are equivalent. For


uncorrected models, R~ Is easily computed in SAS'"
PROC REG or GLM by OUTPUTing the predicted values
and computing correlations In PAOC CORA, and finally
squaring the result.

R2= SSA = y'PXJ. =


SST

yy.

= (ll'X'X6'1- 26'X'e + e'X(X'Xr'X'il/n

To IUustrete a possible problem with the second


definition, suppose below Ihal o's denole observed
values, and p's denote predicted values from \I1e least
squares equation, from a simple regression wilhout an
intercept. Then using Ihe second definition, R~ = 1.
o

(ll'X'X6 + 26'X'g + rlil/n

II the X's were stochastic and regular, B would be the


second moment matrix about zero of the regressors. So
in a sense, R2 is a measure of varfation in the X's. B~
weighted by the true coeffiCients, 6 Another way to look
at this. Is if II; is the mean of the i-\h observation,
expressed as a function of the regressors, then

Il' B l!
1\' B Il +0"

So as n gets large, plim A2

Figure 2.

L~2
Il' B 6= nlim_ ...!::2.-,
n

"

Yet Ihe predicted values, fis, are most certainly nol doing
a good job of predicting the observed values. Of course,
in this case, any linear function of the observed x's will
give an R~ = 1. It is slightiy more complicated lor multiple

i.e. the limiting average value of the individual squared


means. The expression on the right hand side has to be
interpreted carefully however. The ~'s are those
predicted Irom the regressors. This means, for example,
that lor a model wnh no intercept, H a constant is added to
the response, so that observations are moved further
from zero, R2 will actually tend to decrease, because the
model is not adjusted for this new parameter.

regression problems, but generally, even if A~ = t, there


Is no requirement thai the least squares hyperplane must
be "close" to y. Thai is, for any b, and nonzero a,
( Corr(, y:) }2 = ( Corr(a 2 + b, y:)
Since we could
generate a hyperplane passing through zero and any
specified value, that suggests that unless the possible
choices of coefficients is restricted in some way, ~ is not
explicHly a good measure of "fit".

l'}.

Further, note that, asymptotically at least, plim R' is


close to I if 0" is small {i.e. good fitl, or if Il' B Il is large
(i.e. large mean values). Also the contnbution 01 an
individual regressor vartable to R2 will be small when both
the long run mean value Of the squared regressor
variable is small and when its associated "beta-weight" is
small.

III. Limiting Behavior of R2'


The definition of R2 used In SAS procedures seems
to make more sense, R2 =

~~~,

2. Corrected models:

by default, using

SO that X'PX --> B'.


n
For convenience, write the orthogonal projection
matrices p. = X{X'Xr'X' and for i denoting the oxl vector

Suppose a general tinear model is appropriate, i.e.


the nxl regressand vector, y> is appropriately modeled as
~ = X ll. + J!, where the ei's are stochastically indapendent

..

X, X -; B, a positive definite matrix. This implies, with

SSTc

plim denoting limit in probabilily, thai plim X'!! = O. (since


n
-

0"
n

= lL,

a nxn matrix with all elements lin. Since


n
the j is a column of X, there is an g so thai i = X g. Also
then PxP, = P, = p,p.
Thus,
R' = SSf\, (Pxlt-P ,y:)t(p.~_p ,y:) It'Pxlt - yip ,l!

of l's, P,

with mean 0 and variance <:I. X is the nxp matri)( of


values of the regressor variables. The usual {strong}
assumption lor asymptotiC regression results is that

Var( X'i)=

can

show that If x'X --> B, a positive definite


n
matrix, then lor any orthogonal prOjection P, there is a S
One

corrected sums of squares jf the model h~s an intercept,


uncorrected if the model has no intercept. But thai
particular ratio is exactly what R2 is measuring. not
necessarily the quality of fit.

iY.-y'P,y

.,

By the law of large numbers, plim 1~ = 0, so

X'X.)

~k = ft'J<'X a f!!

1247

-0

-> Il'BaO = 0
-

y.'Y.-YP,~

xtp X
As n gets large, with --'-->Bo, say, then,
n
. yip ,?i
.
1l'X'P ,Xll lltXtj j'e
pllm--=pllm(
+
~+

The fact that R' is not quite a measure of fit is not


suprising. It is just the squared correlation between the
response variable y and a linear combination of the
regressor variables. Correlation measures how much
variables vary (linearly) together. So R2 has to depend

kk- )
n n

on the regressors as much as on the response variable.

= lltBoil

Note thaI asymptotically, as n increases, and the


number of regressors (Including the intercept), p, remains
constant R' has the same limit as the adjusted R' of
SAS PROC REG:

fttBll- ft'Boil
fttB'1l
Thus,plimR - t
t.R
2 RtB'R+"
II Bft -ll B"" + a " " (ywhere. now, B' is the limiting, centered matrix of
regression coefficients, corresponding to a moment
malrix of the regressors. Thus R' will be 'large" if if is
small (i.e. good fit). Of ifllt B"1l. is large (Le. large variation
among mean values).
2

R!q= 1-( I-R')~.


np
where n" = n-l for a corrected model with an intercept,
and n for an uncorrected model. This is meant to adjust
for the fact that R' always increases as regressors are
added to the model. R"aq only Increases if the t-statistic
for the added variable is greater than one In absoiute
magnitude. The same remarks would apply to the ofher
finitely corrected "adjusted" R"s, including, for example.
Amemiya's prediction criterion (Amemiya, 1985):

Or again. with correct specification, R2 can also be


n

L ('" interpreted

as

p:(n) f

lim ~;"""'--::---

n~

basically.

the

corrected regression sum of squares.

R' =1-(l-R').!!:!E.,

pc
np
designed to correct still further for the tendency of even
R~q to choose models with many variables. But again,
Ihese have the same asymptotic behavior as R'.

IV. So what does R' measure?


For a correctly specified model, R2 tends toward the

inverse of 1+-"-. where. again, B Is analogous to a


Il.tBIl
sample covariance matrix of regressors for the intercept
case, and a moment matrix for 'the no Intercept case. In

V. So again, what does R' measure?


Even in a correctly specified model, fl' is primarily
an estimate of the relative size of the variation in
observation means and the population variance. Within a

either case, the fttB{1. is a reasonable "regreSSion" sums


of squares, but is not much of an indicator of fit. If one
has perfect frt. ,; = 0, and R' will be 1. Of course, R' will
also tend to 1 as the variation in observation means gets
large.

single set of possible regressors. it could be a useful

index 01 fit to compare various submodels. But, models


with small error, and thus good fll, can have a high R2 or
a small R2 depending on the observation means, I.e. on
the vaJues of the regressor variables. So if R2 is large, it
does not necessarily indicate "good' fit, while R' .small
does not necessarily indicate "bad" fit

In practice, this means that if two experimenters


choose equal sample sizes from the same process, the
one who chooses his regressor variables to have more
variation will tend to have a higher R', For example, in
an economic context, suppose some process remains
true over the years and satisfies the standard regression
assumptions. Then an economist who uses monthly data
from that process will tend to have a lower fl' than one
who uses yearly data, since generally the yearly data will
have more variation. Similarly, suppose two nutritionists
are modeling weight gain as a lunction of calories and
prior weight All else being equal the nuttitionist who
chooses his sample to have more variation In calories
and prior weight will tend to have a higher R'. In an
educational context, with a true model, a researcher who
samples across schools in a stale, could be expected to
have a larger R' than a researcher who samples schools
in a certain county.

SAS is the registered trademark of $AS Institute, inc.,


Casy, NC, USA.
For further information contact:
Steve Thomson
University of Kentucky Computing Center
128 McVey Hall
leXington, KY 40506'()045

Actually this is not surprising. Under the usual


assumptions, the variance of the ordinary least squares
estimates of the /l's, is if (Xt Xl". So large variation in the
X's tends to reduce the variance of the estimates of the
Ws. Thus more variability in the X's increases efflctency.
But fl' is apparentiy not usually Interpreted as a measure
01 efficiency, but rather fit. However, R' is not quite a
measure of efficiency either. For any set of regressors,
R' would also tend to increase as the ft's get large. a
change which would have no effect on the precision of
the estimates.

1248

REFERENCES
Amemiya,
T.
(1985),
Advanced Econometrics.
Cambridge. Massachusetts: Harvaro University Press.
Barrett. J.P. (1974). "The Coefficient 01 Detenninalion Some Limitations", The American Statistician. 28.
19-20.
Chang. P.C. and Alift. A.C. (1987) "Goodness 01 Fit
Statistics for General Linear Regression Equations in
the Presence of Replicated Response". The American
Statistician. 41.195-199.
Crocker, D.C. (1972). "Some Interpretations of the
Multiple Correlation Coefficient", The American
Staustfcian, 26. 31-33.
Draper. N.R. (1984). "The Box-Wetz Criterion Versus R2".
Journal of the Royal Statistical SOCiety, Series A. 147,
101H03.
Draper. N.R. (1985). 'Corrections: The Box-Wetz
Criterion Versus R2". Joumat of /he Royal Statistical
Society, Series A. 148. 357.
Draper. N.R. and Smith. H. (1981). Applied Regression
Analysis. 2nd Ed. New York: Wiley.
Heaty. M.J.R. (1984). "TheUse 01 R2 as a Measure 01
Goodness 01 Fit". Joumat of the Royat Statislical
SOciety. SeriesA. 147.608-609.
Kvalselh. T.O. (1985). "Cautionary Note about R" The
American Statistician. 39, 279285.
Ranney. G.B. and Thigpen. C.C. (19Bl) "The Sample
Coefficient of Determination in Simple linear
Regression-. The American Statistician, 35, 152-153.
Seber, GAF. (1977). Linear Regression Analysis. New
York: Wiley.
Theil. H. and Chung. C.-F. (198B) "Inlonnation-Theoretic
Measures 01 Fit for Univariate and Multivariate linear
Regression". The American Staastician. 42. 249-252.
Willett, J.B. and Singer. J.B. (1988). "Another Cautionary
Note about R2: Its Use in Weighted Regression
Analysis". The American Statistician, 42236-238.

1249

You might also like