You are on page 1of 25

DIY fractional polynomials

Patrick Royston
MRC Clinical Trials Unit , London
10 September 2010

Overview

Introduction to fractional polynomials


Going off-piste: DIY fractional polynomials
Examples

Fractional polynomial models

A fractional polynomial of degree 1 with power p1 is defined


as FP1 = 1 X p1
A fractional polynomial of degree 2 with powers (p1,p2) is
defined as FP2 = 1 X p1 + 2 X p2
Powers (p1,p2) are taken from a predefined set
S = {2, 1, 0.5, 0, 0.5, 1, 2, 3}
where 0 means log X
Also, there are repeated powers FP2 models
Example: FP1 [power 0.5] = 1 X0.5
Example: FP2 [powers (0.5, 3)] = 1 X0.5 + 2 X3
Example: FP2 [powers (3, 3)] = 1 X3 + 2 X3lnX

Some examples of fractional


polynomial (FP2) curves
(-2, 1)

(-2, 2)

(-2, -2)

(-2, -1)

Royston P, Altman DG (1994) Applied Statistics 43: 429-467.

FP analysis for the prognostic


effect of age in breast cancer

FP function selection procedure


Simple functions are preferred. More complicated functions are
accepted only if the fit is much better
Effect of age significant at 5% level?
2

df

P-value

Any effect?
Best FP2 versus null

17.61

0.0015

Linear function suitable?


Best FP2 versus linear

17.03

0.0007

FP1 sufficient?
Best FP2 vs. best FP1

11.20

0.0037

Fractional polynomials in Stata

fracpoly command
Basic syntax:
. fracpoly [, fp_options]: regn_cmd [yvar] xvar1 [xvars]
xvar1 is a continuous predictor which may have a curved
relationship with yvar
xvars are other predictors, all modelled as linear
Can use the fp_option compare to compare the fit of
different FP models
uses the FP function selection procedure

Example (auto data)

fracpoly, compare: regress mpg displacement


Fractional polynomial model comparisons:
-------------------------------------------------------------------------displacement
df
Deviance
Res. SD
Dev. dif. P (*) Powers
-------------------------------------------------------------------------Not in model
0
468.789
5.7855
70.818
0.000
Linear
1
417.801
4.12779
19.830
0.000 1
m = 1
2
400.592
3.67467
2.621
0.284 -2
m = 2
4
397.971
3.6355
--- -2 3
-------------------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2
model

Show FP1 and FP2 models in Stata (+ fracplot)

But what if fracpoly cant fit my


model ?
fracpoly supports only some of Statas rich set of
regression-type commands
Provided we know what the command we want to fit looks
like with a transformed covariate, we can fit an FP model to
the data
We just create the necessary transformed covariate values,
fit the model using them, and assess the fit
A new, simple command fracpoly_powers helps by
generating strings (local macros) with the required powers:
. fracpoly_powers [, degree(#) s(list_of_powers) ]

Fitting an FP2 model in the auto


example
// Store FP2 powers in local macros
fracpoly_powers, degree(2)
local np = r(np)
forvalues j = 1 / `np' {
local p`j' `r(p`j')'
}
// Compute deviance for each model with covariate displacement
local x displacement
local y mpg
local devmin 1e30
quietly forvalues j = 1 / `np' {
fracgen `x' `p`j'', replace
regress `y' `r(names)'
local dev = -2 * e(ll)
if `dev' < `devmin' {
local pbest `p`j''
local devmin `dev'
}
}
di "Best model has powers `pbest', deviance = " `devmin'

A real example: modelling fetal


growth
Prospective longitudinal study of n = 50 pregnant women
There are about 6 repeated measurements on each fetus at
different gestational ages (gawks)
gawks = gestational age in weeks
Wish to model how y = log fetal abdominal circumference
changes with gestational age
There is considerable curvature!

4.5

Log AC
5

5.5

The raw data

10

20

30
Gestational age, wk

40

A mixed model for fetal growth

Multilevel (mixed) model to fit this relationship:


. xtmixed y FP(gawks) || id: FP(gawks),
covariance(unstructured)
But how do we implement FP(gawks) here?
We want the best-fitting FP function of gawks, with random
effects for the parameters (s) of the FP model

Fitting an FP2 mixed model to the


fetal AC data
[First run fracpoly_powers to create local macros with powers]
// Compute deviance for each FP model with covariate gawks
gen x = gawks
gen y = ln(ac)
local devmin 1e30
forvalues j = 1 / `np' {
qui fracgen x `p`j'', replace adjust(mean)
qui xtmixed y `r(names)' || id: `r(names)', ///
nostderr covariance(unstructured)
local dev = -2 * e(ll)
if `dev' < `devmin' {
local p `p`j''
local devmin `dev'
}
di "powers = `p`j''" _col(20) " deviance = " %9.3f `dev'
}
di _n "Best model has powers `p', deviance = " `devmin'

Plots of some results

Residuals at the individual level

-.2

4.5

Residuals
-.1
0
.1

Log AC
5
5.5

.2

Fitted curves at the individual level

10

20
30
Gestational age, wk

40

-.2

Residuals
-.1
0
.1

.2

Residuals and fitted residuals

10

20
30
Gestational age, wk

40

10

20
30
Gestational age, wk

40

An ignorant example!

I know almost nothing about seemingly unrelated


regression (Statas sureg command)
It fits a set of linear regression models which have
correlated error terms
The syntax therefore has a set of equations
. sureg (depvar1 varlist1) (depvar2 varlist2) ... (depvarN
varlistN)
There may be non-linearities lurking in these equations
How can we fit FP models to varlist1, varlist2, ?

Example: modelling learning


scores
Stata FAQ from UCLA
(http://www.ats.ucla.edu/stat/stata/faq/sureg.htm):
What is seemingly unrelated regression and how can I
perform it in Stata?
Example: High School and Beyond study

Example: modelling learning


scores
Contains data from hsb2.dta
obs:
200
highschool and beyond (200 cases)
vars:
11
5 Jul 2010 13:23
size:
9,600 (99.9% of memory free)
------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
female
float %9.0g
fl
race
float %12.0g
rl
ses
float %9.0g
sl
schtyp
float %9.0g
scl
type of school
prog
float %9.0g
sel
type of program
read
float %9.0g
reading score
write
float %9.0g
writing score
math
float %9.0g
math score
science
float %9.0g
science score
socst
float %9.0g
social studies score
-------------------------------------------------------------------------------

[It is unclear to me what ses (low, middle, high) is]

Example (ctd.)

As an example, suppose we wish to model 2 outcomes (read,


math) as predicted by socst female ses and science
female ses using sureg as follows:
. sureg (read socst female ses) (math science female ses)
Are there non-linearities in read as a function of socst?
In math as a function of science?
For simplicity here, will restrict ourselves to FP1 functions of
socst and science
not necessary in principle
We fit the 8 8 = 64 FP1 models and look for the best-fitting
combination

Stata
gen x1 = socst
gen x2 = science
gen y1 = read
gen y2 = math
local devmin 1e30
forvalues j = 1 / `np' {
qui fracgen x1 `p`j'', replace adjust(mean)
local x1vars `r(names)'
forvalues k = 1 / `np' {
qui fracgen x2 `p`k'', replace adjust(mean)
local x2vars `r(names)'
qui sureg (y1 `x1vars' female ses) (y2 `x2vars' female ses)
local dev = -2 * e(ll)
if `dev' < `devmin' {
local px1 `p`j''
local px2 `p`k''
local devmin `dev'
}
}
}

[Run fpexample3.do in Stata]

Comments

The results suggest that there is indeed curvature in both


relationships
Can reject the null hypothesis of linearity at the 1%
significance level
FP1 vs linear: 2 = 10.08 (2 d.f.), P = 0.0065
Shows the importance of considering non-linearity

read as a function of socst


(adjusted female ses)

30

Partial predictor+residual of read


40
50
60
70

80

Fractional Polynomial (3),


adjusted for covariates

30

40

50
social studies score

60

70

math as a function of science


(adjusted female ses)

30

Partial predictor+residual of math


40
50
60
70

80

Fractional Polynomial (2),


adjusted for covariates

20

40

science score

60

80

Conclusions

Fractional polynomial models are a simple yet very useful


extension of linear functions and ordinary polynomials
If you are willing to do some straightforward do-file
programming, you can apply them in a bespoke manner to
a wide range of Stata regression-type commands and get
useful results
For (much) more, see Royston & Sauerbrei (2008) book

You might also like