250 likes | 256 Views
DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit , London 10 September 2010. Introduction to fractional polynomials Going off-piste: DIY fractional polynomials Examples. Overview. A fractional polynomial of degree 1 with power p 1 is defined as FP1 = β 1 X p 1
E N D
DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit , London 10 September 2010
Introduction to fractional polynomials Going off-piste: DIY fractional polynomials Examples Overview
Afractional polynomial of degree 1 with power p1 is defined as FP1 = β1 X p1 Afractional polynomial of degree 2 with powers (p1,p2) is defined as FP2 = β1 X p1 + β2 X p2 Powers (p1,p2) are taken from a predefined set S = {2, 1, 0.5, 0, 0.5, 1, 2, 3} where 0 meanslog X Also, there are ‘repeated’ powers FP2 models Example: FP1 [power 0.5] = β1 X0.5 Example: FP2 [powers (0.5, 3)] = β1 X0.5 + β2 X3 Example: FP2 [powers (3, 3)] = β1 X3 + β2 X3lnX Fractional polynomial models
Some examples of fractional polynomial (FP2) curves Royston P, Altman DG (1994) Applied Statistics 43: 429-467.
FP analysis for the prognostic effect of age in breast cancer
Simple functions are preferred. More complicated functions are accepted only if the fit is much betterEffect of age significant at 5% level? FP function selection procedure χ2 df P-value Any effect? Best FP2 versus null 17.61 4 0.0015 Linear function suitable? Best FP2 versus linear 17.03 3 0.0007 FP1 sufficient? Best FP2 vs. best FP1 11.20 2 0.0037
fracpoly command Basic syntax: . fracpoly [, fp_options]: regn_cmd [yvar] xvar1 [xvars] … xvar1 is a continuous predictor which may have a curved relationship with yvar xvars are other predictors, all modelled as linear Can use the fp_option compare to compare the fit of different FP models uses the FP function selection procedure Fractional polynomials in Stata
fracpoly, compare: regress mpg displacement Fractional polynomial model comparisons: -------------------------------------------------------------------------- displacement df Deviance Res. SD Dev. dif. P (*) Powers -------------------------------------------------------------------------- Not in model 0 468.789 5.7855 70.818 0.000 Linear 1 417.801 4.12779 19.830 0.000 1 m = 1 2 400.592 3.67467 2.621 0.284 -2 m = 2 4 397.971 3.6355 -- -- -2 3 -------------------------------------------------------------------------- (*) P-value from deviance difference comparing reported model with m = 2 model Show FP1 and FP2 models in Stata (+ fracplot) Example (auto data)
fracpoly supports only some of Stata’s rich set of regression-type commands Provided we know what the command we want to fit looks like with a transformed covariate, we can fit an FP model to the data We just create the necessary transformed covariate values, fit the model using them, and assess the fit A new, simple command fracpoly_powers helps by generating strings (local macros) with the required powers: . fracpoly_powers [, degree(#) s(list_of_powers) ] But what if fracpoly can’t fit my model … ?
// Store FP2 powers in local macros fracpoly_powers, degree(2) local np = r(np) forvalues j = 1 / `np' { local p`j' `r(p`j')' } // Compute deviance for each model with covariate displacement local x displacement local y mpg local devmin 1e30 quietly forvalues j = 1 / `np' { fracgen `x' `p`j'', replace regress `y' `r(names)' local dev = -2 * e(ll) if `dev' < `devmin' { local pbest `p`j'' local devmin `dev' } } di "Best model has powers `pbest', deviance = " `devmin' Fitting an FP2 model in the auto example
Prospective longitudinal study of n = 50 pregnant women There are about 6 repeated measurements on each fetus at different gestational ages (gawks) gawks = gestational age in weeks Wish to model how y = log fetal abdominal circumference changes with gestational age There is considerable curvature! A real example: modelling fetal growth
Multilevel (mixed) model to fit this relationship: . xtmixed y FP(gawks) || id: FP(gawks), covariance(unstructured) But how do we implement “FP(gawks)” here? We want the best-fitting FP function of gawks, with random effects for the parameters (β’s) of the FP model A mixed model for fetal growth
[First run fracpoly_powers to create local macros with powers] // Compute deviance for each FP model with covariate gawks gen x = gawks gen y = ln(ac) local devmin 1e30 forvalues j = 1 / `np' { qui fracgen x `p`j'', replace adjust(mean) qui xtmixed y `r(names)' || id: `r(names)', /// nostderr covariance(unstructured) local dev = -2 * e(ll) if `dev' < `devmin' { local p `p`j'' local devmin `dev' } di "powers = `p`j''" _col(20) " deviance = " %9.3f `dev' } di _n "Best model has powers `p', deviance = " `devmin' Fitting an FP2 mixed model to the fetal AC data
I know almost nothing about “seemingly unrelated regression” (Stata’s sureg command) It fits a set of linear regression models which have correlated error terms The syntax therefore has a set of “equations” . sureg (depvar1 varlist1) (depvar2 varlist2) ... (depvarN varlistN) There may be non-linearities lurking in these “equations” How can we fit FP models to varlist1, varlist2, … ? An “ignorant” example!
Stata FAQ from UCLA (http://www.ats.ucla.edu/stat/stata/faq/sureg.htm): What is seemingly unrelated regression and how can I perform it in Stata? Example: High School and Beyond study Example: modelling learning scores
Contains data from hsb2.dta obs: 200 highschool and beyond (200 cases) vars: 11 5 Jul 2010 13:23 size: 9,600 (99.9% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g female float %9.0g fl race float %12.0g rl ses float %9.0g sl schtyp float %9.0g scl type of school prog float %9.0g sel type of program read float %9.0g reading score write float %9.0g writing score math float %9.0g math score science float %9.0g science score socst float %9.0g social studies score ------------------------------------------------------------------------------- [It is unclear to me what “ses” (low, middle, high) is] Example: modelling learning scores
As an example, suppose we wish to model 2 outcomes (read, math) as predicted by “socstfemale ses” and “sciencefemaleses” using sureg as follows: . sureg (read socst female ses) (math science female ses) Are there non-linearities in read as a function of socst?In math as a function of science? For simplicity here, will restrict ourselves to FP1 functions of socst and science not necessary in principle We fit the 8 × 8 = 64 FP1 models and look for the best-fitting combination Example (ctd.)
gen x1 = socst gen x2 = science gen y1 = read gen y2 = math local devmin 1e30 forvalues j = 1 / `np' { qui fracgen x1 `p`j'', replace adjust(mean) local x1vars `r(names)' forvalues k = 1 / `np' { qui fracgen x2 `p`k'', replace adjust(mean) local x2vars `r(names)' qui sureg (y1 `x1vars' female ses) (y2 `x2vars' female ses) local dev = -2 * e(ll) if `dev' < `devmin' { local px1 `p`j'' local px2 `p`k'' local devmin `dev' } } } [Run fpexample3.do in Stata] Stata
The results suggest that there is indeed curvature in both relationships Can reject the null hypothesis of linearity at the 1% significance level FP1 vs linear: χ2 = 10.08 (2 d.f.), P = 0.0065 Shows the importance of considering non-linearity Comments
Fractional polynomial models are a simple yet very useful extension of linear functions and ordinary polynomials If you are willing to do some straightforward do-file programming, you can apply them in a bespoke manner to a wide range of Stata regression-type commands and get useful results For (much) more, see Royston & Sauerbrei (2008) book Conclusions