Application of Fractional Polynomials in Modeling Epidemiological Data

Use of fractional polynomials in medical research Philip Bassey

The selected Papers:

PRESENTATION OUTLINE: • 1. Background / Motivation about the subject • 2. Overview of the selected Papers • 3. Overview of multinomial /polynomials • 4. Overview of Fractional polynomials • 5. Overview of modeling in epidemiology • 6. Application of Fractional Polynomials in model building in epidemiology / clinical practice • 7. Summary

Background & MOTIVATION FOR CHOOSING THIS TOPIC: • In observational studies, we usually come across a mix of continuous, binary, (ordered) and unordered categorical variables. • We develop models to explain possible associations between the outcome and the variables (Y & Xs) ; and for predictive models; how the Xs predict the Y. • This usually involves Identifying variables with (strong) influence on the outcome (β coeffs) and for continuous variables we model or try to determine their functional forms. • For most variables - subject-matter knowledge can be used for the modeling, however for some variables, data-driven choice are inevitable

Edmore Marinda: in the article “ Use of fractional polynomials in medical research “ has argued that : • Categorization of continuous variables (e.g. age groups: 20 - 24,25 -29,…) assumes homogeneity of the trait under consideration within each specified category. • Categorization may result in overparameterized models with loss of efficiency and the mis-specification of the true functional form of a predictor variable. • FPs are preferable to step functions (categorization) because they better control for confounding • Evidence from studies has shown that the fewer the categories created for a continuous variable, the higher the resultant bias.

Edmore Marinda (cont’d): • Medical knowledge may dictate that the relationship between an outcome and a predictor variable is monotonic or there is some leveling off (asymptote) at high or lower values of the predictor. • Therefore when we “force” nonlinear predictors into linear models by categorizing them, we distort the true association (leading to bias). • Categorizing confounding variables may result in residual confounding; where the bias due to confounding is not substantially removed. • THEREFORE: there is a need to “Investigate functional forms of continuous predictor variables”

Omer et al. The aims of the article: The application of MFP methods in building Logistic regression model by using R package. The authors observed and rightly so that continuous variables are often encountered in medical studies ; and are often used to assess risk or prognosis or to select a therapy. Consequently, in the paper a breast cancer data with 15 covariates variables was used to determine the covariates variables that influence breast cancer. • Statistical comparison was used to compare logistic regression model with MFP logistic regression. The result of their study showed that using of MFP logistic regression yielded more accurate model and improving logistic regression classification accuracy.

Omer et al. cont’d - Provision of some insight as to how the FP modeling is done: . Fractional multivariate polynomials (FMP) identified as a useful extension of polynomial regression and as a sensible way to model relationships (Royston and Sauerbrei 2008). The use of a suitable function selection procedure (FSP) provides a simple way to check whether a linear function (default) is adequate or whether a non-linear FP function improves the fit of the data substantially. The Multivariable Fractional Polynomial (MFP) method is facilitated by software that determines whether an explanatory variable is important for the model, and its functional form. There are two components of the procedure: (I) backward elimination of covariates that are statistically insignificant; and (II) iterative examination of the scale of all continuous covariates.

Omer et al. cont’d - Provision of some insight as to how the FP modeling is done: . The method relies on two significance levels ; α1, for the elimination and addition of covariates, and α2 for determining the significance of fractional transformation of continuous covariates. • The first cycle is to include all potential explanatory covariates into the model. • Alternatively, variables with P<0.25 or 0.2 in the univariate analysis can be combined into the initial model. • All dichotomous and categorical variables are not subject to Fractional Polynomial (FP) transformation and are modeled with one degree of freedom. • They are tested for their contribution to the model using α1 by Wald test.

Omer et al. cont’d - Provision of some insight as to how the FP modeling is done: . • Continuous variables are modeled using closed test to observe whether they should be saved or distant using α1, and whether transformation should be performed using α2. • The closed test initiates through comparing the best fitting second-degree fractional polynomial (FP2) with null model. • The term is dropped if the test is non-significant. Then the best-fitting FP2 is compared with the linear term. • Linear term is adopted if the test is non-significant. Or else • Continue to compare the best fitting FP2 to the best fitting FP1. • If the test is significant the best fitting FP2 is accepted. Otherwise the best-fitting FP1 is adopted. .

Omer et al. cont’d - Provision of some insight as to how the FP modeling is done: . • The second cycle of iterations begins with a fit of the model covering significant covariates, either in their original or polynomial transformed form. • All covariates are then examined in descending order of significance for their inclusion, exclusion and possible transformation. • The procedure stops when two conversation steps contain the same covariates with the same FP transformations..

Model building with continuous variables: • In epidemiological & clinical studies we cannot run away from continuous variables /predictors / risk factors : – To model or categorize ? THE TRADITIONAL APPROACH: 1. MODEL (assume linear function) 2. CATEGORIZE [ dichotomization] – determining the optimal cut-point .

Newer aproaches in dealing with continuous variables- • Non parametric (Local influence) models - Locally weighted (kernel) fits (e.g.lowess) - Regression splines - Smoothing splines • Parametric (non local influence) models - Polynomials - Non-linear curves - Fractional polynomials [intermediate between polynomials & non-linear curves]

Review of the basics of regression models:

OVERVIEW OF MULTINOMIALS: • Mononomial expressions have only single terms in the expression. (e.g.) • A Multinomial is an algebraic expression having more than one term. For example: 5x + 9, 6y2 + 2y – 5, 9x3 + 2x2 + 5; • The operations involved in forming a multinomial are addition, subtraction, multiplication, and division (+,−,×,÷) • A polynomial is a finite length term with constants, variables as well as exponents, which are combined by addition, subtraction and multiplication arithmetic symbols. • A multinomial is also called a polynomial

5x2 − 2x2 − 3x2 has no degree; it is a zero polynomial. • A polynomial of degree zero reduces to a single term i.e. a (non-zero constant). E.g. 3x0 + 2x0 = 5 • 2x + y − z + 1 is a polynomial of degree one (a linear polynomial) • The highest sum of the exponents of all the terms of a polynomial defines the degree of the polynomial: xyz + x + y + z is a polynomial of degree three; • Exercise: The degree of a polynomial with the terms: 5x2 – 2y2 + 9x2y2 =??

A polynomial all of whose terms have the same exponent is said to be a homogeneous polynomial (e.g. 5x2 + 2x2 − 3y2 ) • Polynomials in the forms of the first, second, and third degrees are said to be linear, quadratic, and cubic. • Forms in two or three variables are called binary or ternary, for example: • x2 + y2 Binary quadratic polynomial • x2 + y2 + z2 − xy − yz − xz Ternary quadratic form.

Overview of Polynomials: y = 2x + z + 1 is a polynomial of degree one (a linear polynomial) Second order polynomial in one variable Second order polynomial in two variables

A model is said to be hierarchical if it contains the terms . in a hierarchy. For example, the model: An assumption in usual multiple linear regression analysis is that all the independent variables are independent; This assumption is not satisfied in polynomial regression model; however multicollinearity can be removed by centering or by using orthogonal polynomials [this involves mathematical functions- summation / integration]

Polynomial regression models: Then the model is multiple linear regressions model in explanatory variables k ; x1, x2,……..xk ; so the linear model y= βo + βx + ε ; includes the polynomial regression model. Thus the techniques for fitting linear regression model can be used for fitting the polynomial regression model

is a polynomial regression model in one variable and is called a second order model or quadratic model. The coefficients β1 and β2 are called the linear effect parameter and quadratic effect parameter respectively. The interpretation of parameter βo is, βo = E(y) when x=0 and it can be included in the model provided the range of data includes x=0. If x=0, then βo has no interpretation.

Fractional polynomial models: • A generalization of the polynomial function, called fractional polynomials (FP for short), was proposed by Royston and Altman (1994) and Royston and Sauerbrei (2008). Description for one covariate, X : • Fractional polynomial of degree m for X with powers p1,…pm is given by Powers p1,…, pm are exponents taken from a special set of numbers { -2, -1, -0.5, 0, 0.5, 1, 2, 3}

Fractional polynomial models: • The general m-degree fractional polynomial function of a continuous variable x is given by Thus: x0 = log X ; and where p1 =p2 ; β1x1 + β2x2 = β1xp1 + β1xp1 logX [ The so-called repeated-powers FP2 model] FPs differ from the conventional polynomial in that the power p can be a non-integer number. Usually m = 1 or m = 2 is sufficient for a good fit

The FP1 and FP2 degree models: • There are 8 FP1 and 36 FP2 possible models

FP curves are based on the subset of the power, S = {-2, -1,- 0.5, 0, 0.5, 1, 2, 3} • Usually, FP models can be first- or second-degree • These could include: linear, reciprocal, logarithmic, square root, and square transformation of x. • If the values of the powers p1; . . . ; pm are known, fitting a fractional polynomial model is similar to conducting a conventional linear regression analysis. • However, the powers of a fractional polynomial model are usually unknown and need to be estimated from the data

An additional extension is with models that involve repeated powers such as (1, 1). In this case, the second term is multiplied by LN(X). For example, the model FP(2, 2) is: Models that involve only two terms are usually adequate for FP analysis.

Model Selection for Fractional Polynomials: • MFP algorithm combines backward elimination with FP function selection procedures • With many continuous predictors selection of best FP for each becomes more difficult • The best-fitting fractional polynomial model needs to be chosen using appropriate model selection criteria • The MFP algorithm as a standardized way to variable and function selection • Binary and categorical variables can be added to the multivariable model as well • available)

Model Selection for Fractional Polynomials: • Investigation of treatment and covariate / covariate interaction requires statistical tests (MFPI) • Continuous by continuous interaction (linear by linear product) term is not sensible if the main effect (e.g. prognostic effect is non-linear.

MFPI STEPS: • Identify one continuous factor X of interest • Use other prognostic factors to build an adjustment model, e.g. by MFP • Combine backward elimination with search for best FP function • Find the best FP2 transformation of X with same powers in each treatment group • Perform LRT of equality of reg coefficients • • Test against main effects model(no interaction) based on χ2 with 2df • •

Continuous by continuous interaction • Identify Z1 , Z2 continuous and X confounders • Apply MFP to X, Z1 and Z2, forcing Z1 and Z2 into the model. FP functions f1(Z1) and f2(Z2) will be selected for Z1 and Z2 • • Add term f1(Z1) * f2(Z2) to the model chosen and use LRT for test of interaction • Often f1(Z1) and/or f2(Z2) are linear • Check all pairs of continuous variables for an interaction • Check (graphically) the interactions for artefacts • Use forward stepwise if more than one interaction remains

Illustrated example : Whitehall 1 (Study ) • The study included 17,370 male Civil Servants aged 40-64 years, • Measurements included: age, cigarette smoking, BP, cholesterol, height, weight, job grade • Outcomes of interest: all-cause mortality at 10 years ⇒ 1670 (9.7%) died • Applied logistic regression and compared it with MFP analysis

Illustrated example : Whitehall 1 (Study )

Illustrated example: Exploring interactions Consider for e.g. age and weight • Main effects: age – linear weight – FP2 (-1,3) Interaction? LRT: χ2 = 5.27 (2df, p = 0.07) • ⇒ no (strong) interaction

Illustrated example: If they had erroneously assumed that the effect of weight was linear Interaction? • Include age*weight into the model • LRT: χ2 = 8.74 (1df, p = 0.003) • ⇒ highly significant interaction Model Check: • They categorized age in 4 equal sized groups (4 quartiles) • Computed running line (smooth) of the binary outcome on weight in each group

Illustrated example:

Illustrated example: Running line smooth are about parallel across age groups ⇒ no (strong) interactions Smoothed probabilities are about equally spaced ⇒ effect of age is linear If they had erroneously assumed that the effect of weight was linear; the estimated slopes of weight in age-groups indicated strong qualitative interaction between age and weight

Software sources MFP • Most comprehensive implementation is in Stata - Command mfp is part of Stata - MFPI – treatment/covariate interactions - MFPIgen – interaction between two continuous variables -MFPT – time-varying effects in survival data • Versions for SAS and R are also available

Summary: • A model is said to be linear when it is linear in parameters • For complex non-linear models, polynomial models can be used to approximate such models. • Fractional polynomials (FP) have been proposed in epidemiological studies to investigate functional forms of continuous predictor variables

COMMENTARY BY THITIYA AFTER WHICH WE WILL HAVE THE PRACTICE SESSION

MULTINOMIAL LOGITS & FRACTIONAL POLYNOMIALS Our example is on the study of the determinants of aftercare placement for psychiatrically hospitalized patients. (In the book, Applied Logistic Regression. By Hosmer-Lemoshow) PLEASE UPLOAD THE DATA SET AND DO FILE PROVIDED ON THE CEB WEBSITE

PLACE3 = • 0 = Outpatient or day treatment • 1 = Intermediate residential • 2 = Residential

RUN THE UNIVARIATE MLOGIT COMMANDS: (from Age to viol) p-value for selecting eligible covariates 0.2

Now RUN the Multivariate Command: (P-value in this case is 0.05

Application of Fractional Polynomials in Modeling Epidemiological Data

Application of Fractional Polynomials in Modeling Epidemiological Data

Presentation Transcript

Fraud in medical research

The Use of Fractional Polynomials in Multivariable Regression Modeling

Fraud in Medical Research:

Ethical Use of Statistics in Research

The Scientific Case Against Animal Use in Medical Research

Research in Medical Informatics

Use of Human Subjects in Research

Use of Animals in Research

Statistics in Medical Research

Modelling continuous exposures - fractional polynomials

Use of Samples in Research - Rhabdomyosarcomas

Flexible modeling of dose-risk relationships with fractional polynomials

Ethics in medical research

Statistics you can use: Practical use of statistics in reading medical research literature

Fractional dynamic system research in China

Use of Animals in Research

The Use of Fractional CO2 Laser in Dermatology

Effective use of Media in Medical Education

The use of fractional polynomials in multivariable regression modelling

Statistics in Medical Research

Modelling continuous exposures - fractional polynomials

The Use of Fractional Polynomials in Multivariable Regression Modeling