Survival Analysis in SAS. UCLA IDRE Statistical Consulting Group Winter 2014. 1. Introduction. Survival analysis models factors that influence the time to an event . Or time to “failure” Shouldn’t use linear regression Non-normally distributed outcome Censoring.

Download Presentation

Survival Analysis in SAS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

1. Introduction • Survival analysis models factors that influence the time to an event. • Or time to “failure” • Shouldn’t use linear regression • Non-normally distributed outcome • Censoring

1. proclifetest and procphreg • proclifetest • Non-parametric methods • procphreg • Cox proportional hazards regression • Still very popular survival analysis method

1.1. Sample dataset • Worcester Heart Attack Study • WHAS500 • Examines factors that affect time to death after heart attack • 500 subjects • Many observations are right-censored • Loss to follow up • Do not know when they died, but do know at least how long they lived

1.1. Sample dataset • Important variables • lenfol: length of followup, the outcome • fstat: the censoring variable, loss to followup=0, death=1 • age: age at hospitalization • bmi: body mass index • hr: initial heart rate • gender: males=0, females=1

2.1. Data preparation – Structure • Two ways SAS accepts survival data • One row per subject • proclifetest and procphreg both accept • One variable for time to event One censoring variable • Optional if everyone fails • Explanatory variables fixed

2.1. Data preparation – Structure • Multiple rows per subject • Only procphreg accepts • Follow up time partitioned into intervals • Start and stop variables define endpoints • One variable defines count of events in each interval • 0 or 1 in Cox model – same as censoring • Explanatory variables may vary within subject

3. Nonparametric survival analysis • Nonparametric methods used to get an idea of overall survival • Descriptive and exploratory • Make no assumptions about shape of survival function • But do not estimate magnitude of covariate effects either

3.1. The Kaplan Meier estimate of the survival function • Kaplan-Meier survival function estimates probability of survival up to each event time in data from the beginning of follow up time • Unconditional survival probability • Basically calculated by multiplying proportion of those at risk who survive each interval up to interval of interest • S(3rd interval) = prop_surv(1st) x prop_surv(2nd) x prop_surv(3rd)

3.1.2. K-M estimates from proclifetest • Requires use of time statement • Specify event time variable at minimum • Censoring variable also if any censoring • Put values representing censoring in () • atrisk option add Number at Risk to output table proclifetestdata=whas500 atrisk outs=outwhas500; timelenfol*fstat(0); run;

3.1.2. K-M survival, later times Survival estimate changes just alittle, even though many leave study – Censoring doesn’t change survival

3.1.3 Graphing K-M estimate • proclifetest produces graph of overall survival by default • We add plot=survival(cb) to get confidence bands proclifetestdata=whas500 atrisk plots=survival(cb) outs=outwhas500; timelenfol*fstat(0); run;

3.1.3 Graphing K-M estimate Step function drops when someone dies Last few observations are censored, Tick marks are censored observations

3.2. Nelson-Aalen estimator of cumulative hazard function • Calculated as sum of proportion of those at risk who died in each interval up to time t • H(3rd interval) = prop_died(1st) + prop_died(2nd) + prop_died(3rd) • Interpreted as expected number of events from beginning to time t • Has a simple relationship with survival • S(t) = exp(-H(t))

3.2. Nelson-Aalen from proclifetetst • If you add “nelson” to proclifetest statement, SAS will add Nelson-Aalen cumulative hazard to survival table proclifetestdata=whas500 atrisknelson; timelenfol*fstat(0); run;

3.3. Median and mean survival times from proclifetest • proclifetest will produce estimates of mean and median survival time by default • Because distribution is very skewed, median often better indicator of “middle”

3.4. Comparing survival functions using nonparametric methods • We can specify a covariate on the “strata” statement to compare survival between levels of the covariate • SAS will produce graph and tests of comparison proclifetestdata=whas500 atrisk plots=survival(atriskcb) outs=outwhas500; strata gender; timelenfol*fstat(0); run;

3.4. Gender appears to affect survival • 3 chi-squared based tests of equality • 2 differ in how much they weight each interval • Log rank = all intervals equal • Wilcoxon = earlier times more weight

4. Estimating the hazard function • With nonparametrics, we model the survival (or cumulative hazard) function • With regression methods, we look at the hazard function, h(t) • Hazard function gives instantaneous rate of failure at time t • Hazard function should be strictly nonnegative • Another reason we shouldn’t use linear regression

4.1. The Cox proportional hazards model - parameterization • The Cox model is typically parameterized like so: • where • = baseline hazard function of reference group • = predictors in the model • = regression coefficients • always evaluates to positive, which ensures hazard rate is always positive (or 0) • Nice!!

4.2. The baseline hazard rate • Notice that the baseline hazard rate cancels out of the hazard ratio formula: • This means we need not make any assumptions about the baseline hazard rate • The fact that we can leave it unspecified is one of the great strengths of the Cox model • Since the baseline hazard is not parameterized, the Cox model is semiparametric

5.1. Fitting a Cox regression model in procphreg • Model statement works similarly to proclifetest (provided one row per subject structure) • A variable for failure time • A variable for censoring (optional if everyone fails) • Numbers in parentheses represent censoring procphregdata = whas500; class gender; modellenfol*fstat(0) = gender age; run;

5.1. phreg output: model fit • Model fit statistics used to compare between models • Lower is generally better • SBC also known as BIC

5.1. phreg output: test vs null model • Tests overall fit of model vs model with no predictors • Null model assumes everyone has same hazard rate • Prefer likelihood ratio in small samples

5.2. Graphs of survival function after Cox regression • When “plots=survival” is specified on procphreg statement, SAS will produce survival curve for “reference” group • At reference group for all categorical covariates • At mean of all continuous covariates procphregdata = whas500 plots=survival; class gender; modellenfol*fstat(0) = gender age; run;

5.2.1. Use the baseline statement to generate survival plots by group • To get comparison curves, need to use baseline statement in procphreg • A little tricky to use at first • First create dataset of covariate values for which we would like survival curves • Specify this dataset on “covariates=“ option on baseline statement • Use group= option to get separate graphs by group • Use rowid = option to get separate lines by group • Additionally need to specify “plots(overlay=group)” for separate graphs (with possibly separate lines per graph), or at least “plots(overlay)” to get separate lines

5.2.1. Use the baseline statement to generate survival plots by group datacovs; format gender gender.; input gender age; datalines; 0 69.845947 1 69.845947 ; run; procphregdata = whas500 plots(overlay)=(survival); class gender; modellenfol*fstat(0) = gender age; baselinecovariates=covsout=base / rowid=gender; run;

5.3 Interactions in Cox model • Interactions allow the effect of one covariate to vary with the level of another • Perhaps effect of age differs by gender • Can model interactions between terms using “|” • Can also model quadratic (and higher order) effects this way • We suspect bmi has quadratic effect (because high and low ends tend to have worse health outcomes)

5.3 Interactions in Cox model procphregdata = whas500; class gender; modellenfol*fstat(0) = gender|agebmi|bmi hr; run;

5.3 Interactions in Cox model • Interactions are significant • Age effect depends on gender • Less “severe” for females (negative interaction term) • The effect of bmi depends on what the bmi score is • Begins negative (at bmi=0, an unrealistic value) and flattens as bmi rises

5.4. Use hazardratio statements and graphs to interpret effects • hazardratio statement is useful for interpreting interactions • Hazard ratios not printed in regression table for terms involved in interactions • Because hazard ratio changes with level of interacting covariate • Useful for interpreting effects of categorical and continuous predictors and their interactions

5.4. Use hazardratio statements and graphs to interpret effects • Syntax of hazardratio statement • After keyword hazardratio, you can apply an optional label to the hazard ratio • Then the variable whose levels we would like to compare in the hazard ratio • For interactions, after “/”, use option “at=“ to specify level of interacting covariate