- 94 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Flexible Survival Modeling in SAS' - renee-solomon

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Flexible Survival Modeling in SAS

Presented to Nova Scotia Sas Users Group meeting, Feb 22, 2013

By Ron Dewar – Dalhousie University and Cancer Care Nova Scotia

beyond LIFETEST and PHREG

Flexible survival modeling:today’s objectives

- Introduce time-to-event analysis
- Survival analysis in sas
- Survival modeling - some recent developments
- Availability of software (stata, R)
- Progress in converting to sas macros: stata stpm2, predict, rcsgen
- The road ahead

Time-to-Event

- Duration from start to event for each subject
- Status at time (experienced event or still at risk)
- Non-informative censoring: censoring does not change risk of eventually experiencing event
- Covariate patterns among subjects at risk (at the time of an event)
- Other aspects: left truncation (delayed entry)attained age as time scalefixed or time-varying covariatesstrata informative censoring (competing risks)

Lifetest, Phreg

- Lifetestplotted output step/smoothed (survival, hazard)life table estimatessignificance testing between strata
- Phreg (cox regression)left truncationcovariates (numeric and categorical), strata,Wald tests, LR tests, AIC, BICplotted output(survival, hazard as step functions)regression diagnostics

Lifetest, Phreg (cont.)

- Covariate time-dependency is common and (maybe) more interesting
- Hazard Ratio (HR) is difficult to interpret at an individual level
- Interest in other survival functions – hazard, hazard differences, survival differences, relative survival, cumulative hazard, crude probability (of event)
- Parametric representation of survival functions and time-dependency relations (out of sample prediction)

Royston – Parmar survival model2002 Statistics in Medicine 21: 2175–2197

- Cumulative hazard scale – equiv. to hazard scale if no time-dependencies in covariates
- Restricted cubic splines for cum. hazard
- Implemented in Stata 11 as Stpm2 (model fitting) and Predict (post-estimation)
- Similar (?) implementation in R
- ‘cure’ models, left truncation, covariate time-dependency, strata, relative survival (excess hazard), net survival, other scales

Development plan, resources

- Stata code and academic papers
- Email support from module author
- Replication of all features may not be feasible (limited programming resources, obscure stata features)
- Basic functionality that replicates results
- SAS code that can be used, understood, modified, enhanced by collaborators

Proportional Hazards Model

Breast cancer survival with proportional hazards

Odspdf file = “&loc.\doc\breast2004\cox regression.pdf”;

procphregdata = _events_;

class sstage (ref = first);

model years*_death_(0) = sstage age1 age2 age3;

title \'Breast cancer survival, 2004 – 2010, Nova Scotia’;

title2 ‘followed to end of 2011’

title3 \'Proportional Hazards model with stage, age\';

run;

Odspdf close;

…

Number of Observations Read 5475Number of Observations Used 5475

Percent Total Event Censored Censored

5475 830 4645 84.84…

Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq sstage II 1 0.77412 0.10221 57.3652 <.0001 sstage III 1 1.72777 0.10591 266.1493 <.0001

sstage IV 1 3.23040 0.10948 870.6004 <.0001

age1 1 0.18070 0.12381 2.1300 0.1444

age2 1 0.84798 0.11689 52.6287 <.0001 age3 1 1.58877 0.11833 180.2785 <.0001

Outline of analysis stepsR-P model

- Create ‘standard’ dataset: dataset name, key variable names are fixed (%sas_stset)
- Describe and fit model (%sas_stpm2)
- Estimate functions of fitted model parameters (%predict)
- plot predicted functions (eg, with Proc SGPlot)

Fit above model using stpm2()

- 3 stage and age binary variables, hazard scale, 5 df for baseline
- Stata command line

Stpm2 st2 st3 st4 age1 age2 age3, scale(hazard) df(5)

- Sas macro call

%Sas_stmp2( st2 st3 st4 age1 age2 age3, scale=hazard, df = 5);

Stpm2: baseline knots

- Log cumulative hazard is parameterised with restricted cubic spline functions:

array z(*) rcs1 - rcsm < m spline variables> ;

array k(*) < m knot values >;

z(1) = y * log time to event;

do j = 2 to m ;

phi = (k(m) - k(j) )/(k(m) - k(1) );

z(j) = ( y > k(j) )*( y - k(j) )**3

- phi*( y> k(1) )*( y - k(1) )**3 - (1 - phi)*( y > k(m) )*( y - k(m) )**3; end;

How many knots to use?Where to put them?

- Too few: unrealistic representation of hazard
- Too many: over-parameterisation. Unrealistic lumps and bumps
- AIC and BIC may be helpful. LR tests are not. models are not nested
- Some subject matter knowledge can be helpful
- Choice of position probably doesn’t matter too much
- ‘standard’ positions : centile points of cumulative distribution of times of non-censored events

Programming in %sas_stpm2()

- Describe model in macro call
- Internal macro strings drive subsequent processing:
- compute spline functions
- 1st derivatives
- orthogonalisation
- Define linear predictor, log likelihood
- derive initial values for optimisation
- fit model (maximum likelihood with proc nlmixed)
- save results for later processing

Key macro strings

- _null_ data step to build macro strings
- call symput(‘macro_var’, string)
- Linear predictor: independent variables parameters to be estimated
- Log likelihood:

function to be maximised linear predictor 1st derivative of linear predictor censor indicator

Linear predictor, Likelihood

%Sas_stmp2( st2 st3 st4 age1 age2 age3, scale=hazard, df = 5);

Linear predictor: &xb.

cons*_cons + st2*_st2 + st3*_st3 + st4*_st4 + age1*_age1 + age2*_age2 + age3*_age3 + rcs1*_rcs1 + rcs2*_rcs2 + rcs3*_rcs3 + rcs4*_rcs4 + rcs5*_rcs5

1st derivative: &dxb.

rcs1*_drcs1 + rcs2*_drcs2 + rcs3*_drcs3 + rcs4*_drcs4 + rcs5*_drcs5

Log likelihood:

_death_*((&dxb.) + &xb.) – exp(&xb.)

Linear predictor

Covariates:

xb = ifc (&int., \'cons*_cons + \', \' \');

do _i_ = 1 to &n_cov.;

var = scan("&covar.",_i_);

xb= trim(xb)||\'\'||trim(var)||\'*_\'||trim(var)||\' + \' ;

end;

Splines:

do _i_ = 1 to &df.;

xb = trim(xb) ||\' rcs\'||put(_i_,1.)||\'*\' ||\'_rcs\'||put(_i_,1.)|| ifc(_i_< &df.," + ", " ");

end;

Example: colon cancer

* data must be sorted by unique ID;

procsort data = example;

by pid;

run;

* set up a standardised survival dataset;

%sas_stset(example, censor(0), years , pid ) ;

* fit model of interest;

%sas_stpm2(st1 st2 st3 nsex, scale=hazard, df=3);

* predicted hazard, survival functions for IIb cases;

%predict(haz, hazard, at = st2:1 zero);

%predict(surv, survival, at = st2:1 zero);

Example: time-varying covariate

%sas_stpm2(st1 st2 st3 nsex, scale=hazard, df=3,

tvc= st2 st3 ,

dftvc= st2:21 );

* hazard prediction;

%predict(haz1, hazard, at = st2:1 zero);

* Hazard ratio prediction;

%predict(hr1, hratio, hrnum= st2:1 zero,

hrdenom= st2:0 zero);

Example 2: time to initiation of chronic opiod use in new cancer patients

- t0: date of new cancer diagnosis
- t1: date of initiation of chronic opiod pain medication
- Censor: death, end of study period (or 365 days)
- Tier: cancer type grouped by 5-year survival probability
- Age: 10-year age groups
- Other covariates: year of diagnosis, urban/rural, sex%sas_stpm2(t2 t3 a1 a2 a3 a4 a5 a6 nsex nurb y, scale=hazard, df=3, tvc= a2 a6, dftvc = 2);

Example 2

%macro int(row =, col =, sel = ); …

%predict(surv, survival, at = &sel. nsex:1 nurb:1 y:3 zero);

…

%mend;

And then, each for of the 21 combinations of 3 tiers X 7 age groups:

…

%int( row = t2, col = ag2, sel = a1:1 t2:1);

%int( row = t3, col = ag2, sel = a1:1 t3:1);

…

The road ahead

- Documentation (!no, really?)
- Consistency checking
- Confidence intervals for cumulative functions (survival, cumulative hazard)
- Out of sample estimation is inefficient
- Other survival scales (cumulative log odds, probit…)
- Cure models
- Stratified analysis
- Competing risks framework

The future

- Use of an optimsation routine that permits analytic 1st and 2nd derivatives (gradient & hessian) more efficient prediction out of sample prediction
- Re-code string modification in %predict()

Download Presentation

Connecting to Server..