Survival Analysis. Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Control
Randomized Clinical Trial (RCT)Disease
Random assignment
Diseasefree
Target population
Diseasefree, atrisk cohort
Disease
Diseasefree
TIME
Control
Randomized Clinical Trial (RCT)Cured
Random assignment
Not cured
Target population
Patient population
Cured
Not cured
TIME
Control
Randomized Clinical Trial (RCT)Dead
Random assignment
Alive
Target population
Patient population
Dead
Alive
TIME
There are generally three reasons why censoring occurs:
Patients
Status
Status
1
1
start of RRT
recovery of renal function
recovery of renal function
censored
censored
start of RRT
start of RRT
2
2
censored
censored
3
3
start of RRT
start of RRT
death
death
event
event
4
4
start of RRT
start of RRT
censored
death due to competing cause
death
death
5
5
event
event
start of RRT
start of RRT
6
6
event
event
death
death
start of RRT
start of RRT
7
7
censored
censored
start of RRT
start of RRT
loss to follow
loss to follow


up
up
8
8
event
event
start of RRT
start of RRT
death
death
31
31


12
12


2005
2005
01
01


01
01


1996
1996
31
31


12
12


2000
2000
The incidence rate of death for renal replacement therapy (RRT) patients
Example – Survival time on RRT: events & censored observations
____________________________________________________________
Incident RRT patients in the ERAEDTA Registry were included in an analysis of patient survival on RRT. Like in most survival studies patients were recruited over a period of time (19962000  the inclusion period) and they were observed up to a specific date (31 December 2005  the end of the followup period). During this period the event of interest was ‘death while on RRT’, whereas censoring took place at recovery of renal function, loss to followup and at 31 December 2005.
End
Start
Survival times of eight patients at risk of death on RRT. The inclusion period was 19962000, whereas followup was ended on 31 December 2005.
Assumptions related to censoring
Example 2  Survival probability in RRT patients due to diabetes mellitus and
other causes
In a sample of 50 RRT patients taken from a study on diabetes mellitus survival time started running at the moment a patient was included in the study, in this case at the start of RRT. Patients were followed until death or censoring. The survival probability was calculated using the Kaplan Meier method. Subsequently, the survival of patients with ESRD due to diabetes mellitus was compared to the survival of those with ESRD due to other causes.
P = 0.04
Example: Remission time of acute leukemia
Example: Remission time of acute leukemia
6MP(Group = 1)
6,6,6,6+,7,9+,10,10+,11+,17+,19+,20+,22,23,25+,32+,32+,34+,35+
Placebo(Group = 2)
1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23
In JMP (1 is used to denote censored times, 0 for noncensored)
E.g. for Group 1 – first 8 observations6, 6, 6, 6+, 7, 9+, 10, 10+
Example: Remission time of acute leukemia
Group 1 – 6MP
Group 2  Placebo
We can clearly see that the time until remission (“survival”) time is larger for the treatment (6MP) group than control. The logrank and Wilcoxon tests for comparing the “survival” experience of both groups suggest a statistically significant difference exist (p < .0001).
Retrospective cohort study:From December 2003 BMJ: Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study
What the Kaplan Meier method and the logrank test can and cannot do…
The idea is this:
Assume that timestoevent for individuals in your dataset follow a continuous probability distribution (typically a skewed right distribution, generally not normal!).
For all possible times Ti after baseline, there is a certain probability that an individual will have an event at exactly time Ti. For example, human beings have a certain probability of dying at ages 3, 25, 80, and 140: P(T=3), P(T=25), P(T=80), and P(T=140). These probabilities are obviously vastly different.
People have a high chance of dying in their 70’s and 80’s;
BUT they have a smaller chance of dying in their 90’s and 100’s, because few people make it long enough to die at these ages.
Probability density function: f(t)In the case of human longevity, Ti is unlikely to follow a normal distribution, because the probability of death is not highest in the middle ages, but at the beginning and end of life. Hypothetical data:
Show’s how failure times are distributed. If we had no censoring a histogram of the survival times of say ESRD patients would give us an impression of what the probability density function, f(t), looks like.
The smoothed curve added to the histogram is a visualization of f(t) based upon a sample of patients with ESRD.
F(t) is the CDF of f(t), and is “more interesting” than f(t).
Survival function: 1  F(t)The goal of survival analysis is to estimate and compare survival experiences of different groups.
Survival experience is described by the cumulative survival function:
Example: If t = 100years, S(100) = S(t=100) which is the probability of surviving beyond 100 years.
Same hypothetical data, plotted as cumulative distribution rather than density:
Recall f(t)
Hazard rate is an instantaneous incidence rate.
Think of it like the rate of change of your chance of dying, like a speedometer on a car racing towards death.
In words: the probability that if you survive to t, you will succumb to the event in the next instant.
This is subtle, but the idea is:
A possible set of probability density, failure, survival, and hazard functions.
f(t)=density function
F(t)=cumulative failure = P(T < t)
S(t)=cumulative survival
h(t)=hazard function
In order to understand the distinction between OR’s and HR’s we need to discuss the difference between incidence rates and proportions.
The Cox PH Model for individuals with k covariate values says the hazard function for these individuals is given by:
where is the baseline hazard function which is assumed to be the same for all individuals. The covariates then multiple the baseline hazard to give a covariate specific hazard function.
Consider the population i which consists of all individuals with k covariate values
and population j which consists of all individual with k covariate values then the hazard ratio for comparing population i to population j individuals is given by:
Example 1: Suppose we are modeling the hazard function for developing lung cancer using smoking status and age as covariates. Find the HR for 60year old smokers (+1, ) vs. nonsmokers (1 , ).
Thus the HR associated with smoking for 60year old individuals is . Notice the similarity to the interpretation of coefficients in a logistic regression model.Note: The particular age is irrelevant as long as it is the same for both populations being compared.
Example 2: Suppose we are modeling the hazard function for developing lung cancer using smoking status and age as covariates. Find the HR for 70year old smokers (+1, ) vs. 60year old smokers ().
Thus the HR associated with a 10year increase in age starting at age 60 is . Notice the similarity to the interpretation of coefficients for continuous variables in a logistic regression model. Note: This would be the same if we compared any two ages that are 10years apart. Also smoking status irrelevant if it is the same for both populations we are considering.
Here we have two dichotomous covariates in a Cox PH model for remission time.
The hazard ratio (HR) for females is then given by .752, so females have less risk of remission than males. The hazard ratio (HR) for males is the reciprocal 1/.752 = 1.33, so males have 1.33 times the risk of remission. These are only point estimates however, thus we also need to consider CI’s.
Here we have two dichotomous covariates in a Cox PH model for remission time.
The hazard ratio (HR) for receiving the active treatment (6MP) is given by and the hazard ratio (HR) for those receiving placebo is therefore 1/.2005 = 4.988, thus those receiving placebo have 5 times the risk for remission. Again we should examine CI’s for these HR’s.
Here we have one continuous covariate in the Cox PH model for remission time.
log of the white blood cell count
The estimate coefficient for the log base 2 of the white blood cell count is 1.59. A unit increase in corresponds to doubling the WBC, so if we compare two populations patients, one with double the WBC of the other the estimated HR is given by So the population with double the WBC has 4.92 times the risk of remission.
Next we consider a Cox PH model using treatment, sex, and as covariates.
Next we consider a Cox PH model using treatment, sex, and as covariates. We can see that both Treatment and log2WBC are statistically significant, while Sex of the patient is not. JMP can be used to calculate the Risk Ratios or Hazard Ratios (HR).
The estimated HR associated with not receiving the 6MP therapy is 4.02 with a CI (1.698, 10.307) and the estimated HR associated with doubling the WBC is 4.92 with a CI (2.65, 9.73).
Next we consider a Cox PH model using treatment, sex, and as covariates. We can see that both Treatment and log2(WBC) are statistically significant, while Sex of the patient is not. JMP can be used to calculate the Risk Ratios or Hazard Ratios (HR).
The estimated HR for males vs. females is 1.30, however the CI includes 1, so we cannot say there is increased risk of recurrence for males. This is further supported by the pvalue = .5596.
Survival analysis involves making inferences about the time until event occurs.
Due to the prospective nature of these studies there are frequently censored time observations.
The KaplanMeier Method allows us to describe both visually and numerically the survival experience of subjects in our study.
The logrank test allows us to compare the survival experience of subjects across treatment groups.
The Cox Proportional Hazards Model allows us to examine the relationship between the survival experience of subjects and covariates that might be related to their survival; or to look at group/treatment differences adjusted for other covariates.