520 likes | 603 Views
Explore survival analysis methods using real-life data to determine treatment efficacy and predict patient outcomes. From life table analysis to Kaplan-Meier curves and comparing survival curves, this guide covers key statistical techniques step-by-step.
E N D
First example of the day • Small cell lungcanser • Meadian survival time: 8-10 months • 2-year survival is 10% • New treatment showed median survival of 13.2months
Progressively censored observations • Current life table • Completed dataset • Cohort life table • Analysis “on the fly”
Problem • Do patients survive longer after treatment 1 than after treatment 2? • Possible solutions: • ANOVA on mean survival time? • ANOVA on median survival time? • 100 person years of observation: How long has the average person been in the study. • 10 persons being observed for 10 years • 100 persons being observed for 100 years
Life table analysis • A sub-set of 13 patients undergoing the same treatment
Life table analysis • Time interval chosen to be 3 months • ni number of patients starting a given period
Life table analysis • di number of terminal events, in this example; progression/response • wi number of patients that have not yet been in the study long enough to finish this period
Life table analysis • Number exposed to risk: • ni – wi/2 • Assuming that patients withdraw in the middle of the period on average.
Life table analysis • qi = di/(ni – wi/2) • Proportion of patients terminating in the period
Life table analysis • pi = 1 - qi • Proportion of patients surviving
Life table analysis • Si = pi pi-1 ...pi-N • Cumulative proportion of surviving • Conditional probability
Survival curves • How long will a lung canser patient keep having canser on this particular treatment?
Kaplan-Meier • Simple example with only 2 ”terminal-events”.
Confidence interval of the Kaplan-Meier method • Fx after 32 months
Confidence interval of the Kaplan-Meier method • Survival plot for all data on treatment 1 • Are there differences between the treatments?
Comparing Two Survival Curves • One could use the confidence intervals… • But what if the confidence intervals are not overlapping only at some points? • Logrank-stats • Hazard ratio • Mantel-Haenszel methods
Comparing Two Survival Curves • The logrank statistics • Aka Mantel-logrank statistics • Aka Cox-Mantel-logrank statistics
Comparing Two Survival Curves • Five steps to the logrank statistics table • Divide the data into intervals (eg. 10 months) • Count the number of patients at risk in the groups and in total • Count the number of terminal events in the groups and in total • Calculate the expected numbers of terminal events e.g. (31-40) 44 in grp1 and 46 in grp2, 4 terminal events. expected terminal events 4x(44/90) and 4x(46/90) • Calculate the total
Comparing Two Survival Curves • Smells like Chi-Square statistics
Comparing Two Survival Curves • Hazard ratio
Comparing Two Survival Curves • Mantel Haenszel test • Is the OR significant different from 1? • Look at cell (1,1) • Estimated value, E(ai) • Variance, V(ai)
Comparing Two Survival Curves • Mantel Haenszel test • df = 1; p>0.05
Hazard function d is the number of terminal events f is the sum of failure times c is the sum of censured times
Logistic regression Who survived Titanic?
The sinking of Titanic • Titanic sank April 14th 1912 with 2228 souls 705 survived. • A dataset of 1309 passengers survived. • Who survived?
The data • Sibsp is the number of siblings and/or spouses accompanying • Parsc is the number of parents and/or children accompanying • Some values are missing • Can we predict who will survive titanic II?
Analyzing the data in a (too) simple manner • Associations between factors without considering interactions
Analyzing the data in a (too) simple manner • Associations between factors without considering interactions
Analyzing the data in a (too) simple manner • Associations between factors without considering interactions
Could we use multiple linear regression to predict survival?
Logit transformation is modeled linearly • The logistic function
The sigmodal curve • The intercept basically just ‘scale’ the input variable
The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability
The sigmodal curve • The intercept basically just ‘scale’ the input variable • Large regression coefficient → risk factor strongly influences the probability • Positive regression coefficient →risk factor increases the probability
Logistic regression of the Titanic data – passenger class • Summary of data • Coding of the dependent variable • Coding of the categorical explanatory variable: • First class: 1 • Second class: 2 • Third class: reference
Logistic regression of the Titanic data – passenger class • A fit of the null-model, basically just the intercept. Usually not interesting • The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as non-survivers. • Basically tests if the null-model is sufficient. It almost certainly is not. • Shows that survival is related to pclass (which is not in the null-model)
Logistic regression of the Titanic data – passenger class • Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise. • Model Summary. Other measures of the goodness of fit. • Classification table: By including pclass 67.7 passengers were correctly categorized. • Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class)
Logistic regression of the Titanic data – Adding age to the model • Ups… Some data points are missing • And the null model is poorer
Logistic regression of the Titanic data – Adding age to the model • Cox and Senll’s R-square increased from 0.093 to 0.141, indicating a better model • By this model we can classify 69.1% passenger class only classified 67.7%
Logistic regression of the Titanic data – Adding age to the model • Age has a significant influence on survival. • The odds ratio of age is 0.963 • So the odds of a 31 year old is 0.963 times the odds of a 30 year old. • Or the odds for a 30 year old to survive is 1/0.963 = 1.038 times larger than that of a 31 year old
Logistic regression of the Titanic data – Age alone • The model is extremely poor • Consequently age appear to be insignificant in estimating survival.
Logistic regression of the Titanic data – Adding family and sex • The model is becoming better
Logistic regression of the Titanic data – Using the model as to predict • What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? • z = -2.703 • -0.041*25 • +2.552 • +1.718 • +0.925 • = 1.4670
Using the model to predict survival • What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? • z = -3.929 • -0.589*(-5)/14.41 • +1.718 • +2.552 • +0.926 = 1.4714