Kaplan-Meier methods and Parametric Regression methods

Kaplan-Meier methods and Parametric Regression methods Kristin Sainani Ph.D.http://www.stanford.edu/~kcobbStanford UniversityDepartment of Health Research and Policy

More on Kaplan-Meier estimator of S(t)(“product-limit estimator” or “KM estimator”) • When there are no censored data, the KM estimator is simple and intuitive: • Estimated S(t)= proportion of observations with failure times > t. • For example, if you are following 10 patients, and 3 of them die by the end of the first year, then your best estimate of S(1 year) = 70%. • When there are censored data, KM provides estimate of S(t) that takes censoring into account (see last week’s lecture). • If the censored observation had actually been a failure: S(1 year)=4/5*3/4*2/3=2/5=40% • KM estimator is defined only at times when events occur! (empirically defined)

KM (product-limit) estimator, formally

Observed event times The risk set nj at time tjconsists of the original sample minus all those who have been censored or had the event before tj Typically dj= 1 person, unless data are grouped in time intervals (e.g., everyone who had the event in the 3rd month). dj/nj=proportion that failed at the event time tj 1-dj/nj=proportion surviving the event time S(t) represents estimated survival probability at time t: P(T>t) Multiply the probability of surviving event time t with the probabilities of surviving all the previous event times. KM (product-limit) estimator, formally This formula gives the product-limit estimate of survival at each time an event happens.

Example 1: time-to-conception for subfertile women “Failure” here is a good thing. 38 women (in 1982) were treated for infertility with laparoscopy and hydrotubation. All women were followed for up to 2-years to describe time-to-conception. The event is conception, and women "survived" untilthey conceived. Example from: BMJ, Dec 1998; 317: 1572 - 1580.

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2 1 3 1 4 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

Corresponding Kaplan-Meier Curve S(t) is estimated at 9 event times. (step-wise function)

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2 1 3 1 4 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

Corresponding Kaplan-Meier Curve 6 women conceived in 1st month (1st menstrual cycle). Therefore, 32/38 “survived” pregnancy-free past 1 month.

S(t=1) = 32/38 = 84.2% S(t) represents estimated survival probability: P(T>t) Here P(T>1). Corresponding Kaplan-Meier Curve

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2.1 1 3 1 4 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Important detail of how the data were coded:Censoring at t=2 indicates survival PAST the 2nd cycle (i.e., we know the woman “survived” her 2nd cycle pregnancy-free). Thus, for calculating KM estimator at 2 months, this person should still be included in the risk set. Think of it as 2+ months, e.g., 2.1 months. Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

Corresponding Kaplan-Meier Curve

S(t=2) = ( 84.2%)*(84.4%)=71.1% Corresponding Kaplan-Meier Curve 5 women conceive in 2nd month. The risk set at event time 2 included 32 women. Therefore, 27/32=84.4% “survived” event time 2 pregnancy-free. Can get an estimate of the hazard rate here, h(t=2)= 5/32=15.6%. Given that you didn’t get pregnant in month 1, you have an estimated 5/32 chance of conceiving in the 2nd month. And estimate of density (marginal probability of conceiving in month 2): f(t)=h(t)*S(t)=(.711)*(.156)=11%

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2.1 1 3.1 1 4 1 7 1 7 Risk set at 3 months includes 26 women 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

S(t=3) = ( 84.2%)*(84.4%)*(88.5%)=62.8% Corresponding Kaplan-Meier Curve 3 women conceive in the 3rd month. The risk set at event time 3 included 26 women. 23/26=88.5% “survived” event time 3 pregnancy-free.

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2 1 3.1 1 4 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Risk set at 4 months includes 22 women Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

S(t=4) = ( 84.2%)*(84.4%)*(88.5%)*(86.4%)=54.2% Hazard rates (conditional chances of conceiving, e.g. 100%-84%) look similar over time. Corresponding Kaplan-Meier Curve 3 women conceive in the 4th month, and 1 was censored between months 3 and 4. The risk set at event time 4 included 22 women. 19/22=86.4% “survived” event time 4 pregnancy-free. And estimate of density (marginal probability of conceiving in month 4): f(t)=h(t)*S(t)=(.136)* (.542)=7.4%

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2 1 3 1 4.1 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 Risk set at 6 months includes 18 women Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

S(t=6) = (54.2%)*(88.8%)=42.9% Corresponding Kaplan-Meier Curve 2 women conceive in the 6th month of the study, and one was censored between months 4 and 6. The risk set at event time 5 included 18 women. 16/18=88.8% “survived” event time 5 pregnancy-free.

S(t=13)  22% (“eyeball” approximation) Skipping ahead to the 9th and final event time (months=16)…

Raw data: Time (months) to conception or censoring in 38 sub-fertile women after laparoscopy and hydrotubation (1982 study) Conceived (event) Did not conceive (censored) 1 2 1 3 1 4 1 7 1 7 1 8 2 8 2 9 2 9 2 9 2 11 3 24 3 24 3 4 4 4 6 6 9 9 9 10 13 16 2 remaining at 16 months (9th event time) Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy and hydrotubation. BMJ 1982; 284: 1013-1014

S(t=16) =( 22%)*(2/3)=15% Skipping ahead to the 9th and final event time (months=16)… Tail here just represents that the final 2 women did not conceive (cannot make many inferences from the end of a KM curve)!

Kaplan-Meier: SAS output The LIFETEST Procedure Product-Limit Survival Estimates Survival Standard Number Number time Survival Failure Error Failed Left 0.0000 1.0000 0 0 0 38 1.0000 . . . 1 37 1.0000 . . . 2 36 1.0000 . . . 3 35 1.0000 . . . 4 34 1.0000 . . . 5 33 1.0000 0.8421 0.1579 0.0592 6 32 2.0000 . . . 7 31 2.0000 . . . 8 30 2.0000 . . . 9 29 2.0000 . . . 10 28 2.0000 0.7105 0.2895 0.0736 1127 2.0000* . . . 11 26 3.0000 . . . 12 25 3.0000 . . . 13 24 3.0000 0.6285 0.3715 0.0789 14 23 3.0000* . . . 14 22 4.0000 . . . 15 21 4.0000 . . . 16 20 4.0000 0.5428 0.4572 0.0822 17 19 4.0000* . . . 17 18

Kaplan-Meier: SAS output Survival Standard Number Number time Survival Failure Error Failed Left 6.0000 . . . 18 17 6.0000 0.4825 0.5175 0.0834 19 16 7.0000* . . . 19 15 7.0000* . . . 19 14 8.0000* . . . 19 13 8.0000* . . . 19 12 9.0000 . . . 20 11 9.0000 . . . 21 10 9.0000 0.3619 0.6381 0.0869 22 9 9.0000* . . . 22 8 9.0000* . . . 22 7 9.0000* . . . 22 6 10.0000 0.3016 0.6984 0.0910 23 5 11.0000* . . . 23 4 13.0000 0.2262 0.7738 0.0944 24 3 16.0000 0.1508 0.8492 0.0880 25 2 24.0000* . . . 25 1 24.0000* . . . 25 0 NOTE: The marked survival times are censored observations.

Not so easy to get a plot of the actual hazard function! In SAS, need a complicated MACRO, and depends on assumptions…here’s what I get from Paul Allison’s macro for these data…

Linear cumulative hazard function indicates a constant hazard. See lecture 1 if you want more math! At best, you can get the cumulative hazard function…

Cumulative Hazard Function • If the hazard function is constant, e.g. h(t)=k, then the cumulative hazard function will be linear (and higher hazards will have steeper slopes): • If the hazard function is increasing with time, e.g. h(t)=kt, then the cumulative hazard function will be curved up, for example h(t)=kt gives a quadratic: • If the hazard function is decreasing over time, e.g. h(t)=k/t, then the cumulative hazard function should be curved down, for example:

Kaplan-Meier: example 2 Researchers randomized44 patients with chronic active hepatitis were to receive prednisolone or no treatment (control), then compared survival curves. Example from: BMJ 1998;317:468-469 ( 15 August )

Survival times (months) of 44 patients with chronic active hepatitis randomised to receive prednisolone or no treatment. Prednisolone (n=22) Control (n=22) 2 2 6 3 12 4 54 7 56 * 10 68 22 89 28 96 29 96 32 125* 37 128* 40 131* 41 140* 54 141* 61 143 63 145* 71 146 127* 148* 140* 162* 146* 168 158* 173* 167* 181* 182* Data from: BMJ 1998;317:468-469 ( 15 August ) *=censored

Big drops at the end of the curve indicate few patients left. E.g., only 2/3 (66%) survived this drop. Kaplan-Meier: example 2 Are these two curves different? Misleading to the eye—apparent convergence by end of study. But this is due to 6 controls who survived fairly long, and 3 events in the treatment group when the sample size was small.

Control group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 3.000 0.9091 0.0909 0.0613 2 20 4.000 0.8636 0.1364 0.0732 3 19 7.000 0.8182 0.1818 0.0822 4 18 10.000 0.7727 0.2273 0.0893 5 17 22.000 0.7273 0.2727 0.0950 6 16 28.000 0.6818 0.3182 0.0993 7 15 29.000 0.6364 0.3636 0.1026 8 14 32.000 0.5909 0.4091 0.1048 9 13 37.000 0.5455 0.4545 0.1062 10 12 40.000 0.5000 0.5000 0.1066 11 11 41.000 0.4545 0.5455 0.1062 12 10 54.000 0.4091 0.5909 0.1048 13 9 61.000 0.3636 0.6364 0.1026 14 8 63.000 0.3182 0.6818 0.0993 15 7 71.000 0.2727 0.7273 0.0950 16 6 127.000* . . . 16 5 140.000* . . . 16 4 146.000* . . . 16 3 158.000* . . . 16 2 167.000* . . . 16 1 182.000* . . . 16 0 6 controls made it past 100 months.

5/6 of 54% rapidly drops the curve to 45%. 2/3 of 45% rapidly drops the curve to 30%. treated group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 6.000 0.9091 0.0909 0.0613 2 20 12.000 0.8636 0.1364 0.0732 3 19 54.000 0.8182 0.1818 0.0822 4 18 56.000* . . . 4 17 68.000 0.7701 0.2299 0.0904 5 16 89.000 0.7219 0.2781 0.0967 6 15 96.000 . . . 7 14 96.000 0.6257 0.3743 0.1051 8 13 125.000* . . . 8 12 128.000* . . . 8 11 131.000* . . . 8 10 140.000* . . . 8 9 141.000* . . . 8 8 143.000 0.5475 0.4525 0.1175 9 7 145.000* . . . 9 6 146.000 0.4562 0.5438 0.1285 10 5 148.000* . . . 10 4 162.000* . . . 10 3 168.000 0.3041 0.6959 0.1509 11 2 173.000* . . . 11 1 181.000* . . . 11 0

Point-wise confidence intervals We will not worry about mathematical formula for confidence bands. The important point is that there is a confidence interval for each estimate of S(t). (SAS uses Greenwood’s formula.)

Log-rank test Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 4.6599 1 0.0309 Wilcoxon 6.5435 1 0.0105 -2Log(LR) 5.4096 1 0.0200 Chi-square test (with 1 df) of the (overall) difference between the two groups. Groups appear significantly different.

Log-rank test Log-rank test is just a Cochran-Mantel-Haenszel chi-square test! Anyone remember (know) what this is?

Event No Event Group 1 a b Group 2 c d CMH test of conditional independence K Strata = unique event times Nk

Event No Event Group 1 a b Group 2 c d Why is this the expected value in each stratum? Variance is the variance of a hypergeometric distribution CMH test of conditional independence How do you know that this is a chi-square with 1 df?

At risk=22 1st event at month 2. Event time 1 (2 months), control group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 3.000 0.9091 0.0909 0.0613 2 20 4.000 0.8636 0.1364 0.0732 3 19 7.000 0.8182 0.1818 0.0822 4 18 10.000 0.7727 0.2273 0.0893 5 17 22.000 0.7273 0.2727 0.0950 6 16 28.000 0.6818 0.3182 0.0993 7 15 29.000 0.6364 0.3636 0.1026 8 14 32.000 0.5909 0.4091 0.1048 9 13 37.000 0.5455 0.4545 0.1062 10 12 40.000 0.5000 0.5000 0.1066 11 11 41.000 0.4545 0.5455 0.1062 12 10 54.000 0.4091 0.5909 0.1048 13 9 61.000 0.3636 0.6364 0.1026 14 8 63.000 0.3182 0.6818 0.0993 15 7 71.000 0.2727 0.7273 0.0950 16 6 127.000* . . . 16 5 140.000* . . . 16 4 146.000* . . . 16 3 158.000* . . . 16 2 167.000* . . . 16 1 182.000* . . . 16 0

At risk=22 1st event at month 2. Event time 1 (2 months), treated group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 6.000 0.9091 0.0909 0.0613 2 20 12.000 0.8636 0.1364 0.0732 3 19 54.000 0.8182 0.1818 0.0822 4 18 56.000* . . . 4 17 68.000 0.7701 0.2299 0.0904 5 16 89.000 0.7219 0.2781 0.0967 6 15 96.000 . . . 7 14 96.000 0.6257 0.3743 0.1051 8 13 125.000* . . . 8 12 128.000* . . . 8 11 131.000* . . . 8 10 140.000* . . . 8 9 141.000* . . . 8 8 143.000 0.5475 0.4525 0.1175 9 7 145.000* . . . 9 6 146.000 0.4562 0.5438 0.1285 10 5 148.000* . . . 10 4 162.000* . . . 10 3 168.000 0.3041 0.6959 0.1509 11 2 173.000* . . . 11 1 181.000* . . . 11 0

Event No Event treated 1 21 control 1 21 Stratum 1= event time 1 Event time 1: 1 died from each group. (22 at risk in each group) 44

At risk=21 Next event at month 3. Event time 2 (3 months), control group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 3.000 0.9091 0.0909 0.0613 2 20 4.000 0.8636 0.1364 0.0732 3 19 7.000 0.8182 0.1818 0.0822 4 18 10.000 0.7727 0.2273 0.0893 5 17 22.000 0.7273 0.2727 0.0950 6 16 28.000 0.6818 0.3182 0.0993 7 15 29.000 0.6364 0.3636 0.1026 8 14 32.000 0.5909 0.4091 0.1048 9 13 37.000 0.5455 0.4545 0.1062 10 12 40.000 0.5000 0.5000 0.1066 11 11 41.000 0.4545 0.5455 0.1062 12 10 54.000 0.4091 0.5909 0.1048 13 9 61.000 0.3636 0.6364 0.1026 14 8 63.000 0.3182 0.6818 0.0993 15 7 71.000 0.2727 0.7273 0.0950 16 6 127.000* . . . 16 5 140.000* . . . 16 4 146.000* . . . 16 3 158.000* . . . 16 2 167.000* . . . 16 1 182.000* . . . 16 0

At risk=21 No events at 3 months Event time 2 (3 months), treated group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 6.000 0.9091 0.0909 0.0613 2 20 12.000 0.8636 0.1364 0.0732 3 19 54.000 0.8182 0.1818 0.0822 4 18 56.000* . . . 4 17 68.000 0.7701 0.2299 0.0904 5 16 89.000 0.7219 0.2781 0.0967 6 15 96.000 . . . 7 14 96.000 0.6257 0.3743 0.1051 8 13 125.000* . . . 8 12 128.000* . . . 8 11 131.000* . . . 8 10 140.000* . . . 8 9 141.000* . . . 8 8 143.000 0.5475 0.4525 0.1175 9 7 145.000* . . . 9 6 146.000 0.4562 0.5438 0.1285 10 5 148.000* . . . 10 4 162.000* . . . 10 3 168.000 0.3041 0.6959 0.1509 11 2 173.000* . . . 11 1 181.000* . . . 11 0

Event No Event treated 0 21 control 1 20 Stratum 2= event time 2 Event time 2: At 3 months, 1 died in the control group. At that time 21 from each group were at risk 42

At risk=20 1 event at month 4. Event time 3 (4 months), control group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 3.000 0.9091 0.0909 0.0613 2 20 4.000 0.8636 0.1364 0.0732 3 19 7.000 0.8182 0.1818 0.0822 4 18 10.000 0.7727 0.2273 0.0893 5 17 22.000 0.7273 0.2727 0.0950 6 16 28.000 0.6818 0.3182 0.0993 7 15 29.000 0.6364 0.3636 0.1026 8 14 32.000 0.5909 0.4091 0.1048 9 13 37.000 0.5455 0.4545 0.1062 10 12 40.000 0.5000 0.5000 0.1066 11 11 41.000 0.4545 0.5455 0.1062 12 10 54.000 0.4091 0.5909 0.1048 13 9 61.000 0.3636 0.6364 0.1026 14 8 63.000 0.3182 0.6818 0.0993 15 7 71.000 0.2727 0.7273 0.0950 16 6 127.000* . . . 16 5 140.000* . . . 16 4 146.000* . . . 16 3 158.000* . . . 16 2 167.000* . . . 16 1 182.000* . . . 16 0

At risk=21 Event time 3 (4 months), treated group: Survival Standard Number Number time Survival Failure Error Failed Left 0.000 1.0000 0 0 0 22 2.000 0.9545 0.0455 0.0444 1 21 6.000 0.9091 0.0909 0.0613 2 20 12.000 0.8636 0.1364 0.0732 3 19 54.000 0.8182 0.1818 0.0822 4 18 56.000* . . . 4 17 68.000 0.7701 0.2299 0.0904 5 16 89.000 0.7219 0.2781 0.0967 6 15 96.000 . . . 7 14 96.000 0.6257 0.3743 0.1051 8 13 125.000* . . . 8 12 128.000* . . . 8 11 131.000* . . . 8 10 140.000* . . . 8 9 141.000* . . . 8 8 143.000 0.5475 0.4525 0.1175 9 7 145.000* . . . 9 6 146.000 0.4562 0.5438 0.1285 10 5 148.000* . . . 10 4 162.000* . . . 10 3 168.000 0.3041 0.6959 0.1509 11 2 173.000* . . . 11 1 181.000* . . . 11 0

Kaplan-Meier methods and Parametric Regression methods

Kaplan-Meier methods and Parametric Regression methods

Presentation Transcript

Linear methods for regression

Non-Parametric Power Spectrum Estimation Methods

Non-parametric methods

Linear Methods for Regression

EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 25, 2014

Regression Methods

Inverse Regression Methods

Time series Regression : Descriptive Methods

Linear Methods for Regression

Non-Parametric Methods

Linear Methods for Regression (2)

Regression Methods

Parametric Methods

Analysing continuous data Parametric versus Non-parametric methods

Regression Methods

Non-parametric methods