1 / 50

Lecture 9: Hypothesis Testing

Lecture 9: Hypothesis Testing. One sample tests > 2 sample. Hypothesis Testing for One-Sample. Standard set-up What is q ? Common approach Assume distribution is exponential Test that distribution is exponential with q = q 0. Pretty Stringent. Actually

dexteri
Download Presentation

Lecture 9: Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9: Hypothesis Testing One sample tests >2 sample

  2. Hypothesis Testing for One-Sample • Standard set-up • What is q ? • Common approach • Assume distribution is exponential • Test that distribution is exponential with q = q0

  3. Pretty Stringent • Actually • As long as the hazard is specified for the range of t, tests can be performed

  4. General Form of Test When H0 is true When H0 is true; assuming large N Note: this is a one-sided test to test h(t) > h0(t)

  5. Log-Rank • W(ti) = Y(ti) (the most popular choice of weight function) • Y(ti) is the number of individuals in the risk set at time ti

  6. Accounting for Left-Truncation • Choice of weights is still W(t) = Y(t)

  7. Other Options • Harrington and Fleming • WHF(t)=Y(t)*S0(t)p *[1-S0(t)]q, where p,q≥0 and S0(t)=exp(-H0(t)) • Allows user to have flexibility in weighting • Can choose early (p>>q) or late (p<<q) departures or departures in the mid-range (p=q>0) from the null hypothesis to be more influential • Special case: log-rank test, p = q= 0

  8. Notes • An estimator of the variance, V, can be the empirical estimate rather than the hypothesized value • When the alternative, h(t) > h0(t) is true, this variance estimator is expected to be larger and the test less powerful • If h(t) < h0(t) then this variance will be smaller and the test more powerful

  9. Example: Rheumatoid Arthritis • 10 white males with RA followed for up to 18 years • Objective: • Determine if men with RA are at greater risk of mortality

  10. Test statistics =

  11. Bone Marrow Transplant for Leukemia (example 1.3 in the book) • Patient undergoing bone marrow transplant (BMT) for acute leukemia • Three types of leukemia • ALL • AML low risk • AML high risk • What if we are interested in overall incidence rate (i.e. either relapse or death) across all three leukemia types

  12. Estimated KM survival probability for all incidence (i.e. both death and TRM)

  13. BMT Example • Want to test whether or not survival in BMT patients follows an exponential distribution • What does this mean we are asking? • Can estimate l from the data (recall the MLE for an exponential distribution)

  14. R Code ### BMT example data<-read.csv("H:\\public_html\\BMTRY722_Summer2019\\Data\\BMT_1_3.csv“) failtime<-ifelse(data$Relapse==0 & data$Death==0| data$Relapse==1, data$TTR, NA) failtime<-ifelse(data$Death==1 & data$TTR>=data$TTD, data$TTD, failtime) event<-ifelse(data$Relapse==1| data$Death==1, 1, 0) st<-Surv(failtime, event) fit<-survfit(st~1) # empirical survival function plot(fit, xlab="Time", ylab="S(t)", lwd=2) #Calculating lambda hat for estimated hazard rate lambda.hat<-sum(event)/sum(failtime)

  15. “survdiff” Function Description Tests if there is a difference between two or more curves using the G-rho family of tests, or for a single curve against a known alternative Usage survdiff(formula, data, subset, na.action, rho=0) Arguments formula: a formula expression as for other survival models, of the form Surv(time, status)~predictors. For a one-sample test, the predictors must consist of a single offset(sp) term, where sp is a vector giving the survival probability for each subject

  16. “survdiff” Function Method This function implements the G-rho family of Harrington and Fleming (1982), with weights on each death of S(t)^rho, where S is the Kaplan-Meier estimate of survival. With rho=0 this is the log-rank or Mantel-Haenszel test, and with rho=1 it is the equivalent to the Peto & Peto modification of the Gehan-Wilcoxon test. If the right hand side of the formula consists only of an offset term, then a one sample test is done. To cause the missing values in the predictors to be treated as a separate group, rather than being omitted, use a factor function with its exclude argument.

  17. R code #Estimating lambda >lambda.hat<-sum(event)/sum(failtime) # Expected S(t) = exp(-lambda.hat*t) > S.exp<-exp(-lambda.hat*failtime) > one.sample.test<-survdiff(st~offset(S.exp)) # default rho is 0 i.e. log-rank test > one.sample.test1 Observed Expected Z p 83 83 0 1 > one.sample.test2<-survdiff(st~offset(S.exp), rho=1) > one.sample.test2 Observed Expected Z p 83 83 0 0.00521 #Comparing hypothesized dist’n to empirical dist’n > plot(fit, conf.int=F, lwd=2) > lines(sort(failtime), rev(sort(S.exp)), col=2, lwd=2, type="s")

  18. R code #Estimating lambda for failure times <800 > fail2<-failtime[which(failtime<800)] > event2<-event[which(failtime<800)] > lambda.hat2<-sum(event2)/sum(fail2) # Expected S(t) = exp(-.004*t) > S.exp2<-exp(- lambda.hat2 *fail2) > st2<-Surv(fail2, event2); fit2<-survfit(st2~1) > one.sample.testa<-survdiff(st2~offset(S.exp2)) > one.sample.testa Observed Expected Z p 80 80 0 1 > one.sample.testb<-survdiff(st2~offset(S.exp2), rho=1) > one.sample.testb Observed Expected Z p 80 80 0.000 0.477

  19. R code #Estimating lambda for failure times >800 > fail3<-failtime[which(failtime>=800)] > event3<-event[which(failtime>=800)] > lambda.hat3<-sum(event3)/sum(fail3) # Expected S(t) = exp(-.004*t) > S.exp3<-exp(- lambda.hat3*fail3) > st3<-Surv(fail3, event3); fit3<-survfit(st3~1) > one.sample.testc<-survdiff(st3~offset(S.exp3)) > one.sample.testc Observed Expected Z p 3 3 -2.56e-16 1 > one.sample.testd<-survdiff(sts~offset(S.exp3), rho=1) > one.sample.testd Observed Expected Z p 3 3 -0.035 0.9730

  20. Conclusions • So what can we conclude about our original hypothesis?

  21. Relevance • Becoming more common • Phase II cancer studies with TTE outcomes instead of response • But • Often more interested in median or 1 year survival • Yet • Very important for sample size considerations • Most often assume study data will have exponential distribution for sample size

  22. On to something more interesting… comparing >2 samples

  23. Comparing two or more samples • Anova type approach • Where t is the largest time for which all groups have at least one subject at risk • Data can be right-censored (and left truncated) for the tests we will discuss

  24. Notation • Let t1 < t2 < … < tDbe distinct death times in all samples being compared • At time ti, let dijbe the number of events in group j out of Yijindividuals at risk (j = 1,2,…,K) • Define

  25. Rationale • Weighted comparisons of the estimated hazard of thejth population under the null hypothesis and alternative hypothesis • Based on Nelson-Aalen estimator • If the null is true, the pooled estimate of h(t) should be an estimator for hj(t)

  26. Applying the Test • Let Wj(t) be a positive weight function s.t. Wj(t) = 0 if Yij = 0 • If all Zj(t)’s are close to zero, then little evidence to reject the null

  27. Common Form for Weight Functions • All commonly used tests choose weight functions s.t. • Note that weight is common across allj • Can redefine Z:

  28. Test Statistic • Variance and covariance of Zj(t) (K&M p. 207) • Z1(t) , Z2(t) , ..., ZK(t) are linearly dependent because their sum is 0 • For test statistic, choose K – 1 components • Chi-square test with K – 1 d.f. where S-1 is the variance-covariance matrix

  29. Log-Rank Test for 2 Groups • For log-rank W(ti)=1 • Have 2 groups and want to test if survival is the same in the groups • We want to develop a nonparametric test of

  30. Log-Rank Test for 2 Groups • If and follow some parametric distribution and are in the same family, this is easy • For example assume • But need a test whose validity doesn’t depend on parametric assumptions

  31. Constructing the Log-Rank Test • Recall our notation • t1 < t2 < … < tDare D distinct ordered event times • Yij= # people in the group j at risk at ti • Yi = # people at risk across groups at ti • dij = # of people in group jthat fail at ti • di= # of people in across groupsthat fail at ti

  32. Constructing the Log-Rank Test • We can summarize the information at time ti in a 2x2 table

  33. Constructing the Log-Rank Test

  34. Constructing the Log-Rank Test

  35. Constructing the Log-Rank Test

  36. Toy Example • Say we have the following data on two groups: • We want to test the hypothesis

  37. Toy Example

  38. Toy Example

  39. Same Test in R > time<-c(3,6,9,9,11,16,8,9,10,12,19,23) > cens<-c(1,0,1,1,0,1,1,1,0,0,1,0) > grp<-c(1,1,1,1,1,1,2,2,2,2,2,2) > grp<-as.factor(grp) > > sdat<-Surv(time, cens) > survdiff(sdat~grp) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=1 6 4 2.57 0.800 1.62 grp=2 6 3 4.43 0.463 1.62 Chisq= 1.6 on 1 degrees of freedom, p= 0.203

  40. Same Test in R > names(toy) [1] "n" "obs" "exp" "var" "chisq" "call" > toy$obs [1] 4 3 > toy$exp [1] 2.566667 4.433333 > toy$var [,1] [,2] [1,] 1.267778 -1.267778 [2,] -1.267778 1.267778 > toy$chisq [1] 1.620508

  41. UMP Tests

  42. More general: 2 samples • We can change the weight function • For K = 2, can use Z-score or c2 Corrects for ties

  43. Choice for Weight Functions • W(t) = 1 • Log-rank test • Optimal power for detecting differences when hazards are proportional • Wi(t) = Yi • Gehan test • Generalization of 2-sample Mann-Whitney-Wilcoxon test

  44. Choices for Weight Functions • Fleming-Harrington • General case • Special cases • Log-rank: q = 0 • Mann-Whitney-Wilcoxon: p = 1, q = 0 • q = 0, p > 0: gives greater weight to early departures • p = 0, q > 0: gives greater weight to late departures • Allows specific choice of influence (for better or worse!)

  45. Others? • Many • Not all available in all software (e.g. Gehan not in R) • Worth trying a few in each situation to compare inferences

  46. Caveat • Note we are interested in the average difference (consider log-rank specifically) • What if hazards cross? • Could have significant difference prior to some t, and another significant difference after t: but what if direction differs?

  47. Next time • More on different weight functions • Tests for trends

More Related