Lecture 20

1 / 44

# Lecture 20 - PowerPoint PPT Presentation

Lecture 20. Comparing groups Cox PHM. Comparing two or more samples. Anova type approach where τ is the largest time for which all groups have at least one subject at risk Data can be right-censored for the tests we will discuss. Notation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Lecture 20' - frayne

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Lecture 20

Comparing groups

Cox PHM

Comparing two or more samples
• Anova type approach

where τ is the largest time for which all groups have at least one subject at risk

• Data can be right-censored for the tests we will discuss
Notation
• t1<t2<…tDbe distinct death times in all samples being compared
• At time ti, let dij be the number of deaths in group j out of Yij individuals at risk. (j=1,..,K)
• Define
Log-Rank Test Rationale
• Comparisons of the estimated hazard rate of the jth population under the null and alternative hypotheses
• If the null is true, the pooled estimate of h(t) should be an estimator for hj(t)
Applying the Test

for j = 1,…,K

If all Zj(τ)’s are close to zero, then little evidence to reject the null.

Others?
• LOTS!
• Gehan test
• Fleming-Harrington
• Not all available in all software worth trying a few in each situation to compare inferences
2+ samples
• Let’s look at a prostate cancer dataset
• Prostate cancer clinical trial
• 3 trt groups (doce Q3, doce weekly, Q3 mitoxantrone)
• 5 PSA doubling times categories
• outcome: overall survival
R: survdiff

#################################

# test for differences by trt grp

plot(survfit(st~trt), mark.time=F, col=c(1,2,3))

test1 <- survdiff(st~trt)

test2 <- survdiff(st~factor(trt, exclude=3))

test3 <- survdiff(st[trt<3]~trt[trt<3])

R: survdiff

legend(50,1,as.character(0:4), lty=rep(1,5), col=1:5, lwd=rep(2,5))

Caveat
• Note that we are interested in the average difference (consider log-rank specifically)
• What if hazards ‘cross’?
• Could have significant difference prior to some t, and another significant difference after t: but, what if direction differs?
• Not much evidence of crossing
• if there isnt overlap, then tests will be somewhat consistent
• log-rank: most appropriate for ‘proportional hazards’
Example
• K&M 1.4
• Kidney infection data
• Two groups:
• patients with percutaneous placement of catheters (N=76)
• patients with surgical placement of catheters (N=43)
Comparisons

p

0.11

0.96

0.53

0.24

0.26

0.002

0.24

0.002

0.002

0.004

Notice the differences!
• Situation of varying inferences
• Need to be sure that you are testing what you think you are testing
• Check:
• look at hazards?
• do they cross?
• Problem:
• estimating hazards is messy and imprecise
• recall: h(t)= derivative H(t)
Misconception
• Survival curves crossing  telling about appropriateness of log-rank
• Not true:
• survivals crossing depends on censoring and study length
• what if they will cross but t range isnt sufficient?
• Consider:
• Survival curves cross  hazards cross
• Hazards cross  survivals may or may not cross
• solution?
• test in regions of t
• prior to and after cross based on looking at hazards
• some tests allow for crossing (Yang and Prentice 2005)
Cox Propotional Hazards Model
• Names
• Cox regression
• semi-parametric proportional hazards
• Proportional hazards model
• Multiplicative hazards model
• When?
• 1972
• Why?
• allows adjustment for covariates (continuous or categorical) in a survival setting
• allows prediction of survival based on a set of covariates
• Analogous to linear and logistic regression in many ways
Cox PHM Notation
• Data on n individuals:
• Tj : time on study for individual j
• dj : event indicator for individual j
• Zj : vector of covariates for individual j
• More complicated: Zj(t)
• covariates are time dependent
• they may change with time/age
Basic Model

For a Cox model with just one covariate:

• h0(t):
• arbitrary baseline hazard rate.
• notice that it varies by t
• β:
• regression coefficient (vector)
• interpretation is a log hazard ratio
• Semi-parametric form
• non-parametric baseline hazard
• parametric form assumed only for covariate effects
Linear model formulation
• Usual formulation
• Coding of covariates similar to linear and logistic (and other generalized linear models)
Why “proportional”?
• hazard ratio
• Does not depend on t (i.e., it is a constant over time)
• But, it is proportional (constant multiplicative factor)
• Also referred to (sometimes) as the relative risk.
Simple example
• one covariate: z = 1 for new treatment, z=0 for standard treatment
• hazard ratio = exp(β)
• interpretation: exp(β) is the risk of having the event in the new treatment group versus the standard treatment
• Interpretation: at any point in time, the risk of the event in the new treatment group is exp(β) times the risk in the standard treatment group
Hazard Ratios
• Assumption: “Proportional hazards”
• The risk does not depend on time.
• That is, “risk is constant over time”
• But that is still vague…..
• Hypothetical Example: Assume hazard ratio is 0.5.
• Patients in new therapy group are at half the risk of death as those in standard treatment, at any given point in time.
• Hazard function= P(die at time t | survived to time t)
Hazard Ratios
• Hazard Ratio =

hazard function for New

hazard function for Std

• Makes assumption

that this ratio is

constant over time.

Interpretation Again
• For any fixed point in time, individuals in the new treatment group are at half the risk of death as the standard treatment group.
Refresher of coding covariates
• This should be nothing new
• Two kinds of ‘independent’ variables
• quantitative
• qualitative
• Quantitative are continuous
• need to determine scale
• units
• transformation?
• Qualitative are generally categorical
• ordered
• nominal
• coding affects the interpretation
Tests of the model
• Testing that βk=0 for all k=1,..,p
• Three main tests
• Chi-square/Wald test
• Likelihood ratio test
• score(s) test
• All three have chi-square distribution with p degrees of freedom
Example: TAX327
• Randomized clinical trial of men with hormone-refractory prostate cancer
• three treatment arms (Q3 docetaxel, weekly docetaxel, and Q3 mitixantrone)
• other covariates of interest:
• psa doubling time
• lymph node involvement
• liver metastases
• number of metastatic sites
• pain at baseline
• baseline psa
• alkaline phosphatase
• hemoglobin
• performance status
Cox PHM approach

st <- Surv(survtime, died)

attach(data, pos=2)

reg1 <- coxph(st ~ trtgrp)

reg2 <- coxph(st ~ factor(trtgrp))

summary(reg2)

attributes(reg2)

reg2\$coefficients

summary(reg2)\$coef

Results

> summary(reg2)

Call:

coxph(formula = st ~ factor(trtgrp))

n= 1006

coef exp(coef) se(coef) z p

factor(trtgrp)2 0.105 1.11 0.0882 1.19 0.2300

factor(trtgrp)3 0.245 1.28 0.0863 2.84 0.0045

exp(coef) exp(-coef) lower .95 upper .95

factor(trtgrp)2 1.11 0.900 0.935 1.32

factor(trtgrp)3 1.28 0.783 1.079 1.51

Rsquare= 0.008 (max possible= 1 )

Likelihood ratio test= 8.12 on 2 df, p=0.0173

Wald test = 8.16 on 2 df, p=0.0169

Score (logrank) test = 8.19 on 2 df, p=0.0167

Multiple regression
• In the published paper, the model included all covariates included in previous list
Fitting it in R

reg3 <- coxph(st ~ factor(trtgrp) + liverny + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + psadtmonthcat)

reg4 <- coxph(st ~ factor(trtgrp) + liverny + numbersites +

pain0c + pskar2c + proml + probs + highgrade + logpsa0 +

> reg3

Call:

coxph(formula = st ~ factor(trtgrp) + liverny + numbersites +

pain0c + pskar2c + proml + probs + highgrade + logpsa0 +

coef exp(coef) se(coef) z p

factor(trtgrp)2 0.1230 1.131 0.1099 1.12 2.6e-01

factor(trtgrp)3 0.3784 1.460 0.1070 3.54 4.0e-04

liverny 0.4813 1.618 0.2168 2.22 2.6e-02

numbersites 0.4757 1.609 0.1430 3.33 8.8e-04

pain0c 0.3708 1.449 0.0925 4.01 6.1e-05

pskar2c 0.3167 1.373 0.1339 2.37 1.8e-02

proml 0.3132 1.368 0.1125 2.78 5.4e-03

probs 0.2568 1.293 0.0991 2.59 9.5e-03

highgrade 0.1703 1.186 0.0922 1.85 6.5e-02

logpsa0 0.1549 1.168 0.0312 4.96 7.0e-07

logalkp0c 0.2396 1.271 0.0483 4.96 7.0e-07

hemecenter -0.1041 0.901 0.0351 -2.96 3.1e-03

psadtmonthcat -0.0884 0.915 0.0430 -2.05 4.0e-02

Likelihood ratio test=205 on 13 df, p=0 n=641 (365 observations deleted due to missingness)

>

proportional?
• recall we are making strong assumption that we have proportional hazards for each covariate
• we can investigate this to some extent via graphical displays
• but, limited for quantitative variables
“Local” Tests
• Testing individual coefficients
• But, more interestingly, testing sets of coefficients
• Example:
• testing the psa variables
• testing treatment group (3 categories)
• Same as previous:
• Wald test
• Likelihood ratio
• Scores test
TAX327

reg5 <- coxph(st ~ liverny + numbersites +

pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + factor(psadtmonthcat))

lrt.trt <- 2*(reg4\$loglik[2] - reg5\$loglik[2])

p.trt <- 1-pchisq(lrt.trt, 2)

#` to compare, you need to have the same dataset