1 / 18

IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION

IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION Donald A. Pierce, Oregon Health Sciences Univ Ruggero Bellio, Udine, Italy. These slides are at www.science.oregonstate.edu/~piercedo/.

dagan
Download Presentation

IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION • Donald A. Pierce, Oregon Health Sciences Univ • Ruggero Bellio, Udine, Italy These slides are at www.science.oregonstate.edu/~piercedo/ UW Winter 07

  2. Nearly all survival analysis uses first-order asymptotics: limiting distns of MLE, LR, or scores ; interest here only on Cox regression, partial likelihood Usually these approximations are quite good, but it is of interest to verify this or improve on them (Samuelsen, Lifetime Data Analysis, 2003) We consider both higher-order asymptotics and more direct simulation of P-values Primary issue: inference beyond first-order requires more than the likelihood function This may lead to unreasonable dependence of methods on the censoring model and baseline hazard Our approach involves forms of conditioning on censoring UW Winter 07

  3. Consider direct simulation of P-values without this (same issues arise in higher-order asymptotics) One must estimate the baseline hazard, sample failure times according to this, then apply the censoring model which may involve estimating a censoring distribution Quite unattractive in view of the essential nature of Cox regression With suitable conditioning, and some further conventions regarding the censoring model, this can be avoided Aim is to maintain the rank-based nature of inference in the presence of censoring (simulation: sample failures from exponential distn, apply censoring to ranks) We provide convenient Stata and R routines for carrying out both the simulation and higher-order asymptotics. UW Winter 07

  4. COX REGRESSION: Hazards of form , with unspecified. Interest parameter a scalar function of with remaining coordinates as nuisance parameters X O X O O X Risk set : those alive at failure time Multinomial likelihood contribution , the probability that it is individual (i) among these that fails. Partial likelihood UW Winter 07

  5. Useful reference sets for interpreting a given dataset: (i) data-production frame of reference (ii) conditional on “censoring configuration” (iii) treating all risk sets as fixed Using (i) involves censoring model, estimation of baseline hazard and censoring distribution (see Dawid 1991 JRSSS-A regarding data-production and inferential reference sets) That of (ii) requires some development/explanation. By “censoring configuration” we mean the numbers of censorings between successive ordered failures Approach (iii) is not really “conditional”, but many may feel this is the most appropriate reference set --- things are certainly simple from this viewpoint. Applies when risk sets arise in complicated ways, and to time-dependent covariables UW Winter 07

  6. EXTREME* EXAMPLE TO SHOW HOW THINGS ARE WORKING: n = 40 with 30% random censoring, log(RR) interest parameter 1.0 with binary covar -- 5 nuisance params in RR involving exponential covariables. Hypotheses where one-sided Wald P-value is 0.05 * 6 covariables with <30 failures results typical for datasets Lower Upper LR first order 0.046 0.062 Data production, exact (simulation) 0.090 0.020 Conditional, exact (simulation) 0.103 0.024 Conditional, 2nd order asymptotics 0.096 0.025 Fixed risk sets, exact (simulation) 0.054 0.051 Fixed risk sets, 2nd order asymptotics 0.052 0.052 UW Winter 07

  7. With fewer failures, and fewer nuisance parameters, adjustments are smaller and thus harder to summarize. However, the following for a typical dataset shows the essential nature of results. This is for n = 20 with 25% censoring, interest parameter as before, and only 1 nuisance parameter. Lower Upper LR first order 0.042 0.065 Data production, exact (simulation) 0.053 0.040 Conditional, exact (simulation) 0.054 0.037 Conditional, 2nd order asymptotics 0.060 0.043 Fixed risk sets, 2nd order asymptotics 0.047 0.051 Samuelsen’s conclusion, that in small samples the Wald and LR confidence intervals are conservative, does not seem to hold up with any useful generality UW Winter 07

  8. CONDITIONING ON “CENSORING CONFIGURATION” That is, on the vector , where is the number censored following the jth ordered failure Seems easy to accept that this is “ancillary” information, for inference about relative risk when using partial likelihood. It could be that “ancillary” is not the best term for this (comments please!!) The further convention involved in making this useful pertains to which individuals are censored Our convention for this: in martingale fashion, sample from risk sets the to be censored, with probabilities possibly depending on covariables (comments please!!) Unless these probabilities depend on covariables, a quite exceptional assumption, results of Kalbfleisch & Prentice (1973 Bka) apply: partial likelihood is the likelihood of “reduced ranks” UW Winter 07

  9. Recall that a probability model for censoring is often (but with notable exceptions) sort of a “fiction” concocted by the statistician, with following aims A common model is that for each individual there is a fixed, or random, latent censoring time and what is observed is the minimum of the failure and censoring time Leads to usual likelihood function: product over individuals of The use of censoring models is usually only to consider whether this likelihood is valid (censoring is “uninformative”) --- model is not used beyond this But usual models as above render the problem not to be one only involving ranks, whereas our conditioning and convention maintain the rank-based inferential structure UW Winter 07

  10. 2, 3, 4, 1 2, 4, 3, 1 2, 4, 1, 3 “Reduced ranks”, or marginal distribution of ranks, concept – individual 3 is here censored x x O x Compatible ranks for uncensored data – the single “reduced ranks” outcome Partial likelihood, as a function of the data, provides the distribution of these reduced ranks UW Winter 07

  11. Thus with our conditioning and convention, and no direct dependence of censoring on covariates, the K&P result yields that the partial likelihood is the actual likelihood for the “reduced rank” data Means that all the theory of higher-order likelihood inference applies to partial likelihood (subject to minor issues of discreteness) --- a more general argument exists for the data-production reference set Higher-order asymptotics depend only on certain covariances of scores and loglikelihood Either exact or asymptotic results can in principle be computed from the K&P result, but simulation is both simpler and more computationally efficient Simulation for asymptotics is considerably simpler than for exact results (no need to fit models for each trial), but many will prefer the latter when it is not problematic UW Winter 07

  12. SIMULATION OF P-VALUES: With conditioning, one may: (i) simulate failure times using constant baseline hazard since only the ranks matter (ii) apply censoring process to the rank data, and (iii) fit the two models Our primary aim is to lay out assumptions justifying (i) and (ii). (comments please!!) Highly tractable, except that null and alternative model must be fitted for each trial Quite often must allow for “infinite” MLEs, but even with this can be problematic for small samples Primary advantage over asymptotics is the transparency Stata procedure uses same syntax as the ordinary fitting routine, takes about a minute for 5,000 trials UW Winter 07

  13. SECOND-ORDER METHODS: This is for inference about scalar functions of the RR. It involves the quantity proposed by Barndorff-Nielsen, where r is the signed-root maximum LR, and adj involves more than the likelihood function. Insight into limitations of first-order methods derives from decomposing this adjustment as where NP allows for fitting nuisance parameters and INF basically allows for moving from likelihood to frequency inference. Generally, INF is only important for fairly small samples, but NP can be important for reasonable amounts of data when there are several nuisance parameters. UW Winter 07

  14. COMPUTATION OF THIS: Will not give (the fairly simple) formulas here, but they involve computing where the parameter are then evaluated at the constrained and full MLEs (formulas: Pierce & Bellio, Bka 2006, 425) These must be computed by simulation, raising the same issues about reference sets, but this is far easier than the simulation of likelihood ratios Quantities above pertain to statistical curvature, and at least in our setting the magnitude and direction of the NP adjustment relate to the extent and direction of the curvature introduced by variation in composition of risk sets UW Winter 07

  15. RISK SETS AS FIXED: Things simplify considerably for the inferential reference set where the risk sets are taken as fixed (and experiments on these as independent) Use of this reference set often seems necessary when the risk sets arise in complex ways, mainly useful for inference about relative risk beyond analysis of simple response-time data It is also quite adequate for all needs when the numbers at risk are large in relation to the number of failures (rare events). UW Winter 07

  16. FORMULAS FOR FIXED RISK SETS: In this case the setting is one of independent multinomial experiments defined on the risk sets. Following is for loglinear RR Formulas of Pierce & Peters 1992 JRSS-B apply, yielding Where w is the Wald statistic, and is the ratio of determinants of the nuisance parameter information at the full and constrained MLEs May be useful in exploring for what settings the NP adjustment is important: nuisance parameter information must “vary rapidly” with the value of the interest parameter However, these adjustments are smaller than for our other reference sets UW Winter 07

  17. SAME AS FIRST EXAMPLE (5 nuisance parameters) BUT WITH: n = 500 with 97% random censoring (fewer failures than before, namely about 15) – rare disease case Remainder of model specification as in first example, results when Wald P-value is 0.05 typical results for a single dataset, lower limits LR first order 0.057 Data-production refset 0.059 Conditional exact (direct simulation) 0.054 Conditional, second-order 0.054 Fixed risk sets, exact (simulation) 0.055 Fixed risk sets, 2nd order 0.052 UW Winter 07

  18. OVERALL RECOMMENDATIONS: • Seems that adjustments will usually be small, but it may be at least worthwhile to verify that in many instances if convenient enough. • Will provide routines in Stata and R. The Stata one largely uses same syntax as basic fitting command. • When failures are a substantial fraction of those at risk, use conditional simulation of P-values unless problems with fitting are encountered • If those problems are likely or encountered, then use the 2nd-order methods. These also provide more insight. • When failures are a small fraction of those at risk, or when risk sets arise in some special way, use the asymptotic fixed risk set calculations UW Winter 07

More Related