Cervical cancer case study
1 / 22

Cervical Cancer Case Study - PowerPoint PPT Presentation

  • Updated On :

Cervical Cancer Case Study. Presented by: University of Guelph. Baktiar Hasan Mark Kane Melanie Laframboise Michael Maschio Andy Quigley. Objectives. To determine an appropriate model for the prediction of recurrence of cervical cancer

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Cervical Cancer Case Study' - carminda

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cervical cancer case study

Cervical Cancer Case Study

Presented by:

University of Guelph

Baktiar Hasan

Mark Kane

Melanie Laframboise

Michael Maschio

Andy Quigley


  • To determine an appropriate model for the prediction of recurrence of cervical cancer

  • To classify future patients on their risk of recurrence of cervical cancer

Cervical cancer data set
Cervical Cancer Data Set

The original data set included 905 cases

Patients were removed from the data set if they had ANY of the following:

  • Were NOT free of the disease after surgery

     845 Cases remain

Modeling methods
Modeling Methods

  • Mixture Model with Accelerated Failure time

    • Peng and Debham (1998)

  • Cox Proportional Hazard Model

  • Latent Variable Model

  • Bayesian Survival Analysis

    • Seltman, Greenhouse, and Wassserman (2001)

    • Chen, Ibrahim, and Sinha (1999)

Mixture model
Mixture model

  • The model we chose for modeling time to recurrence is a mixture model of the form:

    S(t)=pSu(t) + (1-p)



  • Allows for cure rate

  • Covariates can be incorporated into survival time [Su(t)] AND\OR cure rate [1-p]

Mixture model con t
Mixture Model (Con’t)

  • The model can be fit using a S-plus library (GFCURE) written by Peng.

  • Further details about the library and the model can be found in Peng et al. (1998) and Maller and Zhou (1996).

  • It should be mentioned that we found an error in the S-plus library written by Peng. The function pred.gfcure has a small error which can cause the program to crash or produce incorrect predicted values in some situations.

Immunes and sufficient follow up
“Immunes” and Sufficient Follow up

  • Maller and Zhou (1996) suggest tests to examine the hypotheses of:

    • Presence of “immunes” in the data set

    • Sufficient follow up time

  • In the data set, it was found that immunes were present and there was not strong evidence to suggest that follow up time was insufficient

Missing covariates
Missing Covariates

  • It was noticed that a large proportion of the cases (≈40%) had at least one covariate with a missing value

  • Various methods to handle this situation include:

    • Ignoring cases with missing covariate data

    • Maximum Likelihood MethodsChen and Ibrahim (2001)

Missing covariates con t
Missing Covariates (Con’t)

  • We chose to perform variable selection on only the cases that contain no missing covariates (n=534).

  • BIAS introduced ???

  • CHECK: compare distributions of covariates in “full” and “reduced” data sets

  • NO significant bias was introduced


  • A variety of distributions were considered for modeling recurrence time including Weibull, gamma, lognormal, log-logistic, extended generalized gamma and generalized F.

  • From comparing the distributions using AIC for the above models, there was little improvement from fitting a distribution with 3 or 4 parameters versus a 2 parameter distribution.

  • Of the 2 parameter distributions considered the Weibull distribution surfaced as the best distribution in terms of likelihood and prediction of the cure rate.

Variable selection
Variable Selection

  • Stepwise variable selection was performed using the 534 patients previously mentioned; AIC was used as the entering criterion.

  • Variables were allowed to enter both the cure rate portion of the model and survival time portion of the model.

  • The final model chosen uses the explanatory variables pelvis lymph node involvement (PELLYMPH) and size of tumor (SIZE) to model the survival time of uncured patients and uses Capillary Lymphatic Spaces (CLS) and depth of tumor (MAXDEPTH) to predict cure rate.

Variable selection con t
Variable Selection (Con’t)

  • It should be noted that CLS was modeled as a continuous variable rather than discrete because twice the difference of log likelihoods from modeling CLS as continuous versus discrete is 0.017.

  • Interactions of the significant covariates in the chosen model were also considered, but were found to be non-significant.

Interpretation of the model
Interpretation of the Model

  • The negativecoefficient of PELLYMPH indicates that uncured patients found positive for pelvis lymph node involvement will have a lower recurrence time than patients found negative for pelvis lymph node involvement .

  • The coefficient of SIZE is also negative, which means that for uncured patients, larger tumor size corresponds to quicker recurrence of cancer.

  • The positive value of CLS in the cure rate portion of the model indicates that patients with a positive prognosis have a higher probability of recurrence.

  • The coefficient of MAXDEPTH is also positive, indicating that patients with a large tumor depth have a higher probability of recurrence.

Model validation
Model Validation

  • In order to determine how well the chosen model will predict future patients, the data was randomly split into two subsets.

  • Since it is not known if a patient who did not relapse was cured or censored it is not possible to compare the predicted probability of recurrence with the actual probability of recurrence.

  • A graphical method was utilized for determining how well the predicted probabilities performed.

Model validation con t
Model Validation (Con’t)

  • The graphical method involved predicting the probability of recurrence before time ti (F(t))for a number of chosen times.

  • This prediction is smoothed against recurrence, which is 1 if recurrence occurred before time ti or 0 if recurrence has not occurred before time ti

  • A criticism of this graphical method is that it is possible for a patient with a survival time less than ti but no recurrence to have a recurrence between their censored survival time and ti so they should have been coded as a 1 not a zero for the graph.


  • The second objective is to classify patients into 3 groups: Low relapse, Moderate relapse, and High relapse.

  • We classified patients based on their estimated cure rate from the final model previously mentioned.

  • Low relapse: estimated cure rate ≥ 94%

  • Moderate relapse: 84% < estimated cure rate < 94%

  • High relapse: estimated cure rate ≤ 84%


  • We found that the attributes Capillary Lymphatic Spaces and depth of tumor are important for predicting the probability of relapse and pelvis lymph node involvement and size of tumor are important for predicting the survival time of uncured patients.

  • We used these attributes in a Weibull mixture model to classify patients according to their risk of recurrence.


  • Chen, M., and Ibrahim, J. (2001), “Maximum likelihood methods for cure rate models with missing covariates” Biometrics, 57, 43-52.

  • Chen, M., Ibrahim, J., and Sinha, D. (1999), “A new bayesian model for survival data with a surviving fraction” JASA, 94, 909-919.

  • Maller, R., and Zhou, X. (1996), Survival Analysis with Long-Term Survivors. Toronto: John Wiley & Sons.

  • Peng, Y., Dear, K., and Debham, J. (1998), “A generalized F mixture model for cure rate estimation” Statistics in Medicine, 17, 813-830.

  • Seltman, H., Greenhouse, J., and Wasserman, L. (2001), “Bayesian model selection: analysis of a survival model with a surviving function” Statistics in Medicine 20, 1681-1691.