300 likes | 382 Views
This study delves into the complexities of analyzing competing risks data with missing or misclassified causes. Various methods and results are discussed, highlighting challenges and solutions in this area.
E N D
Issues for analyzing competing-risks data with missing or misclassification in causes Dr. Ronny Westerman Institute of Medical SociologyandSocialMedicine Medical School & University Hospital July 27, 2012
Introduction Background Data and Methods Results Discussion Reference • Outline • Introduction/Background • Data • Methods • Discussion • Perspectives
Introduction Background Data and Methods Results Discussion Reference • Introduction • Limited Failure Models (Immortal) • CompetingRisks • MissingandMisclassificationofcauses (Maskedcauses)
Introduction Background Data and Methods Results Discussion Reference • Limited FailureModel (CureSurvival Models) • Examples: Infant Mortality • Curabilityofcanceranddecreasingmortalityrisksincediagnosisofcancer • None defectiveunitsare not expectedtofailfromrisk
Introduction Background Data and Methods Results Discussion Reference The CompetingRisk Problem: Eachsubjectbeingexposedtomanycompetingrisks, but onlyone will becausedthefailure Subject ist still right-censoredifit do not failwithinthefollow-upduration
Introduction Background Data and Methods Results Discussion Reference • CompetingRisks • Non-parametic, semi-parametricandfull-parametricmodels • Cause-spezifichazardfunction • Problem: Assumptionofindependencethroughcauseoftenviolated? • Failure Time for all risksareoperativelythe same, in thatcase, all risksbeingremovedexcepttheriskunderconsideration
Introduction Background Data and Methods Results Discussion Reference • MissclassificationandMissingofcauses • Causeofeventforsomeofunitsorindividuals not exactlyidentifiedorrecored • Partial masking: Causeisnarrowed down but not exactelyidentified • Reasonformissclassification: • documentationcontainingtheinformationneededforattributingthecauseoffailuremaybe not collected, orthecauseofdiseasesforsomepatientsmaybedifficulttodetermine
Introduction Background Data and Methods Results Discussion Reference • Difficultiesfordetermination: (aetiologicalproblems) • Example: Cardioembolicstroke (Leary andCaplan, 2008) • Cardioembolicstrokeoccurswhentheheartpumpsunwantedmaterialsintothebraincirculation, resulting in theocclusionof a brainbloodvesselanddamagetothebraintissue. • CS diagnosed in 3-8% strokepatients, but in variouscurrentstrokeregistries, approximately 10-20% patients with CS have not maximal symptomsattheonsetoftheirstrokeExclusion
Introduction Background Data and Methods Results Discussion Reference • Missclassification: • Example: BreastCancer • TNM Stagingvs . I-IV Staging • Stage migration: improveddetectionofillnessleadstomovementofpeoplefromthesetofhealthypeopletothesetofunhealthypeople • Will Rogers phenomen: „WhentheOkiesleft Oklahoma andmovedtoCalifornia, theyraised theaverageintelligencelevel in bothstates“
Introduction Background Data and Methods Results Discussion Reference • Methodsfortreatingmaskedcausedata • 1) MutipleImputations Shouldbeused, when Baseline are not proportional Works good in caseofMissingat Random (forcause) • Problem: High-Mortality-Risks, Multiple-Specificand High- Potential-Risksoften not Missingat Random Falseclassificationormisinterpretationofcause-specific mortality
Introduction Background Data and Methods Results Discussion Reference • Methodsfortreatingmaskedcausedata • 2) Second Stage Analysis • Models with non-proportional cause-specifichazard • 3) EM forgroupedSurvivaldata BayesianMethods • Assumptionsformaskedcauses: • Rightcensoring, ifcauses not exactlyidentified • Maskingprobabiltyisconstantover time
Introduction Background Data and Methods Results Discussion Reference • SEER CancerStatistic Data Base National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch (released April 2012) • IncidencebyRace, Gender and Age (different periodsof time) • Cause-SpecificMortalityincluding all specificcancer • SEER public use dataset on survival of breast cancer patients from1992-2009 (n=69,990 in Situ)
Introduction Background Data and Methods Results Discussion Reference • LeadingCauseof Death in the U.S. 1975 vs. 2009 Source: US Moratlity Files, National Center ofHealthStatistics, Centers ofDiseaseControlandPrevention
Introduction Background Data and Methods Results Discussion Reference • US Death Rates, 1975-2009 Heart Disease compared to Neoplasms, by age at death Source: US Moratlity Files, National Center ofHealthStatistics, Centers ofDiseaseControlandPrevention Rates are per 100,000 and age-adjusted to the 2000 US Std Population (19 age groups - Census P25-1103).
Introduction Background Data and Methods Results Discussion Reference Source: US Moratlity Files, National Center ofHealthStatistics, Centers ofDiseaseControlandPrevention
Introduction Background Data and Methods Results Discussion Reference • 5-year Conditional Relative SurvivalforCanceroffemaleBreast SEER, 2012
Introduction Background Data and Methods Results Discussion Reference • Andnow, what‘stheproblem? PreliminaryAnalysis with SEER- DATA (Sen et al. 2010) • Over-sampling themaskedcases • 46 % ofthewomendiedduringfollow-up • Specificmortalityrelatedtobreastcancer, othercanceror non-cancerrelatedcauses • for 56 % theexactcauseofdeath was known • for 35 % partial informationavailable • 30 % withmissingcauseofdeath: falseclassification (breastcancertootheror multiples cancer) • 65 % missingcauseswerecompletymasked
Introduction Background Data and Methods Results Discussion Reference • How do deal withmaskedcauses ? • Motivation touseTwo-Component-Model formaskedcauses • Risksare latent: nospecificinformationaboutthecauseofthecomponentfailure • Onlysomeindividualsmaysusceptibletotheeventofinterest (curabilityortherecessiveriskforthedisease)
Introduction Background Data and Methods Results Discussion Referenc • Two-Component-Model formaskedcauses (Maller and Zhou, 1996) • First component: failureandsurvival time ofsusceptibleindividuals for a certainevent (in riskindividuals – IR) • Second component: failureandsurvivaltimesfor non-susceptible individuals (out ofriskindividuals –OR) • Population SurvivalFunction • Wecanuse STATA commandsforCureSurvival Models
Introduction Background Data and Methods Results Discussion Reference • Useful Statacommands for cure models: lncure, spsurv, and cureregr (Lambert, 2007) • the advances of cureregr: fits both mixture and nonmixture cure models parametric distributions: exponential, Weibull, lognormal, and gamma parametric distributions available • Optional:strsmix allowing more flexible parametric distributions
Introduction Background Data and Methods Results Discussion Reference Data Analysis with SEER Breast Cancer Data • Survival of breast cancer patients from 1992-2009 (n=69,990 in Situ) • causeofdeath: breastcancerandothercauses othercausesascompetingrisks • Weused a non-mixturecurefractionmodelwithWeibullandExponentialspecification
Introduction Background Data and Methods Results Discussion Reference Results from Data Analysis (Estimates for the Long-Term Survival Function) Λ-scaleparameter, φ- shapeparameter, p- long-term parameter
Introduction Background Data and Methods Results Discussion Reference • Results • noevidencethatWeibullprovides a betterfittingthantheExponential forSeerBreastCancer Data at 5% significance • corrobatetheempirical Kaplan-Meier Survival
Introduction Background Data and Methods Results Discussion Reference
Introduction Background Data and Methods Results Discussion Reference ThrillsandTearswithCureSurvival Models Thrills: lessassumptionsandminorcomputationproblems Tears: toovercomethenaïveassumptionfor infinite failure time of thenonsusceptipleunits
Introduction Background Data and Methods Results Discussion Reference Limitations for parametric hazard functions The complexity of the baseline hazard function (Crowther and Lambert, 2011) • beyond standard and sometimes biologically and implausibleshapes • a turning point in the hazardfunctionisobserved • 2-component mixturedistribution e.g. Weibull-Weibull-distribution otherdistributionfamilies also available
Introduction Background Data and Methods Results Discussion Reference • Options in STATA • STPM2: Statamoduletoestimate flexible parametricsurvivalmodels (Royston-Parmar models) (updatedby Lampert, 2012) • STPM2 also usedwith single- or multiple- record(moregeneralized) • STMIX: 2-component parametricmixturesurvivalmodels (Crowtherand Lambert, 2011) distributionchoicesincludes Weibull-Weibullor Weibull-exponential • STMIX canbeusedwith single- ormultiple-record
Introduction Background Data and Methods Results Discussion Reference • References • Craiuand Lee (2005): Model SelectionfortheCompeting-Risks Model withandwithoutmasking. Technometrics, Vol.25, No.4, 457-467 • Leary MC, Caplan LR (2008): Cardioembolicstroke: An updated on etiology, diagnosisandmanagement. Annuals Indian Academic Neurology , 11, 52-63 • Luand Liang (2008): Analysis ofcompetingrisksdatawithmissingcauseoffailureunder additive hazardmodel. Statistica Sinica 18, 219-234. • Crowtherand Lampert (2011): Simulatingcomplexsurvivaldata. Stata Nordic and Baltic Users‘ Group Meeting • National Cancer Institute DCCPS Surveillance Research Programme, Surveillance, Epidemiologyand End Results (SEER) Programme (www.cancer.seer.gov) Research Data (1973-2009) (Released April 2012) • Louzda et al. (2012): The Long-Term Bivariate Survival FGM Copula Model: An Application to a Brazilian HIV Data. Journal of Data Science 10, 515-535 • Roman et al. (2012): A New Long-Term Survival Distribution forCancer Data. Journal of Data Science 10, 242-258 • Sen et al. (2010): A Bayesianapproachtocompetingrisksanalysiswithmaskedcauseofdeath. Statistics in Medicine, 29, 1681-1695