Detecting Spatial Clustering in Matched Case-Control Studies

Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004

Outline • Motivation • Petrochemical exposure in relation to childhood brain and leukemia cancers • Cumulative Geographic Residuals • Unconditional • Conditional • Simulation Results • Type I error • Power Calculations • Application • Childhood Leukemia • Childhood Brain Cancer • Software • Discussion • Limitations • Future Research

Taiwan Petrochemical Study Matched Case-Control Study • 3 controls per case • Matched on Age and Gender • Resided in one of 26 of the overall 38 administrative districts of Kaohsiung County, Taiwan • Controls selected using national identity numbers (not dependent on location).

Study Population Due to dropout approximately 50% 3 to 1 matching, 40% 2 to 1 matching, and 10% 1 to 1 matching.

Map of Kaohsiung

Cumulative Residuals • Unconditional (Independence) • Model definition using logistic regression • Extension to Cluster Detection • Conditional (Matched Design) • Model definition using conditional logistic regression • Extension to Cluster Detection

Logistic Model Assume the logistic model where, and the link function, Therefore the likelihood score function for is with information matrix

Residual Formulation Then define a residual as, where is the solution to . Assuming the model is correctly specified would imply there is no pattern in residuals. => Use Residuals to test for misspecification. Cumulative Residuals for Model Checking; Lin, Wei, Ying 2002

Hypothesis Test Hypothesis of interest, Geographic Location, (ri, ti ) Independent of Outcome, Yi|Xi  Cumulative Geographic Residual Moving Block Process is Patternless

Unconditional Cluster Detection Define the Cumulative Geographic Residual Moving Block Process as,

Asymptotic Distribution However, the asymptotic distribution of is difficult to simulate, but it has been shown to be equivalent to the following, conditional on the observed data, distribution, where

Significance Test Testing the NULL • Simulate N realizations of by repeatedly simulating , while fixing the data at their observed values. • Calculate P-value

Conditional Logistic Model Type of Matching: 1 case to Ms controls Data Structure: Assume that conditional on , an unobserved stratum-specific intercept, and given the logit link, implies, The conditional likelihood, conditioning on is,

Score and Information Denote the conditional likelihood score as, with information matrix,

Conditional Residual Then define a residual as, where is the solution to . => Use these correlated Residuals to test for patterns based on location.

Conditional Cumulative Residual Define the Conditional Cumulative Residual Moving Block Process as, Which has been shown to be asymptotically equivalent to, where and that are independent of observed data.

Testing the NULL Simulate N realizations of by repeatedly simulating , while fixing the data at their observed values. Calculate P-value Significance Test

Simulation • Choice of Gi or Gis • Unconditional Normal Discrete • Conditional Normal Discrete 1 to 1 2 to 1 3 to 1 • Type I error • Power Calculations

Type I error • Unconditional • Generate N xi and yi from Unif(0,10) • Type I error is the percentage of found significant clusters. • Conditional • Generate N xis and yis from Unif(0,10) • Type I error is the percentage of found significant clusters.

Type I error Unconditional Conditional

Power Calculations • Two Power Calculations

Power Calculations • Single Hotspot

Power Calculations • Multiple Hotspots

Power Calculations • Unconditional • Conditional

Application • Study: Kaohsiung, Taiwan Matched Case-Control Study • Method: Conditional Cumulative Geographic Residual Test (Normal and Mixed Discrete)

Results Odds Ratio (p-values) Marginally Significant Clustering for both outcomes without adjusting for smoking history.

Childhood Leukemia

Childhood Brain Cancer

Software • R macro to handle both unconditional and conditional data • Dataset: • X and Y coordinates of each participant • Case/control variable • Covariate matrix • Stratum Variable for conditional data • Takes just a few minutes to run!

Discussion Cumulative Geographic Residuals • Unconditional and Conditional Methods for Binary Outcomes • Can find multiple significant hotspots holding type I error at appropriate levels. • Not computer intensive compared to other cluster detection methods Taiwan Study • Found a possible relationship between Childhood Leukemia and Petrochemical Exposure, but not with the outcome Childhood Brain Cancer.

Discussion Future Research • Failure Time Data • Recurrent Events • Relocation of Study Participants • Surveillance

Detecting Spatial Clustering in Matched Case-Control Studies

Detecting Spatial Clustering in Matched Case-Control Studies

Presentation Transcript

Statistical Analysis in Case-Control studies

Case-Control Studies

Case-Control Studies

Case-Control Studies

Case-Control Studies (Retrospective Studies)

Case-Control Studies

Gene-Environment Case-Control Studies

Spatial Data Mining: Three Case Studies

CASE-CONTROL STUDIES

Issues in case-control studies

Clustering Detecting margin regions

Matching (in case control studies)

Lecture 18 Matched Case Control Studies

Matching in case control studies

Example of Bias Matched Case-Control Study

Spatial Data Mining: Three Case Studies

Case-Control Studies

Spatial Clustering Methods

Case-Control Studies (retrospective studies)

Case-control studies

Case Studies and Explorations with Kmeans Clustering

Case-Control Studies