Geospatial Analysis in Public Health Spatial Cluster Detection M.J. College, Jalgaon IndiaSeptember 22-26, 2008 Glen D. Johnson New York State Department of Health and The University at Albany School of Public Health Department of Environmental Health Sciences
Acknowledgement: Some of the following graphics on cluster detection are compliments of Tom Talbot, MSPH of the New York State Department of Health - co-teaches “GIS in Public Health” with Glen Johnson and Frank Boscoe at the University at Albany, S.U.N.Y.
Cluster • A number of similar things grouped closely togetherWebster’s Dictionary • Concentrations of health events in space and/or time Public Health Definition
Clustering of health outcomes may be caused by a number of community-level factors… • Occupation mix • Demographic mix (i.e. Race, Age, Sex) • Socioeconomic status • Cultural/Behavioral • Environmental Exposure (always a big question) • Time and/or Space(captures unexplained factors that co-vary with the outcome)
Cluster detection influenced by scaling and zoning effects: … as must be considered for all spatial statistics and mapping/visualization - the Modifiable Area Unit Problem (MAUP)
Different scale of observational units: Coarser aggregation
Different zonation: Grid shift
Cluster Questions • Does a disease cluster in space? • Does a disease cluster in both time and space? • Where is the most likely cluster?
More Cluster Questions • At what geographic or population scale do clusters appear? • Are cases of disease clustered in areas of high exposure? - or more generally, “Can the cluster be explained as being associated with something other than chance?”
Nearest Neighbor AnalysisCuzick & Edwards Method • Count the the number of cases whose nearest neighbors are cases and not controls. • When cases are clustered the nearest neighbor to a case will tend to be another case, and the test statistic will be large.
Advantages • Accounts for the geographic variation in population density • Accounts for confounders through judicious selection of controls • Can detect clustering with many small clusters
Disadvantages • Must have spatial locations of cases & controls • Doesn’t show location of the clusters
Knox Methodtest for space-time interaction • When space-time interaction is present cases near in space will be near in time, the test statistic will be large. • Test statistic: The number of pairs of cases that are near in both time and space. • P value is calculated through random simulations of the time value of the cases • Need to define critical space and time distances. i.e. define what is near?
Advantages • Do not have to map controls • Determines if there is a space-time interaction. • Can detect space-time clustering even when the overall disease rate has remained the same over time
Disadvantage • Computationally time consuming with a large number of cases. • Does not determine areas or time periods of where clusters occur.
Spatial Scan StatisticMartin Kulldorffhttp://www.satscan.org/ • Determines locations with elevated rates that are statistically significant. • Adjust for multiple testing of the many possible locations and area sizes of clusters. • Hypothesis testing based on Monte Carlo simulations of the null, completely random, spatial distribution
Following is an example of how the scan statistic algorithm delineates all possible circular clusters, based on census blocks in the city of Albany …
A likelihood ratio is then computed for every circular window, where each window represents a potential spatial cluster. For example, assuming a Poisson distribution of counts, the likelihood ratio is proportional to … for observed cases cand expected cases E[c] inside the search window, and C total observed cases throughout the region, including within the search window.
The circle with the maximum likelihood ratio is then identified as the most likely cluster, and all others are rank-ordered below the maximum. A null distribution of maximum likelihood ratios is obtained by repeating the analysis on a randomized version of the data, obtaining the max. likelihood ratio, and repeating this exercise for, say, 999 times. A p-value is obtained for each circle by comparing it’s likelihood ratio to the simulated null distribution.So, for a likelihood ratio whose rank is R within the simulated null values, then the p-value = R/(# simulations +1).
Note that E[c] = n*C/Nfor population n in the circle and total number of cases and Population = C and N respectively or for covariate category i (an “indirect standardization”) or E[c] may even be predicted from a regression model.
Recent advancements in the spatial scan statistic aimed at overcoming the restriction of the rather arbitrary shape of circular clusters • Patil GP, TaillieC. Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 2004;183-197. • Duczmal L, Assuncao RM. A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comp Stat Data Anal 2004; 45:269-286. • Tango T, Takahashi K. A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geographics 2005; 4:11.
Regression Analysis • Control for known risk factors before analyzing for spatial clustering • Analyze for unexplained clusters. • Follow-up in areas with large regression residuals with traditional case-control or cohort studies • Obtain additional risk factor data to account for the large residuals.