Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster De...
Download
1 / 24

Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Sc - PowerPoint PPT Presentation


  • 333 Views
  • Updated On :

Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic. Allyson Abrams, Martin Kulldorff, Ken Kleinman Department of Ambulatory Care and Prevention, Harvard Medical School and Harvard Pilgrim Health Care

Related searches for Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Sc

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Sc' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Allyson Abrams, Martin Kulldorff, Ken Kleinman

Department of Ambulatory Care and Prevention,

Harvard Medical School and Harvard Pilgrim Health Care

Presented at EVA, August 15, 2005

This work was funded by the United States National Cancer Institute, grant number RO1-CA95979.


Background scan statistics l.jpg
Background: Scan Statistics Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Spatial scan statistic – used to identify geographic clusters

  • Use moving circular window on map

    • Any point on map can be the center of a cluster

    • Each circle includes a different set of points

    • If the centroid of a region is included in the circle, the whole region is included


Background scan statistics3 l.jpg
Background: Scan Statistics Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

For each distinct window, calculate the likelihood, proportional to:

n = number of cases inside circle

N = total number of cases

 = expected number of cases inside circle


Background scan statistics4 l.jpg
Background: Scan Statistics Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • The scan statistic is the maximum likelihood over all possible circles

    • Identifies the most unusual cluster

  • To find p-value, use Monte Carlo hypothesis testing

    • Redistribute cases randomly and recalculate the scan statistic many times

    • Proportion of scan statistics from the Monte Carlo replicates which are greater than or equal to the scan statistic for the true cluster is the p-value


Background scan statistics5 l.jpg
Background: Scan Statistics Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic


Background scan statistics6 l.jpg
Background: Scan Statistics Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • That discussion only considered spatial clustering

  • To extend to clustering in space and time, use cylinders instead of circles

    • The height of the cylinder represents time

  • The rest of the process is unchanged

  • SaTScan is a freely available software that uses the scan statistic to detect clusters in space, time, or space-time (www.satscan.org)


Background satscan l.jpg
Background: SaTScan Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Main drawback to Monte Carlo hypothesis testing: increased precision for p-values can only be obtained through greatly increasing the number of Monte Carlo replicates

    • A big problem for small p-values

  • SaTScan can take anywhere from seconds to hours to run, depending on the data, the type of analysis, and the number of Monte Carlo replicates


Background l.jpg
Background Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • We use SaTScan for 2 main reasons

    • Daily surveillance for disease outbreaks

    • Evaluating systems that use SaTScan for surveillance

  • In both cases, we need to limit the amount of time it takes to generate each p-value while still retaining enough precision in the p-value to determine how unusual a cluster is


Slide9 l.jpg
Goal Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Estimate distribution of the scan statistic using fewer Monte Carlo replicates

    • See how the p-values obtained from the distributional parameters compares with the true p-value


Methods l.jpg
Methods Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Sample map – 245 counties in the northeast United States with 600 cases

  • Ran SaTScan on the sample map using 100,000,000 Monte Carlo replicates to find the 'true' log-likelihood needed to obtain p-values of 0.01, 0.001, 0.0001, 0.00001

    • Corresponds to the following order statistics from the 100,000,000 Monte Carlo replicates: 1,000,000; 100,000; 10,000; 1,000


Methods11 l.jpg
Methods Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Ran SaTScan 1000 times on the same map, each time generating 999 Monte Carlo replicates

  • For each of the 1000 SaTScan runs:

    • Found maximum likelihood estimates of the parameters for each distribution based on the 999 Monte Carlo replicates

      • Distributions used: Normal, Lognormal, Gamma, Gumbel


Methods12 l.jpg
Methods Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • The empirical/asymptotic p-value for each distribution is the area to the right of the observed log-likelihood for a given distribution

  • For each distribution, we generated:

    • empirical/asymptotic p-values based on the 'true' log-likelihood value

    • the log-likelihoods that would have been required to generate p-values of 0.01, 0.001, 0.0001, 0.00001

    • The usual Monte Carlo-based p-value reported in SaTScan


Methods13 l.jpg
Methods Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Repeated the entire process using 60 and 6000 cases

    • Results were almost identical

  • Using 600 cases, repeated entire process with 99 and 9999 Monte Carlo replicates in each of the 1000 simulations

    • Again, very similar results


Results l.jpg

True p-value = 0.01 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results15 l.jpg

True p-value = 0.001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results16 l.jpg

True p-value = 0.0001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results17 l.jpg

True p-value = 0.00001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results18 l.jpg

True p-value = 0.01 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results19 l.jpg

True p-value = 0.001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results20 l.jpg

True p-value = 0.0001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results21 l.jpg

True p-value = 0.00001 Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

Results


Results22 l.jpg
Results Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • The empirical/asymptotic p-values from the Gumbel distribution appear only slightly conservatively biased

  • Other tested distributions all resulted in anti-conservatively biased p-values

  • The ordinary Monte Carlo p-values reported from SaTScan had greater variance than the Gumbel-based p-values


Conclusions l.jpg
Conclusions Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Empirical/asymptotic p-values based on the Gumbel distribution can be preferable to true Monte Carlo p-values

  • Empirical/asymptotic p-values can accurately generate p-values smaller than is possible with Monte Carlo p-values with a given number of replicates

  • We suggest empirical/asymptotic p-values as a hybrid method to accurately obtain small p-values with a relatively small number of Monte Carlo replicates


Future work l.jpg
Future work Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic

  • Results shown today are based on purely spatial analyses – we will also look at space-time analyses

  • An option will be added in SaTScan to allow the user to request the Gumbel-based p-value