1 / 18

Project Mimic: Simulation for Syndromic Surveillance

Project Mimic: Simulation for Syndromic Surveillance. Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and Inbal Yahav RH Smith School of Business University of Maryland with Howard Burkom and Sean Murphy JHU Applied Physics Lab.

nerys
Download Presentation

Project Mimic: Simulation for Syndromic Surveillance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Mimic:Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and Inbal Yahav RH Smith School of Business University of Maryland with Howard Burkom and Sean Murphy JHU Applied Physics Lab This work was partially supported by NIH grant RFA-PH-05-126.

  2. Outline • The Biosurveillance Problem • Motivation: Reasons for simulation • Simulation Methodology • Options/Generation • Mimicking a dataset • Analysis • Is this is a good mimic? • Results

  3. The Biosurveillance Problem

  4. The Biosurveillance Problem, cont. • Given time series (usually pre-diagnostic daily data) • Detect disease outbreaks • Early • With few false alerts

  5. Difficulties with Biosurveillance Data • Teams work on different authentic datasets • Each team has their own private data • Cannot compare results • Researchers with no data cannot join the effort • Data are unlabeled • We don’t know exactly when there are outbreaks • Challenges evaluation of algorithm performance • Hinders comparison of different algorithms

  6. Project Mimic • Q: What if there was a way to • generate pseudo-authentic data • similar in statistical structure to real data • AND insert simulated outbreak signatures into it? • A: we’d have new, labeled pseudo-real data!

  7. Project Mimic: Dataset Mimicker • “Mimics” statistical structure of background data • Levels of counts of different series • Day-of-week patterns • Seasonal patterns • Holidays • Within-series autocorrelation • Cross-series cross-correlation • Extracts features from the authentic dataset • Output: dataset that “looks” like real dataset

  8. Set of 6 series from one city Original Mimicked Resp GI

  9. 3 series from one city, zoomed in

  10. Mimic Methodology • Our method(s): • Create random autocorrelated multivariate data • Normal or poisson • Uses mean, standard deviation, reduced cross-correlation, 1-day acf from original • Holiday factor • Seasonal factor • Day-of-week factor • Details at www.projectmimic.com • Mimicking implicitly uses a generative model • What is the right model?

  11. Evaluating Mimics • Test: could the original data have been generated from the mimicker? • Compare different generative models • If the model were simple, could use AIC • Instead, Chi-squared

  12. Chi-squared Goodness-of-fit Tests • By series • By day of week • Separate values into bins • Chi-squared Test on counts

  13. Example of Disparity

  14. Project Mimic: Outbreak signature simulator • Generates multivariate outbreak-signatures • Options: • Number of outbreak-signatures in series? • Magnitude of outbreak? • How many (and which) series will include outbreak-signatures? • Stochastic/fixed? • Include effects such as DOW, holidays, etc.? (like background data) • Output: matrix of outbreak-signatures to be inserted in the background data

  15. Outbreak labels

  16. Project Mimic • Combining the background matrix + outbreak-signature matrix yields labeled data • Two final products • Mimicker: Data and outbreak-signature simulators (in freeware R) • Can be used by data owners to disseminate pseudo-data • Can be used by research teams to evaluate robustness of methods • Mimics: Datasets that mimic DARPA BioALIRT data • Benchmark datasets for comparison across groups • Can be used to perform optimization methods for improved detection • Available at www.projectmimic.com • Example: BioALIRT data on 3 series (Resp from civilian/military/prescriptions)

  17. Mimicked data + outbreak-signature

  18. Conclusions • Mimic opens the door to: • new techniques • new researchers • First data sets of their kind • Open methodology • Publicly available • Realistic • www.projectmimic.com

More Related