1 / 27

Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales

Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales. Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University. With Stephen Fienberg (Statistics) Anna Goldenberg & Rich Caruana (CS). Overview. Current bio-surveillance systems

rodriguezl
Download Presentation

Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University With Stephen Fienberg (Statistics) Anna Goldenberg & Rich Caruana (CS)

  2. Overview • Current bio-surveillance systems • Monitoring traditional data • Using simple SPC methods • Early detection • Use of non-traditional data • Building a flexible, automated detection system • Evaluating the system • Results and enhancements

  3. Traditional Data Sources • Public health sources • School absence records • Sentinel practices • Laboratory data • Medical sources • Patient visits at urgent care, outpatient clinics, emergency rooms • Speed of detection: weeks after the actual occurrence • Rate of data arrival

  4. Why is detection slow? • Data arrives late • Projects using electronic reporting systems: • Influenza surveillance system (U of Utah) • Tracking ICD9 codes (U of Pittsburgh) • Future: increasing availability of electronic means for gathering surveillance data • Data available on weekly or monthly scale • Data are nation-wide • Signature of outbreak in data is late!

  5. Non-Traditional Data • Data that indirectly measure symptoms • Over-the-counter medication and grocery sales • Web browsing at medical websites • Automatic body tracking devices • Different levels of availability • Regional, localized data • Confidentiality issues

  6. Lab Flu WebMD School Cough& Cold Throat Resp Viral Death weeks Manifestation of Flu in Traditional and Non-Traditional Data

  7. OTC Medication and Grocery Sales • Benefits • Manifestation of outbreak is very early • Timeliness in collection and reporting (daily) • Extremely detailed (basket-level) • Drawbacks • No info about epidemic manifestation in sales data • Requires knowledge about marketing efforts (sales, discounts) • If outbreak replicates sales patterns – hard to detect (Holidays are a big challenge) • Hard to model!

  8. Prior Uses of Non-Traditional Data • Diarrheal Disease Surveillance: data from 38 drug stores in NY (Mikol et al., 2000) • Monitoring near-real-time satellite vegetation and climate data for predicting emerging Rift Valley Fever epidemics in East Africa (DoD and NASA, 2001)

  9. Description of Our Data • Daily sales of several OTC medication groups for 541 days between Aug 8,’99 to Jan 31,‘01 • Concentrated on cough&cold medication (inhalational symptoms): • Cough medication • Tabs & Caps • Nasal medication

  10. Hypothetical Scenario of an Inhalational Anthrax Attack • Symptoms: almost all typical to flu! • fever • fatigue • cough • mild chest discomfort • but no runny nose (!) • Death may occur within 24-36 hours

  11. Sales of Four Sub-Categories

  12. Overview • Current bio-surveillance systems • Non-traditional data • The detection system • An evaluation method • Results and Conclusions • Future work

  13. The Detection System • Take into account special features of OTC and grocery sales data • Time series • Seasonality • Weekday/Weekend effect • Stores closed on certain days • Influence of total sales patterns • Very noisy, non-stationary • Create automated system

  14. Layers of the Detection System Preprocessing De-noising Forecasting next day sales Creating a threshold Real-time sales > threshold NO YES New day sales WARNING! – POSSIBLE BEGINNING OF AN EPIDEMIC/ATTACK

  15. Pre-Processing

  16. De-Noising • Target: obtain main features of data, reduce noise to improve predictability • Selected method: Discrete Cosine Transform with horizontal filtering • How much to de-noise? • Retain minimal coefficient set that • Maximizes accuracy • Optimizes predictability • Use cross-validation and MSE-based criteria

  17. De-Noising: DCT with Horizontal Filtering de-noised set 1 de-noised set 2

  18. Forecasting • Target: Predict next day sales • Use pre-processed, de-noised data • Problem: non-stationary (ARIMA doesn’t work) • Method: 1) decompose with wavelets 2) predict each wavelet resolution 3) sum to obtain overall prediction

  19. Prediction Using Wavelets

  20. Threshold Selection: SPC • Based on empirical distribution of residuals (real values –predictions), we fit a “3σ” limit

  21. Comparing Next-Day Sales to the Threshold

  22. Overview • Current bio-surveillance systems • Non-traditional data • The detection system • An evaluation method • Results and Conclusions • Ongoing work (basket-level data) • Future work

  23. spike base 1 2 3 day Evaluating the System • How fast does it detect an anthrax footprint? • Problems: • data does not include outbreak signature • We don’t know what signature looks like in such data • Solution: simulated signature Inhalational anthrax signature

  24. Constructing the Signature • Sverdlovsk outbreak, 1979 Based on data from Meselson et al., Science (1994)

  25. Anthrax Signature in OTC Sales • Add signature at each data point sequentially, and look at rate of detection • Try different slopes, heights • Compare different configurations of system for different signatures slope = 1/3 Detects 100% of spikes within 3 days for height = 1.3(data range)

  26. Results and Conclusions • The detection system • works with grocery data • detects simulated footprint quickly • has low false alarm rate • The system is flexible (tools are interchangeable) • Almost fully automated, efficient computation • “Perfect bio-attack” is on holiday

  27. Future Work • Combine with traditional medical and public health data sources • Aggregated data: Track several series simultaneously • Basket data: Utilize other features of grocery data such as spatial factor, customer information

More Related