data for biosurveillance a tutorial n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data for Biosurveillance: A tutorial PowerPoint Presentation
Download Presentation
Data for Biosurveillance: A tutorial

Loading in 2 Seconds...

play fullscreen
1 / 60

Data for Biosurveillance: A tutorial - PowerPoint PPT Presentation

  • Uploaded on

Data for Biosurveillance: A tutorial. Main Point. The range of data being collected for public health surveillance have expanded considerably. ANIMALS. HUMAN BEHAVIORS. NON TRADITIONAL USES. CLINICAL DATA. Biosurveillance Data Space. LATER DETECTION. EARLY DETECTION.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data for Biosurveillance: A tutorial' - nerina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
main point
Main Point

The range of data being collected for public health surveillance have expanded considerably

biosurveillance data space





Biosurveillance Data Space









OTC Pharm

Test Results








Poison Centers






Web Queries




Nurse Calls


Transport (bus)

Wind Speed/

direct. Cloud cover

Limited Utility

Some Potential


ER Visits

Radiograph Reports

Clinical data are highly relevant to public health-surveillance. Clinicians and health systems are a primary point of data collection about the sick, including data about demographics, risk factors, symptoms, signs, special testing, and diagnoses.
where are the clinical data
Where are the Clinical Data?
  • Paper charts
  • HL7 Message Routers
  • Registration, Scheduling, and Billing Systems
  • Clinical Laboratory Systems
  • Radiology Systems
  • Pathology
  • Dictation
  • Pharmacy
  • Orders
  • Data Warehouses
  • Clinical Event Monitors
  • Point-of-Care Systems
  • Patient Web Portals and Call Centers
laboratory results and electronic lab reporting elr
Laboratory Results and Electronic Lab Reporting (ELR)
  • There is no need to prove the value of laboratory results for public health surveillance
  • The main issue is getting them
  • Studies of ELR of culture proven notifiable diseases
    • Hawaii ( JAMA 1999)
    • Pittsburgh (Panackal et al EID 2002)
  • Findings (and methods) similar
    • Quicker
    • More complete reporting
chief complaints
Chief Complaints
  • Chief complaints entered by a triage nurse upon admission to an emergency facility are available electronically from hospitals in the United States and other countries.
  • ICD-9 coded versus free text
  • How to group into syndromic categories is a major question
  • There exist several categorizations (CDC consensus, RODS, WRAIR)
detection performance from icd 9 coded chief complaints
Detection Performance from ICD-9 coded Chief Complaints
  • Respiratory Case Detection (Espino 2001)

Sensitivity 0.43 Specificity >95%

  • Respiratory Outbreak Detection (Tsui 2001)

Small sample 1/1 detected, 1 false alarm

  • Diarrhea Case Detection (Ivanov 2002)

Similar results to Espino

using free text chief complaints and natural language processing
Using Free text Chief Complaints (and Natural Language Processing)

Also needed for Web or call center queries, and radiographs





coco naive bayesian parser

GI prodrome


CoCo Naive Bayesian Parser

Maps free-text chief complaint to one of seven prodrome categories (or an eighth category—none)

P(Respiratory|NVD)= .05

P(Botulinic|NVD)= .001

P(Constitutional|NVD)= .01

P(GI|NVD) = .9

P(Hemorrhagic|NVD)= .001

P(Neurologic|NVD)= .001

P(Rash|NVD)= .001

P(None|NVD)= .036


Chief complaint

CoCo Naive BayesClassifier

accuracy of case detection coco na ve bayes vs udoh manual ed log review

Sensitivity, specificity and likelihood ratio positive (LR+) measurements for the CoCo

classifier using the Utah Department of Health emergency department gold standard.

CoCo Syndrome

UDOH Syndrome





Respiratory infection with fever*





Gastroenteritis without blood





Meningitis / encephalitis





Febrile illness with rash*





Botulism-like syndrome




*Required documentation of fever in the patient record.

Accuracy of Case Detection: CoCo Naïve Bayes vs. UDOH Manual ED log review

Courtesy Per Gesteland, MD

accuracy of outbreak detection respiratory outbreaks detected by from respiratory chief complaints
Accuracy of Outbreak Detection: Respiratory Outbreaks detected by from respiratory chief complaints

Hospital P&I Diagnoses

Respiratorychief complaints

SDs from Mean

7 Years

Ivanov and Gesteland


Detecting Respiratory Outbreaks in Children by monitoring Chief Complaints

Detection from CCs precede that from admissions by 9 days

(95% CI -5-23)

kids respiratory (lower respiratory infections):





detecting gi outbreaks in children by monitoring chief complaints
Detecting GI Outbreaks in Children by monitoring Chief Complaints

Detection from CCs precede that from admissions by 23 days

(95% CI 12-33)



which is better icd 9 or free text at detecting cases of acute infectious gi
WHICH IS BETTER, ICD-9 or FREE TEXT? At detecting Cases of Acute Infectious GI

(Ivanov, Wagner, Chapman)


WHICH IS BETTER, ICD-9 or FREE TEXT? At detecting Acute Lower Respiratory Illness from Chief Complaints

(Espino, Wagner, Dowling, Chapman)

chest radiograph reports
Chest Radiograph Reports
  • Radiologists dictate a report for most chest radiographs performed in the United States.
  • Reports are transcribed after dictation and available electronically with a twelve to twenty-four hours latency.
  • The reports describe specific findings important for detection of infectious diseases of the lower respiratory tract such as SARS, Plague, Tularemia, inhalational Anthrax.
  • The granularity of the information is quite specific and allows for detection of different patterns of pneumonia, pleural effusions, and mediastinal widening.
  • The data are identified at the level of the individual patient and can therefore be pinpointed to home location and correlated with other patients to detect clusters of cases,
detecting febrile illness
Detecting Febrile Illness
  • Coded temperature (Possibly best, but rarely recorded electronically and may be normal)
  • From NLP of chief complaints
  • By NLP of Emergency Department (ED) dictation
    • Sensitivity = 0.98
    • Specificity = 0.89
    • ~1 day delay
lab test ordering
Lab Test Ordering
  • Motivation: What if you saw a large number of blood culture orders for people with home addresses in one zip code?
  • Availability from national laboratory companies (maybe 10-20% coverage of all tests done, perhaps less for infectious disease testing which is done in hospitals)
  • Demonstrated value: no published studies!

Summary of clinical data by clinical systems and market penetration (estimated)

Legend: ED, emergency department; LTCF, long term care facility; -, not applicable; ?, unknown

take home message otc
Take Home Message: OTC
  • Availability is better than any other data type
  • Value also better understood of all unconventional types of data because of research, although still more to do
national retail data monitor how it works
National Retail Data Monitor: How it Works
  • OTCs products are UPC bar coded
  • Retail stores scan purchases
  • Nine chains (18,000 stores) send daily sales data
  • NRDM groups the UPC-level sales data into categories like “cough syrup, pediatric liquid”
  • NRDM makes data available to health departments via
    • Web interface: 500+ accounts/46 States
    • Aggregated data feeds to state health depts
  • A BioSense data source


otc product categories
OTC Product Categories
  • There are approximately 7500 products (UPC codes) used for self-treatment of infectious diseases
  • We group them into 18 analytic classes at present (“categories”)
  • Antifever Pediatric (274)
  • Antifever Adult (1340)
  • Bronchial Remedies (43)
  • Chest Rubs (78)
  • Diarrhea Remedies (165)
  • Electrolytes Pediatric (75)
  • Hydrocortisones (185)
  • Thermometer Pediatric (125)
  • Thermometer Adult (313)
  • Cold Relief Adult Liquid (709 products)
  • Cold Relief Adult Tablet (2467)
  • Cold Relief Pediatric Liquid (323)
  • Cold Relief Pediatric Tablet (74)
  • Cough Syrup Adult Liquid (592)
  • Cough Syrup Adult Tablet (32)
  • Cough Syrup Pediatric Liquid (24)
  • Nasal Product Internal (371)
  • Throat Lozenges (364)

Numbers in parenthesis are the number of UPC codes in the category

detecting cryptosporidium from sales of otc diarrhea remedies
Detecting Cryptosporidium from Sales of OTC Diarrhea Remedies
  • Diarrhea remedies = {Kaopectate,Imodium,Pepto}
  • North Battleford Outbreak 2001
    • Large, waterborne outbreak of Cryptosporidium in late March/April 2001
    • Convenience sample of three pharmacies in North Battleford, Saskatchewan
    • Approximately 5-fold increase in all three pharmacies (relative to baseline established from Jan 2001 to early March 2001)
    • Two pharmacies provided March/April 2000 data and those data showed no similar increase
    • Sales peaked weeks before precautionary drinking water advisory and days to weeks before peak onset of diarrhea

*Stirling R, Aramini J, Ellis A, et al. Waterborne cryptosporidiosis outbreak, North Battleford, Saskatchewan, Spring 2001. Can Commun Dis Rep. Nov 15 2001;27(22):185-192.

2001 crypto in north battleford
2001 Crypto in North Battleford

… Precautionary water advisory issued on 4/26

Detectable peak on 4/2 in sales of over-the-counter antidiarrheals

detecting crypto from sales of otc antidiarrheal cont
Detecting Crypto from Sales of OTC Antidiarrheal (cont)
  • Collingwood Ontario*
    • Cryptosporidium outbreak in Collingwood, Ontario Feb/March 1996
      • 3/12 pharmacies that were asked gave data
      • Pharmacy 1: 26 fold increase in sales in Feb 1996 as compared to February 1995
      • Pharmacy 2: 1Q 1996 sales were 3 fold 1Q 1995
      • Pharmacy 3: Reported no change in sales
      • Outbreak detected 3/5
    • Yet another Cryptosporidium outbreak in Canada (Kelowna and Cranbrook, British Columbia)
      • All pharmacists (10-12 of them in each city) interviewed acknowledged increased sales (but there was no data available for study)

*Rodman JS et al. Pharmaceutical sales: A method of disease surveillance. Journal of Environmental Health, Nov 1997:8-14.

**Proctor et al. Surveillance data for waterborne illness detection: an assessment following a massive waterborne outbreak of Cryptosporidium infection. Epidemiol Infect. 1998;120(1):43-54.

detecting crypto from sales of otc antidiarrheal cont1
Detecting Crypto from Sales of OTC Antidiarrheal (cont)
  • Milwaukee Crypt (Proctor et al)
    • Studied the famous 1993 Milwaukee Cryptosporidium outbreak
    • One pharmacy in outbreak area provided monthly counts of unit sales
    • Sales for month of March showed three-fold increase over baseline (March 1994/March 1995)
    • Public health awareness of outbreak – April 5
    • Identified need for knowledge of geographic distribution of water supply to improve outbreak detection (North Milwaukee vs. South Milwaukee)
cryptosporidium outbreak collingwood ontario
Cryptosporidium Outbreak: Collingwood, Ontario

26-fold increase in sales in Feb

Outbreak detected March 5

Rodman JS et al. Pharmaceutical sales: A method of disease surveillance. Journal of Environmental Health, Nov 1997:8-14.

cryptosporidium outbreak milwaukee
Cryptosporidium Outbreak: Milwaukee

3X increase in sales in

March 1993

Public health awareness April 5, 1993

Proctor et al. Surveillance data for waterborne illness detection: an assessment following a massive waterborne outbreak of Cryptosporidium infection. Epidemiol Infect. 1998;120(1):43-54.

more evidence that crypto may drive otc sales
More Evidence that Crypto May Drive OTC Sales
  • Corso et al
    • Reviewed 2000 medical records of patients admitted to Milwaukee EDs
    • Identified 378 persons who had moderate or severe case of Cryptosporidium during 1993 outbreak
    • Self treatment with OTCs prior to ED visit was documented in the medical recordin 30%
open question how small of a crypt outbreak is detectable
Open Question? How small of a crypt outbreak is detectable
  • North Battleford outbreak affected half the population
  • The Milwaukee outbreak was similarly large (estimated 400,000 infected)
  • Other outbreaks: no estimates of size avail.
  • Only one to three drug stores were studied
  • Bottom line: How small of a Cryptosporidium outbreak can be detected is very hard to know from observational studies
detecting pediatric diarrheal and respiratory outbreak from sales of pediatric electrolytes
Detecting Pediatric Diarrheal and Respiratory Outbreak from Sales of Pediatric Electrolytes
  • Pediatric Electrolytes = {Pedialyte, competitors}
  • Hogan et al*
    • 18 Wintertime outbreaks (1998-2001, six cities)
    • Strong correlation (>0.9) between hospital diagnoses of respiratory and diarrheal illness in children < 5 and sales of pediatric electrolytes
    • Usually uptick in sales preceded uptick in hospital diagnoses. Average 2 weeks
    • Variation in time lag from year to year and city to city suggests need for additional studies

Hogan et al. Detection of Pediatric Respiratory and Diarrheal Outbreaks from Sales of Over-the-counter Electrolyte Products. J Am Med Inform Assoc, 10(6) November 2003

detecting pediatric diarrheal and respiratory outbreak from sales of pediatric electrolytes1







Detecting Pediatric Diarrheal and Respiratory Outbreak from Sales of Pediatric Electrolytes

Data courtesy IRI, Utah DOH, Indianapolis Network for Patient Care, and PA HC4 Council

detectability of anthrax detecting influenza from otc cold remedies
Detectability of Anthrax? Detecting Influenza from OTC Cold Remedies
  • Welliver et al*
    • Studied 1976-1977 Influenza B outbreak in Los Angeles
    • Data from one distribution center of Ralphs Grocery Company in Los Angeles
    • OTC cold remedy sales peaked 3 weeks prior to peak in positive Influenza cultures
    • No association between aspirin (antipyretic) sales and influenza

Welliver RC, Cherry JD, Boyer KM, et al. Sales of nonprescription cold remedies: a unique method of influenza surveillance. Pediatr Res. Sep 1979;13(9):1015-1017.

detecting influenza from otc cold remedies cont
Detecting Influenza from OTC Cold Remedies (cont)
  • Correlation of Cough/Cold/Flu OTC Categories With Hospital Diagnoses of Pneumonia, Influenza, Bronchitis, and Bronchiolitis
from permissive environments
From Permissive Environments
  • A permissive environment is one that allows types of biosurveillance data to be collected that cannot be otherwise collected.
  • Survey data
    • 3-4 days earlier
  • Telephone calls to medical offices
    • 3-4 days earlier during Influenza than doctor visits
  • Web queries to medical sites
  • Direct measures: attendance systems e.g., school attendance
  • Indirect through other measures of a person’s physical presence at a location.
  • Affected by weekends, holidays, and vacations or recess periods (especially school absenteeism!)
spatial info
Spatial Info
  • Census tract vs. zip code
    • ZIP HIPAA and availability
  • Street address 60% automatic recognition problem
  • Substreet address (floor of building) Office, especially in vertical cities like HK, NY
  • Longitude, height, and latitude: maximum flexibility
time stamping
Time Stamping
  • It is sort of obvious but worth discussing
  • Time zones can cause confusion
  • The meaning of time stamps can cause confusion
    • Is it the time of the order or the result?
    • Is it the time of patient registration or of admission to the hospital?
data related to route of transmission
Data Related to Route of Transmission
  • Water supply data
  • Airplane passenger lists
  • Food distribution systems
  • HVAC systems
  • Employer
  • Home address
  • Weather and climate data currently used in epidemiological analysis include temperature, wind direction and speed (for bioaerosol related analyses), and precipitation.
  • In the United States, weather data are already highly available.
    • Temperature
    • wind speed
    • wind direction
    • precipitation.
    • Up-to-the-minute information for the entire nation is available because data are collected in real-time, in standard formats, and integrated in a central location that is publicly available without any technical or administrative barriers (
  • But, they are not stored, so if you need historical data you must capture them daily and store them
overly simple answer
Overly Simple Answer
  • Exactly the data public health departments collect at present.
  • We just need to speed up the collection and processing of the data, for example,
    • ELR
    • Electronic reporting of notifiable diseases

Yes, but they sometimes miss outbreaks or detect them late

Yes, but maybe the data are inherently late. Plus there is still the problem of undetected outbreaks

Also, there may just be a better way. There may be highly useful data that they just do not have the infrastructure to collect

examples of what they could collect
Examples of what they could collect
  • Chief complaint, age, gender, occupation, and home zip code of all patients presenting to EDs
  • Chief complaint, age, gender, occupation, and postal codes of all patients with temperatures presenting to EDs
  • Sales of OTC
  • Detailed information about every fever patient in a city

Being done in scores of jurisdictions

Being done in Taiwan

Being done in 46 states and Puerto Rico

Being done in China

how to come up with ideas for new types of data to study
How To Come Up with Ideas for New Types of Data to Study
  • From data actually collected routinely by public health (e.g., notifiable diseases, CDC case definitions), search for surrogate data
  • First principles analysis of a specific detection problem (e.g., large-scale bioaerosol release)
  • Analysis of data actually found useful in recognition and characterization of recent outbreaks;
  • Review of the literature on health psychology, especially the sub literature relevant to behaviors of ill individuals between the onset of symptoms and presentation (if ever) for medical care
data currently collected by public health surveillance systems conventional surveillance data
Data Currently Collected by Public Health Surveillance Systems (conventional surveillance data)
  • Reportable diseases
  • Sentinel physicians
  • Reports from astute clinicians
  • Results of enhanced surveillance or contact testing
data used during outbreak investigations
Data Used During Outbreak Investigations
  • Data mentioned in CDC Case Definitions
  • Data items from case investigation forms
  • Data items mentioned in MMWR and other published reports as being pivotal in the initial detection
  • Easier to know
  • Methods: phone interviews with industries (hospitals, 911 services, pharmacies, schools …)
value and relative importance
Value and Relative Importance
  • Much, much harder to know
  • Methods
    • Observational studies of real outbreaks
    • Studies of what individuals do when sick with different diseases (what they buy, who they call …)
prioritizing research prefer data that are inherently earlier
Prioritizing Research: Prefer Data that are Inherently Earlier
  • Pre-outbreak data:
    • data obtained during the period prior to the release of a biologic agent.
    • E.g., intelligence or host factors such as vaccinations that determine susceptibility.
  • Attack, release/or exposure data
    • obtained at or very near the time of release.
    • E.g., biosensor arrays, police reports of observed explosions, unauthorized airplane flights
  • Pre-symptomatic data (incubation period data)
    • between the time of release of an agent until the recognition of first symptoms in people.
    • E.g., serology or cultures from pre-symptomatic individuals from enhanced screening
  • Early symptom data
    • period between the onset of symptoms and when the illness becomes more fully developed
    • E.g., diarrheal or upper respiratory symptoms, sales of over-the-counter cold medicines
  • Specific syndrome data
    • data that either singly or in combination strongly suggests a specific agent.
    • E.g., specific symptoms, vital signs, physical findings, laboratory results, radiology results
  • Definitive data
    • data that are sufficient on their own to conclude that a patient has a disease.
    • E.g., microbiology culture or autopsy reports.
  • To AHRQ about data availability (Wagner, Aryel, Dato. 2001 188 pages available at
  • To DARPA about data value (Wagner, Pavlin, Brillman, Stetson) expected completion date December 2003.
  • Self-treatment/health seeking literature
  • Published studies

See bibliography that will be on Web at