Practical aspects of alerting algorithms in biosurveillance
Download
1 / 31

practical aspects of alerting algorithms in biosurveillance - PowerPoint PPT Presentation


  • 172 Views
  • Uploaded on

Practical Aspects of Alerting Algorithms in Biosurveillance. Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security Technology Department Biosurveillance Information Exchange Working Group DIMACS Program/Rutgers University

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'practical aspects of alerting algorithms in biosurveillance' - daniel_millan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Practical aspects of alerting algorithms in biosurveillance l.jpg

Practical Aspects of Alerting Algorithms in Biosurveillance

Howard S. Burkom

The Johns Hopkins University Applied Physics Laboratory

National Security Technology Department

Biosurveillance Information Exchange Working Group

DIMACS Program/Rutgers University

Piscataway, NJ February 22, 2006


Outline l.jpg
Outline

What information do temporal alerting algorithms give the health monitor?

How can typical data issues introduce bias or other misinformation?

How do spatial scan statistics and other spatiotemporal methods give the monitor a different look at the data?

What data issues are important for the quality of this information?


Conceptual approaches to aberration detection l.jpg
Conceptual approaches to Aberration Detection

What does ‘aberration’ mean? Different approaches for a single data source:

  • Process control-based: “The underlying data distribution has changed” – many measures

  • Model-based: “The data do not fit an analytical model based on a historical baseline” – many models

  • Can combine these approaches

  • Spatiotemporal Approach: “The relationship of local data to neighboring data differs from expectations based on model or recent history”


Comparing alerting algorithms criteria l.jpg
Comparing Alerting AlgorithmsCriteria:

  • Sensitivity

    • Probability of detecting an outbreak signal

    • Depends on effect of outbreak in data

  • Specificity ( 1 – false alert rate )

    • Probability(no alert | no outbreak )

    • May be difficult to prove no outbreak exists

  • Timeliness

    • Once the effects of an outbreak appear in the data, how soon is an alert expected?


Aggregating data in time l.jpg

Data stream(s) to monitor in time:

test interval

  • Counts to be tested for anomaly

  • Nominally 1 day

  • Longer to reduce noise, test for epicurve shape

  • Will shorten as data acquisition improves

guardband

baseline interval

Avoids

contamination

of baseline

with outbreak

signal

  • Used to get some estimate

  • of normal data behavior

  • Mean, variance

  • Regression coefficients

  • Expected covariate distrib.

  • -- spatial

  • -- age category

  • -- % of claims/syndrome

Aggregating Data in Time


Elements of an alerting algorithm l.jpg
Elements of an Alerting Algorithm

  • Values to be tested:raw data, or residuals from a model?

  • Baseline period

    • Historical data used to determine expected data behavior

    • Fixed or a sliding window?

    • Outlier removal: to avoid training on unrepresentative data

    • What does algorithm do when there is all zero/no baseline data?

    • Is a warmup period of data history required?

  • Buffer period (or guardband)

    • Separation between the baseline period and interval to be tested

  • Test period

    • Interval of current data to be tested

  • Reset criterion

    • to prevent flooding by persistent alerts caused by extreme values

  • Test statistic:value computed to make alerting decisions

  • Threshold: alert issued if test statistic exceeds this value


Slide7 l.jpg
Rash Syndrome Grouping of Diagnosis Codeswww.bt.cdc.gov/surveillance/syndromedef/word/syndromedefinitions.doc


Example daily counts with injected cases l.jpg
Example: Daily Counts with Injected Cases

Injected Cases Presumed Attributable to Outbreak Event


Example algorithm alerts indicated l.jpg

Test Statistic Exceeds Chosen Threshold

Example: Algorithm Alerts Indicated


Ewma monitoring l.jpg
EWMA Monitoring

  • Exponential Weighted Moving Average

  • Average with most weight on recent Xk:

    Sk = wS k-1 + (1-w)Xk,

    where 0 < w < 1

  • Test statistic:

    Sk compared to expectation from sliding baseline

    Basic idea: monitor

    (Sk – mk) / sk

  • Added sensitivity for gradual events

  • Larger w means less smoothing


Example with detection statistic plot l.jpg

Statistic Exceeds Threshold

Threshold

Example with Detection Statistic Plot



Effects of data problems l.jpg

Additional

flags

missed event

Effects of Data Problems


Importance of spatial data for biosurveillance l.jpg
Importance of spatial data for biosurveillance

  • Purely temporal methods can find anomalies, IF you know which case counts to monitor

    • Location of outbreak?

    • Extent?

  • Advantages of spatial clustering

    • Tracking progression of outbreak

    • Identifying population at risk


Evaluating candidate clusters l.jpg
Evaluating Candidate Clusters

Surveillance Region

Candidate cluster:

The scan statistic gives a measure of:

“how unlikely is the number of cases inside relative to the number outside, given the expected spatial distribution of cases”

(Thus, a populous region won’t necessarily flag.)

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x


Selecting candidate clusters l.jpg
Selecting Candidate Clusters

x

x

x

x

x

x

x

x

x

x

x

x

x


Searching for spatial clustering l.jpg

centroids of data

collection regions

x

x

x

x

x

x

x

x

x

x

x

x

region A

Searching for Spatial Clustering

  • form cylinders: bases are circles about each centroid in region A, height is time

  • calculate statistic for event count in each cylinder relative to entire region, within space & time limits

  • most significant clusters: regions whose centroids form base of cylinder with maximum statistic

  • but how unusual is it? Repeat procedure with Monte Carlo runs, compare max statistic to maxima of each of these



Scan statistics advantages l.jpg
Scan Statistics: Advantages

  • Gives monitor guidance for cluster size, location, significance

  • Avoids preselection bias regarding cluster size or location

  • Significance testing has control for multiple testing

  • Can tailor problem design by data, objective:

    • Location (zipcode, hospital/provider site, patient/customer residence, school/store address)

    • Time windows used (cases, history, guardband)

    • Background estimation method: model, history, population, eligible customers


Surveillance application otc anti flu sales dates 15 24apr2002 l.jpg
Surveillance ApplicationOTC Anti-flu Sales, Dates: 15-24Apr2002

Total sales as of 25Apr: 1804

potential cluster:

center at 22311

63 sales, 39 exp.

from recent data

rel. risk = 1.6

p = 0.041



Effect of data discontinuities on otc cough cold clusters l.jpg
Effect of Data Discontinuities on OTC Cough/Cold Clusters

Days

Zip (S to N)

  • Before removing problem zips, cluster groups are dominated by zips

  • that “turn on” after sustained periods of zero or abnormally low counts.

  • After editing, more interesting cluster groups emerge.



Cluster investigation by record inspection l.jpg
Cluster Investigation by Record Inspection

Records Corresponding to a Respiratory Cluster



Cumulative summation approach cusum l.jpg
Cumulative Summation Approach (CUSUM)

  • Widely adapted to disease surveillance

  • Devised for prompt detection of small shifts

  • Look for changes of 2k standard deviations from the mean m (often k = 0.5)

  • Take normalized deviation: often Zt = (xt –m) / s

  • Compare lower, upper sums to threshold h:

    SH,j = max ( 0, (Zt - k) + SH,j-1 )

    SL,j = max ( 0, (-Zt - k) + SL,j-1 )

  • Phase Isetsm, s, h, k

Upper Sum: Keep adding differences between today’s count and k std deviations above mean.

Alert when the sum exceeds threshold h.


Cusum example cdc ears methods c1 c3 l.jpg

Day-9 Day-8 Day-7 Day-6 Day-5 Day-4 Day-3 Day-2 Day-1 Day 0

Current

Count

CuSum Example: CDC EARS Methods C1-C3

  • Three adaptive methods chosen by National Center for Infectious Diseases after 9/1/2001 as most consistent

  • Look for aberrations representing increases, not decreases

  • Fixed mean, variance replaced by values from sliding baseline (usually 7 days)

Baseline for C1-MILD (-1 to -7 day)

Baseline C2-MEDIUM (-3 to -9days)

Baseline for C3-ULTRA (-3 to -9 days)


Calculation for c1 c3 l.jpg

Individual day statistic for day j with lag n: Day-3 Day-2 Day-1 Day 0

Sj,n = Max {0, ( Countj – [μn + σn] ) / σn}, where

μn is 7-day average with n-day lag

( so μ3 is mean of counts in [j-3, j-9] ), and

σn = standard deviation of same 7-day window

C1 statistic for day k is Sk,1 (no lag)

C2 statistic for day k is Sk,3 (2-day lag)

C3 statistic for day k is Sk,3 + Sk-1,3 + Sk-2,3

,where Sk-1,3 , Sk-2,3 are added if they do not exceed the threshold

Upper bound threshold of 2:

equivalent to 3 standard deviationsabove mean

Calculation for C1-C3:


Detailed example i l.jpg
Detailed Example, I Day-3 Day-2 Day-1 Day 0

Fewer alerts

AND more

sensitive:

why?


Detailed example ii l.jpg
Detailed Example, II Day-3 Day-2 Day-1 Day 0

Signal Detected only

with 28-day baseline


Detailed example iii the rest of the story l.jpg
Detailed Example, III Day-3 Day-2 Day-1 Day 0“the rest of the story”


ad