population wide anomaly detection n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Population-Wide Anomaly Detection PowerPoint Presentation
Download Presentation
Population-Wide Anomaly Detection

Loading in 2 Seconds...

play fullscreen
1 / 23

Population-Wide Anomaly Detection - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Population-Wide Anomaly Detection. Weng-Keen Wong 1 , Gregory Cooper 2 , Denver Dash 3 , John Levander 2 , John Dowling 2 , Bill Hogan 2 , Michael Wagner 2.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Population-Wide Anomaly Detection' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
population wide anomaly detection

Population-Wide Anomaly Detection

Weng-Keen Wong1, Gregory Cooper2, Denver Dash3, John Levander2, John Dowling2, Bill Hogan2, Michael Wagner2

1School of Electrical Engineering and Computer Science, Oregon State University, 2Realtime Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, 3Intel Research, Santa Clara

motivation
Motivation
  • Suppose you monitor Emergency Department (ED) data which arrives in realtime
  • Can you specifically detect a large scale anthrax attack?
model non outbreak conditions and notice deviations
Model non-outbreak conditions and notice deviations

Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models

Spatial methods eg. Spatial Scan Statistic

Multivariate methods eg. WSARE

2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000

12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True

6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

model non outbreak conditions and notice deviations1
Model non-outbreak conditions and notice deviations

Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models

Spatial methods eg. Spatial Scan Statistic

These are non-specific methods – they look for anything unusual in the data but not specifically for the onset of an anthrax attack.

Multivariate methods eg. WSARE

2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000

12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True

6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

p opulation wide an omaly d etection and a ssessment panda
Population-wide ANomaly Detection and Assessment (PANDA)
  • A detector specifically for a large-scale outdoor release of inhalational anthrax
  • Uses a massive causal Bayesian network
  • Population-wide approach: each person in the population is represented as a subnetwork in the overall model
population wide approach
Population-Wide Approach
  • Note the conditional independence assumptions
  • Anthrax is infectious but non-contagious

Anthrax Release

Global nodes

Interface nodes

Location of Release

Time of Release

Each person in the population

Person Model

Person Model

Person Model

population wide approach1
Population-Wide Approach
  • Structure designed by expert judgment
  • Parameters obtained from census data, training data, and expert assessments informed by literature and experience

Anthrax Release

Global nodes

Interface nodes

Location of Release

Time of Release

Each person in the population

Person Model

Person Model

Person Model

person model initial prototype
Person Model (Initial Prototype)

Anthrax Release

Time Of Release

Location of Release

Gender

Age Decile

Age Decile

Gender

Home Zip

Home Zip

Other ED

Disease

Other ED

Disease

Anthrax Infection

Anthrax Infection

Respiratory

from Anthrax

Respiratory CC

From Other

Respiratory

from Anthrax

Respiratory CC

From Other

Respiratory

CC

Respiratory

CC

ED Admit

from Anthrax

ED Admit

from Other

ED Admit

from Anthrax

ED Admit

from Other

Respiratory CC

When Admitted

Respiratory CC

When Admitted

ED Admission

ED Admission

person model initial prototype1
Person Model (Initial Prototype)

Anthrax Release

Time Of Release

Location of Release

Female

20-30

50-60

Male

Gender

Age Decile

Age Decile

Gender

Home Zip

Home Zip

Other ED

Disease

Other ED

Disease

Anthrax Infection

Anthrax Infection

15213

15146

Respiratory

from Anthrax

Respiratory CC

From Other

Respiratory

from Anthrax

Respiratory CC

From Other

Respiratory

CC

Respiratory

CC

ED Admit

from Anthrax

ED Admit

from Other

ED Admit

from Anthrax

ED Admit

from Other

Unknown

False

Respiratory CC

When Admitted

Respiratory CC

When Admitted

Yesterday

ED Admission

never

ED Admission

prototype is computationally feasible
Prototype is Computationally Feasible

Aside from caching tricks, there are two main optimizations:

  • Incremental Updating
  • Equivalence Classes

Performance:

On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of initialization time, 3 seconds for each hour’s worth of ED data

See Cooper G.F., Dash D.H., Levander J.D., Wong W-K, Hogan W. R., Wagner M. M. Bayesian Biosurveillance of Disease Outbreaks. In Proceedings of the 20th Conference on UAI. Banff, Canada: AUAI Press; 2004. pp94-103.

what do you gain with a population wide approach
What do you gain with a population-wide approach?

Coherent framework for:

  • Incorporating background knowledge
  • Incorporating different types of evidence
  • Data fusion
  • Explanation
1 incorporating background knowledge
1. Incorporating Background Knowledge
  • Limited data from actual anthrax attacks available:
    • Postal attacks 2001 (Only 11 people affected, not representative of a large scale attack)
    • Sverdlovsk 1979
  • But literature contains studies on the characteristics of inhalational anthrax
1 incorporating background knowledge1
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern
1 incorporating background knowledge2
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern

At an individual level

1 incorporating background knowledge3
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of background knowledge eg. for inhalational anthrax:
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern

Can represent this by the effects over individuals

2 incorporating evidence
2. Incorporating Evidence
  • Easily incorporate different types of evidence eg. spatial, temporal, demographic, symptomatic
  • Easily incorporate new evidence that distinguishes an individual (or individuals) from others
    • Modify the appropriate person model
3 data fusion
3. Data Fusion

ED data

OTC data

  • No data available during an actual anthrax attack that captures the correlation between these two data sources.
  • By modeling the actions of individuals, and incorporating background knowledge, we can come up with a plausible model of the effects of an attack on these two data sources.
3 data fusion1
3. Data Fusion

ED data

OTC data

ED data – individual patient records, available usually in real-time

OTC data – aggregated over zipcode and available daily

3 data fusion2
3. Data Fusion

ED data

OTC data

By representing at the finest granularity (ie. each individual), we can easily deal with different spatial and temporal granularity in data fusion.

See Wong, W-K, Cooper G.F., Dash D.H., Dowling, J.N., Levander J.D., Hogan W. R., Wagner M. M. Bayesian Biosurveillance Using Multiple Data Streams. In Proceedings of the 3rd National Syndromic Surveillance Conference, 2004.

4 explanation
Important to know why the model believes an anthrax attack is occurring

Can find the subset of evidence E* that most influences such a belief

In PANDA, E* would correspond to a group of individuals

Identify the individuals that most contribute to the hypothesis of an attack

4. Explanation
4 explanation1
4. Explanation

Currently, we identify the top equivalence classes that contribute the most to the hypothesis that an attack is occurring

Can also use the Bayesian network to calculate the most likely location of release and time of release

future work
Future Work
  • More sophisticated person models
  • Improved explanation capabilities
  • Validation of data fusion model
  • More disease models apart from anthrax
  • Contagious disease models
  • Combining outputs from multiple Bayesian detectors
thank you
Thank You!

RODS Laboratory: http://rods.health.pitt.edu

Bayesian Biosurveillance:

http://www.cbmi.pitt.edu/panda/

This research was supported by grants IIS-0325581 from the National Science Foundation, F30602-01-2-0550 from the Department of Homeland Security, and ME-01-737 from the Pennsylvania Department of Health.