Modeling crypto occurrence using lab specific matrix spike recovery data
Download
1 / 17

Modeling Crypto Occurrence, Using Lab-Specific Matrix Spike Recovery Data - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Modeling Crypto Occurrence, Using Lab-Specific Matrix Spike Recovery Data. Michael Messner , Ph.D. Mathematical Statistician EPA Office of Ground Water and Drinking Water Standards and Risk Management Division Messner.Michael@epa.gov. Outline. Disclaimer Data Used

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modeling Crypto Occurrence, Using Lab-Specific Matrix Spike Recovery Data' - damaris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Modeling crypto occurrence using lab specific matrix spike recovery data l.jpg

Modeling Crypto Occurrence, Using Lab-Specific Matrix Spike Recovery Data

Michael Messner, Ph.D.

Mathematical Statistician

EPA Office of Ground Water and Drinking Water

Standards and Risk Management Division

Messner.Michael@epa.gov


Outline l.jpg
Outline Recovery Data

  • Disclaimer

  • Data Used

  • Uncertainty in Crypto Numbers Spiked

  • Model Building

  • Preferred Model (Model 5)

  • Results of Recovery Modeling

  • Informing the Crypto Occurrence Model


Disclaimer l.jpg
Disclaimer Recovery Data

  • Views expressed in this presentation are the authors and are not necessarily those of the USEPA.


Data used l.jpg
Data Used Recovery Data

  • Results were obtained from analyses of 1263 source water samples that were spiked with Cryptosporidium (matrix spike samples).

    • Dates range from Feb, 2004 to May 2008.

  • For each matrix spike sample, the data include:

    • Organization (Lab ID)

    • Sample volume filtered

    • Sample volume spiked

    • Number of Crypto measured

    • Number of Crypto spiked

  • The fraction of volume spiked is found by dividing “Sample volume filtered” by “Sample volume spiked”


Uncertainty in crypto numbers spiked l.jpg
Uncertainty in Crypto Numbers Spiked Recovery Data

  • Spiking suspensions (“tubes”), provided by two vendors, were prepared using flow cytometry.

  • Both vendors checked hundreds of their tubes by carefully counting the tubes’ oocysts.

  • Based on data provided by one lab, a pooled estimate of relative standard deviation (RSD) is 1.35%.

  • The other lab provided a histogram, rather than statistical summaries. The next slide shows that their precision appears to match that of the first lab.


Histogram of lab 2 and normal density function mu 100 s 1 35 l.jpg
Histogram of Lab 2 Recovery Dataand Normal Density Function mu = 100, s = 1.35


Model building l.jpg
Model Building Recovery Data

  • All models assume that the number of oocysts counted is Binomial with parameters N (exact number of oocysts in the spiked sample) and r, the probability that an oocyst in the sample will be observed and counted.

  • All the models account for uncertainty in N, based on 1.35% RSD.

  • Basic modeling approach was to start simple, using 2-parameter models, using log likelihood to gauge model quality.


Models l.jpg
Models Recovery Data

  • Model 1: r varies from assay to assay (both within and between labs) as a beta random variable.

  • Model 2: ln(r/(1-r)) = logit(r) varies from assay to assay as a normal random variable.

  • Model 3: With probability z, r varies as a Beta random variable, but the rest of the time (1-z), r is exactly zero.

  • Model 4: With probability z, logit(r) varies as a normal random variable, but the rest of the time (1-z), r is exactly zero.

  • Model 5: Both the probability of zero recovery and expected value of logit(r) vary from lab to lab as a bivariate normal random variable. Covariance allows these two features to be related.


Model 5 hierarchy l.jpg
Model 5 Hierarchy Recovery Data

  • High Level:

    • Grand means (mu0 and mu1) of lab-specific parameters logit(r) & pr{r=0}

    • Precision matrix R (R-1 = var-covar matrix)

    • Within-lab precision parameter phi0

  • Medium Level:

    • Lab-specific averages of logit(r)

    • Lab-specific pr{r=0}

  • Low Level:

    • Sample-specific recoveries (product of nonzero recovery and an indicator of zero recovery

    • Data (not shown in the figure).

      • K ~ dbinom(N,r)

      • Number spiked (Sp)

      • Number counted (K)


Winbugs code l.jpg
WinBUGS Code Recovery Data


Results l.jpg
Results Recovery Data

  • WinBUGS generates statistics about the model parameters and a Markov Chain Monte Carlo (MCMC) or “uncertainty” sample.

  • MCMC sample of size 10K takes about 4 min.


Results13 l.jpg
Results Recovery Data

0 not in interval for logit(r) and logit(z)  reject hypothesis that median probabilities for these are 0.5.

0 in interval  covariance is not significant, so can’t reject notion that Pr{zero} is distributed independently of median recovery (when not zero)

Can’t say that Labs with poor recovery don’t also have high probability of totally missing spiked oocysts.


Labs differ w r t mean logit r l.jpg
Labs Differ w.r.t. Mean Logit(r) Recovery Data

Central Value

Posterior median for this lab is

-1.019  median r = 26.5%

Average Recovery* = 24.2%

Logit(0.881) = 2

Logit(0.731) = 1

Logit(0.5) = 0

Logit(0.269) = -1

Logit(0.119) = -2

Posterior median for this lab is 0.2353  median r = 55.9%

Average Recovery* = 62.4%

Posterior median for this lab is - 0.5883  median r = 64.3%

Average Recovery* = 65.3%

* (count/expected), averaged across samples


Labs differ w r t pr r 0 l.jpg
Labs Differ w.r.t. Pr{r=0} Recovery Data

Lab found Crypto in all 60 spikes

Lab found no Crypto in 5 of 76 spikes

Lab found no Crypto in 17 of 223 spikes

Lab found no Crypto in 4 of 22 spikes


Informing the occurrence model l.jpg
Informing the Occurrence Model Recovery Data

  • Okay, so what good is all this?

  • Can use MCMC sample to inform our upcoming estimate of the Long-Term Rule’s (LT2’s) benefit.

    • Public water systems are monitoring their source waters for Crypto.

    • The new Crypto data, together with a model that accounts for lab-specific recovery will produce better estimates of actual occurrence.

    • Better occurrence estimates  better risk analyses  improved estimate of the benefit of treatment changes that result from LT2 implementation.


The funny thing about hierarchical models l.jpg
The funny thing about hierarchical models… Recovery Data

…is that, once you’ve tried one (and succeeded), you’ll see hierarchical models everywhere…

…which makes you wonder if you’re like that fellow with a hammer, to whom every problem looks like a nail.

Hierarchical modeling : Try it, you’ll like it.