- 115 Views
- Uploaded on
- Presentation posted in: General

How Many Samples do I Need? Part 1

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

DQO Training Course

Day 1

Module 4

How Many Samples do I Need?Part 1

Presenter: Sebastian Tindall

60 minutes

(15 minute 1st Afternoon Break)

- How many samples based on
- Census
- Sampling

- Types of decision error
- Definitions of common statistical terms

n = (total $) ($ per sample)

Quick & Dirty Method

n = 5

Budget Method

How will the

data be used?

It depends!

What is the decision?

What is the

tolerance for

mistakes?

What is the

underlying variation

in the material

being sampled?

(The Real Answer)

Just Enough!

REMEMBER:

HETEROGENEITY

IS THE RULE!

- Requires knowing the “true condition” of the population in question
- Perform a census
- Collect and analyze every possible member of the population in question

- Perform a census

- Population
- Universe of items (elements) within the spatial boundary
- All the possible soil samples in the Smith’s backyard
- All the people in the U.S.A.

- Translation: you have to count/measure (sample) EVERY single member of the population

- Universe of items (elements) within the spatial boundary

One-Acre

30'0"

Football Field

How many surface

soil samples can

I take from a

one-acre field?

The perimeter of a

one-acre field measures

272.25 feet by 160 feet.

If one surface soil

sample = 2.5” x 2.5” x 6”

deep, then….

...there are = 1,000,000 possible

surface soil samples in a one-acre field.

How much would it

cost to know the

true condition of

the one-acre field?

If it costs $3000 to test

one surface soil sample,

it would cost$3,000,000,000

to test all possible population

units.

CENSUS

- Testing all possible population units (samples) is the ONLY way to know the true condition of the site with absolute certainty
- However, time and money considerations usually prevent us from doing this

- Perform a census
- totally impractical

- Therefore, we can never make a decision with absolute certainty
- So what’s left to do?

ESTIMATION

- Estimates of the true condition of the site are usually made from a few (representative) samples
- Taking a few samples (making a few measurements) and using them to represent the site
- Make inferences (even sweeping claims) about the population of interest based on these few samples

- An estimate is just an educated guess based on incomplete information
- Educated guesses will be wrong, to some degree
- In other words, the process of estimation contains inherent errors

Are unavoidable!

- Are NOT mistakes. They do not suggest that anything was done improperly
- Are an inherent part of the process of estimation
- Are simply deviations from the true condition of the site
- Introduce uncertainty into the decision-making process

Estimation Errors

Decision Errors

- Decision errors are true mistakes
- Examples:
- Walking away from a dirty site
- Cleaning up a clean site

- Decision errors can be managed

- Are acceptable or tolerable …within limits
- We set tolerable limits on the percentage of time we are willing to:
- Walk away from a dirty site
- Clean up a clean site

Planning

Sampling

Analysis

Data Vs

Decision

Population

Everyone or everything of interest

Example: All the people in this class

Sample

Some subset of the population

Example: Five people randomly chosen from the class

Population Parameter

The true value of the population characteristic (e.g., age) that can only be known if all possible samples are measured

Example: true mean age of all the people in the class, calculated using data from every member of the population

Sample Statistic

The estimated value of the population characteristic that is calculated from sample data

Example: estimate of the true mean age of all people in the class, calculated using data from a subset (sample) of the population

Population Parameter

Represents “true condition” of the population

Decisions can be made with 100% certainty (0% uncertainty)

Sample Statistic

Represents “estimated condition” of the population

Decision cannot be made with 100% certainty

What is the true mean age in this class?

What is the estimated mean age in this class?

Randomly select 5 ages

2nd estimated mean age in this class?

Randomly select 15 ages

(See Computer Age Demo)

- In this case - where we are only interested in measuring a small group of people who are all in the same room at the same time - it is not too difficult to determine the true mean age with 100% certainty. But:
- What if some people failed to respond?
- What if some people “fudged” a little?
- What if some of the response forms got lost?

- Before we can talk about acceptable limits for making decision errors, we must first understand what correct decisions and decision errors look like and define some terms
- There are two types of correct decisions and two types of decision errors that can be made

Graph of Perfect Decision Making

1.0

0.5

0.0

Ideal Decision Rule

Chance of Deciding Site is Dirty

6 pCi/g

Action Level

Low True Mean 226Ra concentration High

Graph of Typical Decision Making

1.0

0.5

0.0

Typical Curve

Chance of Deciding Site is Dirty

6 pCi/g

Action Level

Low True Mean 226Ra Concentration High

Null Hypothesis:

The Site is dirty.

True State of Site

Site is clean

Site is dirty

The Gray Region

1.0

Probability of deciding that the site is dirty

Typical Curve

0.5

0.0

75

100

Lower Bound of Gray Region

Action Level

True mean COPC Concentration

Decision

Performance

Goal

Diagram

Walk away from site

Clean up site

Alternative Action

Action Level

UCL 1A

UCL 1B

X A

75

110

100

95

Decision-Making Procedure:

Apply Decision Rule

PSQ

Is Site clean?

Is Site dirty?

∞

DL

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

Action Level

X B

UCL B

110

120

100

Decision-Making Procedure:

Apply Decision Rule

PSQ

Is Site clean?

Is Site dirty?

∞

DL

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Decision-Making Procedure: Apply Decision Rule

PSQ

Conclusion:

Site is dirty.

Is Site clean?

Is Site dirty?

Action:

Clean up a

dirty site.

A correct

decision.

∞

DL

100

Action Level

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Decision-Making Procedure: Apply Decision Rule

PSQ

Conclusion:

Site is clean.

Is Site clean?

Is Site dirty?

Action:

Walk away from a dirty site.

An incorrect

decision.

∞

DL

100

Action Level

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Decision-Making Procedure: Apply Decision Rule

Conclusion:

Site is clean.

PSQ

Is Site clean?

Is Site dirty?

Action:

Walk away

from a

clean site.

A correct

decision.

∞

DL

100

Action Level

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Decision-Making Procedure: Apply Decision Rule

PSQ

Conclusion:

Site is dirty.

Is Site clean?

Is Site dirty?

Action:

Clean up a

clean site.

An incorrect

decision.

∞

DL

100

Action Level

95 UCL% COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

The Gray Region

Null Hypothesis:

The Site is dirty.

True State of Site

Site is clean

Site is dirty

When the True Mean is

well above the Action

Level...

1.0

Probability of deciding that the True Mean is greater that or equal to the Action Level

... then there should be high a

probability that the Sample

Mean UCL will also be above

the Action Level...

0.5

... and it is highly likely that we

will correctly decide to clean

up a dirty site.

0.0

Lower Bound of GrayRegion

75

100

Action Level

True mean COPC Concentration

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Null Hypothesis:

The Site is dirty.

The Gray Region

True State of Site

If the True Mean

is well below the Lower

Bound of the Gray

Region...

... then there should

be a very low

probability that the

Sample Mean UCL

will be above the

Action Level...

Site is clean

Site is dirty

1.0

Probability of deciding that the site is dirty

0.5

0.0

Lower Bound of GrayRegion

75

100

Action Level

True mean COPC Concentration

... and it is highly unlikely

that we will incorrectly

decide to clean up a clean site.

Walk away from site

Clean up site

Alternative Action

True Mean

Sample Mean UCL

Deviation

Null Hypothesis:

The Site is dirty.

The Gray Region

True State of Site

... then there is an

increased probability that

the Sample Mean UCL will

be above the Action Level...

When the True Mean

is IN the gray region…..

Site is clean

Site is dirty

1.0

Probability of deciding that the site is dirty

0.5

... and that we will agree to

incorrectly decide to clean up

a clean site.

0.0

Lower Bound of GrayRegion

75

100

Action Level

True mean COPC Concentration

Walk away from site

Clean up site

Alternative Action

Null Hypothesis:

The Site is dirty.

True State of Site

Site is clean

Site is dirty

1.0

Typical Curve

The Gray Region

0.5

Probability of deciding that the site is dirty

0.0

Lower Bound of Gray Region

75

100

Action Level

True mean COPC Concentration

Decision

Performance

Goal

Diagram

Walk away from site

Clean up site

Alternative Action

Unnecessary Disposal and/or Cleanup Cost

Threatto Public Healthand Environment

Sampling and Analyses Cost

Sampling and Analyses Cost

$

$

$

$

Managing Uncertainty is a Balancing Act

PRP 1 Focus

Regulatory 1 Focus

- We will never know the true condition of the site - time and money prevent this
- Therefore we must estimate the true condition through sampling
- Estimates based on samples are not factual statements about the site. They are educated guesses
- Estimates must be in error - because they use incomplete information

- Errors are not mistakes - just deviations from the truth
- Errors (deviations) introduce uncertainty into the decision-making process
- Errors and uncertainty can be managed so that you can still get the job done and prove that you did it

- The DQO Process is designed to help you manage uncertainty and:
- Get the job done efficiently
- Prove that you did it defensibly

Managing uncertainty through

systematic planning.

“FAILING TO PLAN…..

IS PLANNING TO FAIL”

REMEMBER:

HETEROGENEITY

IS THE RULE!

End of Module 4

Thank you

Summary of Parts 1, 2, 3 will be at the end of Module 6

Questions?

We will now take a 15 minute break.

Please be back in 15 minutes.