1 / 46

# - PowerPoint PPT Presentation

Bayesian Statistics: Asking the “Right” Questions. Michael L. Raymer, Ph.D. Statistical Games. “The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”.

Related searches for

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '' - Albert_Lan

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Bayesian Statistics: Asking the “Right” Questions

Michael L. Raymer, Ph.D.

“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”

“Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.”

M. Raymer – WSU, FBS

• “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?”

• Information available:

• How common is each allele in a particular population?

• CPI, RMP etc.

M. Raymer – WSU, FBS

• Suppose the rate of breast canceris 1%

• Mammograms detect breast cancer in 80% of cases where it is present

• 10% of the time, mammograms will indicate breast cancer in a healthy patient

• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

• 75% -- 3

• 50% -- 1

• 25% -- 2

• <10% -- a lot

M. Raymer – WSU, FBS

• Counting all possible outcomes

• If you flip a coin 4 times, what is the probability that you will get heads twice?

• TTTT THTT HTTT HHTT

• TTTH THTHHTTH HHTH

• TTHT THHTHTHT HHHT

• TTHH THHH HTHH HHHH

• P(2 heads) = 6/16 = 0.375

M. Raymer – WSU, FBS

• Frequency and Probability

• We can guess at probabilities by counting frequencies:

• The law of large numbers: the more samples we take the closer we will get to 0.5.

M. Raymer – WSU, FBS

• Counting frequencies gives us distributions

Gaussian Distribution

(Continuous)

Binomial Distribution

(Discrete)

M. Raymer – WSU, FBS

• Parametric

• Assume a Gaussian (e.g.) distribution.

• Estimate the parameters (,).

• Non-parametric

• Histogram sampling

• Bin size is critical

• Gaussian smoothingcan help

M. Raymer – WSU, FBS

• Non-overlapping outcomes:

• Possible Overlap:

• Independent Events:

TheProduct Rule

M. Raymer – WSU, FBS

• P(Engine > 200 H.P.) = 0.2

• P(Color = red) = 0.3

• Assuming independence:

• P(Red & Fast) = 0.2 × 0.3 = 0.06

• 1/4 * 1/10 * 1/6 * 1/8 * 1/5  1/10,000

M. Raymer – WSU, FBS

• One variable:

A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

M. Raymer – WSU, FBS

• Assume independence:

• Note what happens to significant digits!

The ring is size 11, and also made of platinum.

M. Raymer – WSU, FBS

• If a fruit has a diameter of 4”, how likely is it to be an apple?

4” Fruit

Apples

M. Raymer – WSU, FBS

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

M. Raymer – WSU, FBS

• Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob?

• Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob?

• How common are 17, 19, 21, 14, and 15.1 in “the population”?

M. Raymer – WSU, FBS

• For related events, we can expressprobability conditionally:

• Statistical Independence:

M. Raymer – WSU, FBS

• Terminology

• We have an object, and we want to decide if it belongs to a class

• Is this fruit a type of apple?

• Does this DNA come from a Caucasian American?

• Is this car a sports car?

• We measure features of the object (evidence):

• Size, weight, color

• Alleles at various loci

M. Raymer – WSU, FBS

• Feature/Evidence Vector:

• Classes & Posterior Probability:

M. Raymer – WSU, FBS

• You are given a fruit with adiameter of 4” – is it a pear or an apple?

• To begin, we need to know the distributions of diameters for pears and apples.

M. Raymer – WSU, FBS

Class-Conditional Distributions

P(x)

1” 2” 3” 4” 5” 6”

M. Raymer – WSU, FBS

• We based this decision on

(class conditional)

• What we really want to use is

(posterior probability)

• What if we found the fruit in a pear orchard?

• We need to know the prior probability of finding an apple or a pear!

M. Raymer – WSU, FBS

• Prior probability + Evidence Posterior Probability

• Without evidence, what is the “prior probability” that a fruit is an apple?

• What is the prior probability that a DNA sample comes from the defendant?

M. Raymer – WSU, FBS

• Bayes Rule

M. Raymer – WSU, FBS

or

M. Raymer – WSU, FBS

• Is it an ordinary apple or an uncommon pear?

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• What are the classes?

• What is the evidence?

• What is the prior probability?

• What is the class-conditional probability?

M. Raymer – WSU, FBS

• Suppose the rate of breast canceris 1%

• Mammograms detect breast cancer in 80% of cases where it is present

• 10% of the time, mammograms will indicate breast cancer in a healthy patient

• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

• Classes: healthy, cancer

• Evidence: positive mammogram (pos), negative mammogram (neg)

• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

• Suppose we have 1000 women

• 10 will have breast cancer

• 8 of these will have a positive mammogram

• 990 will not have breast cancer

• 99 of these will have a positive mammogram

• Of the 107 women with a positive mammogram, 8 have breast cancer

• 8/107 0.075 = 7.5%

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• Suppose the chance of a randomly chosen person being guilty is .001

• When a person is guilty, a DNA sample will match that individual 99% of the time.

• .0001 of the time, a DNA will exhibit a false match for an innocent individual

• If a DNA test demonstrates a match, what is the probability of guilt?

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• Assuming independent features:

• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

M. Raymer – WSU, FBS

• Provably optimum when the features (evidence) follow Gaussian distributions, and are independent.

M. Raymer – WSU, FBS

• Classes: DNA from defendant, DNA not from defendant

• Evidence: Allele matches at various loci

• Assumption of independence

• Prior Probabilities?

• Assumed equal (0.5)

• What is the true prior probability that an evidence sample came from a particular individual?

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater.

• The denominator for all the classes is always equal.

• Can be eliminated

• Useful when there are many possible classes

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• It is relatively easy to find the allele frequencies in the population

• Marginal probability distributions

• Independence assumption

• Class conditional probabilities

• Equal prior probabilities

• Bayesian posterior probability estimate

M. Raymer – WSU, FBS

M. Raymer – WSU, FBS

• The oldest citation:

T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763.

M. Raymer – WSU, FBS