Bayesian Statistics: Asking the “Right” Questions

1 / 46

Bayesian Statistics: Asking the Right Questions - PowerPoint PPT Presentation

Bayesian Statistics: Asking the “Right” Questions. Michael L. Raymer, Ph.D. Statistical Games. “The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Bayesian Statistics: Asking the “Right” Questions

Michael L. Raymer, Ph.D.

Statistical Games

“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”

“Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.”

M. Raymer – WSU, FBS

The Question
• “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?”
• Information available:
• How common is each allele in a particular population?
• CPI, RMP etc.

M. Raymer – WSU, FBS

An Example Problem
• Suppose the rate of breast canceris 1%
• Mammograms detect breast cancer in 80% of cases where it is present
• 10% of the time, mammograms will indicate breast cancer in a healthy patient
• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

Results
• 75% -- 3
• 50% -- 1
• 25% -- 2
• <10% -- a lot

M. Raymer – WSU, FBS

Determining Probabilities
• Counting all possible outcomes
• If you flip a coin 4 times, what is the probability that you will get heads twice?
• TTTT THTT HTTT HHTT
• TTTH THTHHTTH HHTH
• TTHT THHTHTHT HHHT
• TTHH THHH HTHH HHHH
• P(2 heads) = 6/16 = 0.375

M. Raymer – WSU, FBS

Statistical Preliminaries
• Frequency and Probability
• We can guess at probabilities by counting frequencies:
• The law of large numbers: the more samples we take the closer we will get to 0.5.

M. Raymer – WSU, FBS

Distributions
• Counting frequencies gives us distributions

Gaussian Distribution

(Continuous)

Binomial Distribution

(Discrete)

M. Raymer – WSU, FBS

Density Estimation
• Parametric
• Assume a Gaussian (e.g.) distribution.
• Estimate the parameters (,).
• Non-parametric
• Histogram sampling
• Bin size is critical
• Gaussian smoothingcan help

M. Raymer – WSU, FBS

Combining Probabilities
• Non-overlapping outcomes:
• Possible Overlap:
• Independent Events:

TheProduct Rule

M. Raymer – WSU, FBS

Product Rule Example
• P(Engine > 200 H.P.) = 0.2
• P(Color = red) = 0.3
• Assuming independence:
• P(Red & Fast) = 0.2 × 0.3 = 0.06
• 1/4 * 1/10 * 1/6 * 1/8 * 1/5  1/10,000

M. Raymer – WSU, FBS

Statistical Decision Making
• One variable:

A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

M. Raymer – WSU, FBS

Multiple Variables
• Assume independence:
• Note what happens to significant digits!

The ring is size 11, and also made of platinum.

M. Raymer – WSU, FBS

Which Question?
• If a fruit has a diameter of 4”, how likely is it to be an apple?

4” Fruit

Apples

M. Raymer – WSU, FBS

“Inverting” the question

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

M. Raymer – WSU, FBS

Forensic DNA Evidence
• Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob?
• Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob?
• How common are 17, 19, 21, 14, and 15.1 in “the population”?

M. Raymer – WSU, FBS

Conditional Probabilities
• For related events, we can expressprobability conditionally:
• Statistical Independence:

M. Raymer – WSU, FBS

Bayesian Decision Making
• Terminology
• We have an object, and we want to decide if it belongs to a class
• Is this fruit a type of apple?
• Does this DNA come from a Caucasian American?
• Is this car a sports car?
• We measure features of the object (evidence):
• Size, weight, color
• Alleles at various loci

M. Raymer – WSU, FBS

Bayesian Notation
• Feature/Evidence Vector:
• Classes & Posterior Probability:

M. Raymer – WSU, FBS

A Simple Example
• You are given a fruit with adiameter of 4” – is it a pear or an apple?
• To begin, we need to know the distributions of diameters for pears and apples.

M. Raymer – WSU, FBS

Maximum Likelihood

Class-Conditional Distributions

P(x)

1” 2” 3” 4” 5” 6”

M. Raymer – WSU, FBS

A Key Problem
• We based this decision on

(class conditional)

• What we really want to use is

(posterior probability)

• What if we found the fruit in a pear orchard?
• We need to know the prior probability of finding an apple or a pear!

M. Raymer – WSU, FBS

Prior Probabilities
• Prior probability + Evidence Posterior Probability
• Without evidence, what is the “prior probability” that a fruit is an apple?
• What is the prior probability that a DNA sample comes from the defendant?

M. Raymer – WSU, FBS

The heart of it all
• Bayes Rule

M. Raymer – WSU, FBS

Bayes Rule

or

M. Raymer – WSU, FBS

Example Revisited
• Is it an ordinary apple or an uncommon pear?

M. Raymer – WSU, FBS

Bayes Rule Example

M. Raymer – WSU, FBS

Bayes Rule Example

M. Raymer – WSU, FBS

Posing the question
• What are the classes?
• What is the evidence?
• What is the prior probability?
• What is the class-conditional probability?

M. Raymer – WSU, FBS

An Example Problem
• Suppose the rate of breast canceris 1%
• Mammograms detect breast cancer in 80% of cases where it is present
• 10% of the time, mammograms will indicate breast cancer in a healthy patient
• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

Practice Problem Revisited
• Classes: healthy, cancer
• Evidence: positive mammogram (pos), negative mammogram (neg)
• If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS

A Counting Argument
• Suppose we have 1000 women
• 10 will have breast cancer
• 8 of these will have a positive mammogram
• 990 will not have breast cancer
• 99 of these will have a positive mammogram
• Of the 107 women with a positive mammogram, 8 have breast cancer
• 8/107 0.075 = 7.5%

M. Raymer – WSU, FBS

Solution

M. Raymer – WSU, FBS

An Example Problem
• Suppose the chance of a randomly chosen person being guilty is .001
• When a person is guilty, a DNA sample will match that individual 99% of the time.
• .0001 of the time, a DNA will exhibit a false match for an innocent individual
• If a DNA test demonstrates a match, what is the probability of guilt?

M. Raymer – WSU, FBS

Solution

M. Raymer – WSU, FBS

Marginal Distributions

M. Raymer – WSU, FBS

Combining Marginals
• Assuming independent features:
• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

M. Raymer – WSU, FBS

Bayes Decision Rule
• Provably optimum when the features (evidence) follow Gaussian distributions, and are independent.

M. Raymer – WSU, FBS

Forensic DNA
• Classes: DNA from defendant, DNA not from defendant
• Evidence: Allele matches at various loci
• Assumption of independence
• Prior Probabilities?
• Assumed equal (0.5)
• What is the true prior probability that an evidence sample came from a particular individual?

M. Raymer – WSU, FBS

The Importance of Priors

M. Raymer – WSU, FBS

Likelihood Ratios
• When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater.
• The denominator for all the classes is always equal.
• Can be eliminated
• Useful when there are many possible classes

M. Raymer – WSU, FBS

Likelihood Ratio Example

M. Raymer – WSU, FBS

Likelihood Ratio Example

M. Raymer – WSU, FBS

From alleles to identity:
• It is relatively easy to find the allele frequencies in the population
• Marginal probability distributions
• Independence assumption
• Class conditional probabilities
• Equal prior probabilities
• Bayesian posterior probability estimate

M. Raymer – WSU, FBS

Thank you.

M. Raymer – WSU, FBS