Bayesian statistics asking the right questions
Download
1 / 46

Bayesian Statistics: Asking the Right Questions - PowerPoint PPT Presentation


  • 218 Views
  • Updated On :

Bayesian Statistics: Asking the “Right” Questions. Michael L. Raymer, Ph.D. Statistical Games. “The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Bayesian Statistics: Asking the Right Questions' - Albert_Lan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bayesian statistics asking the right questions l.jpg

Bayesian Statistics: Asking the “Right” Questions

Michael L. Raymer, Ph.D.


Statistical games l.jpg
Statistical Games

“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”

“Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.”

M. Raymer – WSU, FBS


The question l.jpg
The Question

  • “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?”

  • Information available:

    • How common is each allele in a particular population?

    • CPI, RMP etc.

M. Raymer – WSU, FBS


An example problem l.jpg
An Example Problem

  • Suppose the rate of breast canceris 1%

  • Mammograms detect breast cancer in 80% of cases where it is present

  • 10% of the time, mammograms will indicate breast cancer in a healthy patient

  • If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS


Results l.jpg
Results

  • 75% -- 3

  • 50% -- 1

  • 25% -- 2

  • <10% -- a lot

M. Raymer – WSU, FBS


Determining probabilities l.jpg
Determining Probabilities

  • Counting all possible outcomes

  • If you flip a coin 4 times, what is the probability that you will get heads twice?

    • TTTT THTT HTTT HHTT

    • TTTH THTHHTTH HHTH

    • TTHT THHTHTHT HHHT

    • TTHH THHH HTHH HHHH

  • P(2 heads) = 6/16 = 0.375

M. Raymer – WSU, FBS


Statistical preliminaries l.jpg
Statistical Preliminaries

  • Frequency and Probability

    • We can guess at probabilities by counting frequencies:

      • P(heads) = 0.5

    • The law of large numbers: the more samples we take the closer we will get to 0.5.

M. Raymer – WSU, FBS


Distributions l.jpg
Distributions

  • Counting frequencies gives us distributions

Gaussian Distribution

(Continuous)

Binomial Distribution

(Discrete)

M. Raymer – WSU, FBS


Density estimation l.jpg
Density Estimation

  • Parametric

    • Assume a Gaussian (e.g.) distribution.

    • Estimate the parameters (,).

  • Non-parametric

    • Histogram sampling

    • Bin size is critical

    • Gaussian smoothingcan help

M. Raymer – WSU, FBS


Combining probabilities l.jpg
Combining Probabilities

  • Non-overlapping outcomes:

  • Possible Overlap:

  • Independent Events:

TheProduct Rule

M. Raymer – WSU, FBS


Product rule example l.jpg
Product Rule Example

  • P(Engine > 200 H.P.) = 0.2

  • P(Color = red) = 0.3

  • Assuming independence:

    • P(Red & Fast) = 0.2 × 0.3 = 0.06

  • 1/4 * 1/10 * 1/6 * 1/8 * 1/5  1/10,000

M. Raymer – WSU, FBS


Statistical decision making l.jpg
Statistical Decision Making

  • One variable:

A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

M. Raymer – WSU, FBS


Multiple variables l.jpg
Multiple Variables

  • Assume independence:

    • Note what happens to significant digits!

The ring is size 11, and also made of platinum.

M. Raymer – WSU, FBS


Which question l.jpg
Which Question?

  • If a fruit has a diameter of 4”, how likely is it to be an apple?

4” Fruit

Apples

M. Raymer – WSU, FBS


Inverting the question l.jpg
“Inverting” the question

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

M. Raymer – WSU, FBS


Forensic dna evidence l.jpg
Forensic DNA Evidence

  • Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob?

    • Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob?

    • How common are 17, 19, 21, 14, and 15.1 in “the population”?

M. Raymer – WSU, FBS


Conditional probabilities l.jpg
Conditional Probabilities

  • For related events, we can expressprobability conditionally:

  • Statistical Independence:

M. Raymer – WSU, FBS


Bayesian decision making l.jpg
Bayesian Decision Making

  • Terminology

    • We have an object, and we want to decide if it belongs to a class

      • Is this fruit a type of apple?

      • Does this DNA come from a Caucasian American?

      • Is this car a sports car?

    • We measure features of the object (evidence):

      • Size, weight, color

      • Alleles at various loci

M. Raymer – WSU, FBS


Bayesian notation l.jpg
Bayesian Notation

  • Feature/Evidence Vector:

  • Classes & Posterior Probability:

M. Raymer – WSU, FBS


A simple example l.jpg
A Simple Example

  • You are given a fruit with adiameter of 4” – is it a pear or an apple?

  • To begin, we need to know the distributions of diameters for pears and apples.

M. Raymer – WSU, FBS


Maximum likelihood l.jpg
Maximum Likelihood

Class-Conditional Distributions

P(x)

1” 2” 3” 4” 5” 6”

M. Raymer – WSU, FBS


A key problem l.jpg
A Key Problem

  • We based this decision on

    (class conditional)

  • What we really want to use is

    (posterior probability)

  • What if we found the fruit in a pear orchard?

  • We need to know the prior probability of finding an apple or a pear!

M. Raymer – WSU, FBS


Prior probabilities l.jpg
Prior Probabilities

  • Prior probability + Evidence Posterior Probability

  • Without evidence, what is the “prior probability” that a fruit is an apple?

  • What is the prior probability that a DNA sample comes from the defendant?

M. Raymer – WSU, FBS


The heart of it all l.jpg
The heart of it all

  • Bayes Rule

M. Raymer – WSU, FBS


Bayes rule l.jpg
Bayes Rule

or

M. Raymer – WSU, FBS


Example revisited l.jpg
Example Revisited

  • Is it an ordinary apple or an uncommon pear?

M. Raymer – WSU, FBS


Bayes rule example l.jpg
Bayes Rule Example

M. Raymer – WSU, FBS


Bayes rule example28 l.jpg
Bayes Rule Example

M. Raymer – WSU, FBS


Posing the question l.jpg
Posing the question

  • What are the classes?

  • What is the evidence?

  • What is the prior probability?

  • What is the class-conditional probability?

M. Raymer – WSU, FBS


An example problem30 l.jpg
An Example Problem

  • Suppose the rate of breast canceris 1%

  • Mammograms detect breast cancer in 80% of cases where it is present

  • 10% of the time, mammograms will indicate breast cancer in a healthy patient

  • If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS


Practice problem revisited l.jpg
Practice Problem Revisited

  • Classes: healthy, cancer

  • Evidence: positive mammogram (pos), negative mammogram (neg)

  • If a woman has a positive mammogram result, what is the probability that she has breast cancer?

M. Raymer – WSU, FBS


A counting argument l.jpg
A Counting Argument

  • Suppose we have 1000 women

    • 10 will have breast cancer

      • 8 of these will have a positive mammogram

    • 990 will not have breast cancer

      • 99 of these will have a positive mammogram

    • Of the 107 women with a positive mammogram, 8 have breast cancer

      • 8/107 0.075 = 7.5%

M. Raymer – WSU, FBS


Solution l.jpg
Solution

M. Raymer – WSU, FBS


An example problem34 l.jpg
An Example Problem

  • Suppose the chance of a randomly chosen person being guilty is .001

  • When a person is guilty, a DNA sample will match that individual 99% of the time.

  • .0001 of the time, a DNA will exhibit a false match for an innocent individual

  • If a DNA test demonstrates a match, what is the probability of guilt?

M. Raymer – WSU, FBS


Solution35 l.jpg
Solution

M. Raymer – WSU, FBS


Marginal distributions l.jpg
Marginal Distributions

M. Raymer – WSU, FBS


Combining marginals l.jpg
Combining Marginals

  • Assuming independent features:

  • If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

M. Raymer – WSU, FBS


Bayes decision rule l.jpg
Bayes Decision Rule

  • Provably optimum when the features (evidence) follow Gaussian distributions, and are independent.

M. Raymer – WSU, FBS


Forensic dna l.jpg
Forensic DNA

  • Classes: DNA from defendant, DNA not from defendant

  • Evidence: Allele matches at various loci

    • Assumption of independence

  • Prior Probabilities?

    • Assumed equal (0.5)

    • What is the true prior probability that an evidence sample came from a particular individual?

M. Raymer – WSU, FBS


The importance of priors l.jpg
The Importance of Priors

M. Raymer – WSU, FBS


Likelihood ratios l.jpg
Likelihood Ratios

  • When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater.

  • The denominator for all the classes is always equal.

    • Can be eliminated

    • Useful when there are many possible classes

M. Raymer – WSU, FBS


Likelihood ratio example l.jpg
Likelihood Ratio Example

M. Raymer – WSU, FBS


Likelihood ratio example43 l.jpg
Likelihood Ratio Example

M. Raymer – WSU, FBS


From alleles to identity l.jpg
From alleles to identity:

  • It is relatively easy to find the allele frequencies in the population

    • Marginal probability distributions

  • Independence assumption

    • Class conditional probabilities

  • Equal prior probabilities

    • Bayesian posterior probability estimate

M. Raymer – WSU, FBS


Slide45 l.jpg

Thank you.

M. Raymer – WSU, FBS


A key advantage l.jpg
A Key Advantage

  • The oldest citation:

T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763.

M. Raymer – WSU, FBS


ad