Create Presentation
Download Presentation

Download Presentation
## Bayesian Statistics: Asking the “Right” Questions

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Bayesian Statistics: Asking the “Right” Questions**Michael L. Raymer, Ph.D.**Statistical Games**“The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.” “Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.” M. Raymer – WSU, FBS**The Question**• “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?” • Information available: • How common is each allele in a particular population? • CPI, RMP etc. M. Raymer – WSU, FBS**An Example Problem**• Suppose the rate of breast canceris 1% • Mammograms detect breast cancer in 80% of cases where it is present • 10% of the time, mammograms will indicate breast cancer in a healthy patient • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS**Results**• 75% -- 3 • 50% -- 1 • 25% -- 2 • <10% -- a lot M. Raymer – WSU, FBS**Determining Probabilities**• Counting all possible outcomes • If you flip a coin 4 times, what is the probability that you will get heads twice? • TTTT THTT HTTT HHTT • TTTH THTHHTTH HHTH • TTHT THHTHTHT HHHT • TTHH THHH HTHH HHHH • P(2 heads) = 6/16 = 0.375 M. Raymer – WSU, FBS**Statistical Preliminaries**• Frequency and Probability • We can guess at probabilities by counting frequencies: • P(heads) = 0.5 • The law of large numbers: the more samples we take the closer we will get to 0.5. M. Raymer – WSU, FBS**Distributions**• Counting frequencies gives us distributions Gaussian Distribution (Continuous) Binomial Distribution (Discrete) M. Raymer – WSU, FBS**Density Estimation**• Parametric • Assume a Gaussian (e.g.) distribution. • Estimate the parameters (,). • Non-parametric • Histogram sampling • Bin size is critical • Gaussian smoothingcan help M. Raymer – WSU, FBS**Combining Probabilities**• Non-overlapping outcomes: • Possible Overlap: • Independent Events: TheProduct Rule M. Raymer – WSU, FBS**Product Rule Example**• P(Engine > 200 H.P.) = 0.2 • P(Color = red) = 0.3 • Assuming independence: • P(Red & Fast) = 0.2 × 0.3 = 0.06 • 1/4 * 1/10 * 1/6 * 1/8 * 1/5 1/10,000 M. Raymer – WSU, FBS**Statistical Decision Making**• One variable: A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11? M. Raymer – WSU, FBS**Multiple Variables**• Assume independence: • Note what happens to significant digits! The ring is size 11, and also made of platinum. M. Raymer – WSU, FBS**Which Question?**• If a fruit has a diameter of 4”, how likely is it to be an apple? 4” Fruit Apples M. Raymer – WSU, FBS**“Inverting” the question**Given an apple, what is the probability that it will have a diameter of 4”? Given a 4” diameter fruit, what is the probability that it is an apple? M. Raymer – WSU, FBS**Forensic DNA Evidence**• Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob? • Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob? • How common are 17, 19, 21, 14, and 15.1 in “the population”? M. Raymer – WSU, FBS**Conditional Probabilities**• For related events, we can expressprobability conditionally: • Statistical Independence: M. Raymer – WSU, FBS**Bayesian Decision Making**• Terminology • We have an object, and we want to decide if it belongs to a class • Is this fruit a type of apple? • Does this DNA come from a Caucasian American? • Is this car a sports car? • We measure features of the object (evidence): • Size, weight, color • Alleles at various loci M. Raymer – WSU, FBS**Bayesian Notation**• Feature/Evidence Vector: • Classes & Posterior Probability: M. Raymer – WSU, FBS**A Simple Example**• You are given a fruit with adiameter of 4” – is it a pear or an apple? • To begin, we need to know the distributions of diameters for pears and apples. M. Raymer – WSU, FBS**Maximum Likelihood**Class-Conditional Distributions P(x) 1” 2” 3” 4” 5” 6” M. Raymer – WSU, FBS**A Key Problem**• We based this decision on (class conditional) • What we really want to use is (posterior probability) • What if we found the fruit in a pear orchard? • We need to know the prior probability of finding an apple or a pear! M. Raymer – WSU, FBS**Prior Probabilities**• Prior probability + Evidence Posterior Probability • Without evidence, what is the “prior probability” that a fruit is an apple? • What is the prior probability that a DNA sample comes from the defendant? M. Raymer – WSU, FBS**The heart of it all**• Bayes Rule M. Raymer – WSU, FBS**Bayes Rule**or M. Raymer – WSU, FBS**Example Revisited**• Is it an ordinary apple or an uncommon pear? M. Raymer – WSU, FBS**Bayes Rule Example**M. Raymer – WSU, FBS**Bayes Rule Example**M. Raymer – WSU, FBS**Posing the question**• What are the classes? • What is the evidence? • What is the prior probability? • What is the class-conditional probability? M. Raymer – WSU, FBS**An Example Problem**• Suppose the rate of breast canceris 1% • Mammograms detect breast cancer in 80% of cases where it is present • 10% of the time, mammograms will indicate breast cancer in a healthy patient • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS**Practice Problem Revisited**• Classes: healthy, cancer • Evidence: positive mammogram (pos), negative mammogram (neg) • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS**A Counting Argument**• Suppose we have 1000 women • 10 will have breast cancer • 8 of these will have a positive mammogram • 990 will not have breast cancer • 99 of these will have a positive mammogram • Of the 107 women with a positive mammogram, 8 have breast cancer • 8/107 0.075 = 7.5% M. Raymer – WSU, FBS**Solution**M. Raymer – WSU, FBS**An Example Problem**• Suppose the chance of a randomly chosen person being guilty is .001 • When a person is guilty, a DNA sample will match that individual 99% of the time. • .0001 of the time, a DNA will exhibit a false match for an innocent individual • If a DNA test demonstrates a match, what is the probability of guilt? M. Raymer – WSU, FBS**Solution**M. Raymer – WSU, FBS**Marginal Distributions**M. Raymer – WSU, FBS**Combining Marginals**• Assuming independent features: • If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier). M. Raymer – WSU, FBS**Bayes Decision Rule**• Provably optimum when the features (evidence) follow Gaussian distributions, and are independent. M. Raymer – WSU, FBS**Forensic DNA**• Classes: DNA from defendant, DNA not from defendant • Evidence: Allele matches at various loci • Assumption of independence • Prior Probabilities? • Assumed equal (0.5) • What is the true prior probability that an evidence sample came from a particular individual? M. Raymer – WSU, FBS**The Importance of Priors**M. Raymer – WSU, FBS**Likelihood Ratios**• When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater. • The denominator for all the classes is always equal. • Can be eliminated • Useful when there are many possible classes M. Raymer – WSU, FBS**Likelihood Ratio Example** M. Raymer – WSU, FBS**Likelihood Ratio Example**M. Raymer – WSU, FBS**From alleles to identity:**• It is relatively easy to find the allele frequencies in the population • Marginal probability distributions • Independence assumption • Class conditional probabilities • Equal prior probabilities • Bayesian posterior probability estimate M. Raymer – WSU, FBS**Thank you.**M. Raymer – WSU, FBS**A Key Advantage**• The oldest citation: T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763. M. Raymer – WSU, FBS