Sequential Learning

Sequential Learning

Sequential Learning

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. Sequential Learning James B. Elsner and Thomas H. Jagger Department of Geography, Florida State University Some material based on notes from a one-day short course on Bayesian modeling and prediction given by David Draper (http://www.ams.ucsc.edu/~draper)

2. HIV Screening As we discussed, the three definitions of probability are: classical, frequentist, and Bayesian. Consider the problem of screening for HIV. Widespread screening for HIV has been proposed by some people in some countries (e.g., the U.S.) Recall: Two blood tests that screen for HIV are widely available: ELISA—relatively inexpensive (roughly \$20) and fairly accurate. Western Blot (WB)—considerably more accurate, but cost quite a bit more (about \$100).

3. A new patient comes to You, a physician, with symptoms that suggest he may be HIV positive (You = a generic person making assessments of uncertainty). Questions: • Is it appropriate to use the language of probability to quantify Your uncertainty about the proposition A = {This patient is HIV positive}? • If so, what kinds of probability are appropriate, and how would You assess P(A) in each case? • What strategy (e.g., ELISA, WB, both?) should You employ to decrease Your uncertainty about A? If You decide to run a screening test, how should Your uncertainty be updated in light of the test results?

4. Let’s say that, with this patient’s values of relevant demographic variables, the prevalence of HIV estimated from the medical literature, P(A) = P(he’s HIV positive), in his recognizable subpopulation is about 1/100 = 0.01 (1%). To improve this estimate by gathering data specificto this patient, You decide to take some blood and get a result from ELISA. Suppose the test comes back positive—what is Your updated P(A)? Bayesian probability has that name because of the simple updating rule attributed to Thomas Bayes who was one of the first to define conditional probability and make calculations on it.

5. The conditional probability of an event B in relationship to an event A is the probability of that event B occurs given that A has already occurred. P(B|A): conditional probability of B given A. P(B|A) = P(A and B)/P(A) P(A and B): probability of events A and B both occurring. P(A) P(B) P(A and B)

6. In New England, 84% of the houses have a garage and 65% of the houses have a garage and a backyard. What is the probability that a house has a backyard given it has a garage? Answer: P(backyard | garage) = P(garage and backyard)/P(garage) = 0.65/0.84 = 0.77 or 77%. Simple…, lets try another. In a certain 3-box game, each box contains 2 chips. The first has 2 red chips, the second has 2 green chips, and the third has 1 green and 1 red chip. We do not know which box contains which chips. We take one chip out of one box without looking inside. It is a green chip. What are the chances that the second chip is also green? Answers?

7. Let, Event A = {second chip in the box is green} Event B = {first chip in the box is green} P(A|B) = P(A and B)/P(B) P(A and B) = 1/3 since only 1 box in 3 have both green chips. P(B) = 3/6 = 1/2 since there are 3 green chips out of a total of 6. P(A|B) = 1/3 divided by 1/2 = 2/3. Another? Monty Hall Game Simulator http://www.shodor.org/interactivate/activities/monty3/index.html

8. Problems involving conditional probability often lead to results that are unexpected. In fact, many people have a hard time accepting results for these problems, as the results may seem counterintuitive. That may be one reason for the reluctance to embrace Bayesian thinking? Bayes’ theorem is derived from the definition of conditional probability. As we’ve seen: P(A|B) = P(A and B)/P(B) (1) But, we can also write: P(B|A) = P(B and A)/P(A) Multiplying through yields: P(A) P(B|A) = P(B and A) (2) Since: P(A and B) = P(B and A), we can write eq (2) as: P(A) P(B|A) = P(A and B) (3) Then substituting eq (3) into eq (1) yields P(A|B) = P(A) P(B|A)/P(B)

9. Bayes’ Theorem for propositions: P(A|D) = P(A) P(D|A) / P(D) In the usual application, A is an unknown quantity (such as the truth of some proposition) and D stands for some data relevant to Your uncertainty about A. P(unknown|data) = P(unknown) P(data|unknown)/c, where c = normalizing constant posterior = cX prior X likelihood The terms prior and posterior emphasize the sequential nature of the learning process: P(unknown) was Your uncertainty assessment before the data arrived (Your prior).

10. Your prior is updated multiplicatively on the probability scale by the likelihood P(data|unknown), and renormalized so that the total probability remains 1 (100%). Writing Bayes’ Theorem both for A and (not A) and combining gives a (perhaps even more) useful version: Bayes’ Theorem in odds form P(A|data) / P(not A|data) = P(A) / P(not A) XP(data|A)/P(data|not A) posterior odds = prior odds X Bayes’ factor Other names for Bayes’ factor are the data odds and the likelihood ratio, since this factor measures the relative plausibility of the data given A and (not A).

11. Applying Bayes’ Theorem to the HIV example requires additional information about ELISA obtained by screening the blood of people with known HIV status. sensitivity = P(ELISA positive|HIV positive) specificity = P(ELISA negative|HIV negative) These are called ELISA’s operating characteristics. They are rather good—sensitivity of about 0.95, and specificity of about 0.98. Thus, you might well expect that P(this patient HIV positive|ELISA positive) would be close to 1.

12. Here the updating produces a surprising result: Bayes factor comes out as B = sensitivity/(1 – specificity) = 0.95/0.02 = 47.5 which sounds like strong evidence that this patient is HIV positive. But, the prior odds are quite a bit stronger the other way prior odds = P(A)/[ 1 - P(A)] = 99 to 1 against HIV Thus multiplying prior odds by Bayes factor we get the posterior odds of 99/47.5 = 2.08 against HIV. To turn odds into probability we write P(HIV positive|data) = 1 / (1+odds) = 0.32 (32%). Bayesian calculator.

13. The reason Your posterior probability that Your patient is HIV positive is only 32% in light of the strong evidence from ELISA is that ELISA was designed to have a vastly better false negative rate—P(HIV positive|ELISA negative); P(HIV positive|ELISA negative)=5/9707 = 0.00052 or 1 in 1941, in comparison to its false positive rate—P(HIV negative|ELISA positive); P(HIV negative|ELISA positive)=198/293 = 0.68 or 2 in 3. This in turn is because ELISA’s developers judged it far worse to tell someone who’s HIV positive that they’re not than the other way around (reasonable for using ELISA for blood bank screening for instance).

14. The false positive rate would make widespread screening for HIV based only on ELISA a truly bad idea. Formalizing the consequences of the two types of error in diagnostic screening would require quantifying misclassification costs, which shifts the focus from (scientific) inference (the acquisition of knowledge for its own sake: Is this patient really HIV positive?) to decision making (putting that knowledge to work to answer a public policy or business question, e.g.: What use of ELISA and Western Blot would yield the optimal screening strategy?) Bayesian calculator.