1 / 87

Tutorial on Bayesian Networks

Tutorial on Bayesian Networks. Jack Breese Microsoft Research breese@microsoft.com. Daphne Koller Stanford University koller@cs.stanford.edu. First given as a AAAI’97 tutorial. Probabilities. Probability distribution P(X| x) X is a random variable Discrete Continuous

jade-nunez
Download Presentation

Tutorial on Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial on Bayesian Networks Jack Breese Microsoft Research breese@microsoft.com Daphne Koller Stanford University koller@cs.stanford.edu First given as a AAAI’97 tutorial.

  2. Probabilities • Probability distribution P(X|x) • X is a random variable • Discrete • Continuous • xis background state of information

  3. Discrete Random Variables • Finite set of possible outcomes X binary:

  4. Continuous Random Variable • Probability distribution (density function) over continuous values 5 7

  5. Bayesian networks • Basics • Structured representation • Conditional independence • Naïve Bayes model • Independence facts

  6. P( S=no) 0.80 P( S=light) 0.15 P( S=heavy) 0.05 Smoking= no light heavy P( C=none) 0.96 0.88 0.60 P( C=benign) 0.03 0.08 0.25 P( C=malig) 0.01 0.04 0.15 Bayesian Networks Smoking Cancer

  7. Product Rule • P(C,S) = P(C|S) P(S)

  8. Marginalization P(Smoke) P(Cancer)

  9. Cancer= none benign malignant P( S=no) 0.821 0.522 0.421 P( S=light) 0.141 0.261 0.316 P( S=heavy) 0.037 0.217 0.263 Bayes Rule Revisited

  10. A Bayesian Network Age Gender Exposure to Toxics Smoking Cancer Serum Calcium Lung Tumor

  11. Independence Age and Gender are independent. Age Gender P(A,G) = P(G)P(A) P(A|G) = P(A) A ^G P(G|A) = P(G) G ^A P(A,G) = P(G|A) P(A) = P(G)P(A) P(A,G) = P(A|G) P(G) = P(A)P(G)

  12. Conditional Independence Cancer is independent of Age and Gender given Smoking. Age Gender Smoking P(C|A,G,S) = P(C|S) C ^ A,G | S Cancer

  13. Serum Calcium is independent of Lung Tumor, given Cancer P(L|SC,C) = P(L|C) More Conditional Independence:Naïve Bayes Serum Calcium and Lung Tumor are dependent Cancer Serum Calcium Lung Tumor

  14. Naïve Bayes in general H …... E1 E2 E3 En 2n + 1 parameters:

  15. P(E = heavy | C = malignant) > P(E = heavy | C = malignant, S=heavy) More Conditional Independence:Explaining Away Exposure to Toxics and Smoking are independent Exposure to Toxics Smoking E ^ S Cancer Exposure to Toxics is dependent on Smoking, given Cancer

  16. Age Gender Exposure to Toxics Smoking Cancer Serum Calcium Lung Tumor Put it all together

  17. General Product (Chain) Rule for Bayesian Networks Pai=parents(Xi)

  18. Conditional Independence A variable (node) is conditionally independent of its non-descendants given its parents. Age Gender Non-Descendants Exposure to Toxics Smoking Parents Cancer is independent of Age and Gender given Exposure to Toxics and Smoking. Cancer Serum Calcium Lung Tumor Descendants

  19. Another non-descendant Age Gender Cancer is independent of Dietgiven Exposure toToxics and Smoking. Exposure to Toxics Smoking Diet Cancer Serum Calcium Lung Tumor

  20. Independence and Graph Separation • Given a set of observations, is one set of variables dependent on another set? • Observing effects can induce dependencies. • d-separation (Pearl 1988) allows us to check conditional independence graphically.

  21. CPCS Network

  22. Age Gender Exposure to Toxic Smoking Genetic Damage Cancer Structuring Network structure corresponding to “causality” is usually good. Extending the conversation. Lung Tumor

  23. Local Structure • Causal independence: from 2nto n+1 parameters • Asymmetric assessment: similar savings in practice. • Typical savings (#params): • 145 to 55 for a small hardware network; • 133,931,430 to 8254 for CPCS !!

  24. Course Contents • Concepts in Probability • Bayesian Networks • Inference • Decision making • Learning networks from data • Reasoning over time • Applications

  25. Inference • Patterns of reasoning • Basic inference • Exact inference • Exploiting structure • Approximate inference

  26. Predictive Inference Age Gender How likely are elderly males to get malignant cancer? Exposure to Toxics Smoking P(C=malignant| Age>60, Gender= male) Cancer Serum Calcium Lung Tumor

  27. Combined Age Gender How likely is an elderly male patient with high Serum Calciumto have malignant cancer? Exposure to Toxics Smoking Cancer P(C=malignant| Age>60, Gender= male, Serum Calcium = high) Serum Calcium Lung Tumor

  28. Smoking • If we then observe heavy smoking, the probability of exposure to toxics goes back down. Explaining away Age Gender • If we see a lung tumor, the probability of heavy smoking and of exposure to toxics both go up. Exposure to Toxics Smoking Cancer Serum Calcium Lung Tumor

  29. P(q, e) P(q | e) = P(e) Inference in Belief Networks • Find P(Q=q|E= e) • Q the query variable • E set of evidence variables X1,…, Xn are network variables except Q, E P(q, e) = S P(q, e, x1,…, xn) x1,…, xn

  30. Basic Inference S C P(c) = ? • P(C,S) = P(C|S) P(S)

  31. C P(b) = S P(a, b) = S P(b | a) P(a) a a P(c) = S P(c | b) P(b) b = S P(c | b) P(b | a) P(a) P(c) = S P(a, b, c) b,a b,a = S P(c | b) S P(b | a) P(a) b a P(b) Basic Inference A B

  32. = S P(x | y1, y2) P(y1) P(y2) because of independence of Y1, Y2: y1, y2 Inference in trees Y2 Y1 X X P(x) = S P(x | y1, y2) P(y1, y2) y1, y2

  33. Polytrees • A network is singly connected (a polytree) if it contains no undirected loops. D C Theorem: Inference in a singly connected network can be done in linear time*. Main idea: in variable elimination, need only maintain distributions over single nodes. * in network size including table sizes.

  34. c c P(g) = P(r, s) ~ 0 The problem with loops P(c) 0.5 Cloudy c c Rain Sprinkler P(s) 0.01 0.99 P(r) 0.01 0.99 Grass-wet deterministic or The grass is dry only if no rain and no sprinklers.

  35. 0 0 P(g | r, s) P(r, s) + P(g | r, s) P(r, s) + P(g | r, s) P(r, s) + P(g | r, s) P(r, s) 0 1 = P(r, s) = P(r) P(s) ~ 0.5 ·0.5 = 0.25 problem The problem with loops contd. P(g) = ~ 0

  36. P(c) = S P(c | b) S P(b | a) P(a) P(A) P(B | A) b a P(b) x S P(B, A) P(B) P(C | B) A x S P(C, B) P(C) B Variable elimination A B C

  37. Inference as variable elimination • A factor over X is a function from val(X) to numbers in [0,1]: • A CPT is a factor • A joint distribution is also a factor • BN inference: • factors are multiplied to give new ones • variables in factors summed out • A variable can be summed out as soon as all factors mentioning it have been multiplied.

  38. P(A) P(G) P(S | A,G) P(E | A) S P(A,E,S) P(A,S) P(A,G,S) x x G S P(C | E,S) P(E,S) A x S P(C) P(E,S,C) E,S S P(L | C) x P(C,L) P(L) C Variable Elimination with loops Age Gender Exposure to Toxics Smoking Cancer Serum Calcium Lung Tumor Complexity is exponential in the size of the factors

  39. A, G, S A, E, S Join trees* A join tree is a partially precompiled factorization Age Gender P(A) x P(G) x P(S | A,G) x P(A,S) Exposure to Toxics Smoking Cancer E, S, C Serum Calcium Lung Tumor C, S-C C, L * aka junction trees, Lauritzen-Spiegelhalter, Hugin alg., …

  40. Boolean 3CNF formula f= (u v  w) (u  w  y) U V W Y prior probability1/2 or or and Probability ( ) = 1/2n · # satisfying assignments of f Computational complexity • Theorem: Inference in a multi-connected Bayesian network is NP-hard.

  41. # of live samples with B=b P(b|c) ~ total # of live samples 0.001 0.03 0.4 0.3 0.8 B E A C N b n b e a e b e b e Samples: b e a c n Stochastic simulation Burglary Earthquake P(b) P(e) 0.03 0.001 b e Alarm P(a) 0.98 0.7 0.4 0.01 Call Newscast = c e a P(n) 0.3 0.001 P(c) 0.05 0.8 e a c ...

  42. weight 0.8 b weight of samples with B=b n a P(b|c) = 0.05 b e a c n total weight of samples Likelihood weighting Burglary Earthquake a P(c) Alarm 0.05 0.8 P(c) 0.95 0.2 Call Newscast = c Samples: B E A C N e a c ...

  43. Markov Chain Monte Carlo

  44. MCMC with Gibbs Sampling • Fix the values of observed variables • Set the values of all non-observed variables randomly • Perform a random walk through the space of complete variable assignments. On each move: • Pick a variable X • Calculate Pr(X=true | all other variables) • Set X to true with that probability • Repeat many times. Frequency with which any variable X is true is it’s posterior probability. • Converges to true posterior when frequencies stop changing significantly • stable distribution, mixing

  45. Markov Blanket Sampling • How to calculate Pr(X=true | all other variables) ? • Recall: a variable is independent of all others given it’s Markov Blanket • parents • children • other parents of children • So problem becomes calculating Pr(X=true | MB(X)) • We solve this sub-problem exactly • Fortunately, it is easy to solve

  46. Example A C X B

  47. Example Smoking Heartdisease Lungdisease Shortnessof breath

  48. Example • Evidence: s, b Smoking Heartdisease Lungdisease Shortnessof breath

  49. Example • Evidence: s, b • Randomly set: h, b Smoking Heartdisease Lungdisease Shortnessof breath

  50. Example • Evidence: s, b • Randomly set: h, g • Sample H using P(H|s,g,b) Smoking Heartdisease Lungdisease Shortnessof breath

More Related