1 / 72

Inference in Bayesian Networks

Inference in Bayesian Networks. Agenda. Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination Monte-Carlo methods. Some Applications of BN. Medical diagnosis Troubleshooting of hardware/software systems

teague
Download Presentation

Inference in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference in Bayesian Networks

  2. Agenda • Reading off independence assumptions • Efficient inference in Bayesian Networks • Top-down inference • Variable elimination • Monte-Carlo methods

  3. Some Applications of BN • Medical diagnosis • Troubleshooting of hardware/software systems • Fraud/uncollectible debt detection • Data mining • Analysis of genetic sequences • Data interpretation, computer vision, image understanding

  4. Battery Gas Radio SparkPlugs Starts Moves More Complicated Singly-Connected Belief Net

  5. Region = {Sky, Tree, Grass, Rock} R1 Above R2 R4 R3

  6. BN to evaluate insurance risks

  7. Burglary Earthquake causes Alarm effects JohnCalls MaryCalls BN from Last Lecture Intuitive meaning of arc from x to y: “x has direct influence on y” Directed acyclic graph

  8. Arcs do not necessarily encode causality! A C B B C A 2 BN’s that can encode the same joint probability distribution

  9. Reading off independence relationships • Given B, does the value of A affect the probability of C? • P(C|B,A) = P(C|B)? • No! • C parent’s (B) are given, and so it is independent of its non-descendents (A) • Independence is symmetric:C  A | B => A  C | B A B C

  10. Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary  Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents

  11. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why?

  12. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | Alarm ? • No! Why? • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 • P(B|A)P(E|A) = 0.086

  13. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | JohnCalls? • No! Why? • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent

  14. Independence relationships • Rough intuition (this holds for tree-like graphs, polytrees): • Evidence on the (directed) road between two variables makes them independent • Evidence on an “A” node makes descendants independent • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) • Formal property in general case : D-separation  independence (see R&N)

  15. Benefits of Sparse Models • Modeling • Fewer relationships need to be encoded (either through understanding or statistics) • Large networks can be built up from smaller ones • Intuition • Dependencies/independencies between variables can be inferred through network structures • Tractable inference

  16. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm)

  17. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm) P(Alarm) = Σb,eP(A,b,e) P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

  18. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(Alarm) = Σb,eP(A,b,e) • P(Alarm) = Σb,e P(A|b,e)P(b)P(e) • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

  19. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(A) = Σb,eP(A,b,e) • P(A) = Σb,e P(A|b,e)P(b)P(e) • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E) • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

  20. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls)

  21. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A)

  22. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A) P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

  23. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence Suppose we want to compute P(Alarm|Earthquake)

  24. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence Suppose we want to compute P(A|e) P(A|e) = Σb P(A,b|e) P(A|e) = Σb P(A|b,e)P(b)

  25. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference with Evidence • Suppose we want to compute P(A|e) • P(A|e) = Σb P(A,b|e) • P(A|e) = Σb P(A|b,e)P(b) • P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066

  26. Top-Down inference • Only works if the graph of ancestors of a variable is a polytree • Evidence given on ancestor(s) of the query variable • Efficient: • O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents • Evidence on an ancestor cuts off influence of portion of graph above evidence node

  27. Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)?

  28. Bayes’ Rule • P(AB) = P(A|B) P(B) = P(B|A) P(A) • So… P(A|B) = P(B|A) P(A) / P(B)

  29. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)?

  30. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)? • P(B) = Sa P(B,A=a) [marginalization] • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability] • So, P(B) = SaP(B | A=a) P(A=a)

  31. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)?

  32. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)? • P(A|B) = P(B|A)P(A)/P(B) [Bayes rule] • P(B) = SaP(B | A=a) P(A=a) [Last slide] • So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]

  33. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) =

  34. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a?

  35. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a? NO!

  36. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)] Be careful about indices!

  37. Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity

  38. Performing Inference • Variables X • Have evidence set E=e, query variable Q • Want to compute the posterior probability distribution over Q, given E=e • Let the non-evidence variables be Y (= X \ E) • Straight forward method: • Compute joint P(YE=e) • Marginalize to get P(Q,E=e) • Divide by P(E=e) to get P(Q|E=e)

  39. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|M) = ?? Evidence E=e Query Q

  40. Burglary Earthquake Alarm P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) JohnCalls MaryCalls  full joint distribution table Inference in the Alarm Example P(J|MaryCalls) = ?? 24 entries 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

  41. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 2 entries:one for JohnCalls,the other for JohnCalls 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

  42. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(SjP(j,MaryCalls))

  43. How expensive? • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi)) Straightforward method: • Use above to compute P(Y,E=e) • P(Q,E=e) = Sy1 … Syk P(Y,E=e) • P(E=e) = Sq P(Q,E=e) • Step 1: O( 2n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?

  44. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)

  45. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) Rearrange equation…

  46. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 Cache P(x2) for both values of X3!

  47. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 • How many * and + saved? • *: 2*4*2=16 vs 4+4=8 • + 2*3=8 vs 2+1=3 Can lead to huge gains in larger networks

  48. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

  49. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

  50. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b) Compute for all values of E,b

More Related