1 / 48

Belief Networks

This text explains the concept of Probabilistic Belief Networks and how they can be used to update belief strengths when new evidence is observed.

pomeroy
Download Presentation

Belief Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Belief Networks Russell and Norvig: Chapter 15 CS121 – Winter 2002

  2. Other Names • Bayesian networks • Probabilistic networks • Causal networks

  3. sensors environment ? agent actuators I believe that the sun will still exist tomorrowwith probability 0.999999 and that it will be a sunny with probability 0.6 Probabilistic Agent

  4. Probabilistic Belief • There are several possible worlds that areindistinguishable to an agent given some priorevidence. • The agent believes that a logic sentence B is True with probability p and False with probability 1-p. B is called a belief • In the frequency interpretation of probabilities, this means that the agent believes that the fraction of possible worlds that satisfy B is p • The distribution (p,1-p) is the strength of B

  5. Problem • At a certain time t, the KB of an agent is some collection of beliefs • At time t the agent’s sensors make an observation that changes the strength of one of its beliefs • How should the agent update the strength of its other beliefs?

  6. Toothache Example • A certain dentist is only interested in two things about any patient, whether he has a toothache and whether he has a cavity • Over years of practice, she has constructed the following joint distribution:

  7. In particular, this distribution implies that the prior probability of Toothache is 0.05P(T) = P((TC)v(TC)) = P(TC) + P(TC) Toothache Example • Using the joint distribution, the dentist can compute the strength of any logic sentence built with the proposition Toothache and Cavity

  8. New Evidence • She now makes an observation E that indicates that a specific patient x has high probability (0.8) of having a toothache, but is not directly related to whether he has a cavity

  9. The probability of Cavity that was 0.01 is now (knowing E) 0.6526 Adjusting Joint Distribution • She now makes an observation E that indicates that a specific patient x has high probability (0.8) of having a toothache, but is not directly related to whether he has a cavity • She can use this additional information to create a joint distribution (specific for x) conditional to E, by keeping the same probability ratios between Cavity and Cavity

  10. Corresponding Calculus • P(C|T) = P(CT)/P(T) = 0.04/0.05

  11. C and E are independent given T Corresponding Calculus • P(C|T) = P(CT)/P(T) = 0.04/0.05 • P(CT|E) = P(C|T,E) P(T|E) = P(C|T) P(T|E)

  12. Corresponding Calculus • P(C|T) = P(CT)/P(T) = 0.04/0.05 • P(CT|E) = P(C|T,E) P(T|E) = P(C|T) P(T|E) = (0.04/0.05)0.8 = 0.64

  13. If the dentist knows that the patient has a cavity (C), the probability of a probe catching in the tooth (K) does not depend on the presence of a toothache (T). More formally:P(T|C,K) = P(T|C) and P(K|C,T) = P(K|C) Cavity Toothache Catch Generalization • n beliefs X1,…,Xn • The joint distribution can be used to update probabilities when new evidence arrives But: • The joint distribution contains 2n probabilities • Useful independence is not made explicit

  14. Purpose of Belief Networks • Facilitate the description of a collection of beliefs by making explicit causality relations and conditional independence among beliefs • Provide a more efficient way (than by sing joint distribution tables) to update belief strengths when new evidence is observed

  15. Alarm Example Five beliefs • A: Alarm • B: Burglary • E: Earthquake • J: JohnCalls • M: MaryCalls

  16. Burglary Earthquake causes Alarm effects JohnCalls MaryCalls A Simple Belief Network Intuitive meaning of arrow from x to y: “x has direct influence on y” Directed acyclicgraph (DAG) Nodes are beliefs

  17. Burglary Earthquake Alarm JohnCalls MaryCalls Assigning Probabilities to Roots

  18. Burglary Earthquake Alarm JohnCalls MaryCalls Conditional Probability Tables Size of the CPT for a node with k parents: 2k

  19. Burglary Earthquake Alarm JohnCalls MaryCalls Conditional Probability Tables

  20. Burglary Earthquake Alarm JohnCalls MaryCalls What the BN Means P(x1,x2,…,xn) = Pi=1,…,nP(xi|Parents(Xi))

  21. Burglary Earthquake Alarm JohnCalls MaryCalls Calculation of Joint Probability P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062

  22. Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm Burglary Earthquake For example, John doesnot observe any burglariesdirectly Alarm JohnCalls MaryCalls What The BN Encodes

  23. Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm For instance, the reasons why John and Mary may not call if there is an alarm are unrelated Burglary Earthquake Alarm Note that these reasons couldbe other beliefs in the network.The probabilities summarize thesenon-explicit beliefs JohnCalls MaryCalls What The BN Encodes

  24. Structure of BN • The relation: P(x1,x2,…,xn) = Pi=1,…,nP(xi|Parents(Xi))means that each belief is independent of its predecessors in the BN given its parents • Said otherwise, the parents of a belief Xi are all the beliefs that “directly influence” Xi • Usually (but not always) the parents of Xi are its causes and Xi is the effect of these causes E.g., JohnCalls is influenced by Burglary, but not directly. JohnCalls is directly influenced by Alarm

  25. Construction of BN • Choose the relevant sentences (random variables) that describe the domain • Select an ordering X1,…,Xn, so that all the beliefs that directly influence Xi are before Xi • For j=1,…,n do: • Add a node in the network labeled by Xj • Connect the node of its parents to Xj • Define the CPT of Xj • The ordering guarantees that the BN will have no cycles • The CPT guarantees that exactly the correct number of probabilities will be defined: no missing, no extra Use canonical distribution, e.g., noisy-OR, to fill CPT’s

  26. Locally Structured Domain • Size of CPT: 2k, where k is the number of parents • In a locally structured domain, each belief is directly influenced by relatively few other beliefs and k is small • BN are better suited for locally structured domains

  27. J M P(B|…) TTFF TFTF ???? Note compactness of notation! Distribution conditional to the observations made Inference In BN P(X|obs) = Se P(X|e) P(e|obs) where e is an assignment of values to the evidence variables • Set E of evidence variables that are observed with new probability distribution, e.g., {JohnCalls,MaryCalls} • Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E)

  28. Burglary Earthquake Burglary Earthquake Causal Diagnostic Alarm Alarm JohnCalls MaryCalls JohnCalls MaryCalls Burglary Earthquake Burglary Earthquake Mixed Intercausal Alarm Alarm JohnCalls MaryCalls JohnCalls MaryCalls Inference Patterns • Basic use of a BN: Given new • observations, compute the newstrengths of some (or all) beliefs • Other use: Given the strength of • a belief, which observation should • we gather to make the greatest • change in this belief’s strength

  29. Burglary Earthquake Alarm is singly connected is not singly connected JohnCalls MaryCalls Singly Connected BN • A BN is singly connected if there is at most one undirected path between any two nodes is singly connected

  30. Battery Gas Radio SparkPlugs linear Starts diverging converging Moves Types Of Nodes On A Path

  31. Battery Gas Radio SparkPlugs linear Starts diverging converging Moves Independence Relations In BN Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E

  32. Battery Gas Radio SparkPlugs linear Starts diverging converging Moves Independence Relations In BN Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on SparkPlugs

  33. Battery Gas Radio SparkPlugs linear Starts diverging converging Moves Independence Relations In BN Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given evidence on Battery

  34. Battery Gas Radio SparkPlugs linear Starts diverging converging Moves Independence Relations In BN Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three conditions holds: 1. A node on the path is linear and in E 2. A node on the path is diverging and in E 3. A node on the path is converging and neither this node, nor any descendant is in E Gas and Radio are independent given no evidence, but they aredependent given evidence on Starts or Moves

  35. Um U1 ... X Y1 Yp ... Answering Query P(X|E)

  36. Um U1 ... Recursive back-chaining algorithm Given by the CPT of node X Recursion ends when reaching an evidence, a root, or a leaf node X Y1 Yp ... Same problemon a subset of the BN Computing P(X|E+) X

  37. P(O) O 0.4 O O P(L|O) P(C|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office O: Sonia is in her office L: Lights are on in Sonia’s office C: Sonia is logged on to her computer We observe L=True What is the probability of C given this observation? --> Compute P(C|L=T)

  38. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T)

  39. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T) = P(C|O=T) P(O=T|L=T) + P(C|O=F) P(O=F|L=T)

  40. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T) = P(C|O=T) P(O=T|L=T) + P(C|O=F) P(O=F|L=T) P(O|L) = P(OL) / P(L) = P(L|O)P(O) / P(L)

  41. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T) = P(C|O=T) P(O=T|L=T) + P(C|O=F) P(O=F|L=T) P(O|L) = P(OL) / P(L) = P(L|O)P(O) / P(L) P(O=T|L=T) = 0.24/P(L) P(O=F|L=T) = 0.06/P(L)

  42. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T) = P(C|O=T) P(O=T|L=T) + P(C|O=F) P(O=F|L=T) P(O|L) = P(OL) / P(L) = P(L|O)P(O) / P(L) P(O=T|L=T) = 0.24/P(L) = 0.8 P(O=F|L=T) = 0.06/P(L) = 0.2

  43. P(O) O 0.4 O O P(C|O) P(L|O) TF TF 0.80.3 0.60.1 L C Example: Sonia’s Office P(C|L=T) = P(C|O=T) P(O=T|L=T) + P(C|O=F) P(O=F|L=T) P(O|L) = P(OL) / P(L) = P(L|O)P(O) / P(L) P(O=T|L=T) = 0.24/P(L) = 0.8 P(O=F|L=T) = 0.06/P(L) = 0.2 P(C|L=T) = 0.8x0.8 + 0.3x0.2 P(C|L=T) = 0.7

  44. If successive observations are made over time, the forward-chaining update is invoked after every new observation Complexity • The back-chaining algorithm considers each node at most once • It takes linear time in the number of beliefs • But it computes P(X|E) for only one X • Repeating the computation for every belief takes quadratic time • By forward-chaining from E and clever bookkeeping, P(X|E) can be computed for all X in linear time

  45. To update the probability of D given some evidenceE, we need to know how both B and C depend on E E A A B C “Solution”: D B,C D Multiply-Connected BN But this solution takes exponential time in the worst-case In fact, inference with multiply-connected BN is NP-hard

  46. Cloudy Rain Sprinkler WetGrass Stochastic Simulation P(WetGrass|Cloudy)? P(WetGrass|Cloudy) = P(WetGrass  Cloudy) / P(Cloudy) 1. Repeat N times: 1.1. Guess Cloudy at random 1.2. For each guess of Cloudy, guess Sprinkler and Rain, then WetGrass 2. Compute the ratio of the # runs where WetGrass and Cloudy are True over the # runs where Cloudy is True

  47. Applications • http://excalibur.brc.uconn.edu/~baynet/researchApps.html • Medical diagnosis, e.g., lymph-node deseases • Fraud/uncollectible debt detection • Troubleshooting of hardware/software systems

  48. Summary • Belief update • Role of conditional independence • Belief networks • Causality ordering • Inference in BN • Back-chaining for singly-connected BN • Stochastic Simulation

More Related