1 / 43

Probabilistic Representation and Reasoning

Probabilistic Representation and Reasoning. Given a set of facts/beliefs/rules/evidence Evaluate a given statement Determine the probability of a statement Find a statement that optimizes a set of constraints

gisela
Download Presentation

Probabilistic Representation and Reasoning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Representation and Reasoning • Given a set of facts/beliefs/rules/evidence • Evaluate a given statement • Determine the probability of a statement • Find a statement that optimizes a set of constraints • Most probable explanation (MPE) (Setting of hidden variables that best explains observations.) (c) 2003 Thomas G. Dietterich

  2. Probability Theory • Random Variables • Boolean: W1,2 (just like propositional logic). Two possible values {true, false} • Discrete: Weather 2 {sunny, cloudy, rainy, snow} • Continuous: Temperature 2< • Propositions • W1,2 = true, Weather = sunny, Temperature = 65 • These can be combined as in propositional logic (c) 2003 Thomas G. Dietterich

  3. Example • Consider a car described by 3 random variables: • Gas 2 {true, false}: There is gas in the tank • Meter 2 {empty,full}: The gas gauge shows the tank is empty or full • Starts 2 {yes,no}: The car starts when you turn the key in the ignition (c) 2003 Thomas G. Dietterich

  4. Joint Probability Distribution • Each row is called a “primitive event” • Rows are mutually exclusive and exhaustive • Corresponds to an “8-sided coin” with the indicated probabilities (c) 2003 Thomas G. Dietterich

  5. Any Query Can Be Answered from the Joint Distribution • P(Gas = false Æ Meter = full Æ Starts = yes) = 0.0006 • P(Gas = false) = 0.2, this is the sum of all cells where Gas = false • In general: To compute P(Q), for any proposition Q, add up the probability in all cells where Q is true (c) 2003 Thomas G. Dietterich

  6. Notations • P(G,M,S) denotes the entire joint distribution (In the book, P is boldface). It is a table or function that maps from G, M, and S to a probability. • P(true,empty,no) denotes a single probability value: P(Gas=true Æ Meter=empty Æ Starts=no) (c) 2003 Thomas G. Dietterich

  7. Operations on Probability Tables (1) • Marginalization (“summing away”) • M,SP(G,M,S) = P(G) • P(G) is called a “marginal probability” distribution. It consists of two probabilities: (c) 2003 Thomas G. Dietterich

  8. Conditional Probability • Suppose we observe that M=full. What is the probability that the car will start? • P(S=yes | M=full) • Definition: P(A|B) = P(A Æ B) / P(B) (c) 2003 Thomas G. Dietterich

  9. Conditional Probability • Select cells that match the condition (M=full) • Delete remaining cells and M column • Renormalize the table to obtain P(S,G|M=full) • Sum away Gas: GP(S,G | M=full) = P(S|M=full) • Read answer from P(S=yes | M=full) cell (c) 2003 Thomas G. Dietterich

  10. Operations on Probability Tables (2):Conditionalizing • Construct P(G,S | M) by normalizing the subtable corresponding to M=full and normalizing the subtable corresponding to M=empty (c) 2003 Thomas G. Dietterich

  11. Chain Rule of Probability • P(A,B,C) = P(A|B,C) ¢ P(B|C) ¢ P(C) • Proof: (c) 2003 Thomas G. Dietterich

  12. Chain Rule (2) • Holds for distributions too: P(A,B,C) = P(A | B,C) ¢P(B | C) ¢P(C) This means that for each setting of A,B, and C, we can substitute into the equation, and it is true. (c) 2003 Thomas G. Dietterich

  13. Belief Networks (1):Independence • Defn: Two random variables X and Y are independent iff P(X,Y) = P(X) ¢P(Y) • Example: • X is a coin with P(X=heads) = 0.4 • Y is a coin with P(Y=heads) = 0.8 • Joint distribution: (c) 2003 Thomas G. Dietterich

  14. Belief Networks (2)Conditional Independence • Defn: Two random variables X and Y are conditionally independent given Z iff P(X,Y | Z) = P(X|Z) ¢P(Y|Z) • Example: P(S,M | G) = P(S | G) ¢P(M | G) Intuition: G independently causes S and M (c) 2003 Thomas G. Dietterich

  15. Operations on Probability Tables (3):Conformal Product • Allocate space for resulting table and then fill in each cell with the product of the corresponding cells: P(S,M | G) = P(S | G) ¢P(M | G) ¢ = (c) 2003 Thomas G. Dietterich

  16. Properties of Conformal Products • Commutative • Associative • Work on normalized or unnormalized tables • Work on joint or conditional tables (c) 2003 Thomas G. Dietterich

  17. Conditional Independence Allows Us to Simplify the Joint Distribution P(G,M,S) = P(M,S | G) ¢P(G) [chain rule] = P(M | G) ¢P(S | G) ¢P(G) [CI] (c) 2003 Thomas G. Dietterich

  18. Bayesian Networks Gas • One node for each random variable • Each node stores a probability distribution P(node | parents(node)) • Only direct dependencies are shown • Joint distribution is conformal product of node distributions: P(G,M,S) = P(G) ¢P(M | G) ¢P(S | G) Meter Starts

  19. Inference in Bayesian Networks • Suppose we observe that M=full. What is the probability that the car will start? • P(S=yes | M=full) • Before, we handled this by the following steps: • Remove all rows corresponding to M=empty • Normalize remaining rows to get P(S,G|M=full) • Sum over G: GP(S,G|M=full) = P(S | M=full) • Read answer from the S=yes entry in the table • We want to get the same result, but without constructing the joint distribution first. (c) 2003 Thomas G. Dietterich

  20. Inference in Bayesian Networks (2) • Remove all rows corresponding to M=empty from all nodes • P(G) – unchanged • P(M | G) becomes P[G] • P(S | G) – unchanged • Sum over G: GP(G) ¢P[G] ¢P(S | G) • Normalize to get P(S|M=full) • Read answer from the S=yes entry in the table (c) 2003 Thomas G. Dietterich

  21. Inference with Tables G (c) 2003 Thomas G. Dietterich

  22. Inference with Tables G Step 1: Delete M=empty rows from all tables (c) 2003 Thomas G. Dietterich

  23. Inference with Tables G Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) (c) 2003 Thomas G. Dietterich

  24. Inference with Tables G Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product (c) 2003 Thomas G. Dietterich

  25. Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G (c) 2003 Thomas G. Dietterich

  26. Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G Step 5: Normalize (c) 2003 Thomas G. Dietterich

  27. Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G Step 5: Normalize Step 6: Read answer from table: 0.6469 (c) 2003 Thomas G. Dietterich

  28. Notes • We never created the joint distribution • Deleting the M=empty rows from the individual table followed by conformal product has the same effect as performing the conformal product first and then deleting the M=empty rows • Normalization can be postponed to the end (c) 2003 Thomas G. Dietterich

  29. Cold Allergy Cat Sneeze Scratch Another Example: “Asia”(all variables Boolean) • Suppose we observe Sneeze • What is P(Cold | Sneeze) = P(Co|S)? (c) 2003 Thomas G. Dietterich

  30. Answering the query • Joint distribution: A,Ca,ScP(Co) ¢P(A) ¢P(Sn | Co,A) ¢P(Ca) ¢P(A | Ca) ¢P(Sc | Ca) • Apply evidence: sn (Sneeze = true) A,Ca,ScP(Co) ¢P(A) ¢P[Co,A]¢P(Ca) ¢P(A | Ca) ¢P(Sc | Ca) • Push summations in as far as possible: • P(Co) ¢ AP(A) ¢P[Co,A] ¢CaP(A | Ca) ¢P(Ca) ¢ScP(Sc | Ca) • Evaluate: • P(Co) ¢ AP(A) ¢P[Co,A] ¢CaP(A | Ca) ¢P(Ca) ¢P[Ca] P(Co) ¢ AP(A) ¢P[Co,A] ¢P[A] P(Co) ¢ P[Co] P[Co] • Normalize and extract answer (c) 2003 Thomas G. Dietterich

  31. Cold Allergy Cat Sneeze Scratch Pruning Leaves • Leaf nodes not involved in the evidence or the query can be pruned. • Example: Scratch Query Evidence (c) 2003 Thomas G. Dietterich

  32. Greedy algorithm for choosing the elimination order nodes = set of tables (after evidence) V = variables to sum over while |nodes| > 1 do Generate all pairs of tables in nodes that share at least one variable Compute size of table that would result from conformal product of each pair (summing over as many variables in V as possible) Let (T1,T2) be the pair with smallest resulting size Delete T1 and T2 from nodes Add conformal product V T1¢T2 to nodes end (c) 2003 Thomas G. Dietterich

  33. Example of Greedy Algorithm • Given tables P(Co), P[Co,A], P(A|Ca), P(Ca) • Variables to sum: A, Ca • Choose: P[A] = CaP(A|Ca) ¢P(Ca) (c) 2003 Thomas G. Dietterich

  34. Example of Greedy Algorithm (2) • Given tables P(Co), P[Co,A], P[A] • Variables to sum: A • Choose: P[Co] = AP[Co,A] ¢P[A] (c) 2003 Thomas G. Dietterich

  35. Example of Greedy Algorithm (3) • Given tables P(Co), P[Co] • Variables to sum: none • Choose: P2[Co] = P(Co) ¢P[Co] • Normalize and extract answer (c) 2003 Thomas G. Dietterich

  36. P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 … B1,1 B2,4 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1 B1,2 B2,1 Bayesian Network For WUMPUS • P(P1,1,P1,2, …, P4,4, B1,1, B1,2, …, B4,4) (c) 2003 Thomas G. Dietterich

  37. Probabilistic Inference in WUMPUS • Suppose we have observed • No breeze in 1,1 • Breeze in 1,2 and 2,1 • No pit in 1,1, 1,2, and 1,3 • What is the probability of a pit in 1,3? P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1) (c) 2003 Thomas G. Dietterich

  38. P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 … B1,1 B2,4 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1 B1,2 B2,1 What isP(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)? false false query false false true true (c) 2003 Thomas G. Dietterich

  39. P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 … B1,1 B2,4 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1 B1,2 B2,1 Prune Leaves Not Involved in Query or Evidence false false query false false true true (c) 2003 Thomas G. Dietterich

  40. Prune Independent Nodes false false query false … P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 B1,1 B1,2 B2,1 false true true (c) 2003 Thomas G. Dietterich

  41. Solve Remaining Network false false query false P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 B1,1 B1,2 B2,1 P2,2,P3,1P(B1,1|P1,1,P1,2,P2,1) ¢P(B1,2|P1,1,P1,2,P1,3) ¢P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢P(P1,1) ¢P(P1,2) ¢P(P2,1) ¢P(P2,2) ¢P(P1,3) ¢P(P3,1) false true true (c) 2003 Thomas G. Dietterich

  42. Performing the Inference NORM{ P2,2,P3,1P(B1,1|P1,1,P1,2,P2,1) ¢P(B1,2|P1,1,P1,2,P1,3) ¢P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢P(P1,1) ¢P(P1,2) ¢P(P2,1) ¢P(P2,2) ¢P(P1,3) ¢P(P3,1) } NORM{ P2,2,P3,1P[P1,3] ¢ P[P2,2,P3,1] ¢ P(P2,2) ¢ P(P1,3) ¢ P(P3,1) } NORM{ P[P1,3] ¢ P(P1,3) ¢ P2,2 P(P2,2) ¢ P3,1P[P2,2,P3,1] ¢P(P3,1) } P(P1,3) = h0.69, 0.31i 31% chance of WUMPUS! We have reduced the inference to a simple computation over 2x2 tables. (c) 2003 Thomas G. Dietterich

  43. Summary • The Joint Distribution is analogous to the truth table for propositional logic. It exponentially large, but any query can be answered using it • Conditional independence allows us to factor the joint distribution using conformal products • Conditional independence relationships are conveniently visualized and encoded in a belief network DAG • Given evidence, we can reason efficiently by algebraic manipulation of the factored representation (c) 2003 Thomas G. Dietterich

More Related