Probabilistic Representation and Reasoning

Probabilistic Representation and Reasoning • Given a set of facts/beliefs/rules/evidence • Evaluate a given statement • Determine the probability of a statement • Find a statement that optimizes a set of constraints • Most probable explanation (MPE) (Setting of hidden variables that best explains observations.) (c) 2003 Thomas G. Dietterich

Probability Theory • Random Variables • Boolean: W1,2 (just like propositional logic). Two possible values {true, false} • Discrete: Weather 2 {sunny, cloudy, rainy, snow} • Continuous: Temperature 2< • Propositions • W1,2 = true, Weather = sunny, Temperature = 65 • These can be combined as in propositional logic (c) 2003 Thomas G. Dietterich

Example • Consider a car described by 3 random variables: • Gas 2 {true, false}: There is gas in the tank • Meter 2 {empty,full}: The gas gauge shows the tank is empty or full • Starts 2 {yes,no}: The car starts when you turn the key in the ignition (c) 2003 Thomas G. Dietterich

Joint Probability Distribution • Each row is called a “primitive event” • Rows are mutually exclusive and exhaustive • Corresponds to an “8-sided coin” with the indicated probabilities (c) 2003 Thomas G. Dietterich

Any Query Can Be Answered from the Joint Distribution • P(Gas = false Æ Meter = full Æ Starts = yes) = 0.0006 • P(Gas = false) = 0.2, this is the sum of all cells where Gas = false • In general: To compute P(Q), for any proposition Q, add up the probability in all cells where Q is true (c) 2003 Thomas G. Dietterich

Notations • P(G,M,S) denotes the entire joint distribution (In the book, P is boldface). It is a table or function that maps from G, M, and S to a probability. • P(true,empty,no) denotes a single probability value: P(Gas=true Æ Meter=empty Æ Starts=no) (c) 2003 Thomas G. Dietterich

Operations on Probability Tables (1) • Marginalization (“summing away”) • M,SP(G,M,S) = P(G) • P(G) is called a “marginal probability” distribution. It consists of two probabilities: (c) 2003 Thomas G. Dietterich

Conditional Probability • Select cells that match the condition (M=full) • Delete remaining cells and M column • Renormalize the table to obtain P(S,G|M=full) • Sum away Gas: GP(S,G | M=full) = P(S|M=full) • Read answer from P(S=yes | M=full) cell (c) 2003 Thomas G. Dietterich

Operations on Probability Tables (2):Conditionalizing • Construct P(G,S | M) by normalizing the subtable corresponding to M=full and normalizing the subtable corresponding to M=empty (c) 2003 Thomas G. Dietterich

Chain Rule (2) • Holds for distributions too: P(A,B,C) = P(A | B,C) ¢P(B | C) ¢P(C) This means that for each setting of A,B, and C, we can substitute into the equation, and it is true. (c) 2003 Thomas G. Dietterich

Belief Networks (1):Independence • Defn: Two random variables X and Y are independent iff P(X,Y) = P(X) ¢P(Y) • Example: • X is a coin with P(X=heads) = 0.4 • Y is a coin with P(Y=heads) = 0.8 • Joint distribution: (c) 2003 Thomas G. Dietterich

Belief Networks (2)Conditional Independence • Defn: Two random variables X and Y are conditionally independent given Z iff P(X,Y | Z) = P(X|Z) ¢P(Y|Z) • Example: P(S,M | G) = P(S | G) ¢P(M | G) Intuition: G independently causes S and M (c) 2003 Thomas G. Dietterich

Operations on Probability Tables (3):Conformal Product • Allocate space for resulting table and then fill in each cell with the product of the corresponding cells: P(S,M | G) = P(S | G) ¢P(M | G) ¢ = (c) 2003 Thomas G. Dietterich

Bayesian Networks Gas • One node for each random variable • Each node stores a probability distribution P(node | parents(node)) • Only direct dependencies are shown • Joint distribution is conformal product of node distributions: P(G,M,S) = P(G) ¢P(M | G) ¢P(S | G) Meter Starts

Inference in Bayesian Networks • Suppose we observe that M=full. What is the probability that the car will start? • P(S=yes | M=full) • Before, we handled this by the following steps: • Remove all rows corresponding to M=empty • Normalize remaining rows to get P(S,G|M=full) • Sum over G: GP(S,G|M=full) = P(S | M=full) • Read answer from the S=yes entry in the table • We want to get the same result, but without constructing the joint distribution first. (c) 2003 Thomas G. Dietterich

Inference in Bayesian Networks (2) • Remove all rows corresponding to M=empty from all nodes • P(G) – unchanged • P(M | G) becomes P[G] • P(S | G) – unchanged • Sum over G: GP(G) ¢P[G] ¢P(S | G) • Normalize to get P(S|M=full) • Read answer from the S=yes entry in the table (c) 2003 Thomas G. Dietterich

Inference with Tables G Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product (c) 2003 Thomas G. Dietterich

Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G (c) 2003 Thomas G. Dietterich

Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G Step 5: Normalize (c) 2003 Thomas G. Dietterich

Inference with Tables Step 1: Delete M=empty rows from all tables Step 2: Perform algebra to push summation inwards (no-op in this case) Step 3: Form conformal product Step 4: Sum away G Step 5: Normalize Step 6: Read answer from table: 0.6469 (c) 2003 Thomas G. Dietterich

Notes • We never created the joint distribution • Deleting the M=empty rows from the individual table followed by conformal product has the same effect as performing the conformal product first and then deleting the M=empty rows • Normalization can be postponed to the end (c) 2003 Thomas G. Dietterich

Answering the query • Joint distribution: A,Ca,ScP(Co) ¢P(A) ¢P(Sn | Co,A) ¢P(Ca) ¢P(A | Ca) ¢P(Sc | Ca) • Apply evidence: sn (Sneeze = true) A,Ca,ScP(Co) ¢P(A) ¢P[Co,A]¢P(Ca) ¢P(A | Ca) ¢P(Sc | Ca) • Push summations in as far as possible: • P(Co) ¢ AP(A) ¢P[Co,A] ¢CaP(A | Ca) ¢P(Ca) ¢ScP(Sc | Ca) • Evaluate: • P(Co) ¢ AP(A) ¢P[Co,A] ¢CaP(A | Ca) ¢P(Ca) ¢P[Ca] P(Co) ¢ AP(A) ¢P[Co,A] ¢P[A] P(Co) ¢ P[Co] P[Co] • Normalize and extract answer (c) 2003 Thomas G. Dietterich

Greedy algorithm for choosing the elimination order nodes = set of tables (after evidence) V = variables to sum over while |nodes| > 1 do Generate all pairs of tables in nodes that share at least one variable Compute size of table that would result from conformal product of each pair (summing over as many variables in V as possible) Let (T1,T2) be the pair with smallest resulting size Delete T1 and T2 from nodes Add conformal product V T1¢T2 to nodes end (c) 2003 Thomas G. Dietterich

Probabilistic Inference in WUMPUS • Suppose we have observed • No breeze in 1,1 • Breeze in 1,2 and 2,1 • No pit in 1,1, 1,2, and 1,3 • What is the probability of a pit in 1,3? P(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1) (c) 2003 Thomas G. Dietterich

… P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 … B1,1 B2,4 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1 B1,2 B2,1 What isP(P1,3|:B1,1,B1,2,B2,1, :P1,1,:P1,2,:P2,1)? false false query false false true true (c) 2003 Thomas G. Dietterich

… P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 P1,4 … B1,1 B2,4 B1,3 B2,2 B3,1 B1,4 B2,3 B4,1 B1,2 B2,1 Prune Leaves Not Involved in Query or Evidence false false query false false true true (c) 2003 Thomas G. Dietterich

Solve Remaining Network false false query false P1,1 P1,2 P2,1 P2,2 P1,3 P3,1 B1,1 B1,2 B2,1 P2,2,P3,1P(B1,1|P1,1,P1,2,P2,1) ¢P(B1,2|P1,1,P1,2,P1,3) ¢P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢P(P1,1) ¢P(P1,2) ¢P(P2,1) ¢P(P2,2) ¢P(P1,3) ¢P(P3,1) false true true (c) 2003 Thomas G. Dietterich

Performing the Inference NORM{ P2,2,P3,1P(B1,1|P1,1,P1,2,P2,1) ¢P(B1,2|P1,1,P1,2,P1,3) ¢P(B2,1|P1,1,P2,1,P2,2,P3,1) ¢P(P1,1) ¢P(P1,2) ¢P(P2,1) ¢P(P2,2) ¢P(P1,3) ¢P(P3,1) } NORM{ P2,2,P3,1P[P1,3] ¢ P[P2,2,P3,1] ¢ P(P2,2) ¢ P(P1,3) ¢ P(P3,1) } NORM{ P[P1,3] ¢ P(P1,3) ¢ P2,2 P(P2,2) ¢ P3,1P[P2,2,P3,1] ¢P(P3,1) } P(P1,3) = h0.69, 0.31i 31% chance of WUMPUS! We have reduced the inference to a simple computation over 2x2 tables. (c) 2003 Thomas G. Dietterich

Summary • The Joint Distribution is analogous to the truth table for propositional logic. It exponentially large, but any query can be answered using it • Conditional independence allows us to factor the joint distribution using conformal products • Conditional independence relationships are conveniently visualized and encoded in a belief network DAG • Given evidence, we can reason efficiently by algebraic manipulation of the factored representation (c) 2003 Thomas G. Dietterich

Probabilistic Representation and Reasoning

Probabilistic Representation and Reasoning

Presentation Transcript

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Probabilistic Reasoning

Knowledge, Representation, and Reasoning

Probabilistic Reasoning

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Formal Representation and Reasoning

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Probabilistic Reasoning

Probabilistic Reasoning

Probabilistic Reasoning