1 / 58

Bayes Nets and Probabilities

Bayes Nets and Probabilities. Oliver Schulte Machine Learning 726. Bayes Nets: General Points. Represent domain knowledge . Allow for uncertainty . Complete representation of probabilistic knowledge. Represent causal relations. Fast answers to types of queries:

treva
Download Presentation

Bayes Nets and Probabilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayes Nets and Probabilities Oliver Schulte Machine Learning 726

  2. Bayes Nets: General Points • Represent domain knowledge. • Allow for uncertainty. • Complete representation of probabilistic knowledge. • Represent causal relations. • Fast answers to types of queries: • Probabilistic: What is the probability that a patient has strep throat given that they have fever? • Relevance: Is fever relevant to having strep throat?

  3. Bayes Net Links • Judea Pearl's Turing Award • See UBC’s AISpace

  4. Probability Reasoning (With Bayes Nets)

  5. Random Variables • A random variable has a probability associated with each of its values. • A basic statement assigns a value to a random variable.

  6. Probability for Sentences • A sentence or query is formed by using “and”, “or”, “not” recursively with basic statements. • Sentences also have probabilities assigned to them.

  7. Probability Notation • Often probability theorists write A,B instead of A  B (like Prolog). • If the intended random variables are known, they are often not mentioned.

  8. Axioms of probability Sentences considered as sets of complete assignments For any sentence A, B • 0 ≤ P(A) ≤ 1 • P(true) = 1 and P(false) = 0 • P(A B) = P(A) + P(B) - P(AB) • P(A) = P(B) if A and B are logically equivalent.

  9. Rule 1: Logical Equivalence

  10. The Logical Equivalence Pattern Rule 1: Logically equivalent expressions have the same probability.

  11. Rule 2: Marginalization

  12. The Marginalization Pattern

  13. Prove the Pattern: Marginalization • Theorem. P(A) = P(A,B) + P(A, not B) • Proof. • A is logically equivalent to [A and B)  (A and not B)]. • P(A) = P([A and B)  (A and not B)]) =P(A and B) + P(A and not B) – P([A and B)  (A and not B)]). Disjunction Rule. • [A and B)  (A and not B)] is logically equivalent to false, so P([A and B)  (A and not B)]) =0. • So 2. implies P(A) = P(A and B) + P(A and not B).

  14. Completeness of Bayes Nets • A probabilistic query system is complete if it can compute a probability for every sentence. • Proposition: A Bayes net is complete. Proof has two steps. • Any system that encodes the joint distribution is complete. • A Bayes net encodes the joint distribution.

  15. The Joint Distribution

  16. Assigning Probabilities to Sentences • A complete assignment is a conjunctive sentence that assigns a value to each random variable. • The joint probability distribution specifies a probability for each complete assignment. • A joint distribution determines an probability for every sentence. • How? Spot the pattern.

  17. Probabilities for Sentences: Spot the Pattern

  18. Inference by enumeration

  19. Inference by enumeration • Marginalization: For any sentence A, sum the joint probabilities for the complete assignments where A is true. • P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2.

  20. Completeness Proof for Joint Distribution • Theorem [from propositional logic] Every sentence is logically equivalent to a disjunction of the formA1 or A2 or ... or Akwhere the Ai are complete assignments. • All of the Ai are mutually exclusive (joint probability 0). Why? • So if S is equivalent to A1 or A2 or ... or Ak, thenP(S) = ΣiP(Ai)where each Ai is given by the joint distribution.

  21. Bayes Nets and The Joint Distribution

  22. Example: Complete Bayesian Network

  23. The Story • You have a new burglar alarm installed at home. • It’s reliable at detecting burglary but also responds to earthquakes. • You have two neighbors that promise to call you at work when they hear the alarm. • John always calls when he hears the alarm, but sometimes confuses alarm with telephone ringing. • Mary listens to loud music and sometimes misses the alarm.

  24. Computing The Joint Distribution • A Bayes net provides a compact factored representation of a joint distribution. • In words, the joint probability is computed as follows. • For each node Xi: • Find the assigned value xi. • Find the values y1,..,yk assigned to the parents of Xi. • Look up the conditional probability P(xi|y1,..,yk) in the Bayes net. • Multiply together these conditional probabilities.

  25. Product Formula Example: Burglary • Query: What is the joint probability that all variables are true? • P(M, J, A, E, B) = P(M|A) p(J|A) p(A|E,B)P(E)P(B)= .7 x .9 x .95 x .002 x .001

  26. Compactness of Bayesian Networks • Consider n binary variables • Unconstrained joint distribution requires O(2n) probabilities • If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2k) probabilities • Example • Full unconstrained joint distribution • n = 30: need 230probabilities for full joint distribution • Bayesian network • n = 30, k = 4: need 480 probabilities

  27. Summary: Why are Bayes nets useful? - Graph structure supports - Modular representation of knowledge - Local, distributed algorithms for inference and learning - Intuitive (possibly causal) interpretation • - Factored representation may have exponentially fewer parameters than full joint P(X1,…,Xn) => • lower sample complexity (less data for learning) • lower time complexity (less time for inference)

  28. Is it Magic? • How can the Bayes net reduce parameters?By exploiting conditional independencies. • Why does the product formula work? • The Bayes net topological or graphical semantics. • The graph by itself entails conditional independencies. • The Chain Rule.

  29. Conditional Probabilities and Independence

  30. Conditional Probabilities: Intro • Given (A) that a die comes up with an odd number, what is the probability that (B) the number is • a 2 • a 3 • Answer: the number of cases that satisfy both A and B, out of the number of cases that satisfy A. • Examples: • #faces with (odd and 2)/#faces with odd= 0 / 3 = 0. • #faces with (odd and 3)/#faces with odd= 1 / 3.

  31. Conditional Probsctd. • Suppose that 50 students are taking 310 and 30 are women. Given (A) that a student is taking 310, what is the probability that (B) they are a woman? • Answer: #students who take 310 and are a woman/#students in 310 = 30/50 = 3/5. • Notation: P(A|B)

  32. Conditional Ratios: Spot the Pattern • Spot the Pattern

  33. Conditional Probs: The Ratio Pattern • Spot the Pattern P(A|B) = P(A and B)/ P(B) Important!

  34. Conditional Probabilities: Motivation • Much knowledge can be represented as implications B1,..,Bk =>A. • Conditional probabilities are a probabilistic version of reasoning about what follows from conditions. • Cognitive Science: Our minds store implicational knowledge.

  35. The Product Rule: Spot the Pattern

  36. Independence • A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) • Suppose that Weather is independent of the Cavity Scenario. Then the joint distribution decomposes: P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) • Absolute independence powerful but rare.

  37. Exercise • Prove that the three definitions of independence are equivalent (assuming all positive probabilities). • A and B are independent iff • P(A|B) = P(A) • or P(B|A) = P(B) • or P(A, B) = P(A) P(B)

  38. Conditional independence • If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) • The same independence holds if I haven't got a cavity:(2) P(catch | toothache,cavity) = P(catch | cavity) • Catch is conditionally independent of Toothache given Cavity:P(Catch | Toothache,Cavity) = P(Catch | Cavity) • The equivalences for independence also holds for conditional independence, e.g.: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

  39. Bayes Nets Graphical Semantics

  40. Common Causes: Spot the Pattern Cavity Catch toothache • Catch is independent of toothache given Cavity.

  41. Burglary Example • JohnCalls, MaryCallsare conditionally independent given Alarm.

  42. Spot the Pattern: Chain Scenario • MaryCalls is independent of Burglary given Alarm. • JohnCalls is independent of Earthquake given Alarm.

  43. The Markov Condition • A Bayes net is constructed so that:each variable is conditionally independent of its nondescendants given its parents. • The graph alone (without specified probabilities) entails conditional independencies. • Causal Interpretation: Each parent is a direct cause.

  44. Derivation of the Product Formula

  45. The Chain Rule • We can always write P(a, b, c, … z) = P(a | b, c, …. z) P(b, c, … z) (Product Rule) • Repeatedly applying this idea, we obtain P(a, b, c, … z) = P(a | b, c, …. z) P(b | c,.. z) P(c| .. z)..P(z) • Order the variables such that children come before parents. • Then given its parents, each node is independent of its other ancestors by the topological independence. • P(a,b,c, … z) = Πx. P(x|parents)

  46. Example in Burglary Network • P(M, J,A,E,B) = P(M| J,A,E,B)p(J,A,E,B)= P(M|A)p(J,A,E,B) = P(M|A) p(J|A,E,B)p(A,E,B) = P(M|A) p(J|A)p(A,E,B) = P(M|A) p(J|A) p(A|E,B) P(E,B) = P(M|A) p(J|A) p(A|E,B) P(E)P(B) • Colours show applications of the Bayes net topological independence.

  47. Explaining Away

  48. Common Effects: Spot the Pattern • Influenza and Smokes are independent. • Given Bronchitis, they become dependent. Influenza Smokes Bronchitis Battery Age Charging System OK • Battery Age and Charging System are independent. • Given Battery Voltage, they become dependent. Battery Voltage

  49. A B C Conditioning on Children • Independent Causes: • A and B are independent. • “Explaining away” effect: • Given C, observing A makes B less likely. • E.g. Bronchitis in UBC “Simple Diagnostic Problem”. • A and B are (marginally) independent, become dependent once C is known.

  50. D-separation • A, B, and C are non-intersecting subsets of nodes in a directed graph. • A path from A to B is blocked if it contains a node such that either • the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or • the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. • If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .

More Related