1 / 38

Cooperating Intelligent Systems

Cooperating Intelligent Systems. Bayesian networks Chapter 14, AIMA. Inference. Inference in the statistical setting means computing probabilities for different outcomes to be true given the information

orla-spence
Download Presentation

Cooperating Intelligent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cooperating Intelligent Systems Bayesian networks Chapter 14, AIMA

  2. Inference • Inference in the statistical setting means computing probabilities for different outcomes to be true given the information • We need an efficient method for doing this, which is more powerful than the naïve Bayes model.

  3. Bayesian networks A Bayesian network is a directed graph in which each node is annotated with quantitative probability information: • A set of random variables, {X1,X2,X3,...}, makes up the nodes of the network. • A set of directed links connect pairs of nodes, parent  child • Each node Xi has a conditional probability distribution P(Xi | Parents(Xi)) . • The graph is a directed acyclic graph (DAG).

  4. The dentist network Weather Cavity Toothache Catch

  5. The alarm network Burglar alarm responds to both earthquakes and burglars. Two neighbors: John and Mary,who have promised to call youwhen the alarm goes off. John always calls when there’san alarm, and sometimes whenthere’s not an alarm. Mary sometimes misses the alarms (she likes loud music). Burglary Earthquake Alarm JohnCalls MaryCalls

  6. Age Gender Toxics Smoking Cancer GeneticDamage LungTumour SerumCalcium The cancer network From Breese and Coller 1997

  7. The cancer network P(A,G) = P(A)P(G) Age Gender Toxics Smoking P(C|S,T,A,G) = P(C|S,T) Cancer GeneticDamage LungTumour SerumCalcium P(A,G,T,S,C,SC,LT,GD) = P(A)P(G)P(T|A)P(S|A,G)P(C|T,S)P(GD)P(SC|C)P(LT|C,GD) P(SC,C,LT,GD) = P(SC|C)P(LT|C,GD)P(C) P(GD) From Breese and Coller 1997

  8. The product (chain) rule (This is for Bayesian networks, the general case comeslater in this lecture)

  9. 0.7 1.9 Bayes network node is a function A B b ¬a C P(C|¬a,b) = U[0.7,1.9]

  10. Bayes network node is a function A B C • A BN node is a conditionaldistribution function • Inputs = Parent values • Output = distribution over values • Any type of function from valuesto distributions.

  11. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network Note: Each number in the tables represents aboolean distribution. Hence there is adistribution output forevery input.

  12. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network Probability distribution for”no earthquake, no burglary,but alarm, and both Mary andJohn make the call”

  13. The general chain rule (always true): The Bayesian network chain rule: Meaning of Bayesian network The BN is a correct representation of the domain iff each node is conditionally independent of its predecessors, given its parents.

  14. Burglary Burglary Earthquake Earthquake Alarm Alarm JohnCalls JohnCalls MaryCalls MaryCalls The alarm network The fully correct alarm network might look something like the figure. The Bayesian network (red) assumes that some of the variables are independent (or that the dependecies can be neglected since they are very weak). The correctness of the Bayesian network of course depends on the validity of these assumptions. It is this sparse connection structure that makes the BN approachfeasible (~linear growth in complexity rather than exponential)

  15. How construct a BN? • Add nodes in causal order (”causal” determined from expertize). • Determine conditional independence using either (or all) of the following semantics: • Blocking/d-separation rule • Non-descendant rule • Markov blanket rule • Experience/your beliefs

  16. Cancer GeneticDamage LungTumour SerumCalcium Path blocking & d-separation Intuitively, knowledge about Serum Calcium influences our belief about Cancer, if we don’t know the value of Cancer, which in turn influences our belief about Lung Tumour, etc. However, if we are given the value of Cancer (i.e. C= true or false), then knowledge of Serum Calcium will not tell us anything about Lung Tumour that we don’t already know. We say that Cancer d-separates (direction-dependent separation) Serum Calcium and Lung Tumour.

  17. W Xi Xk2 Xk2 Xk1 Xk1 Xk3 Xk3 Xj Path blocking & d-separation XiandXjare d-separatedif all paths betweeen them are blocked Two nodes Xi and Xj are conditionally independent given a set W= {X1,X2,X3,...} of nodes if for every undirected path in the BN between Xi and Xj there is some node Xk on the path having one of the following three properties: • Xk∈W, and both arcs on the path lead out of Xk. • Xk∈W, and one arc on the path leads into Xk and one arc leads out. • Neither Xk nor any descendant of Xk is in W, and both arcs on the path lead into Xk. Xkblocks the path between Xi and Xj

  18. A node is conditionally independent of its non-descendants (Zij), given its parents. Non-descendants

  19. A node is conditionally independent of all other nodes in the network, given its parents, children, and children’s parents These constitute the nodes Markov blanket. Markov blanket X2 X3 X1 X4 X5 X6 Xk

  20. Boolean  Boolean Boolean  Discrete Boolean  Continuous Discrete  Boolean Discrete  Discrete Discrete  Continuous Continuous  Boolean Continuous  Discrete Continuous  Continuous Efficient representation of PDs A C P(C|a,b) ? B

  21. Efficient representation of PDs Boolean  Boolean:Noisy-OR, Noisy-AND Boolean/Discrete  Discrete:Noisy-MAX Bool./Discr./Cont.  Continuous:Parametric distribution (e.g. Gaussian) Continuous  Boolean:Logit/Probit

  22. Noisy-OR exampleBoolean → Boolean The effect (E) is off (false) when none of the causes are true. The probability for the effect increases with the number of true causes. (for this example) Example from L.E. Sucar

  23. Cn C2 C1 qn q1 q2 PROD P(E|C1,...) Noisy-OR general caseBoolean → Boolean Example on previous slide usedqi = 0.1 for all i. Image adapted from Laskey & Mahoney 1999

  24. Cn C2 C1 en e1 e2 MAX Observed effect Noisy-MAXBoolean → Discrete Effect takes on the max value from different causes Restrictions: • Each cause must have an off state, which does not contribute to effect • Effect is off when all causes are off • Effect must have consecutive escalating values: e.g., absent, mild, moderate, severe. Image adapted from Laskey & Mahoney 1999

  25. Parametric probability densitiesBoolean/Discr./Continuous → Continuous Use parametric probability densities, e.g., the normal distribution Gaussian networks (a = input to the node)

  26. Probit & LogitDiscrete → Boolean If the input is continuous but output is boolean, use probit or logit P(A|x) x

  27. Age: {1-10, 11-20,...} Gender: {M, F} Toxics: {Low, Medium, High} Smoking: {No, Light, Heavy} Cancer: {No, Benign, Malignant} Serum Calcium: Level Lung Tumour: {Yes, No} Age Gender Toxics Smoking Cancer LungTumour SerumCalcium The cancer network Discrete Discrete/boolean Discrete Discrete Discrete Continuous Discrete/boolean

  28. Inference in BN Inference means computing P(X|e), where X is a query (variable) and e is a set of evidence variables (for which we know the values). Examples: P(Burglary | john_calls, mary_calls) P(Cancer | age, gender, smoking, serum_calcium) P(Cavity | toothache, catch)

  29. Exact inference in BN ”Doable” for boolean variables: Look up entries in conditional probability tables (CPTs).

  30. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network What is the probability for a burglary if both John and Mary call? Evidence variables = {J,M} Query variable = B

  31. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network What is the probability for a burglary if both John and Mary call? = 0.001 = 10-3

  32. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network What is the probability for a burglary if both John and Mary call?

  33. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network What is the probability for a burglary if both John and Mary call?

  34. Burglary Earthquake Alarm JohnCalls MaryCalls Example: The alarm network What is the probability for a burglary if both John and Mary call? Answer: 28%

  35. Use depth-first search A lot of unneccesary repeated computation...

  36. Complexity of exact inference • By eliminating repeated calculation & uninteresting paths we can speed up the inference a lot. • Linear time complexity for singly connected networks (polytrees). • Exponential for multiply connected networks. • Clustering can improve this

  37. Approximate inference in BN • Exact inference is intractable in large multiply connected BNs ⇒use approximate inference: Monte Carlo methods (random sampling). • Direct sampling • Rejection sampling • Likelihood weighting • Markov chain Monte Carlo

  38. Markov chain Monte Carlo • Fix the evidence variables (E1, E2, ...) at their given values. • Initialize the network with values for all other variables, including the query variable. • Repeat the following many, many, many times: • Pick a non-evidence variable at random (query Xi or hidden Yj) • Select a new value for this variable, conditioned on the current values in the variable’s Markov blanket. Monitor the values of the query variables.

More Related