cooperating intelligent systems l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Cooperating Intelligent Systems PowerPoint Presentation
Download Presentation
Cooperating Intelligent Systems

Loading in 2 Seconds...

play fullscreen
1 / 64

Cooperating Intelligent Systems - PowerPoint PPT Presentation


  • 501 Views
  • Uploaded on

Cooperating Intelligent Systems. Bayesian networks Chapter 14, AIMA. Inference. Inference in the statistical setting means computing probabilities for different outcomes to be true given the information

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Cooperating Intelligent Systems' - KeelyKia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cooperating intelligent systems

Cooperating Intelligent Systems

Bayesian networks

Chapter 14, AIMA

inference
Inference
  • Inference in the statistical setting means computing probabilities for different outcomes to be true given the information
  • We need an efficient method for doing this, which is more powerful than the naïve Bayes model.
bayesian networks
Bayesian networks

A Bayesian network is a directed graph in which each node is annotated with quantitative probability information:

  • A set of random variables, {X1,X2,X3,...}, makes up the nodes of the network.
  • A set of directed links connect pairs of nodes, parent  child
  • Each node Xi has a conditional probability distribution P(Xi | Parents(Xi)) .
  • The graph is a directed acyclic graph (DAG).
the dentist network
The dentist network

Weather

Cavity

Toothache

Catch

the alarm network
The alarm network

Burglar alarm responds to both earthquakes and burglars.

Two neighbors: John and Mary,who have promised to call youwhen the alarm goes off.

John always calls when there’san alarm, and sometimes whenthere’s not an alarm.

Mary sometimes misses the alarms (she likes loud music).

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

the cancer network

Age

Gender

Toxics

Smoking

Cancer

GeneticDamage

LungTumour

SerumCalcium

The cancer network

From Breese and Coller 1997

the cancer network7
The cancer network

P(A,G) = P(A)P(G)

Age

Gender

Toxics

Smoking

P(C|S,T,A,G) = P(C|S,T)

Cancer

GeneticDamage

LungTumour

SerumCalcium

P(A,G,T,S,C,SC,LT,GD) = P(A)P(G)P(T|A)P(S|A,G)P(C|T,S)P(GD)P(SC|C)P(LT|C,GD)

P(SC,C,LT,GD) = P(SC|C)P(LT|C,GD)P(C) P(GD)

From Breese and Coller 1997

the product chain rule
The product (chain) rule

(This is for Bayesian networks, the general case comeslater in this lecture)

bayes network node is a function

0.7

1.9

Bayes network node is a function

A

B

b

¬a

C

P(C|¬a,b) = U[0.7,1.9]

bayes network node is a function10
Bayes network node is a function

A

B

C

  • A BN node is a conditionaldistribution function
  • Inputs = Parent values
  • Output = distribution over values
  • Any type of function from valuesto distributions.
example the alarm network

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

Note: Each number in the tables represents aboolean distribution.

Hence there is adistribution output forevery input.

example the alarm network12

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

Probability distribution for”no earthquake, no burglary,but alarm, and both Mary andJohn make the call”

meaning of bayesian network

The general chain rule (always true):

The Bayesian network chain rule:

Meaning of Bayesian network

The BN is a correct representation of the domain iff each node is conditionally independent of its predecessors, given its parents.

the alarm network14

Burglary

Burglary

Earthquake

Earthquake

Alarm

Alarm

JohnCalls

JohnCalls

MaryCalls

MaryCalls

The alarm network

The fully correct alarm network might look something like the figure.

The Bayesian network (red) assumes that some of the variables are independent (or that the dependecies can be neglected since they are very weak).

The correctness of the Bayesian network of course depends on the validity of these assumptions.

It is this sparse connection structure that makes the BN approachfeasible (~linear growth in complexity rather than exponential)

how construct a bn
How construct a BN?
  • Add nodes in causal order (”causal” determined from expertize).
  • Determine conditional independence using either (or all) of the following semantics:
    • Blocking/d-separation rule
    • Non-descendant rule
    • Markov blanket rule
    • Experience/your beliefs
path blocking d separation

Cancer

GeneticDamage

LungTumour

SerumCalcium

Path blocking & d-separation

Intuitively, knowledge about Serum Calcium influences our belief about Cancer, if we don’t know the value of Cancer, which in turn influences our belief about Lung Tumour, etc.

However, if we are given the value of Cancer (i.e. C= true or false), then knowledge of Serum Calcium will not tell us anything about Lung Tumour that we don’t already know.

We say that Cancer d-separates (direction-dependent separation) Serum Calcium and Lung Tumour.

some definitions of bn from wikipedia
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if its joint probability density function can be written as a product of the individual density functions, conditional on their parent variables:

X = {X1, X2, ..., XN} is a set of random variables

G = (V,E) is a directed acyclic graph (DAG) of vertices (V) and edges (E)

some definitions of bn from wikipedia18
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if it satisfies the local Markov property; each variable is conditionally independent of its non-descendants given its parent variables:

Note:

X = {X1, X2, ..., XN} is a set of random variables

G = (V,E) is a directed acyclic graph (DAG) of vertices (V) and edges (E)

some definitions of bn from wikipedia20
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if every node is conditionally independent of all other nodes in the network, given its Markov blanket. The Markov blanket of a node is its parents, children and children's parents.

X = {X1, X2, ..., XN} is a set of random variables

G = (V,E) is a directed acyclic graph (DAG) of vertices (V) and edges (E)

markov blanket
A node is conditionally independent of all other nodes in the network, given its parents, children, and children’s parents

These constitute the node’s Markov blanket.

Markov blanket

X2

X3

X1

X4

X5

X6

Xk

slide22

W

Xi

Xk2

Xk2

Xk1

Xk1

Xk3

Xk3

Xj

Path blocking & d-separation

XiandXjare d-separatedif all paths betweeen them are blocked

Two nodes Xi and Xj are conditionally independent given a set W= {X1,X2,X3,...} of nodes if for every undirected path in the BN between Xi and Xj there is some node Xk on the path having one of the following three properties:

  • Xk∈W, and both arcs on the path lead out of Xk.
  • Xk∈W, and one arc on the path leads into Xk and one arc leads out.
  • Neither Xk nor any descendant of Xk is in W, and both arcs on the path lead into Xk.
  • Xkblocks the path between Xi and Xj
some definitions of bn from wikipedia23
Some definitions of BN(from Wikipedia)
  • X is a Bayesian network with respect to G if, for any two nodes i, j:

The d-separating set(i,j) is the set of nodes that d-separate node i and j.

The Markov blanket of node i is the minimal set of nodes that d-separates node i from all other nodes.

X = {X1, X2, ..., XN} is a set of random variables

G = (V,E) is a directed acyclic graph (DAG) of vertices (V) and edges (E)

causal networks

A

B

C

A

B

C

Causal networks
  • Bayesian networks are usually used to represent causal relationships. This is, however, not strictly necessary: a directed edge from node i to node j does not require that Xi is causally dependent on Xj.
    • This is demonstrated by the fact that Bayesian networks on the two graphs:are equivalent. They impose the same conditional independence requirements.

A causal network is a Bayesian network with an explicit requirement that the relationships be causal.

causal networks25

A

B

C

A

B

C

Causal networks

The equivalence is proved with Bayes theorem...

exercise 14 12 a in aima
Exercise 14.12* (a) in AIMA
  • Two astronomers in different parts of the world make measurements M1 and M2 of the number of stars N in some small region of the sky, using their telescopes. Normally there is a small possibility e of error up to one star in each direction. Each telescope can also (with a much smaller probability f) be badly out of focus (events F1 and F2) in which case the scientist will undercount by three or more stars (or, if N is less than 3, fail to detect any stars at all). Consider the three networks in Figure 14.22*.
    • (a) Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceeding information?

*In the 2:nd edition is this exercise 14.3 and the figure is 14.19.

how construct an efficient bn
How construct an efficient BN?
  • Add nodes in causal order (”causal” determined from expertize).
  • Determine conditional independence using either (or all) of the following semantics:
    • Blocking/d-separation rule
    • Non-descendant rule
    • Markov blanket rule
    • Experience/your beliefs
exercise 14 12 a in aima28
Exercise 14.12 (a) in AIMA
  • Two astronomers in different parts of the world make measurements M1 and M2 of the number of stars N in some small region of the sky, using their telescopes. Normally there is a small possibility e of error up to one star in each direction. Each telescope can also (with a much smaller probability f) be badly out of focus (events F1 and F2) in which case the scientist will undercount by three or more stars (or, if N is less than 3, fail to detect any stars at all). Consider the three networks in Figure 14.22.
    • (a) Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceeding information?

F1

F2

N

M2

M1

exercise 14 12 a in aima30
Exercise 14.12 (a) in AIMA
  • Two astronomers in different parts of the world make measurements M1 and M2 of the number of stars N in some small region of the sky, using their telescopes. Normally there is a small possibility e of error up to one star in each direction. Each telescope can also (with a much smaller probability f) be badly out of focus (events F1 and F2) in which case the scientist will undercount by three or more stars (or, if N is less than 3, fail to detect any stars at all). Consider the three networks in Figure 14.22.
    • (a) Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceeding information?
exercise 14 12 a in aima31
Exercise 14.12 (a) in AIMA
  • (i) must be incorrect – N is d-separated from F1 and F2, i.e. knowing the focus states F does not affect N if we know M. This cannot be correct.

wrong

exercise 14 12 a in aima32
Exercise 14.12 (a) in AIMA
  • Two astronomers in different parts of the world make measurements M1 and M2 of the number of stars N in some small region of the sky, using their telescopes. Normally there is a small possibility e of error up to one star in each direction. Each telescope can also (with a much smaller probability f) be badly out of focus (events F1 and F2) in which case the scientist will undercount by three or more stars (or, if N is less than 3, fail to detect any stars at all). Consider the three networks in Figure 14.22.
    • (a) Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceeding information?

wrong

exercise 14 12 a in aima33
Exercise 14.12 (a) in AIMA
  • (ii) is correct – it describes the causal relationships. It is a causal network.

wrong

ok

exercise 14 12 a in aima34
Exercise 14.12 (a) in AIMA
  • Two astronomers in different parts of the world make measurements M1 and M2 of the number of stars N in some small region of the sky, using their telescopes. Normally there is a small possibility e of error up to one star in each direction. Each telescope can also (with a much smaller probability f) be badly out of focus (events F1 and F2) in which case the scientist will undercount by three or more stars (or, if N is less than 3, fail to detect any stars at all). Consider the three networks in Figure 14.22.
    • (a) Which of these Bayesian networks are correct (but not necessarily efficient) representations of the preceeding information?

wrong

ok

exercise 14 12 a in aima35
Exercise 14.12 (a) in AIMA
  • (iii) is also ok – a fully connected graph would be correct (but not efficient). (iii) has all connections except Mi–Fj and Fi–Fj. (iii) is not causal and not efficient.

wrong

ok

ok – but not good

exercise 14 12 a in aima36
Exercise 14.12 (a) in AIMA

(ii) says:

(iii) says:

wrong

ok

ok – but not good

exercise 14 12 a in aima37
Exercise 14.12 (a) in AIMA

(ii) says:

(iii) says:

The full correct expression (one version) is:

ok

exercise 14 12 a in aima38
Exercise 14.12 (a) in AIMA

(ii) says:

This is not as efficient as (ii). This requires more conditionalprobabilities.

(iii) says:

The full correct expression (another version) is:

ok

exercise 14 12 a in aima39
Exercise 14.12 (a) in AIMA

(i) says:

wrong

ok

ok – but not good

exercise 14 12 a in aima40
Exercise 14.12 (a) in AIMA

(i) says:

The full correct expression (a third version) is:

exercise 14 12 a in aima41
Exercise 14.12 (a) in AIMA

(i) says:

This is an unreasonable approximation. The rest is ok.

The full correct expression (a third version) is:

efficient representation of pds
Boolean  Boolean

Boolean  Discrete

Boolean  Continuous

Discrete  Boolean

Discrete  Discrete

Discrete  Continuous

Continuous  Boolean

Continuous  Discrete

Continuous  Continuous

Efficient representation of PDs

A

C

P(C|a,b) ?

B

efficient representation of pds43
Efficient representation of PDs

Boolean  Boolean:Noisy-OR, Noisy-AND

Boolean/Discrete  Discrete:Noisy-MAX

Bool./Discr./Cont.  Continuous:Parametric distribution (e.g. Gaussian)

Continuous  Boolean:Logit/Probit

noisy or example boolean boolean
Noisy-OR exampleBoolean → Boolean

The effect (E) is off (false) when none of the causes are true. The probability for the effect increases with the number of true causes.

(for this example)

Example from L.E. Sucar

noisy or general case boolean boolean

Cn

C2

C1

qn

q1

q2

PROD

P(E|C1,...)

Noisy-OR general caseBoolean → Boolean

Example on previous slide usedqi = 0.1 for all i.

Needs only n parameters, not 2n parameters.

Image adapted from Laskey & Mahoney 1999

noisy or example ii
Noisy-OR example (II)
  • Fever is True if and only if Cold, Flu or Malaria is True.
  • each cause has an independent chance of causing the effect.
    • all possible causes are listed
    • inhibitors are independent

Cold

Flu

Malaria

q1

q2

q3

Fever

noisy or example ii47
Noisy-OR example (II)
  • P(Fever | Cold) = 0.4 ⇒ q1 = 0.6
  • P(Fever | Flu) = 0.8 ⇒ q2 = 0.2
  • P(Fever | Malaria) = 0.9 ⇒ q3 = 0.1

Cold

Flu

Malaria

q1 = 0.6

q2 = 0.2

q3 = 0.1

Fever

noisy or example ii48
Noisy-OR example (II)
  • P(Fever | Cold) = 0.4 ⇒ q1 = 0.6
  • P(Fever | Flu) = 0.8 ⇒ q2 = 0.2
  • P(Fever | Malaria) = 0.9 ⇒ q3 = 0.1

Cold

Flu

Malaria

q1 = 0.6

q2 = 0.2

q3 = 0.1

Fever

noisy or example ii49
Noisy-OR example (II)
  • P(Fever | Cold) = 0.4 ⇒ q1 = 0.6
  • P(Fever | Flu) = 0.8 ⇒ q2 = 0.2
  • P(Fever | Malaria) = 0.9 ⇒ q3 = 0.1

Cold

Flu

Malaria

q1 = 0.6

q2 = 0.2

q3 = 0.1

Fever

noisy or example ii50
Noisy-OR example (II)
  • P(Fever | Cold) = 0.4 ⇒ q1 = 0.6
  • P(Fever | Flu) = 0.8 ⇒ q2 = 0.2
  • P(Fever | Malaria) = 0.9 ⇒ q3 = 0.1

Cold

Flu

Malaria

q1 = 0.6

q2 = 0.2

q3 = 0.1

Fever

parametric probability densities boolean discr continuous continuous
Parametric probability densitiesBoolean/Discr./Continuous → Continuous

Use parametric probability densities, e.g., the normal distribution

Gaussian networks (a = input to the node)

probit logit discrete boolean
Probit & LogitDiscrete → Boolean

If the input is continuous but output is boolean, use probit or logit

P(A|x)

x

the cancer network53
Age: {1-10, 11-20,...}

Gender: {M, F}

Toxics: {Low, Medium, High}

Smoking: {No, Light, Heavy}

Cancer: {No, Benign, Malignant}

Serum Calcium: Level

Lung Tumour: {Yes, No}

Age

Gender

Toxics

Smoking

Cancer

LungTumour

SerumCalcium

The cancer network

Discrete

Discrete/boolean

Discrete

Discrete

Discrete

Continuous

Discrete/boolean

inference in bn
Inference in BN

Inference means computing P(X|e), where X is a query (variable) and e is a set of evidence variables (for which we know the values).

Examples:

P(Burglary | john_calls, mary_calls)

P(Cancer | age, gender, smoking, serum_calcium)

P(Cavity | toothache, catch)

exact inference in bn
Exact inference in BN

”Doable” for boolean variables: Look up entries in conditional probability tables (CPTs).

slide56

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

What is the probability for a burglary if both John and Mary call?

Evidence variables = {J,M}

Query variable = B

slide57

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

What is the probability for a burglary if both John and Mary call?

= 0.001 = 10-3

slide58

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

What is the probability for a burglary if both John and Mary call?

slide59

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

What is the probability for a burglary if both John and Mary call?

slide60

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Example: The alarm network

What is the probability for a burglary if both John and Mary call?

Answer: 28%

use depth first search
Use depth-first search

A lot of unneccesary repeated computation...

complexity of exact inference
Complexity of exact inference
  • By eliminating repeated calculation & uninteresting paths we can speed up the inference a lot.
  • Linear time complexity for singly connected networks (polytrees).
  • Exponential for multiply connected networks.
    • Clustering can improve this
approximate inference in bn
Approximate inference in BN
  • Exact inference is intractable in large multiply connected BNs ⇒use approximate inference: Monte Carlo methods (random sampling).
    • Direct sampling
    • Rejection sampling
    • Likelihood weighting
    • Markov chain Monte Carlo
markov chain monte carlo
Markov chain Monte Carlo
  • Fix the evidence variables (E1, E2, ...) at their given values.
  • Initialize the network with values for all other variables, including the query variable.
  • Repeat the following many, many, many times:
    • Pick a non-evidence variable at random (query Xi or hidden Yj)
    • Select a new value for this variable, conditioned on the current values in the variable’s Markov blanket.

Monitor the values of the query variables.