Inference in Bayesian Networks

1 / 72

# Inference in Bayesian Networks - PowerPoint PPT Presentation

Inference in Bayesian Networks. Agenda. Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination Monte-Carlo methods. Some Applications of BN. Medical diagnosis Troubleshooting of hardware/software systems

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Inference in Bayesian Networks' - teague

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Inference in Bayesian Networks

Agenda
• Efficient inference in Bayesian Networks
• Top-down inference
• Variable elimination
• Monte-Carlo methods
Some Applications of BN
• Medical diagnosis
• Troubleshooting of hardware/software systems
• Fraud/uncollectible debt detection
• Data mining
• Analysis of genetic sequences
• Data interpretation, computer vision, image understanding

Burglary

Earthquake

causes

Alarm

effects

JohnCalls

MaryCalls

BN from Last Lecture

Intuitive meaning of arc from x to y: “x has direct influence on y”

Directed acyclic graph

Arcs do not necessarily encode causality!

A

C

B

B

C

A

2 BN’s that can encode the same joint probability distribution

• Given B, does the value of A affect the probability of C?
• P(C|B,A) = P(C|B)?
• No!
• C parent’s (B) are given, and so it is independent of its non-descendents (A)
• Independence is symmetric:C  A | B => A  C | B

A

B

C

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

What does the BN encode?

Burglary  Earthquake

JohnCallsMaryCalls | Alarm

JohnCalls Burglary | Alarm

JohnCalls Earthquake | Alarm

MaryCalls Burglary | Alarm

MaryCalls Earthquake | Alarm

A node is independent of its non-descendents, given its parents

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

• How about Burglary Earthquake | Alarm ?
• No! Why?

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

• How about Burglary  Earthquake | Alarm ?
• No! Why?
• P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075
• P(B|A)P(E|A) = 0.086

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

• How about Burglary  Earthquake | JohnCalls?
• No! Why?
• Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent
Independence relationships
• Rough intuition (this holds for tree-like graphs, polytrees):
• Evidence on the (directed) road between two variables makes them independent
• Evidence on an “A” node makes descendants independent
• Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent)
• Formal property in general case : D-separation  independence (see R&N)
Benefits of Sparse Models
• Modeling
• Fewer relationships need to be encoded (either through understanding or statistics)
• Large networks can be built up from smaller ones
• Intuition
• Dependencies/independencies between variables can be inferred through network structures
• Tractable inference

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Suppose we want to compute P(Alarm)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Suppose we want to compute P(Alarm)

P(Alarm) = Σb,eP(A,b,e)

P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference
• Suppose we want to compute P(Alarm)
• P(Alarm) = Σb,eP(A,b,e)
• P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
• P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference
• Suppose we want to compute P(Alarm)
• P(A) = Σb,eP(A,b,e)
• P(A) = Σb,e P(A|b,e)P(b)P(e)
• P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
• P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence

Suppose we want to compute P(Alarm|Earthquake)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence

Suppose we want to compute P(A|e)

P(A|e) = Σb P(A,b|e)

P(A|e) = Σb P(A|b,e)P(b)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence
• Suppose we want to compute P(A|e)
• P(A|e) = Σb P(A,b|e)
• P(A|e) = Σb P(A|b,e)P(b)
• P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066
Top-Down inference
• Only works if the graph of ancestors of a variable is a polytree
• Evidence given on ancestor(s) of the query variable
• Efficient:
• O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents
• Evidence on an ancestor cuts off influence of portion of graph above evidence node

Cavity

Toothache

Querying the BN
• The BN gives P(T|C)
Bayes’ Rule
• P(AB) = P(A|B) P(B) = P(B|A) P(A)
• So… P(A|B) = P(B|A) P(A) / P(B)
Applying Bayes’ Rule
• Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
• What’s P(B)?
Applying Bayes’ Rule
• Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
• What’s P(B)?
• P(B) = Sa P(B,A=a) [marginalization]
• P(B,A=a) = P(B|A=a)P(A=a) [conditional probability]
• So, P(B) = SaP(B | A=a) P(A=a)
Applying Bayes’ Rule
• Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
• What’s P(A|B)?
Applying Bayes’ Rule
• Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
• What’s P(A|B)?
• P(A|B) = P(B|A)P(A)/P(B) [Bayes rule]
• P(B) = SaP(B | A=a) P(A=a) [Last slide]
• So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
• P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
• [An equation that holds for all values A can take on, and all values B can take on]
• P(A=a|B=b) =
• P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
• [An equation that holds for all values A can take on, and all values B can take on]
• P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

• P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
• [An equation that holds for all values A can take on, and all values B can take on]
• P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

NO!

• P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
• [An equation that holds for all values A can take on, and all values B can take on]
• P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)]

Cavity

Toothache

Querying the BN
• The BN gives P(T|C)
• P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule]
• Querying a BN is just applying Bayes’ rule on a larger scale…

Denominator computed by summing out numerator over Cavity and Cavity

Performing Inference
• Variables X
• Have evidence set E=e, query variable Q
• Want to compute the posterior probability distribution over Q, given E=e
• Let the non-evidence variables be Y (= X \ E)
• Straight forward method:
• Compute joint P(YE=e)
• Marginalize to get P(Q,E=e)
• Divide by P(E=e) to get P(Q|E=e)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|M) = ??

Evidence E=e

Query Q

Burglary

Earthquake

Alarm

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))

JohnCalls

MaryCalls

 full joint distribution table

Inference in the Alarm Example

P(J|MaryCalls) = ??

24 entries

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|MaryCalls) = ??

2 entries:one for JohnCalls,the other for JohnCalls

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)

= P(J,MaryCalls)/(SjP(j,MaryCalls))

How expensive?
• P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))

Straightforward method:

• Use above to compute P(Y,E=e)
• P(Q,E=e) = Sy1 … Syk P(Y,E=e)
• P(E=e) = Sq P(Q,E=e)
• Step 1: O( 2n-|E| ) entries!

Normalization factor – no big deal once we have P(Q,E=e)

Can we do better?

Variable Elimination
• Consider linear network X1X2X3
• P(X) = P(X1) P(X2|X1) P(X3|X2)
• P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)
Variable Elimination
• Consider linear network X1X2X3
• P(X) = P(X1) P(X2|X1) P(X3|X2)
• P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

Rearrange equation…

Variable Elimination
• Consider linear network X1X2X3
• P(X) = P(X1) P(X2|X1) P(X3|X2)
• P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

Cache P(x2) for both values of X3!

Variable Elimination
• Consider linear network X1X2X3
• P(X) = P(X1) P(X2|X1) P(X3|X2)
• P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

• How many * and + saved?
• *: 2*4*2=16 vs 4+4=8
• + 2*3=8 vs 2+1=3

Can lead to huge gains in larger networks

VE in Alarm Example
• P(E|j,m)=P(E,j,m)/P(j,m)
• P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
VE in Alarm Example
• P(E|j,m)=P(E,j,m)/P(j,m)
• P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

VE in Alarm Example
• P(E|j,m)=P(E,j,m)/P(j,m)
• P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)

Compute for all values of E,b

VE in Alarm Example
• P(E|j,m)=P(E,j,m)/P(j,m)
• P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)= P(E) P(j,m|E)

Compute for all values of E

What order to perform VE?
• For tree-like BNs (polytrees), order so parents come before children
• # of variables in each intermediate probability table is 2^(# of parents of a node)
• If the number of parents of a node is bounded, then VE is linear time!
• Other networks: intermediate factors may become large
Non-polytree networks
• P(D) = ΣaΣbΣc P(A)P(B|A)P(C|A)P(D|B,C) = ΣbΣc P(D|B,C) Σa P(A)P(B|A)P(C|A)

A

No more simplifications…

B

C

D

Approximate Inference Techniques
• Based on the idea of Monte Carlo simulation
• Basic idea:
• To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed
• Conditional simulation:
• To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can:
• Pick a coin C out of B (occurs with probability P(C))
• Flip C and observe whether it flips heads (occurs with probability P(H|C))
• Put C back and repeat from step 1 many times
• Return the fraction of heads observed (estimate of P(H))

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Approximate Inference: Monte-Carlo Simulation
• Sample from the joint distribution

B=0

E=0

A=0

J=1

M=0

Approximate Inference: Monte-Carlo Simulation
• As more samples are generated, the distribution of the samples approaches the joint distribution!

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

Approximate Inference: Monte-Carlo Simulation
• Inference: given evidence E=e (e.g., J=1)
• Remove the samples that conflict

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

Distribution of remaining samples approximates the conditional distribution!

How many samples?
• Error of estimate, for n samples, is on average
• Variance-reduction techniques
Rare Event Problem:
• What if some events are really rare (e.g., burglary & earthquake ?)
• # of samples must be huge to get a reasonable estimate
• Solution: likelihood weighting
• Enforce that each sample agrees with evidence
• While generating a sample, keep track of the ratio of
• (how likely the sampled value is to occur in the real world)(how likely you were to generate the sampled value)

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.008

B=0

E=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.0023

B=0

E=1

A=1

A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.0016

B=0

E=1

A=1

M=1

J=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=3.988

B=0

E=0

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.004

B=0

E=0

A=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.0028

B=0

E=0

A=1

M=1

J=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.00375

B=1

E=0

A=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=0.0026

B=1

E=0

A=1

M=1

J=1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5

w=5e-7

B=1

E=1

A=1

M=1

J=1

Likelihood weighting
• Suppose evidence Alarm & MaryCalls
• Sample B,E with P=0.5
• N=4 gives P(B|A,M)~=0.371
• Exact inference gives P(B|A,M) = 0.375

w=0.0016

w=0.0028

w=0.0026

w~=0

B=0

E=1

A=1

M=1

J=1

B=0

E=0

A=1

M=1

J=1

B=1

E=0

A=1

M=1

J=1

B=1

E=1

A=1

M=1

J=1

Recap
• Efficient inference in BNs
• Variable elimination
• Approximate methods: Monte-Carlo sampling
Next Lecture
• Statistical learning: from data to distributions
• R&N 20.1-2