- 146 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Inference in Bayesian Networks' - teague

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Agenda

- Reading off independence assumptions
- Efficient inference in Bayesian Networks
- Top-down inference
- Variable elimination
- Monte-Carlo methods

Some Applications of BN

- Medical diagnosis
- Troubleshooting of hardware/software systems
- Fraud/uncollectible debt detection
- Data mining
- Analysis of genetic sequences
- Data interpretation, computer vision, image understanding

Earthquake

causes

Alarm

effects

JohnCalls

MaryCalls

BN from Last LectureIntuitive meaning of arc from x to y: “x has direct influence on y”

Directed acyclic graph

Arcs do not necessarily encode causality!

A

C

B

B

C

A

2 BN’s that can encode the same joint probability distribution

Reading off independence relationships

- Given B, does the value of A affect the probability of C?
- P(C|B,A) = P(C|B)?
- No!
- C parent’s (B) are given, and so it is independent of its non-descendents (A)
- Independence is symmetric:C A | B => A C | B

A

B

C

Earthquake

Alarm

JohnCalls

MaryCalls

What does the BN encode?Burglary Earthquake

JohnCallsMaryCalls | Alarm

JohnCalls Burglary | Alarm

JohnCalls Earthquake | Alarm

MaryCalls Burglary | Alarm

MaryCalls Earthquake | Alarm

A node is independent of its non-descendents, given its parents

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships- How about Burglary Earthquake | Alarm ?
- No! Why?

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships- How about Burglary Earthquake | Alarm ?
- No! Why?
- P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075
- P(B|A)P(E|A) = 0.086

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships- How about Burglary Earthquake | JohnCalls?
- No! Why?
- Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent

Independence relationships

- Rough intuition (this holds for tree-like graphs, polytrees):
- Evidence on the (directed) road between two variables makes them independent
- Evidence on an “A” node makes descendants independent
- Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent)
- Formal property in general case : D-separation independence (see R&N)

Benefits of Sparse Models

- Modeling
- Fewer relationships need to be encoded (either through understanding or statistics)
- Large networks can be built up from smaller ones
- Intuition
- Dependencies/independencies between variables can be inferred through network structures
- Tractable inference

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inferenceSuppose we want to compute P(Alarm)

P(Alarm) = Σb,eP(A,b,e)

P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference- Suppose we want to compute P(Alarm)
- P(Alarm) = Σb,eP(A,b,e)
- P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
- P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference- Suppose we want to compute P(Alarm)
- P(A) = Σb,eP(A,b,e)
- P(A) = Σb,e P(A|b,e)P(b)P(e)
- P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
- P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inferenceNow, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inferenceNow, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with EvidenceSuppose we want to compute P(Alarm|Earthquake)

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with EvidenceSuppose we want to compute P(A|e)

P(A|e) = Σb P(A,b|e)

P(A|e) = Σb P(A|b,e)P(b)

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence- Suppose we want to compute P(A|e)
- P(A|e) = Σb P(A,b|e)
- P(A|e) = Σb P(A|b,e)P(b)
- P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066

Top-Down inference

- Only works if the graph of ancestors of a variable is a polytree
- Evidence given on ancestor(s) of the query variable
- Efficient:
- O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents
- Evidence on an ancestor cuts off influence of portion of graph above evidence node

Bayes’ Rule

- P(AB) = P(A|B) P(B) = P(B|A) P(A)
- So… P(A|B) = P(B|A) P(A) / P(B)

Applying Bayes’ Rule

- Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
- What’s P(B)?

Applying Bayes’ Rule

- Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
- What’s P(B)?
- P(B) = Sa P(B,A=a) [marginalization]
- P(B,A=a) = P(B|A=a)P(A=a) [conditional probability]
- So, P(B) = SaP(B | A=a) P(A=a)

Applying Bayes’ Rule

- Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
- What’s P(A|B)?

Applying Bayes’ Rule

- What’s P(A|B)?
- P(A|B) = P(B|A)P(A)/P(B) [Bayes rule]
- P(B) = SaP(B | A=a) P(A=a) [Last slide]
- So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]

How do we read this?

- P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
- [An equation that holds for all values A can take on, and all values B can take on]
- P(A=a|B=b) =

How do we read this?

- P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
- [An equation that holds for all values A can take on, and all values B can take on]
- P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

How do we read this?

- P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
- [An equation that holds for all values A can take on, and all values B can take on]
- P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

NO!

How do we read this?

- P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
- [An equation that holds for all values A can take on, and all values B can take on]
- P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)]

Be careful about indices!

Toothache

Querying the BN- The BN gives P(T|C)
- What about P(C|T)?
- P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule]
- Querying a BN is just applying Bayes’ rule on a larger scale…

Denominator computed by summing out numerator over Cavity and Cavity

Performing Inference

- Variables X
- Have evidence set E=e, query variable Q
- Want to compute the posterior probability distribution over Q, given E=e
- Let the non-evidence variables be Y (= X \ E)
- Straight forward method:
- Compute joint P(YE=e)
- Marginalize to get P(Q,E=e)
- Divide by P(E=e) to get P(Q|E=e)

Earthquake

Alarm

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))

JohnCalls

MaryCalls

full joint distribution table

Inference in the Alarm ExampleP(J|MaryCalls) = ??

24 entries

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm ExampleP(J|MaryCalls) = ??

2 entries:one for JohnCalls,the other for JohnCalls

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm ExampleP(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)

= P(J,MaryCalls)/(SjP(j,MaryCalls))

How expensive?

- P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))

Straightforward method:

- Use above to compute P(Y,E=e)
- P(Q,E=e) = Sy1 … Syk P(Y,E=e)
- P(E=e) = Sq P(Q,E=e)
- Step 1: O( 2n-|E| ) entries!

Normalization factor – no big deal once we have P(Q,E=e)

Can we do better?

Variable Elimination

- Consider linear network X1X2X3
- P(X) = P(X1) P(X2|X1) P(X3|X2)
- P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)

Variable Elimination

- Consider linear network X1X2X3
- P(X) = P(X1) P(X2|X1) P(X3|X2)
- P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

Rearrange equation…

Variable Elimination

- Consider linear network X1X2X3
- P(X) = P(X1) P(X2|X1) P(X3|X2)
- P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

Cache P(x2) for both values of X3!

Variable Elimination

- Consider linear network X1X2X3
- P(X) = P(X1) P(X2|X1) P(X3|X2)
- P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

- How many * and + saved?
- *: 2*4*2=16 vs 4+4=8
- + 2*3=8 vs 2+1=3

Can lead to huge gains in larger networks

VE in Alarm Example

- P(E|j,m)=P(E,j,m)/P(j,m)
- P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

VE in Alarm Example

- P(E|j,m)=P(E,j,m)/P(j,m)
- P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

VE in Alarm Example

- P(E|j,m)=P(E,j,m)/P(j,m)
- P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)

Compute for all values of E,b

VE in Alarm Example

- P(E|j,m)=P(E,j,m)/P(j,m)
- P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)= P(E) P(j,m|E)

Compute for all values of E

What order to perform VE?

- For tree-like BNs (polytrees), order so parents come before children
- # of variables in each intermediate probability table is 2^(# of parents of a node)
- If the number of parents of a node is bounded, then VE is linear time!
- Other networks: intermediate factors may become large

Non-polytree networks

- P(D) = ΣaΣbΣc P(A)P(B|A)P(C|A)P(D|B,C) = ΣbΣc P(D|B,C) Σa P(A)P(B|A)P(C|A)

A

No more simplifications…

B

C

D

Approximate Inference Techniques

- Based on the idea of Monte Carlo simulation
- Basic idea:
- To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed
- Conditional simulation:
- To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can:
- Pick a coin C out of B (occurs with probability P(C))
- Flip C and observe whether it flips heads (occurs with probability P(H|C))
- Put C back and repeat from step 1 many times
- Return the fraction of heads observed (estimate of P(H))

Earthquake

Alarm

JohnCalls

MaryCalls

Approximate Inference: Monte-Carlo Simulation- Sample from the joint distribution

B=0

E=0

A=0

J=1

M=0

Approximate Inference: Monte-Carlo Simulation

- As more samples are generated, the distribution of the samples approaches the joint distribution!

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

Approximate Inference: Monte-Carlo Simulation

- Inference: given evidence E=e (e.g., J=1)
- Remove the samples that conflict

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

Distribution of remaining samples approximates the conditional distribution!

How many samples?

- Error of estimate, for n samples, is on average
- Variance-reduction techniques

Rare Event Problem:

- What if some events are really rare (e.g., burglary & earthquake ?)
- # of samples must be huge to get a reasonable estimate
- Solution: likelihood weighting
- Enforce that each sample agrees with evidence
- While generating a sample, keep track of the ratio of

- (how likely the sampled value is to occur in the real world)(how likely you were to generate the sampled value)

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.008

B=0

E=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.0023

B=0

E=1

A=1

A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.0016

B=0

E=1

A=1

M=1

J=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=3.988

B=0

E=0

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.004

B=0

E=0

A=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.0028

B=0

E=0

A=1

M=1

J=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.00375

B=1

E=0

A=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=0.0026

B=1

E=0

A=1

M=1

J=1

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5

w=5e-7

B=1

E=1

A=1

M=1

J=1

Likelihood weighting

- Suppose evidence Alarm & MaryCalls
- Sample B,E with P=0.5
- N=4 gives P(B|A,M)~=0.371
- Exact inference gives P(B|A,M) = 0.375

w=0.0016

w=0.0028

w=0.0026

w~=0

B=0

E=1

A=1

M=1

J=1

B=0

E=0

A=1

M=1

J=1

B=1

E=0

A=1

M=1

J=1

B=1

E=1

A=1

M=1

J=1

Recap

- Efficient inference in BNs
- Variable elimination
- Approximate methods: Monte-Carlo sampling

Next Lecture

- Statistical learning: from data to distributions
- R&N 20.1-2

Download Presentation

Connecting to Server..