inference in bayesian networks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Inference in Bayesian Networks PowerPoint Presentation
Download Presentation
Inference in Bayesian Networks

Loading in 2 Seconds...

play fullscreen
1 / 72

Inference in Bayesian Networks - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Inference in Bayesian Networks. Agenda. Reading off independence assumptions Efficient inference in Bayesian Networks Top-down inference Variable elimination Monte-Carlo methods. Some Applications of BN. Medical diagnosis Troubleshooting of hardware/software systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Inference in Bayesian Networks' - teague


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
    • Reading off independence assumptions
  • Efficient inference in Bayesian Networks
    • Top-down inference
    • Variable elimination
    • Monte-Carlo methods
some applications of bn
Some Applications of BN
  • Medical diagnosis
  • Troubleshooting of hardware/software systems
  • Fraud/uncollectible debt detection
  • Data mining
  • Analysis of genetic sequences
  • Data interpretation, computer vision, image understanding
bn from last lecture

Burglary

Earthquake

causes

Alarm

effects

JohnCalls

MaryCalls

BN from Last Lecture

Intuitive meaning of arc from x to y: “x has direct influence on y”

Directed acyclic graph

arcs do not necessarily encode causality
Arcs do not necessarily encode causality!

A

C

B

B

C

A

2 BN’s that can encode the same joint probability distribution

reading off independence relationships
Reading off independence relationships
  • Given B, does the value of A affect the probability of C?
    • P(C|B,A) = P(C|B)?
  • No!
  • C parent’s (B) are given, and so it is independent of its non-descendents (A)
  • Independence is symmetric:C  A | B => A  C | B

A

B

C

what does the bn encode

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

What does the BN encode?

Burglary  Earthquake

JohnCallsMaryCalls | Alarm

JohnCalls Burglary | Alarm

JohnCalls Earthquake | Alarm

MaryCalls Burglary | Alarm

MaryCalls Earthquake | Alarm

A node is independent of its non-descendents, given its parents

reading off independence relationships1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships
  • How about Burglary Earthquake | Alarm ?
  • No! Why?
reading off independence relationships2

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships
  • How about Burglary  Earthquake | Alarm ?
  • No! Why?
  • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075
  • P(B|A)P(E|A) = 0.086
reading off independence relationships3

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Reading off independence relationships
  • How about Burglary  Earthquake | JohnCalls?
  • No! Why?
  • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent
independence relationships
Independence relationships
  • Rough intuition (this holds for tree-like graphs, polytrees):
    • Evidence on the (directed) road between two variables makes them independent
    • Evidence on an “A” node makes descendants independent
    • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent)
  • Formal property in general case : D-separation  independence (see R&N)
benefits of sparse models
Benefits of Sparse Models
  • Modeling
    • Fewer relationships need to be encoded (either through understanding or statistics)
    • Large networks can be built up from smaller ones
  • Intuition
    • Dependencies/independencies between variables can be inferred through network structures
  • Tractable inference
top down inference

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Suppose we want to compute P(Alarm)

top down inference1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Suppose we want to compute P(Alarm)

P(Alarm) = Σb,eP(A,b,e)

P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

top down inference2

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference
  • Suppose we want to compute P(Alarm)
  • P(Alarm) = Σb,eP(A,b,e)
  • P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
  • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
top down inference3

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference
  • Suppose we want to compute P(Alarm)
  • P(A) = Σb,eP(A,b,e)
  • P(A) = Σb,e P(A|b,e)P(b)P(e)
  • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
  • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252
top down inference4

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

top down inference5

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

top down inference6

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference

Now, suppose we want to compute P(MaryCalls)

P(M) = P(M|A)P(A) + P(M|A) P(A)

P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

top down inference with evidence

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence

Suppose we want to compute P(Alarm|Earthquake)

top down inference with evidence1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence

Suppose we want to compute P(A|e)

P(A|e) = Σb P(A,b|e)

P(A|e) = Σb P(A|b,e)P(b)

top down inference with evidence2

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Top-Down inference with Evidence
  • Suppose we want to compute P(A|e)
  • P(A|e) = Σb P(A,b|e)
  • P(A|e) = Σb P(A|b,e)P(b)
  • P(A|e) = 0.95*0.001 +0.29*0.999 + = 0.29066
top down inference7
Top-Down inference
  • Only works if the graph of ancestors of a variable is a polytree
  • Evidence given on ancestor(s) of the query variable
  • Efficient:
    • O(d 2k) time, where d is the number of ancestors of a variable, with k a bound on # of parents
    • Evidence on an ancestor cuts off influence of portion of graph above evidence node
querying the bn

Cavity

Toothache

Querying the BN
  • The BN gives P(T|C)
  • What about P(C|T)?
bayes rule
Bayes’ Rule
  • P(AB) = P(A|B) P(B) = P(B|A) P(A)
  • So… P(A|B) = P(B|A) P(A) / P(B)
applying bayes rule
Applying Bayes’ Rule
  • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
  • What’s P(B)?
applying bayes rule1
Applying Bayes’ Rule
  • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
  • What’s P(B)?
    • P(B) = Sa P(B,A=a) [marginalization]
    • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability]
    • So, P(B) = SaP(B | A=a) P(A=a)
applying bayes rule2
Applying Bayes’ Rule
  • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
  • What’s P(A|B)?
applying bayes rule3
Applying Bayes’ Rule
  • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables)
  • What’s P(A|B)?
    • P(A|B) = P(B|A)P(A)/P(B) [Bayes rule]
    • P(B) = SaP(B | A=a) P(A=a) [Last slide]
    • So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
how do we read this
How do we read this?
  • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
  • [An equation that holds for all values A can take on, and all values B can take on]
  • P(A=a|B=b) =
how do we read this1
How do we read this?
  • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
  • [An equation that holds for all values A can take on, and all values B can take on]
  • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

how do we read this2
How do we read this?
  • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
  • [An equation that holds for all values A can take on, and all values B can take on]
  • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)]

Are these the same a?

NO!

how do we read this3
How do we read this?
  • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]
  • [An equation that holds for all values A can take on, and all values B can take on]
  • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)]

Be careful about indices!

querying the bn1

Cavity

Toothache

Querying the BN
  • The BN gives P(T|C)
  • What about P(C|T)?
  • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule]
  • Querying a BN is just applying Bayes’ rule on a larger scale…

Denominator computed by summing out numerator over Cavity and Cavity

performing inference
Performing Inference
  • Variables X
  • Have evidence set E=e, query variable Q
  • Want to compute the posterior probability distribution over Q, given E=e
  • Let the non-evidence variables be Y (= X \ E)
  • Straight forward method:
    • Compute joint P(YE=e)
    • Marginalize to get P(Q,E=e)
    • Divide by P(E=e) to get P(Q|E=e)
inference in the alarm example

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|M) = ??

Evidence E=e

Query Q

inference in the alarm example1

Burglary

Earthquake

Alarm

P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))

JohnCalls

MaryCalls

 full joint distribution table

Inference in the Alarm Example

P(J|MaryCalls) = ??

24 entries

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

inference in the alarm example2

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|MaryCalls) = ??

2 entries:one for JohnCalls,the other for JohnCalls

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

inference in the alarm example3

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Inference in the Alarm Example

P(J|MaryCalls) = ??

1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)

= P(J,MaryCalls)/(SjP(j,MaryCalls))

how expensive
How expensive?
  • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))

Straightforward method:

  • Use above to compute P(Y,E=e)
  • P(Q,E=e) = Sy1 … Syk P(Y,E=e)
  • P(E=e) = Sq P(Q,E=e)
  • Step 1: O( 2n-|E| ) entries!

Normalization factor – no big deal once we have P(Q,E=e)

Can we do better?

variable elimination
Variable Elimination
  • Consider linear network X1X2X3
  • P(X) = P(X1) P(X2|X1) P(X3|X2)
  • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)
variable elimination1
Variable Elimination
  • Consider linear network X1X2X3
  • P(X) = P(X1) P(X2|X1) P(X3|X2)
  • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)

Rearrange equation…

variable elimination2
Variable Elimination
  • Consider linear network X1X2X3
  • P(X) = P(X1) P(X2|X1) P(X3|X2)
  • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

Cache P(x2) for both values of X3!

variable elimination3
Variable Elimination
  • Consider linear network X1X2X3
  • P(X) = P(X1) P(X2|X1) P(X3|X2)
  • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2)

Computed for each value of X2

  • How many * and + saved?
    • *: 2*4*2=16 vs 4+4=8
    • + 2*3=8 vs 2+1=3

Can lead to huge gains in larger networks

ve in alarm example
VE in Alarm Example
  • P(E|j,m)=P(E,j,m)/P(j,m)
  • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
ve in alarm example1
VE in Alarm Example
  • P(E|j,m)=P(E,j,m)/P(j,m)
  • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

ve in alarm example2
VE in Alarm Example
  • P(E|j,m)=P(E,j,m)/P(j,m)
  • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)

Compute for all values of E,b

ve in alarm example3
VE in Alarm Example
  • P(E|j,m)=P(E,j,m)/P(j,m)
  • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)= P(E) P(j,m|E)

Compute for all values of E

what order to perform ve
What order to perform VE?
  • For tree-like BNs (polytrees), order so parents come before children
    • # of variables in each intermediate probability table is 2^(# of parents of a node)
  • If the number of parents of a node is bounded, then VE is linear time!
  • Other networks: intermediate factors may become large
non polytree networks
Non-polytree networks
  • P(D) = ΣaΣbΣc P(A)P(B|A)P(C|A)P(D|B,C) = ΣbΣc P(D|B,C) Σa P(A)P(B|A)P(C|A)

A

No more simplifications…

B

C

D

approximate inference techniques
Approximate Inference Techniques
  • Based on the idea of Monte Carlo simulation
  • Basic idea:
    • To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed
  • Conditional simulation:
    • To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can:
    • Pick a coin C out of B (occurs with probability P(C))
    • Flip C and observe whether it flips heads (occurs with probability P(H|C))
    • Put C back and repeat from step 1 many times
    • Return the fraction of heads observed (estimate of P(H))
approximate inference monte carlo simulation

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Approximate Inference: Monte-Carlo Simulation
  • Sample from the joint distribution

B=0

E=0

A=0

J=1

M=0

approximate inference monte carlo simulation1
Approximate Inference: Monte-Carlo Simulation
  • As more samples are generated, the distribution of the samples approaches the joint distribution!

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

approximate inference monte carlo simulation2
Approximate Inference: Monte-Carlo Simulation
  • Inference: given evidence E=e (e.g., J=1)
  • Remove the samples that conflict

B=0

E=0

A=0

J=1

M=0

B=0

E=0

A=0

J=0

M=0

B=0

E=0

A=0

J=0

M=0

B=1

E=0

A=1

J=1

M=0

Distribution of remaining samples approximates the conditional distribution!

how many samples
How many samples?
  • Error of estimate, for n samples, is on average
  • Variance-reduction techniques
rare event problem
Rare Event Problem:
  • What if some events are really rare (e.g., burglary & earthquake ?)
  • # of samples must be huge to get a reasonable estimate
  • Solution: likelihood weighting
    • Enforce that each sample agrees with evidence
    • While generating a sample, keep track of the ratio of
  • (how likely the sampled value is to occur in the real world)(how likely you were to generate the sampled value)
likelihood weighting

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=1

likelihood weighting1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.008

B=0

E=1

likelihood weighting2

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.0023

B=0

E=1

A=1

A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

likelihood weighting3

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.0016

B=0

E=1

A=1

M=1

J=1

likelihood weighting4

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=3.988

B=0

E=0

likelihood weighting5

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.004

B=0

E=0

A=1

likelihood weighting6

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.0028

B=0

E=0

A=1

M=1

J=1

likelihood weighting7

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.00375

B=1

E=0

A=1

likelihood weighting8

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=0.0026

B=1

E=0

A=1

M=1

J=1

likelihood weighting9

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5

w=5e-7

B=1

E=1

A=1

M=1

J=1

likelihood weighting10
Likelihood weighting
  • Suppose evidence Alarm & MaryCalls
  • Sample B,E with P=0.5
  • N=4 gives P(B|A,M)~=0.371
  • Exact inference gives P(B|A,M) = 0.375

w=0.0016

w=0.0028

w=0.0026

w~=0

B=0

E=1

A=1

M=1

J=1

B=0

E=0

A=1

M=1

J=1

B=1

E=0

A=1

M=1

J=1

B=1

E=1

A=1

M=1

J=1

recap
Recap
  • Efficient inference in BNs
  • Variable elimination
  • Approximate methods: Monte-Carlo sampling
next lecture
Next Lecture
  • Statistical learning: from data to distributions
  • R&N 20.1-2