Bayesian networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 74

Bayesian Networks PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

Bayesian Networks. Russell and Norvig: Chapter 14 CS440 Fall 2005. sensors. environment. ?. agent. actuators. I believe that the sun will still exist tomorrow with probability 0.999999 and that it will be a sunny with probability 0.6. Probabilistic Agent. Problem.

Download Presentation

Bayesian Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bayesian networks

Bayesian Networks

Russell and Norvig: Chapter 14

CS440 Fall 2005


Probabilistic agent

sensors

environment

?

agent

actuators

I believe that the sun

will still exist tomorrowwith probability 0.999999

and that it will be a sunny

with probability 0.6

Probabilistic Agent


Problem

Problem

  • At a certain time t, the KB of an agent is some collection of beliefs

  • At time t the agent’s sensors make an observation that changes the strength of one of its beliefs

  • How should the agent update the strength of its other beliefs?


Purpose of bayesian networks

Purpose of Bayesian Networks

  • Facilitate the description of a collection of beliefs by making explicit causality relations and conditional independence among beliefs

  • Provide a more efficient way (than by using joint distribution tables) to update belief strengths when new evidence is observed


Other names

Other Names

  • Belief networks

  • Probabilistic networks

  • Causal networks


Bayesian networks1

Bayesian Networks

  • A simple, graphical notation for conditional independence assertions resulting in a compact representation for the full joint distribution

  • Syntax:

    • a set of nodes, one per variable

    • a directed, acyclic graph (link = ‘direct influences’)

    • a conditional distribution for each node given its parents: P(Xi|Parents(Xi))


Example

Example

Topology of network encodes conditional independence assertions:

Cavity

Weather

Toothache

Catch

Weather is independent of other variables

Toothache and Catch are independent given Cavity


Example1

Example

I’m at work, neighbor John calls to say my alarm is

ringing, but neighbor Mary doesn’t call. Sometime it’s set off by a minor earthquake. Is there a burglar?

Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reflects “causal” knowledge:- A burglar can set the alarm off- An earthquake can set the alarm off- The alarm can cause Mary to call- The alarm can cause John to call


A simple belief network

Burglary

Earthquake

causes

Alarm

effects

JohnCalls

MaryCalls

A Simple Belief Network

Intuitive meaning of arrow

from x to y: “x has direct influence on y”

Directed acyclicgraph (DAG)

Nodes are random variables


Assigning probabilities to roots

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Assigning Probabilities to Roots


Conditional probability tables

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Conditional Probability Tables

Size of the CPT for a node with k parents: ?


Conditional probability tables1

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Conditional Probability Tables


What the bn means

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

What the BN Means

P(x1,x2,…,xn) = Pi=1,…,nP(xi|Parents(Xi))


Calculation of joint probability

Burglary

Earthquake

Alarm

JohnCalls

MaryCalls

Calculation of Joint Probability

P(JMABE)= P(J|A)P(M|A)P(A|B,E)P(B)P(E)= 0.9 x 0.7 x 0.001 x 0.999 x 0.998= 0.00062


What the bn encodes

Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm

The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm

Burglary

Earthquake

For example, John doesnot observe any burglariesdirectly

Alarm

JohnCalls

MaryCalls

What The BN Encodes


What the bn encodes1

Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or Alarm

The beliefs JohnCalls and MaryCalls are independent given Alarm or Alarm

For instance, the reasons why John and Mary may not call if there is an alarm are unrelated

Burglary

Earthquake

Alarm

Note that these reasons couldbe other beliefs in the network.The probabilities summarize thesenon-explicit beliefs

JohnCalls

MaryCalls

What The BN Encodes


Structure of bn

Structure of BN

  • The relation: P(x1,x2,…,xn) = Pi=1,…,nP(xi|Parents(Xi))means that each belief is independent of its predecessors in the BN given its parents

  • Said otherwise, the parents of a belief Xiare all the beliefs that “directly influence” Xi

  • Usually (but not always) the parents of Xiare its causes and Xi is the effect of these causes

E.g., JohnCalls is influenced by Burglary, but not directly. JohnCalls is directly influenced by Alarm


Construction of bn

Construction of BN

  • Choose the relevant sentences (random variables) that describe the domain

  • Select an ordering X1,…,Xn, so that all the beliefs that directly influence Xi are before Xi

  • For j=1,…,n do:

    • Add a node in the network labeled by Xj

    • Connect the node of its parents to Xj

    • Define the CPT of Xj

  • The ordering guarantees that the BN will have no cycles


Cond independence relations

Y1

Y2

X

Non-descendent

Ancestor

Cond. Independence Relations

Parent

  • 1. Each random variable X, is conditionally independent of its non-descendents, given its parents Pa(X)

  • Formally,I(X; NonDesc(X) | Pa(X))

  • 2. Each random variable is conditionally independent of all the other nodes in the graph, given its neighbor

Non-descendent

Descendent


Inference in bn

J

M

P(B|…)

T

T

?

Distribution conditional to the observations made

Inference In BN

  • Set E of evidence variables that are observed, e.g., {JohnCalls,MaryCalls}

  • Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E)


Inference patterns

Burglary

Earthquake

Burglary

Earthquake

Causal

Diagnostic

Alarm

Alarm

JohnCalls

MaryCalls

JohnCalls

MaryCalls

Burglary

Earthquake

Burglary

Earthquake

Mixed

Intercausal

Alarm

Alarm

JohnCalls

MaryCalls

JohnCalls

MaryCalls

Inference Patterns

  • Basic use of a BN: Given new

  • observations, compute the newstrengths of some (or all) beliefs

  • Other use: Given the strength of

  • a belief, which observation should

  • we gather to make the greatest

  • change in this belief’s strength


Types of nodes on a path

Battery

Gas

Radio

SparkPlugs

linear

Starts

diverging

converging

Moves

Types Of Nodes On A Path


Independence relations in bn

Battery

Gas

Radio

SparkPlugs

linear

Starts

diverging

converging

Moves

Independence Relations In BN

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three

conditions holds:

1. A node on the path is linear and in E

2. A node on the path is diverging and in E

3. A node on the path is converging and neither this node, nor any descendant is in E


Independence relations in bn1

Battery

Gas

Radio

SparkPlugs

linear

Starts

diverging

converging

Moves

Independence Relations In BN

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three

conditions holds:

1. A node on the path is linear and in E

2. A node on the path is diverging and in E

3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given evidence on SparkPlugs


Independence relations in bn2

Battery

Gas

Radio

SparkPlugs

linear

Starts

diverging

converging

Moves

Independence Relations In BN

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three

conditions holds:

1. A node on the path is linear and in E

2. A node on the path is diverging and in E

3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given evidence on Battery


Independence relations in bn3

Battery

Gas

Radio

SparkPlugs

linear

Starts

diverging

converging

Moves

Independence Relations In BN

Given a set E of evidence nodes, two beliefs connected by an undirected path are independent if one of the following three

conditions holds:

1. A node on the path is linear and in E

2. A node on the path is diverging and in E

3. A node on the path is converging and neither this node, nor any descendant is in E

Gas and Radio are independent given no evidence, but they aredependent given evidence on

Starts or Moves


Bn inference

A

B

C

BN Inference

  • Simplest Case:

A

B

P(B) = P(a)P(B|a) + P(~a)P(B|~a)

P(C) = ???


Bn inference1

BN Inference

  • Chain:

X1

X2

Xn

What is time complexity to compute P(Xn)?

What is time complexity if we computed the full joint?


Inference ex 2

Cloudy

Rain

Sprinkler

WetGrass

Inference Ex. 2

Algorithm is computing not individual

probabilities, but entire tables

  • Two ideas crucial to avoiding exponential blowup:

    • because of the structure of the BN, somesubexpression in the joint depend only on a small numberof variable

    • By computing them once and caching the result, wecan avoid generating them exponentially many times


Variable elimination

Variable Elimination

General idea:

  • Write query in the form

  • Iteratively

    • Move all irrelevant terms outside of innermost sum

    • Perform innermost sum, getting a new term

    • Insert the new term into the product


A more complex example

Smoking

Visit to

Asia

Tuberculosis

Lung Cancer

Abnormality

in Chest

Bronchitis

Dyspnea

X-Ray

A More Complex Example

  • “Asia” network:


Bayesian networks

S

V

L

T

B

A

X

D

  • We want to compute P(d)

  • Need to eliminate: v,s,x,t,l,a,b

    Initial factors


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: v,s,x,t,l,a,b

    Initial factors

Eliminate: v

Note: fv(t) = P(t)

In general, result of elimination is not necessarily a probability term


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: s,x,t,l,a,b

  • Initial factors

Eliminate: s

Summing on s results in a factor with two arguments fs(b,l)

In general, result of elimination may be a function of several variables


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: x,t,l,a,b

  • Initial factors

Eliminate: x

Note: fx(a) = 1 for all values of a !!


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: t,l,a,b

  • Initial factors

Eliminate: t


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: l,a,b

  • Initial factors

Eliminate: l


Bayesian networks

S

V

L

T

B

A

X

D

Compute:

  • We want to compute P(d)

  • Need to eliminate: b

  • Initial factors

Eliminate: a,b


Variable elimination1

Variable Elimination

  • We now understand variable elimination as a sequence of rewriting operations

  • Actual computation is done in elimination step

  • Computation depends on order of elimination


Dealing with evidence

S

V

L

T

B

A

X

D

Dealing with evidence

  • How do we deal with evidence?

  • Suppose get evidence V = t, S = f, D = t

  • We want to compute P(L, V = t, S = f, D = t)


Dealing with evidence1

S

V

L

T

B

A

X

D

Dealing with Evidence

  • We start by writing the factors:

  • Since we know that V = t, we don’t need to eliminate V

  • Instead, we can replace the factors P(V) and P(T|V) with

  • These “select” the appropriate parts of the original factors given the evidence

  • Note that fp(V) is a constant, and thus does not appear in elimination of other variables


Dealing with evidence2

S

V

L

T

B

A

X

D

Dealing with Evidence

  • Given evidence V = t, S = f, D = t

  • Compute P(L, V = t, S = f, D = t )

  • Initial factors, after setting evidence:


Dealing with evidence3

S

V

L

T

B

A

X

D

Dealing with Evidence

  • Given evidence V = t, S = f, D = t

  • Compute P(L, V = t, S = f, D = t )

  • Initial factors, after setting evidence:

  • Eliminating x, we get


Dealing with evidence4

S

V

L

T

B

A

X

D

Dealing with Evidence

  • Given evidence V = t, S = f, D = t

  • Compute P(L, V = t, S = f, D = t )

  • Initial factors, after setting evidence:

  • Eliminating x, we get

  • Eliminating t, we get


Dealing with evidence5

S

V

L

T

B

A

X

D

Dealing with Evidence

  • Given evidence V = t, S = f, D = t

  • Compute P(L, V = t, S = f, D = t )

  • Initial factors, after setting evidence:

  • Eliminating x, we get

  • Eliminating t, we get

  • Eliminating a, we get


Dealing with evidence6

S

V

L

T

B

A

X

D

Dealing with Evidence

  • Given evidence V = t, S = f, D = t

  • Compute P(L, V = t, S = f, D = t )

  • Initial factors, after setting evidence:

  • Eliminating x, we get

  • Eliminating t, we get

  • Eliminating a, we get

  • Eliminating b, we get


Variable elimination algorithm

Variable Elimination Algorithm

  • Let X1,…, Xm be an ordering on the non-query variables

  • For I = m, …, 1

    • Leave in the summation for Xi only factors mentioning Xi

    • Multiply the factors, getting a factor that contains a number for each value of the variables mentioned, including Xi

    • Sum out Xi, getting a factor f that contains a number for each value of the variables mentioned, not including Xi

    • Replace the multiplied factor in the summation


Complexity of variable elimination

Complexity of variable elimination

  • Suppose in one elimination step we compute

    This requires

  • multiplications

    • For each value for x, y1, …, yk, we do m multiplications

  • additions

    • For each value of y1, …, yk , we do |Val(X)| additions

      Complexity is exponential in number of variables in the intermediate factor!


Understanding variable elimination

Understanding Variable Elimination

  • We want to select “good” elimination orderings that reduce complexity

  • This can be done be examining a graph theoretic property of the “induced” graph; we will not cover this in class.

  • This reduces the problem of finding good ordering to graph-theoretic operation that is well-understood—unfortunately computing it is NP-hard!


Exercise variable elimination

Exercise: Variable elimination

p(study)=.6

smart

study

p(smart)=.8

p(fair)=.9

prepared

fair

pass

Query: What is the probability that a student is smart, given that they pass the exam?


Approaches to inference

Approaches to inference

  • Exact inference

    • Inference in Simple Chains

    • Variable elimination

    • Clustering / join tree algorithms

  • Approximate inference

    • Stochastic simulation / sampling methods

    • Markov chain Monte Carlo methods


Stochastic simulation direct

Stochastic simulation - direct

  • Suppose you are given values for some subset of the variables, G, and want to infer values for unknown variables, U

  • Randomly generate a very large number of instantiations from the BN

    • Generate instantiations for all variables – start at root variables and work your way “forward”

  • Rejection Sampling: keep those instantiations that are consistent with the values for G

  • Use the frequency of values for U to get estimated probabilities

  • Accuracy of the results depends on the size of the sample (asymptotically approaches exact results)


Direct stochastic simulation

Cloudy

Rain

Sprinkler

WetGrass

Direct Stochastic Simulation

P(WetGrass|Cloudy)?

P(WetGrass|Cloudy)

= P(WetGrass  Cloudy) / P(Cloudy)

1. Repeat N times: 1.1. Guess Cloudy at random 1.2. For each guess of Cloudy, guess Sprinkler and Rain, then WetGrass

2. Compute the ratio of the # runs where WetGrass and Cloudy are True over the # runs where Cloudy is True


Exercise direct sampling

Exercise: Direct sampling

p(study)=.6

smart

study

p(smart)=.8

p(fair)=.9

prepared

fair

pass

Topological order = …?

Random number generator: .35, .76, .51, .44, .08, .28, .03, .92, .02, .42


Likelihood weighting

Likelihood weighting

  • Idea: Don’t generate samples that need to be rejected in the first place!

  • Sample only from the unknown variables Z

  • Weight each sample according to the likelihood that it would occur, given the evidence E


Learning bayesian networks

Learning Bayesian Networks


Learning bayesian networks1

E

B

P(A | E,B)

B

E

.9

.1

e

b

e

b

.7

.3

.8

.2

e

b

R

A

.99

.01

e

b

C

Data +

Prior information

Learning Bayesian networks

Inducer


Known structure complete data

E

B

P(A | E,B)

.9

.1

e

b

e

b

.7

.3

.8

.2

e

b

.99

.01

e

b

E

B

P(A | E,B)

B

B

E

E

?

?

e

b

A

A

e

b

?

?

?

?

e

b

?

?

e

b

Known Structure -- Complete Data

E, B, A

<Y,N,N>

<Y,Y,Y>

<N,N,Y>

<N,Y,Y>

.

.

<N,Y,Y>

  • Network structure is specified

    • Inducer needs to estimate parameters

  • Data does not contain missing values

Inducer


Unknown structure complete data

E

B

P(A | E,B)

.9

.1

e

b

e

b

.7

.3

.8

.2

e

b

B

E

.99

.01

e

b

A

E

B

P(A | E,B)

B

E

?

?

e

b

A

e

b

?

?

?

?

e

b

?

?

e

b

Unknown Structure -- Complete Data

E, B, A

<Y,N,N>

<Y,Y,Y>

<N,N,Y>

<N,Y,Y>

.

.

<N,Y,Y>

  • Network structure is not specified

    • Inducer needs to select arcs & estimate parameters

  • Data does not contain missing values

Inducer


Known structure incomplete data

E

B

P(A | E,B)

.9

.1

e

b

e

b

.7

.3

.8

.2

e

b

.99

.01

e

b

E

B

P(A | E,B)

B

B

E

E

?

?

e

b

A

A

e

b

?

?

?

?

e

b

?

?

e

b

Known Structure -- Incomplete Data

E, B, A

<Y,N,N>

<Y,?,Y>

<N,N,Y>

<N,Y,?>

.

.

<?,Y,Y>

  • Network structure is specified

  • Data contains missing values

    • We consider assignments to missing values

Inducer


Known structure complete data1

Known Structure / Complete Data

  • Given a network structure G

    • And choice of parametric family for P(Xi|Pai)

  • Learn parameters for network

    Goal

  • Construct a network that is “closest” to probability that generated the data


Learning parameters for a bayesian network

B

E

A

C

Learning Parameters for a Bayesian Network

  • Training data has the form:


Unknown structure complete data1

E

B

P(A | E,B)

.9

.1

e

b

e

b

.7

.3

.8

.2

e

b

B

E

.99

.01

e

b

A

E

B

P(A | E,B)

B

E

?

?

e

b

A

e

b

?

?

?

?

e

b

?

?

e

b

Unknown Structure -- Complete Data

E, B, A

<Y,N,N>

<Y,Y,Y>

<N,N,Y>

<N,Y,Y>

.

.

<N,Y,Y>

  • Network structure is not specified

    • Inducer needs to select arcs & estimate parameters

  • Data does not contain missing values

Inducer


Benefits of learning structure

Benefits of Learning Structure

  • Discover structural properties of the domain

    • Ordering of events

    • Relevance

  • Identifying independencies  faster inference

  • Predict effect of actions

    • Involves learning causal relationship among variables


Why struggle for accurate structure

Increases the number of parameters to be fitted

Wrong assumptions about causality and domain structure

Cannot be compensated by accurate fitting of parameters

Also misses causality and domain structure

Truth

Earthquake

Earthquake

Alarm Set

AlarmSet

Burglary

Burglary

Earthquake

Alarm Set

Burglary

Sound

Sound

Sound

Why Struggle for Accurate Structure?

Adding an arc

Missing an arc


Approaches to learning structure

Approaches to Learning Structure

  • Constraint based

    • Perform tests of conditional independence

    • Search for a network that is consistent with the observed dependencies and independencies

  • Pros & Cons

    • Intuitive, follows closely the construction of BNs

    • Separates structure learning from the form of the independence tests

    • Sensitive to errors in individual tests


Approaches to learning structure1

Approaches to Learning Structure

  • Score based

    • Define a score that evaluates how well the (in)dependencies in a structure match the observations

    • Search for a structure that maximizes the score

  • Pros & Cons

    • Statistically motivated

    • Can make compromises

    • Takes the structure of conditional probabilities into account

    • Computationally hard


Heuristic search

Heuristic Search

  • Define a search space:

    • nodes are possible structures

    • edges denote adjacency of structures

  • Traverse this space looking for high-scoring structures

    Search techniques:

    • Greedy hill-climbing

    • Best first search

    • Simulated Annealing

    • ...


Heuristic search cont

S

C

E

S

C

D

E

D

S

C

S

C

E

E

D

D

Heuristic Search (cont.)

  • Typical operations:

Add C D

Reverse C E

Delete C E


Exploiting decomposability in local search

S

C

S

C

E

E

D

D

S

C

S

C

E

E

D

D

Exploiting Decomposability in Local Search

  • Caching: To update the score of after a local change, we only need to re-score the families that were changed in the last move


Greedy hill climbing

Greedy Hill-Climbing

  • Simplest heuristic local search

    • Start with a given network

      • empty network

      • best tree

      • a random network

    • At each iteration

      • Evaluate all possible changes

      • Apply change that leads to best improvement in score

      • Reiterate

    • Stop when no modification improves score

  • Each step requires evaluating approximately n new changes


Greedy hill climbing possible pitfalls

Greedy Hill-Climbing: Possible Pitfalls

  • Greedy Hill-Climbing can get struck in:

    • Local Maxima:

      • All one-edge changes reduce the score

    • Plateaus:

      • Some one-edge changes leave the score unchanged

      • Happens because equivalent networks received the same score and are neighbors in the search space

  • Both occur during structure search

  • Standard heuristics can escape both

    • Random restarts

    • TABU search


Summary

Summary

  • Belief update

  • Role of conditional independence

  • Belief networks

  • Causality ordering

  • Inference in BN

  • Stochastic Simulation

  • Learning BNs


A bayesian network

MINVOLSET

KINKEDTUBE

PULMEMBOLUS

INTUBATION

VENTMACH

DISCONNECT

PAP

SHUNT

VENTLUNG

VENITUBE

PRESS

MINOVL

FIO2

VENTALV

PVSAT

ANAPHYLAXIS

ARTCO2

EXPCO2

SAO2

TPR

INSUFFANESTH

HYPOVOLEMIA

LVFAILURE

CATECHOL

LVEDVOLUME

STROEVOLUME

ERRCAUTER

HR

ERRBLOWOUTPUT

HISTORY

CO

CVP

PCWP

HREKG

HRSAT

HRBP

BP

A Bayesian Network

The “ICU alarm” network

37 variables, 509 parameters (instead of 237)


  • Login