This presentation is the property of its rightful owner.
1 / 31

# Learning Bayesian Networks PowerPoint PPT Presentation

Learning Bayesian Networks. Dimensions of Learning. X 1 true false false true. X 2 1 5 3 2. X 3 0.7 -1.6 5.9 6.3. Learning Bayes nets from data. Bayes net(s). data. X 1. X 2. Bayes-net learner. X 3. X 4. X 5. X 6. X 7. + prior/expert information. X 8. X 9. Q. X 1.

Learning Bayesian Networks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

X1

true

false

false

true

X2

1

5

3

2

X3

0.7

-1.6

5.9

6.3

...

.

.

.

.

.

.

### Learning Bayes netsfrom data

Bayes net(s)

data

X1

X2

Bayes-net

learner

X3

X4

X5

X6

X7

+

prior/expert information

X8

X9

Q

X1

X2

XN

...

toss 1

toss 2

toss N

### From thumbtacks to Bayes nets

Thumbtack problem can be viewed as learning

the probability for a very simple BN:

X

tails

X

Y

“tails”

X

Y

QX

X1

X2

XN

?

QY

case 1

Y1

case 2

Y2

YN

case N

X

Y

QX

X1

X2

XN

"parameter

independence"

QY

case 1

Y1

case 2

Y2

YN

case N

X

Y

QX

X1

X2

XN

### The next simplest Bayes net

"parameter

independence"

QY

case 1

Y1

ß

case 2

Y2

two separate

thumbtack-like

learning problems

YN

case N

X

Y

### A bit more difficult...

Three probabilities to learn:

X

Y

QY|X=tails

QX

X1

Y1

case 1

tails

X2

Y2

case 2

X

Y

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

X

Y

?

?

QY|X=tails

QX

?

X1

Y1

case 1

X2

Y2

case 2

X

Y

### A bit more difficult...

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

3 separate thumbtack-like problems

### In general …

Learning probabilities in a Bayes netis straightforward if

• Complete data

• Local distributions from the exponential family (binomial, Poisson, gamma, ...)

• Parameter independence

• Conjugate priors

X

Y

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

### Solution: Use EM

• Initialize parameters ignoring missing data

• E step: Infer missing values usingcurrent parameters

• M step: Estimate parameters using completed data

• Can also use gradient descent

### Learning Bayes-net structure

Given data, which model is correct?

X

Y

model 1:

X

Y

model 2:

### Bayesian approach

Given data, which model is correct? more likely?

X

Y

model 1:

X

Y

model 2:

### Bayesian approach:Model averaging

Given data, which model is correct? more likely?

X

Y

model 1:

X

Y

model 2:

average

predictions

### Bayesian approach:Model selection

Given data, which model is correct? more likely?

X

Y

model 1:

X

Y

model 2:

Keep the best model:

- Explanation

- Understanding

- Tractability

Given data d:

model

score

"marginal

likelihood"

likelihood

X

conjugate

prior

X

Y

### More complicated graphs

3 separate thumbtack-like learning problems

X

Y|X=tails

### Computation ofmarginal likelihood

Efficient closed form if

• Local distributions from the exponential family (binomial, poisson, gamma, ...)

• Parameter independence

• Conjugate priors

• No missing data (including no hidden variables)

initialize

structure

score

all possible

single changes

perform

best

change

any

changes

better?

yes

no

return

saved structure

### Structure search

• Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995)

• Heuristic methods

• Greedy

• Greedy with restarts

• MCMC methods

### Structure priors

1. All possible structures equally likely

2. Partial ordering, required / prohibited arcs

3. Prior(m) a Similarity(m, prior BN)

### Parameter priors

• All uniform: Beta(1,1)

• Use a prior Bayes net

### Parameter priors

Recall the intuition behind the Beta prior for the thumbtack:

• The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance"

• Equivalent sample size = ah + at

• The larger the equivalent sample size, the more confident we are about the long-run fraction

x1

x2

x3

x4

x5

x6

x7

x8

x9

### Parameter priors

imaginary

count

for any

variable

configuration

equivalent

sample

size

+

parameter

modularity

parameter priors for any Bayes net structure for X1…Xn

x1

x2

x3

x4

x5

x6

x1

x2

x7

x3

x4

x8

x5

x9

x6

x7

x1

true

false

false

true

x2

false

false

false

true

x3

true

true

false

false

x8

x9

...

.

.

.

.

.

.

### Combining knowledge & data

prior network+equivalent sample size

improved network(s)

data