Learning bayesian networks
Sponsored Links
This presentation is the property of its rightful owner.
1 / 31

Learning Bayesian Networks PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Learning Bayesian Networks. Dimensions of Learning. X 1 true false false true. X 2 1 5 3 2. X 3 0.7 -1.6 5.9 6.3. Learning Bayes nets from data. Bayes net(s). data. X 1. X 2. Bayes-net learner. X 3. X 4. X 5. X 6. X 7. + prior/expert information. X 8. X 9. Q. X 1.

Download Presentation

Learning Bayesian Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning Bayesian Networks


Dimensions of Learning


X1

true

false

false

true

X2

1

5

3

2

X3

0.7

-1.6

5.9

6.3

...

.

.

.

.

.

.

Learning Bayes netsfrom data

Bayes net(s)

data

X1

X2

Bayes-net

learner

X3

X4

X5

X6

X7

+

prior/expert information

X8

X9


Q

X1

X2

XN

...

toss 1

toss 2

toss N

From thumbtacks to Bayes nets

Thumbtack problem can be viewed as learning

the probability for a very simple BN:

X

heads/tails


tails

heads

X

Y

heads/tails

heads/tails

“heads”

“tails”

The next simplest Bayes net


X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

?

QY

case 1

Y1

case 2

Y2

YN

case N


X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

"parameter

independence"

QY

case 1

Y1

case 2

Y2

YN

case N


X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

"parameter

independence"

QY

case 1

Y1

ß

case 2

Y2

two separate

thumbtack-like

learning problems

YN

case N


X

Y

heads/tails

heads/tails

A bit more difficult...

Three probabilities to learn:

  • qX=heads

  • qY=heads|X=heads

  • qY=heads|X=tails


X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

heads

X1

Y1

case 1

tails

X2

Y2

case 2


X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2


X

Y

heads/tails

heads/tails

A bit more difficult...

?

?

QY|X=heads

QY|X=tails

QX

?

X1

Y1

case 1

X2

Y2

case 2


X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

3 separate thumbtack-like problems


In general …

Learning probabilities in a Bayes netis straightforward if

  • Complete data

  • Local distributions from the exponential family (binomial, Poisson, gamma, ...)

  • Parameter independence

  • Conjugate priors


X

Y

heads/tails

heads/tails

Incomplete data makes parameters dependent

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2


Solution: Use EM

  • Initialize parameters ignoring missing data

  • E step: Infer missing values usingcurrent parameters

  • M step: Estimate parameters using completed data

  • Can also use gradient descent


Learning Bayes-net structure

Given data, which model is correct?

X

Y

model 1:

X

Y

model 2:


Bayesian approach

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:


Bayesian approach:Model averaging

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

average

predictions


Bayesian approach:Model selection

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

Keep the best model:

- Explanation

- Understanding

- Tractability


To score a model,use Bayes’ theorem

Given data d:

model

score

"marginal

likelihood"

likelihood


Thumbtack example

X

heads/tails

conjugate

prior


X

Y

heads/tails

heads/tails

More complicated graphs

3 separate thumbtack-like learning problems

X

Y|X=heads

Y|X=tails


Model score for adiscrete Bayes net


Computation ofmarginal likelihood

Efficient closed form if

  • Local distributions from the exponential family (binomial, poisson, gamma, ...)

  • Parameter independence

  • Conjugate priors

  • No missing data (including no hidden variables)


initialize

structure

score

all possible

single changes

perform

best

change

any

changes

better?

yes

no

return

saved structure

Structure search

  • Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995)

  • Heuristic methods

    • Greedy

    • Greedy with restarts

    • MCMC methods


Structure priors

1. All possible structures equally likely

2. Partial ordering, required / prohibited arcs

3. Prior(m) a Similarity(m, prior BN)


Parameter priors

  • All uniform: Beta(1,1)

  • Use a prior Bayes net


Parameter priors

Recall the intuition behind the Beta prior for the thumbtack:

  • The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance"

  • Equivalent sample size = ah + at

  • The larger the equivalent sample size, the more confident we are about the long-run fraction


x1

x2

x3

x4

x5

x6

x7

x8

x9

Parameter priors

imaginary

count

for any

variable

configuration

equivalent

sample

size

+

parameter

modularity

parameter priors for any Bayes net structure for X1…Xn


x1

x2

x3

x4

x5

x6

x1

x2

x7

x3

x4

x8

x5

x9

x6

x7

x1

true

false

false

true

x2

false

false

false

true

x3

true

true

false

false

x8

x9

...

.

.

.

.

.

.

Combining knowledge & data

prior network+equivalent sample size

improved network(s)

data


  • Login