Learning bayesian networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Learning Bayesian Networks PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Learning Bayesian Networks. Dimensions of Learning. X 1 true false false true. X 2 1 5 3 2. X 3 0.7 -1.6 5.9 6.3. Learning Bayes nets from data. Bayes net(s). data. X 1. X 2. Bayes-net learner. X 3. X 4. X 5. X 6. X 7. + prior/expert information. X 8. X 9. Q. X 1.

Download Presentation

Learning Bayesian Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning bayesian networks

Learning Bayesian Networks


Dimensions of learning

Dimensions of Learning


Learning bayes nets from data

X1

true

false

false

true

X2

1

5

3

2

X3

0.7

-1.6

5.9

6.3

...

.

.

.

.

.

.

Learning Bayes netsfrom data

Bayes net(s)

data

X1

X2

Bayes-net

learner

X3

X4

X5

X6

X7

+

prior/expert information

X8

X9


From thumbtacks to bayes nets

Q

X1

X2

XN

...

toss 1

toss 2

toss N

From thumbtacks to Bayes nets

Thumbtack problem can be viewed as learning

the probability for a very simple BN:

X

heads/tails


The next simplest bayes net

tails

heads

X

Y

heads/tails

heads/tails

“heads”

“tails”

The next simplest Bayes net


The next simplest bayes net1

X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

?

QY

case 1

Y1

case 2

Y2

YN

case N


The next simplest bayes net2

X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

"parameter

independence"

QY

case 1

Y1

case 2

Y2

YN

case N


The next simplest bayes net3

X

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net

"parameter

independence"

QY

case 1

Y1

ß

case 2

Y2

two separate

thumbtack-like

learning problems

YN

case N


A bit more difficult

X

Y

heads/tails

heads/tails

A bit more difficult...

Three probabilities to learn:

  • qX=heads

  • qY=heads|X=heads

  • qY=heads|X=tails


A bit more difficult1

X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

heads

X1

Y1

case 1

tails

X2

Y2

case 2


A bit more difficult2

X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2


A bit more difficult3

X

Y

heads/tails

heads/tails

A bit more difficult...

?

?

QY|X=heads

QY|X=tails

QX

?

X1

Y1

case 1

X2

Y2

case 2


A bit more difficult4

X

Y

heads/tails

heads/tails

A bit more difficult...

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

3 separate thumbtack-like problems


In general

In general …

Learning probabilities in a Bayes netis straightforward if

  • Complete data

  • Local distributions from the exponential family (binomial, Poisson, gamma, ...)

  • Parameter independence

  • Conjugate priors


Incomplete data makes parameters dependent

X

Y

heads/tails

heads/tails

Incomplete data makes parameters dependent

QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2


Solution use em

Solution: Use EM

  • Initialize parameters ignoring missing data

  • E step: Infer missing values usingcurrent parameters

  • M step: Estimate parameters using completed data

  • Can also use gradient descent


Learning bayes net structure

Learning Bayes-net structure

Given data, which model is correct?

X

Y

model 1:

X

Y

model 2:


Bayesian approach

Bayesian approach

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:


Bayesian approach model averaging

Bayesian approach:Model averaging

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

average

predictions


Bayesian approach model selection

Bayesian approach:Model selection

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

Keep the best model:

- Explanation

- Understanding

- Tractability


To score a model use bayes theorem

To score a model,use Bayes’ theorem

Given data d:

model

score

"marginal

likelihood"

likelihood


Thumbtack example

Thumbtack example

X

heads/tails

conjugate

prior


More complicated graphs

X

Y

heads/tails

heads/tails

More complicated graphs

3 separate thumbtack-like learning problems

X

Y|X=heads

Y|X=tails


Model score for a discrete bayes net

Model score for adiscrete Bayes net


Computation of marginal likelihood

Computation ofmarginal likelihood

Efficient closed form if

  • Local distributions from the exponential family (binomial, poisson, gamma, ...)

  • Parameter independence

  • Conjugate priors

  • No missing data (including no hidden variables)


Structure search

initialize

structure

score

all possible

single changes

perform

best

change

any

changes

better?

yes

no

return

saved structure

Structure search

  • Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995)

  • Heuristic methods

    • Greedy

    • Greedy with restarts

    • MCMC methods


Structure priors

Structure priors

1. All possible structures equally likely

2. Partial ordering, required / prohibited arcs

3. Prior(m) a Similarity(m, prior BN)


Parameter priors

Parameter priors

  • All uniform: Beta(1,1)

  • Use a prior Bayes net


Parameter priors1

Parameter priors

Recall the intuition behind the Beta prior for the thumbtack:

  • The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance"

  • Equivalent sample size = ah + at

  • The larger the equivalent sample size, the more confident we are about the long-run fraction


Parameter priors2

x1

x2

x3

x4

x5

x6

x7

x8

x9

Parameter priors

imaginary

count

for any

variable

configuration

equivalent

sample

size

+

parameter

modularity

parameter priors for any Bayes net structure for X1…Xn


Combining knowledge data

x1

x2

x3

x4

x5

x6

x1

x2

x7

x3

x4

x8

x5

x9

x6

x7

x1

true

false

false

true

x2

false

false

false

true

x3

true

true

false

false

x8

x9

...

.

.

.

.

.

.

Combining knowledge & data

prior network+equivalent sample size

improved network(s)

data


  • Login