Loading in 5 sec....

Learning Bayesian NetworksPowerPoint Presentation

Learning Bayesian Networks

- 119 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Learning Bayesian Networks' - octavius-roberts

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Presentation Transcript

X1

true

false

false

true

X2

1

5

3

2

X3

0.7

-1.6

5.9

6.3

...

.

.

.

.

.

.

Learning Bayes netsfrom dataBayes net(s)

data

X1

X2

Bayes-net

learner

X3

X4

X5

X6

X7

+

prior/expert information

X8

X9

X1

X2

XN

...

toss 1

toss 2

toss N

From thumbtacks to Bayes netsThumbtack problem can be viewed as learning

the probability for a very simple BN:

X

heads/tails

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net?

QY

case 1

Y1

case 2

Y2

YN

case N

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net"parameter

independence"

QY

case 1

Y1

case 2

Y2

YN

case N

Y

heads/tails

heads/tails

QX

X1

X2

XN

The next simplest Bayes net"parameter

independence"

QY

case 1

Y1

ß

case 2

Y2

two separate

thumbtack-like

learning problems

YN

case N

Y

heads/tails

heads/tails

A bit more difficult...Three probabilities to learn:

- qX=heads
- qY=heads|X=heads
- qY=heads|X=tails

Y

heads/tails

heads/tails

A bit more difficult...QY|X=heads

QY|X=tails

QX

heads

X1

Y1

case 1

tails

X2

Y2

case 2

Y

heads/tails

heads/tails

A bit more difficult...QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

Y

heads/tails

heads/tails

A bit more difficult...?

?

QY|X=heads

QY|X=tails

QX

?

X1

Y1

case 1

X2

Y2

case 2

Y

heads/tails

heads/tails

A bit more difficult...QY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

3 separate thumbtack-like problems

In general …

Learning probabilities in a Bayes netis straightforward if

- Complete data
- Local distributions from the exponential family (binomial, Poisson, gamma, ...)
- Parameter independence
- Conjugate priors

Y

heads/tails

heads/tails

Incomplete data makes parameters dependentQY|X=heads

QY|X=tails

QX

X1

Y1

case 1

X2

Y2

case 2

Solution: Use EM

- Initialize parameters ignoring missing data
- E step: Infer missing values usingcurrent parameters
- M step: Estimate parameters using completed data
- Can also use gradient descent

Bayesian approach:Model averaging

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

average

predictions

Bayesian approach:Model selection

Given data, which model is correct? more likely?

X

Y

model 1:

Datad

X

Y

model 2:

Keep the best model:

- Explanation

- Understanding

- Tractability

Y

heads/tails

heads/tails

More complicated graphs3 separate thumbtack-like learning problems

X

Y|X=heads

Y|X=tails

Model score for adiscrete Bayes net

Computation ofmarginal likelihood

Efficient closed form if

- Local distributions from the exponential family (binomial, poisson, gamma, ...)
- Parameter independence
- Conjugate priors
- No missing data (including no hidden variables)

structure

score

all possible

single changes

perform

best

change

any

changes

better?

yes

no

return

saved structure

Structure search- Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995)
- Heuristic methods
- Greedy
- Greedy with restarts
- MCMC methods

Structure priors

1. All possible structures equally likely

2. Partial ordering, required / prohibited arcs

3. Prior(m) a Similarity(m, prior BN)

Parameter priors

- All uniform: Beta(1,1)
- Use a prior Bayes net

Parameter priors

Recall the intuition behind the Beta prior for the thumbtack:

- The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance"
- Equivalent sample size = ah + at
- The larger the equivalent sample size, the more confident we are about the long-run fraction

x1

x2

x3

x4

x5

x6

x7

x8

x9

Parameter priorsimaginary

count

for any

variable

configuration

equivalent

sample

size

+

parameter

modularity

parameter priors for any Bayes net structure for X1…Xn

x1

x2

x3

x4

x5

x6

x1

x2

x7

x3

x4

x8

x5

x9

x6

x7

x1

true

false

false

true

x2

false

false

false

true

x3

true

true

false

false

x8

x9

...

.

.

.

.

.

.

Combining knowledge & dataprior network+equivalent sample size

improved network(s)

data

Download Presentation

Connecting to Server..