Exploiting parameter domain knowledge for learning in bayesian networks
Download
1 / 53

Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks. ~ Thesis Defense ~ Stefan Niculescu Carnegie Mellon University, July 2005. Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao (Siemens Medical Solutions). Domain Knowledge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks' - yasuo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Exploiting parameter domain knowledge for learning in bayesian networks

Exploiting Parameter Domain Knowledge forLearning in Bayesian Networks

~ Thesis Defense ~

Stefan Niculescu

Carnegie Mellon University, July 2005

Thesis Committee:

Tom Mitchell (Chair)

John Lafferty

Andrew Moore

Bharat Rao (Siemens Medical Solutions)


Domain knowledge
Domain Knowledge

  • In real world, often data is too sparse to allow building of an accurate model

  • Domain knowledge can help alleviate this problem

  • Several types of domain knowledge:

    • Relevance of variables (feature selection)

    • Conditional Independences among variables

    • Parameter Domain Knowledge


Parameter domain knowledge
Parameter Domain Knowledge

  • In a Bayesian Network for a real world domain:

    • can have huge number of parameters

    • not enough data to estimate them accurately

  • Parameter Domain Knowledge constraints:

    • reduce the space of feasible parameters

    • reduce the variance of parameter estimates


Parameter domain knowledge1
Parameter Domain Knowledge

Examples:

  • DK: “If a person has a Family history ofHeart Attack, Race and Pollution are not significant factors for the probability of getting a Heart Attack.”

  • DK: “Two voxels in the brain may exhibit the same activation patterns during a cognitive task, but with different amplitudes.”

  • DK: “Two countries may have different Heart Disease rates, but the relative proportion of Heart Attack to CHF is the same.”

  • DK: “The aggregate probability of Adverbs in English is less than the aggregate probability of Verbs”.


Thesis
Thesis

Standard methods for performing parameter estimation in Bayesian Networks can be naturally extended to take advantage of parameter domain knowledge that can be provided by a domain expert. These new learning algorithms perform better (in terms of probability density estimation) than existing ones.


Outline
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Parameter domain knowledge framework domain knowledge constraints
Parameter Domain Knowledge Framework~ Domain Knowledge Constraints ~


Parameter domain knowledge framework frequentist approach complete data
Parameter Domain Knowledge Framework~ Frequentist Approach, Complete Data ~


Parameter domain knowledge framework frequentist approach complete data1
Parameter Domain Knowledge Framework~ Frequentist Approach, Complete Data ~


Parameter domain knowledge framework frequentist approach incomplete data
Parameter Domain Knowledge Framework~ Frequentist Approach, Incomplete Data ~

EM Algorithm. Repeat until convergence:


Parameter domain knowledge framework frequentist approach incomplete data discrete variables
Parameter Domain Knowledge Framework~ Frequentist Approach, Incomplete Data ~~ Discrete Variables ~

EM Algorithm. Repeat until convergence:




Parameter domain knowledge framework computing the normalization constant
Parameter Domain Knowledge Framework~ Computing the Normalization Constant ~


Parameter domain knowledge framework computing the normalization constant1
Parameter Domain Knowledge Framework~ Computing the Normalization Constant ~

H(2)

In H7:

ε = 0.5


Outline1
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Simple parameter sharing maximum likelihood estimators
Simple Parameter Sharing~ Maximum Likelihood Estimators ~

Cubical Die – cut symmetrically at each cornerk1=6 k2=8

ki places

Theorem. The Maximum Likelihood parameters are given by:

Total:


Simple parameter sharing dependent dirichlet priors
Simple Parameter Sharing~ Dependent Dirichlet Priors ~


Simple parameter sharing variance reduction in parameter estimates
Simple Parameter Sharing~ Variance Reduction in Parameter Estimates ~


Simple parameter sharing experiments learning a probability distribution
Simple Parameter Sharing~ Experiments – Learning a Probability Distribution ~

  • Synthetic Dataset:

    • Probability distribution over 50 values

    • 50 randomly generated parameters:

      • 6 shared between 2 and 5 times to count as half

      • The rest “not shared” (shared exactly once)

    • 1000 examples sampled from this distribution

    • Purpose:

      • Domain Knowledge readily available

      • To be able to study the effect of training set size (up to 1000)

      • To be able to compare our estimated distribution to the true distribution

  • Models:

    • STBN ( Standard Bayesian Network )

    • PDKBN ( Bayesian Network with PDK )


Experimental results
Experimental Results

  • PDKBN performs better than STBN

    • Largest difference: 0.05 (30 ex)

  • On average, STBN needs 1.86 times more examples to catch up in KL !!!

    • 40 (PDKBN) ~ 103 (STBN)

    • 200 (PDKBN) ~ 516 (STBN)

    • 650 (PDKBN) ~ >1000 (STBN)

  • The difference between PDKBN and STBN shrinks when the size of training set increases, but PDKBN is much better when training data is scarce.


Outline2
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Hidden process models
Hidden Process Models

One observation (trial):

N different trials:

All trials and all Processes have equal length T


Parameter sharing in hpms
Parameter Sharing in HPMs

  • similar shape activity

  • different amplitudes

Xv


Parameter sharing in hpms maximum likelihood estimation
Parameter Sharing in HPMs~ Maximum Likelihood Estimation ~

  • l’(P,C) quadratic in (P,C), but

    • linear in P !

    • linear in C !


Parameter sharing in hpms maximum likelihood estimation1
Parameter Sharing in HPMs~ Maximum Likelihood Estimation ~


Starplus dataset

Trial:

read sentence

view picture

answer whether sentence describes picture

40 trials – 32 time slices (2/sec)

picture presented first in half of trials

sentence first in the other half

Three possible objects: star, dollar, plus

Collected by Just et al.

IDEA: model using HPMs with two processes:

“Sentence” and “Picture”

We assume a process starts when stimulus is presented

Will use Shared HPMs where possible

Starplus Dataset



Exploiting parameter domain knowledge for learning in bayesian networks

+

---

*


Parameter sharing in hpms hierarchical partitioning algorithm
Parameter Sharing in HPMs~ Hierarchical Partitioning Algorithm ~


Parameter sharing in hpms experiments
Parameter Sharing in HPMs~ Experiments ~

  • We compare three models:

    • Based on Average (per trial) Likelihood

    • StHPM – Standard, per voxel HPM

    • ShHPM – One HPM for all voxels in an ROI (24 total)

    • HieHPM – Hierarchical HPM

  • Effect of training set size (6 to 40) in CALC:

    • ShHPM biased here

      • Better than StHPM at small sample size

      • Worse at 40 examples

    • HieHPM – the best

      • It can represent both models

      • e106 times better data likelihoodthan StHPM at 40 examples

      • StHPM needs 2.9 times more examples to catch up


Parameter sharing in hpms experiments1
Parameter Sharing in HPMs~ Experiments ~

Performance over whole brain (40 examples):

  • HieHPM – the best

    • e1792 times better data likelihoodthan StHPM

    • Better than StHPM in 23/24 ROIs

    • Better than ShHPM in 12/24 ROIs, equal in 11/24

  • ShHPM – second best

    • e464 times better data likelihoodthan StHPM

    • Better than StHPM in 18/24 ROIs

    • It has bias, but makes sense to share whole ROIs not involved in the cognitive task


Learned voxel clusters
Learned Voxel Clusters

  • In the whole brain:

    • ~ 300 clusters

    • ~ 15 voxels / cluster

  • In CALC:

    • ~ 60 clusters

    • ~ 5 voxels / cluster



Outline3
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Parameter domain knowledge types
Parameter Domain Knowledge Types

  • DISCRETE:

    • Known Parameter Values

    • Parameter Sharing and Proportionality Constants – One Distribution

    • Sum Sharing and Ratio Sharing – One Distribution

    • Parameter Sharing and Hierarchical Sharing – Multiple Distributions

    • Sum Sharing and Ratio Sharing – Multiple Distributions

  • CONTINUOUS (Gaussian Distributions):

    • Parameter Sharing and Proportionality Constants – One Distribution

    • Parameter Sharing in Hidden Process Models

  • INEQUALITY CONSTRAINTS:

    • Between Sums of Parameters – One Distribution

    • Upper Bounds on Sums of Parameters – One Distribution


Probability ratio sharing
Probability Ratio Sharing

  • Want to model P(Word|Language)

  • Two languages: English, Spanish

  • Different sets of words

  • Domain Knowledge:

    • Word groups:

      • About computers: computer, keyboard, monitor, etc

    • Relative frequency of “computer” to “keyboard” same in both languages

      • Aggregate mass can be different

T1Computer Words

T2 Business Words


Probability ratio sharing1
Probability Ratio Sharing

DK: Parameters of a given color preserve their relative ratios across all distributions!

...



Inequalities between sums of parameters
Inequalities between Sums of Parameters

  • In spoken language:

    • Each Adverb comes along with a Verb

    • Each Adjective comes with a Noun or Pronoun

  • Therefore it is reasonable to expect that:

    • The frequency of Adverbs is less than that of Verbs

    • The frequency of Adjectives is less than that of Nouns and Pronouns Equivalently:

  • In general, within the same distribution:


Outline4
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Dirichlet priors in a bayes net
Dirichlet Priors in a Bayes Net

Prior Belief

Spread

  • The Domain Expert specifies an assignment of parameters.

    • leaves room for some error (Variance).

  • Several types:

    • Standard

    • Dirichlet Tree Priors

    • Dependent Dirichlet


Markov models

...

...

Markov Models

...

...


Module networks
Module Networks

  • In a Module:

  • Same parents

  • Same CPTs

Image from “Learning Module Networks” by Eran Segal and Daphne Koller


Context specific independence
Context Specific Independence

Burglary

Set

Alarm


Limitations of current models
Limitations of Current Models

  • Dirichlet priors

    • When the number of parameters is huge, specifying a useful prior is difficult

    • Unable to enforce even simple constraints:

      • Need additional hyperparameters to enforce basic parameter sharing, but no closed form MAP estimates can be computed !

    • Dependent Dirichlet Priors are not conjugate priors

      • Our priors are dependent and also conjugate !!!

  • Markov Models, Module Networks and CSI

    • Particular cases of our Parameter Sharing DK

    • Do not allow sharing at parameter level of granularity


Outline5
Outline

  • Motivation

  • Parameter Domain Knowledge Framework

  • Simple Parameter Sharing

  • Parameter Sharing in Hidden Process Models

  • Types of Parameter Domain Knowledge

  • Related Work

  • Summary / Future Work


Summary
Summary

  • Parameter Related Domain Knowledge is needed when data is scarce

    • Reduces the number of free parameters

    • Reduces the variance in parameter estimates (illustrated on Simple Parameter Sharing)

  • Developed unified Parameter Domain Knowledge Framework

    • From both a frequentist and Bayesian point of view

    • From both complete and incomplete data

  • Developed efficient learning algorithms for several types of PDK:

    • Closed form solutions for most of these types

    • For both discrete and continuous variables

    • For both equality and inequality constraints

    • Particular cases of our parameter sharing framework:

      • Markov Models, Module Nets, Context Specific Independence

  • Developed method of automatically learning the domain knowledge (illustrated on HPMs)

  • Experiments show the superiority of models using PDK


Future work
Future Work

  • Interactions among different types of Parameter Domain Knowledge

  • Incorporate Parameter Domain Knowledge in Structure Learning

  • Hard vs. Soft constraints

  • Parameter Domain Knowledge for learning Undirected Graphical Models