introduction to bayesian belief nets n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Bayesian Belief Nets PowerPoint Presentation
Download Presentation
Introduction to Bayesian Belief Nets

Loading in 2 Seconds...

play fullscreen
1 / 63

Introduction to Bayesian Belief Nets - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Introduction to Bayesian Belief Nets. Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~greiner/bn.html. 1996 …. 1990 …. 1980. Motivation. Gates says [LATimes, 28/Oct/96]:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Bayesian Belief Nets' - badu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to bayesian belief nets

Introduction toBayesian Belief Nets

Russ Greiner

Dep’t of Computing Science

Alberta Ingenuity Centre for Machine Learning

University of Alberta

http://www.cs.ualberta.ca/~greiner/bn.html

slide2
1996 …

1990 …

1980

motivation
Motivation
  • Gates says [LATimes, 28/Oct/96]:

Microsoft’s competitive advantages is its expertise in “Bayesian networks”

  • Current Products
    • Microsoft Pregnancy and Child Care (MSN)
    • Answer Wizard (Office, …)
    • Print Troubleshooter

Excel Workbook Troubleshooter

Office 95 Setup Media Troubleshooter

Windows NT 4.0 Video Troubleshooter

Word Mail Merge Troubleshooter

motivation ii
Motivation (II)
  • US Army:SAIP(Battalion Detection from SAR, IR… GulfWar)
  • NASA: Vista(DSS for Space Shuttle)
  • GE: Gems(real-time monitor for utility generators)
  • Intel: (infer possible processing problems from end-of-line tests on semiconductor chips)
  • KIC:
    • medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home-based health evaluations
    • DSS for capital equipment: locomotives, gas-turbine engines, office equipment
motivation iii
Motivation (III)
  • Lymph-node pathology diagnosis
  • Manufacturing control
  • Software diagnosis
  • Information retrieval
  • Types of tasks
    • Classification/Regression
    • Sensor Fusion
    • Prediction/Forecasting
outline
Outline
  • Existing uses of Belief Nets (BNs)
  • How to reason with BNs
  • Specific Examples of BNs
  • Contrast with Rules, Neural Nets, …
  • Possible applications of BNs
  • Challenges
    • How to reason efficiently
    • How to learn BNs
slide8

Symptoms

Blah blah ouch yak

ouch blah ouch blah

blah ouch blah

Chief complaint

History, …

Signs

Physical Exam

Test results, …

Plan

Diagnosis

Treatment, …

objectives decision support system
Objectives: Decision Support System
  • Determine
    • which tests to perform
    • which repairto suggest

based on costs, sensitivity/specificity, …

  • Use all sources of information
    • symbolic (discrete observations, history, …)
    • signal(from sensors)
  • Handle partial information
  • Adapt to track fault distribution
underlying task
Underlying Task
  • Approach1: Use set of obs1 & … & obsm  Dxirules

but… Need rule for each situation

  • for each diagnosis Dxr
  • for each set of possible values vj for Oj
  • for each subset of obs. {Ox1, Ox2, … }  {Oj}
    • Can’t use

if only know Temp and BP

If Temp>100 & BP = High & Cough = Yes  DiseaseX

  • Situation: Given observations {O1=v1, … Ok=vk}

(symptoms, history, test results, …)

what is best DIAGNOSIS Dxi for patient?

  • Seldom Completely Certain
underlying task ii
Underlying Task, II
  • Approach 2: Compute Probabilities of Dxi

given observations { obsj }

P( Dx = u | O1= v1, …, Ok= vk )

  • Situation: Given observations {O1=v1, … Ok=vk}

(symptoms, history, test results, …)

what is best DIAGNOSIS Dxi for patient?

  • Challenge: How to express Probabilities?
how to deal with probabilities
How to deal with Probabilities

P( Dx=T, O1=T, O2=T, …, ON=T ) = 0.03

P( Dx=T, O1=T, O2=T, …, ON=F ) = 0.4

P( Dx=T, O1=F, O2=F, … , ON=T ) = 0

P( Dx=F, O1=F, O2=F, …, ON=F ) = 0.01

  • Then:Marginalize:

P( Dx = u, O1= v1,…,Ok= vk ) = ΣP( Dx = u , O1= v1 , …, Ok= vk, …, ON= vN )

Conditionalize:

P(O1 = v1,…, Ok = vk | Dx = u) P( Dx = u, O1 = v1,…,Ok = vk )

P( O1 = v1,…,Ok = vk)

P( Dx = u, O1=v1,..., Ok= vk,…, ON=vN )

  • Sufficient: “atomic events”:

for all 21+N values u {T, F}, vj{T, F}

  • But… even if binary Dx, 20 binary obs.’s.  >2,097,000 numbers!
problems with atomic events
Problems with “Atomic Events”

Belief Nets

  • Representation is not intuitive

 Should make “connections” explicit

use “local information”

P(Jaundice | Hepatitis), P(LightDim | BadBattery),…

  • Too many numbers – O(2N)
    • Hard to store
    • Hard to use
      • [Must add 2r values to marginalize r variables]
    • Hard to learn
      • [Takes O(2N) samples to learn 2N parameters]

 Include only necessary “connections”

slide14

? Hepatitis?

? Hepatitis,

not Jaunticed

but +BloodTest

?

Jaunticed

BloodTest

hepatitis example
Hepatitis Example
  • J B H P(J, B, H)
  • 0 0 0 0.03395
  • 0 0 1 0.0095
  • 0 1 0 0.0003
  • 0 1 1 0.1805
  • 1 0 0 0.01455
  • 0 1 0.038
  • 1 0 0.00045
  • 1 1 1 0.722

Option 1:

…Marginalize/Conditionalize, to get P( H=1 | J=0, B=1 ) …

H Hepatitis

J Jaundice

B (positive) Blood test

  • (Boolean) Variables:
    • Want P( H=1 | J=0, B=1 )
  • …, P(H=1 | B=1, J=1), P(H=1| B=0,J=0), …
  • Alternatively…
encoding causal links
Encoding Causal Links

P(H=1)

P(H=0)

H

0.05

0.95

h

P(B=1 | H=h)

P(B=0 | H=h)

B

1

0.95

0.05

0

0.03

0.97

h

b

P(J=1|h,b)

P(J=0|h,b)

J

1

1

0.8

0.2

1

0

0.8

0.2

0

1

0.3

0.7

0

0

0.3

0.7

  • Simple Belief Net:
  • Node ~ Variable

Link ~ “Causal dependency”

  • “CPTable” ~ P(child | parents)
slide17

Encoding Causal Links

H

B

J

  • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H)
  • J is INDEPENDENT of B, once we know H
  • Don’t need B J arc!
slide18

Encoding Causal Links

H

B

J

  • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H)
  • J is INDEPENDENT of B, once we know H
  • Don’t need B J arc!
slide19

Encoding Causal Links

H

B

J

  • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H)
  • J is INDEPENDENT of B, once we know H
  • Don’t need B J arc!
sufficient belief net
Sufficient Belief Net

H

B

J

P(J=0 | H=1)

Hence: P(H=1 | B=1, J=0 ) = P(H=1) P(B=1 | H=1) P(J=0 |B=1,H=1)

  • Requires: P(H=1) known
  • P(J=1 | H=1) known
  • P(B=1 | H=1) known
    • (Only 5 parameters, not 7)
factoring
“Factoring”

H

  • but… ONLY THROUGH H:
    • If know H=1, then likely that B=1
    • … doesn’t matter whether J=1 or J=0 !

B

J

P(J=0 | B=1, H=1) = P(J=0 | H=1)

  • Bdoes depend on J:
  • If J=1, then likely that H=1 B =1

N.b., BandJ ARE correlated a priori P(J | B )  P(J)

GIVEN H, they become uncorrelatedP(J | B, H) = P(J | H)

factored distribution
Factored Distribution

H Hepatitis

J Jaundice

B (positive) Blood test

P( B | J ) P ( B ) but

P( B | J,H ) = P ( B | H )

Age

ShoeSize

Reading

  • Symptoms independent, given Disease
  • ReadingAbility and ShoeSize are dependent,

P(ReadAbility | ShoeSize ) P(ReadAbility )

  • but become independent, given Age
  • P(ReadAbility | ShoeSize, Age ) = P(ReadAbility | Age)
na ve bayes
“Naïve Bayes”

P(H = hi )

P(Oj = vj | H = hi)

Independent: P(Oj | H, Ok,…) = P(Oj | H)

H

  • Given

...

O1

O2

On

  • Classification Task:

Given {O1 = v1, …, On = vn }

Findhi that maximizes (H = hi | O1 = v1, …, On = vn)

  • Find argmax {hi}
na ve bayes con t
Naïve Bayes (con’t)
  • If kDx’s and nOis,
      • requires only k priors, n * k pairwise-conditionals
    • (Not 2n+k… relatively easy to learn)

n

1+2n

2n+1 – 1

10

21

2,047

H

30

61

2,147,438,647

...

O1

O2

On

  • Normalizing term

(No need to compute, assame for allhi)

  • Easy to use for Classification
  • Can use even if some vjs not specified
bigger networks
Bigger Networks

GeneticPH

LiverTrauma

Hepatitis

Jaundice

Bloodtest

  • If GeneticPH, then expect Jaundice:

GeneticPH Hepatitis Jaundice

  • But only via Hepatitis: GeneticPH and not Hepatitis Jaundice
    • P( J | G ) P( J ) but
    • P( J | G,H ) = P( J | H)
  • Intuition: Show CAUSAL connections:

GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice

belief nets
Belief Nets

D

I

  • Given H = 1,
    • - D has no influence on J
    • - J has no influence on B
    • - etc.

H

J

B

  • DAG structure
    • Each node  Variable v
    • v depends (only) on its parents

+ conditional prob: P(vi | parenti = 0,1,… )

  • vis INDEPENDENT of non-descendants,

given assignments to its parents

less trivial situations
Less Trivial Situations

P(FHD=1)

FHD

0.001

f

P(MS=1 | FHD=f)

1

0.10

MS

f

m

P(D=1 | FHD=f, MS=m)

0

0.03

1

1

0.97

1

0

0.90

D

0

1

0.08

0

0

0.04

  • N.b., obs1 is not always independent of obs2 given H
  • Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression

MotherSuicide causes Depression (w/ or w/o F.H.Depression)

  • Here, P( D | MS, FHD ) P( D | FHD ) !
  • Can be done using Belief Network,
    • but need to specify:
  • P( FHD ) 1
  • P( MS | FHD ) 2
  • P( D | MS, FHD ) 4
alarm
ALARM
  • A Logical Alarm Reduction Mechanism
    • 8 diagnoses, 16 findings, …
extensions
Extensions
  • Find best values (posterior distr.) for

SEVERAL (> 1) “output” variables

  • Partial specification of “input” values
    • onlysubset of variables
    • only “distribution” of each input variable
  • General Variables
    • Discrete, but domain > 2
    • Continuous (Gaussian: x = i bi yifor parents {Y} )
  • Decision TheoryDecision Nets(Influence Diagrams) Making Decisions, not just assigning prob’s
  • Storing P( v | p1, p2,…,pk)General “CP Tables” 0(2k) Noisy-Or, Noisy-And, Noisy-Max “Decision Trees”
outline1
Outline
  • Existing uses of Belief Nets (BNs)
  • How to reason with BNs
  • Specific Examples of BNs
  • Contrastwith Rules, Neural Nets, …
  • Possible applications of BNs
  • Challenges
    • How to reason efficiently
    • How to learn BNs
belief nets vs rules
Belief Nets vs Rules
  • Oftensame nodes(rep’ning Propositions) but

BN: Cause  Effect

“Hep  Jaundice” P(J | H )

Rule: Effect  Cause

“Jaundice  Hep”

  • Both have “Locality”

Specific clusters (rules / connected nodes)

WHY?: Easier for people to reason CAUSALLY

even if use is DIAGNOSTIC

  • BN provide OPTIMAL way to deal with
    • + Uncertainty
    • + Vagueness (var not given, or only dist)
    • + Error
    • …Signals meeting Symbols …
  • BN permits different “direction”s of inference
belief nets vs neural nets
Belief Nets vs Neural Nets
  • Both have “graph structure” but

BN: Nodes have SEMANTICs

Combination Rules: Sound Probability

NN: Nodes: arbitrary

Combination Rules: Arbitrary

  • So harder to
      • Initialize NN
      • Explain NN

(But perhaps easier to learn NN from examples only?)

  • BNs can deal with
      • Partial Information
      • Different “direction”s of inference
belief nets vs markov nets
Belief Nets vs Markov Nets
  • Technical: BNs use DIRECTRED arcs
    •  allow “induced dependencies”
    • I (A, {}, B) “A independent of B, given {}”
    • ¬ I (A, C, B) “A dependent on B, given C”

A

B

C

A

  • MNs use UNDIRECTED arcs
  •  allow other independencies
  • I(A, BC, D) A independent of D, given B, C
  • I(B, AD, C) B independent of C, given A, D

B

C

D

  • Each uses “graph structure”

to FACTOR a distribution

… explicitly specify dependencies, implicitly independencies…

  • but subtle differences…
    • BNs capture “causality”, “hierarchies”
    • MNs capture “temporality”
uses of belief nets 1
Uses of Belief Nets #1
  • Medical Diagnosis: “Assist/Critique” MD
    • identify diseases not ruled-out
    • specify additional tests to perform
    • suggest treatments appropriate/cost-effective
    • react to MD’s proposed treatment
  • Decision Support: Find/repair faults in complex machines

[Device, or Manufacturing Plant, or …]

… based on sensors, recorded info, history,…

  • Preventative Maintenance:

Anticipate problems in complex machines

[Device, or Manufacturing Plant, or …]

…based on sensors, statistics, recorded info, device history,…

uses con t
Uses (con’t)
  • Logistics Support: Stock warehouses appropriately…based on (estimated) freq. of needs, costs,
  • Diagnose Software:Find most probable bugs, given program behavior, core dump, source code, …
  • Part Inspection/Classification:… based on multiple sensors, background, model of production,…
  • Information Retrieval: Combine information from various sources, based on info from various “agents”,…

General: Partial Info, Sensor fusion

-Classification -Interpretation

-Prediction -…

challenge 1 computational efficiency
Challenge #1Computational Efficiency
  • For given BN:
    • General problem is
      • Given
      • Compute
  • + If BN is “poly tree”,  efficient alg.
  • - If BN is gen’l DAG (>1 path from X to Y)
    • - NP-hard in theory
    • - slow in practice
  • Tricks: Get approximate answer (quickly)
    • + Use abstraction of BN
    • + Use “abstraction” of query (range)

O1 = v1, …, On = vn

D

I

P(H | O1 = v1, …, On = vn)

H

J

B

2a obtaining accurate bn
# 2a:Obtaining Accurate BN
  • BN encodes distribution over n variables
  • Not O(2n) values, but “only” i 2k_i
  • (Node ni binary, with kiparents)
  • Still lots of values! …structure ..
  • Qualitative Information
      • Structure: “What depends on what?”
    • Easy for people (background knowledge)
    • But NP-hard to learn from samples…
  • Quantitative Information
    • Actual CP-tables
    • Easy to learn, given lots of examples.
    • But people have hard time…

Knowledge acquisition: from human experts

Simple learning algorithm

notes on learning
Notes on Learning

structure

values

  • Just Learning Algorithm: algorithms that
  • learn from sample
  • Mixed Sources: Person provides structure;

Algorithm fills-in numbers.

  • Just Human Expert: People produce CP-table, as well as structure
    • Relatively few values really required
      • Esp. if NoisyOr, NoisyAnd, NaiveBayes, …
    • Actual values not that important
      • …Sensitivity studies
my current work
My Current Work
  • Learning Belief Nets
    • Model selection:

Challenging myth that MDL is appropriate criteria

    • Learning “performance system”, not model
  • Validating Belief Nets

“Error bars” around answers

  • Adaptive User Interfaces
  • Efficient Vision Systems
  • Foundations of Learnability
    • Learning Active Classifiers
    • Sequential learners
  • Condition Based maintenance, Bio-signal interpretation, …
2b maintaining accurate bn
# 2b: Maintaining Accurate BN
  • The world changes.

Information in BN*may be

      • perfect at time t
      • sub-optimal at time t + 20
      • worthless at time t + 200
  • Need to MAINTAIN a BN over time

using on-going human consultant

  • Adaptive BN
    • Dirichlet distribution (variables)
    • Priors over BNs
conclusions
Conclusions
  • Belief Nets are PROVEN TECHNOLOGY
    • Medical Diagnosis
    • DSS for complex machines
    • Forecasting, Modeling, InfoRetrieval…
  • Provide effective way to
    • Represent complicated, inter-related events
    • Reason about such situations
      • Diagnosis, Explanation, ValueOfInfo
      • Explain conclusions
      • Mix Symbolic and Numeric observations
  • Challenges
    • Efficient ways to use BNs
    • Ways to create BNs
    • Ways to maintain BNs
    • Reason about time
extra slides
Extra Slides
  • AI Seminar
    • Friday, noon, CSC3-33
    • Free PIZZA!
    • http://www.cs.ualberta.ca/~ai/ai-seminar.html
  • References

http://www.cs.ualberta.ca/~greiner/bn.html

  • Crusher Controller
  • Formal Framework
  • Decision Nets
  • Developing the Model
  • Why Reasoning is Hard
  • Learning Accurate Belief Nets
references
References
  • http://www.cs.ualberta.ca/~greiner/bn.html
  • Overview textbooks:
    • Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988.
    • Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 1995. (See esp Ch 14, 15, 19.)
  • General info re BayesNets
    • http://www.afit.af.mil:80/Schools/EN/ENG/LABS/AI/BayesianNetworks
    • Proceedings: http://www.sis.pitt.edu/~dsl/uai.html
    • Assoc for Uncertainty in AI http://www.auai.org/
  • Learning:
    • David Heckerman, A tutorial on learning with Bayesian networks, 1995,
    • http://www.research.microsoft.com/research/dtg/heckerma/TR-95-06.htm
  • Software:
    • General: http://bayes.stat.washington.edu/almond/belief.html
    • JavaBayes http://www.cs.cmu.edu/~fgcozman/Research/JavaBaye
    • Norsys http://www.norsys.com/
utility decision nets
Utility: Decision Nets
  • Given c( action, state)  R(cost function)

Cp(a) = Es[ c(a,s) ] =  sS p(s | obs) * c(a, s)

  • Best (immediate) action: a* = argmina A {Cp(a) }
  • Decision Net (like Belief Net) but…
    • 3 types of nodes
      • chance (like Belief net)
      • action – repair, sensing
      • cost/utility
    • Links for “dependency”
    • Given observations, obs, computes best action, a*
  • Sequence of Actions: MDPs, POMDPs, …

Go Back

formal framework
Formal Framework
  • Always true:
      • P(x1, …,xn) = P(x1) P(x2 | x1) P (x3 | x2, x1) … P (xn | xn-1,…,x1)
  • Given independencies,
      • P(xk | x1,…,xk-1) = P (xk | pak) for some pak{x1, …, xk-1}
  • Hence

So just connect each ypai to xi…  DAG structure

Note: -Size of BN is

.

so better to use small pai.

-pai = {1,…,i – 1} is never incorrect

… but seldom min’l…

(so hard to store, learn, reason with,…)

- Order of variables can make HUGE difference

Can have |pai| = 1 for one ordering

|pai| =i– 1 for another

Go Back

developing the model
Developing the Model
  • Source of information
  • + (Human) Expert (s)
  • + Data from earlier Runs
  • + Simulator
  • Typical Process
  • 1. Develop / Refine Initial Prototype
  • 2. Test Prototype ↦ Accurate System
  • 3. Deploy System
  • 4. Update / Maintain System
develop refine prototype
Develop/Refine Prototype
  • Requires expert

useful to have data

  • Initial Interview(s):

To establish “what relates to what”

Expert time: ≈ ½ - day

  • Iterative process: (Gradual refinement)

To refine qualitative connections

To establish correct operations

Expert presents “Good Performance”

KE implements Expert’s claims

KE tests on examples (real data or expert),

and reports to Expert

Expert time: ≈ 1 – 2 hours / week for ?? Weeks

(Depends on complexity of device, and accuracy of model)

Go Back

why reasoning is hard
Why Reasoning is Hard

Z

A

B

C

  • BN reasoning may look easy:

Just “propagate” information from node to node

  • Challenge: What is P(C=t)?

A = Z = ¬B P ( A = t ) = P ( B = f ) = ½

So… ? P ( C = t ) = P ( A = t, B = t)

= P ( A = t) * P( B = t) = ½ * ½ = ¼

Wrong: P ( C = t ) = 0!

  • Need to maintain dependencies!P ( A = t, B = t ) = P ( A = t ) * P ( B = t | A = t)

Go Back

crusher controller
Crusher Controller
  • Given observations
    • History, sensor readings, schedule, …
  • Specify best action for crusher
    • “stop immediately”, “increase roller speed by ”
  • Best == minimize expected cost …
  • Initially: just recommendation to human operator
  • Later: Directly implement (some) actions
      • ?Request values of other sensors?
approach
Approach
  • For each state s

(“Good flow”, “tooth about to enter”, …)

for each action a

(“Stop immediately”, “Change p7 += 0.32”, …)

determine utility of performing a in s

(Cost of lost production if stopped;

… of reduced production efficient if continue; …)

  • Use observations to estimate (dist over) current states

Infer EXPECTED UTILITY of each action, based on distr.

  • Return action with highest Expected Utility
details
Inputs

Sensor Readings (history)

Camera, microphone, power-draw

Parameter settings

Log files, Maintenance records

Schedule (maintenance, anticipated load, …)

Outputs

Continue as is

Adjust parameters

GapSize, ApronFeederSpeed, 1J_ConveyorSpeed

Shut down immediately

Step adding new material

Tell operator to look

State “CrusherEnvironment”

#UncrushableThingsNowInCrusher

#TeethMissing

NextUncrushableEntry

Control Parameters

Details
benefits
Benefits
  • Increase Crusher Effectiveness
    • Find best settings for parameters
    • To maximize production of well-sized chunks
  • Reduce Down Time
    • Know when maintain/repair is critical
  • Reduce Damage to Crusher
  • Usable Model of Crusher
    • Easy to modify when needed
    • Training
    • Design of next generation
  • Prototype for design of {control, diagnostician} of other machines

Go Back

my background
My Background
  • PhD, Stanford(Computer Science)
    • Representational issues, Analogical Inference
    • … everything in Logic
  • PostDoc at UofToronto(CS)
    • Foundations of learnability, logical inference, DB, control theory, …
    • … everything in Logic
  • Industrial research(Siemens Corporate Research)
    • Need to solve REAL problems
      • Theory Revision, Navigational systems, …
    • …logic is not be-all-and-end-all!
  • Prof at UofAlberta (CS)
    • Industrial problems (Siemens, BioTools, Syncrude)
    • Foundations of learnability, probabilistic inference …
less trivial situations1
Less Trivial Situations

P(FHD=1)

0.001

  • N.b., obs1 is not always independent of obs2 given H
  • Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression

MotherSuicide causes Depression (w/ or w/o F.H.Depression)

FHD

MS

D

  • Here, P( D | MS, FHD ) P( D | FHD ) !
  • Can be done using Belief Network,
    • but need to specify:
  • P( FHD ) 1
  • P( MS | FHD ) 2
  • P( D | MS, FHD ) 4