1 / 19

PatReco: Bayesian Networks

Explore the concepts of Bayesian Networks including conditional independence, probability computation, training/parameter estimation, and inference/testing. Applications include medical diagnosis and computer problem diagnosis.

karenbailey
Download Presentation

PatReco: Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2009-2010

  2. Definitions • Bayesian networks consist of nodes and (usually directional) arcs • Nodes or states represent a classification class or in general events and are described with a pdf • Arcs represent relations between arcs, e.g., cause and effect, time sequence • Two nodes that are connected via another node are conditionally independent (given that node)

  3. When to use Bayesian nets • Bayesian networks (or networks of inference) are statistical models that are used for classification (or in general pattern recognition) problems where there are dependencies among classes, e.g., time dependencies, cause and effect dependencies

  4. Conditional Independence • Full independence between A and B P(A|B) = P(A) or P(A,B) = P(A) P(B) • Conditional independence of A, B given C P(A|BC) = P(A|C) or P(A,B|C) = P(A|C)P(B|C)

  5. B C A B A C A B C Conditional Independence A, C independent given B P(C|BA) = P(C|B) B,C independent given A P(B,C|A) = P(B|A)P(C|A) A,C dependent given B P(A,C|B) cannot be reduced!

  6. Three problems • Probability computation (use independence) • Training/Parameter Estimation • Maximum likelihood (ML) if all is observable • Expectation maximization (EM) if missing data • Inference (Testing) • Diagnosis P(cause|effect) bottom-up • Prediction P(effect|cause) top-down

  7. Probability Computation For a Bayesian Network that consists of N nodes: • Compute P(n1, n2 ..nN) using chain rule starting from the “last/bottom” node and working your way up P(n1, n2 ..nN) = P(nN| n1, n2 .. nN-1) P(nN-1 |n1, n2 .. nN-2 ) … P(n2 |n1) P(n1) • Identify conditional independence conditions from Bayesian network topology • Simplify the conditionals probabilities using independence conditions

  8. S C W R Probability Computation Topology: P(C,S,R,W) = P(W|C,S,R) P(S|CR) P(R|C)P(C) Independent: (W,C)|S,R (S,R)|C Dependent: (S,R)|W P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C)

  9. Probability Computation • There are general algorithms for identifying cliques in the Bayesian net • Cliques are islands of conditional dependence, i.e., terms in the probability computation that cannot be further reduced SC WSR RC

  10. Training/Parameter Estimation • Instead of estimating the joint pdf of the whole network the joint pdf of each of the cliques is estimated • For example if the network joint pdf is P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) instead of computing P(C,S,R,W) we compute each of P(W|S,R), P(S|C), P(R|C), P(C) for all possible values of W, S, R, C (much simpler)

  11. Training/Parameter Estimation • For fully observable data and discrete probabilities compute maximum likelihood estimates of parameters, e.g., for discrete probs counts(W=1,S=1,R=0) P(W=1|S=1,R=0)ML = _______________________ counts(W=*,S=1,R=0)

  12. Training/Parameter Estimation • Example: the following observations pairs are given for (W,C,S,R): • (1,0,1,0), (0,0,1,0),(1,1,1,0),(0,1,1,0),(1,0,1,0), (0,1,0,0),(1,0,0,1),(0,1,1,1),(1,1,1,0) • Using Maximum Likelihood Estimation: P(W=1|S=1,R=0)ML = #(1, *, 1, 0)/#(*,*,1,0) = 2/5 = 0.4

  13. Training/Parameter Estimation • When data is non observable or missing the EM algorithm is employed • There are efficient implementations of the EM algorithm for Bayesian nets that operate on the clique network • When the topology of the Bayesian network is not known structural EM can be used

  14. Inference • There are two types of inference (testing) • Diagnosis P(cause|effect) bottom-up • Prediction P(effect|cause) top-down Once • Once the parameters of the network are estimated the joint network pdf can be estimated for ALL possible network values • Inference is simply probability computation using the network pdf

  15. Inference • For example P(W=1|C=1) = P(W=1,C=1) / P(C=1) where P(W=1,C=1) = RSP(W=1,C=1,R=*,S=*) P(C=1) = RWSP(W=*,C=1,R=*,S=*)

  16. Inference • Efficient algorithms exist for performing inference in large networks which operate on the clique network • Inference is often shown as a probability maximization problem, e.g., what is the most probable cause or effect? argmaxW P(W|C=1)

  17. Continuous Case • In our examples the network nodes represented discrete events (states or classes) • Network nodes often hold continuous variables (observations), e.g., length, energy • For the continuous case parametric pdf are introduced and their parameters are estimated using ML (observed) or EM (hidden)

  18. Some Applications • Medical diagnosis • Computer problem diagnosis (MS) • Markov chains • Hidden Markov Models (HMMs)

  19. Conclusions • Bayesian networks are used to represent dependencies between classes • Network topology defines conditional independence conditions that simplify the network pdf modeling and computation • Three problems: probability computation, estimation/training, inference/testing

More Related