1 / 0

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks. David Heckerman. What is a Bayesian Network?. “a graphical model for probabilistic relationships among a set of variables.”. Why use Bayesian Networks?. Don’t need complete data set Can learn causal relationships

adriel
Download Presentation

A Tutorial on Learning with Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Tutorial on Learning with Bayesian Networks

    David Heckerman
  2. What is a Bayesian Network? “a graphical model for probabilistic relationships among a set of variables.”
  3. Why use Bayesian Networks? Don’t need complete data set Can learn causal relationships Combines domain knowledge and data Avoids overfitting – don’t need test data
  4. Probability 2 types Bayesian Classical
  5. Bayesian Probability ‘Personal’ probability Degree of belief Property of person who assigns it Observations are fixed, imagine all possible values of parameters from which they could have come “I think the coin will land on heads 50% of the time”
  6. Classical Probability Property of environment ‘Physical’ probability Imagine all data sets of size N that could be generated by sampling from the distribution determined by parameters. Each data set occurs with some probability and produces an estimate “The probability of getting heads on this particular coin is 50%”
  7. Notation Variable: X State of X = x Set of variables: Y Assignment of variables (configuration): y Probability that X = x of a person with state of information ξ: Uncertain variable: Θ Parameter: θ Outcome of lth try: Xl D = {X1 = x1, ... XN = xN} observations
  8. Example Thumbtack problem: will it land on the point (heads) or the flat bit (tails)? Flip it N times What will it do on the N+1th time? How to compute p(xN+1|D, ξ) from p(θ|ξ)?
  9. Step 1 Use Bayes’ rule to get probability distribution for Θ given D and ξ where
  10. Step 2 Expand p(D|θ,ξ) – likelihood function for binomial sampling Observations in D are mutually independent – probability of heads is θ and tails is 1- θ Substitute into the previous equation...
  11. Step 3 Average over possible values of Θ to determine probability Ep (θ|D,ξ)(θ) is the expectation of θw.r.t. the distribution p(θ|D,ξ)
  12. Prior Distribution The prior is taken from a beta distribution: P(θ|ξ) = Beta (θ|αh, αt) αh, αt are hyperparameters to distinguish from the θ parameter – sufficient statistics Beta prior means posterior is beta too
  13. Assessing the prior Imagined future data: Assess probability in first toss of thumbtack Imagine you’ve seen outcomes of k flips Reassess probability Equivalent samples Start with Beta(0,0) prior, observe αh, αt heads and tails – posterior will be Beta(αh, αt) Beta (0,0) is state of minimum information Assess αh, αt by determining number of observations of heads and tails equivalent to our current knowledge
  14. Can’t always use Beta prior What if you bought the thumbtack in a magic shop? It could be biased. Need a mixture of Betas – introduces hidden variable H
  15. Distributions We’ve only been talking about binomials so far Observations could come from any physical probability distribution We can still use Bayesian methods. Same as before: Define variables for unknown parameters Assign priors to variables Use Bayes’ rule to update beliefs Average over possible values of Θ to predict things
  16. Exponential Family For distributions in the exponential family – Calculation can be done efficiently and in closed form E.g. Binomial, multinomial, normal, Gamma, Poisson...
  17. Exponential Family Bernardo and Smith (1994) compiled important quantities and Bayesian computations for commonly used members of the family Paper focuses on multinomial sampling
  18. Multinomial sampling X is discrete – r possible states x1 ... xr Likelihood function: Same number of parameters as states Parameters = physical probabilities Sufficient statistics for D = {X1 = x1, ... XN = xN}: {N1, ... Nr} where Ni is the number of times X = xi in D
  19. Multinomial Sampling Prior used is Dirichlet: P(θ|ξ) = Dir(θ|α1, ..., αr) Posterior is Dirichlet too P(θ|ξ) = Dir(θ|α1+N1, ..., αr+Nr) Can assess this same way you can Beta distribution
  20. Bayesian Network Network structure of BN: Directed acyclic graph (DAG) Each node of the graph represents a variable Each arc asserts the dependence relationship between the pair of variables A probability table associating each node to its immediate parent nodes
  21. Bayesian Network (cont’d) A Bayesian network for detecting credit-card fraud Direction of arcs: from parent to descendant node Parents of node Xi: Pai Pa(Jewelry) = {Fraud, Age, Sex}
  22. Bayesian Network (cont’d) Network structure: S Set of variables: Parents of Xi: Pai Joint distribution of X: Markov condition: ND(Xi) = nondescendent nodes of Xi
  23. Constructing BN Given set (chain rule of prob) Now, for every Xi: are cond. independent given such that Xiand X\ Pai
  24. Constructing BN (cont’d) Using the ordering (F,A,S,G,J) But by using the ordering (J,G,S,A,F) we obtain a fully connected structure Use some prior assumptions of the causal relationships among variables
  25. Inference in BN The goal is to compute any probability of interest (probabilistic inference) Inference (even approximate) in an arbitrary BN for discrete variables is NP-hard (Cooper, 1990 / Dagum and Luby, 1993) Most commonly used algorithms: Lauritzen & Spiegelhalter (1988), Jensen et al. (1990) and Dawid (1992) basic idea: transform BN to a tree – exploit mathematical Properties of that tree
  26. Inference in BN (cont’d)
  27. Learning in BN Learning the parameters from data Learning the structure from data Learning the parameters: known structure, data is fully observable
  28. Learning parameters in BN Recall thumbtack problem: Step 1: Step 2: expand p(D|θ,ξ) Step 3: Average over possible values of Θ to determine probability
  29. Learning parameters in BN (cont’d) Joint probability distribution: : Hypothesis of structure S θi: vectors of parameters for the local distribution θs: vector of {θ1,θ2,...,θN} D = {X1,X2,...XN} random sample Goal is to calculate the posterior distribution:
  30. Learning parameters in BN (cont’d) Illustration with multinomial distr. : Each X1 is discrete: values from Local distr. is a collection of multinomial distros, one for each config of Pai configurations of Pai mutually independent
  31. Learning parameters in BN (cont’d) Parameter independence: Therefore: We can update each vector of θijindependently Assume that prior distr. of θijis Thus, posterior distr. of θijis: where Nijk is the number of cases in D in which and
  32. Learning parameters in BN (cont’d) To compute , we have to average over possible conf of θs : Using parameter independence: we obtain: where
More Related