A Tutorial on Inference and Learning in Bayesian Networks

A Tutorial on Inference and Learning in Bayesian Networks

From:
 |
(132) |   (0) |   (0)
Rate Presentation: 0 0
Description:
Road map\". Introduction: Bayesian networksWhat are BNs: representation, types, etcWhy use BNs: Applications (classes) of BNsInformation sources, software, etcProbabilistic inference Exact inferenceApproximate inferenceLearning Bayesian NetworksLearning parametersLearning graph structureSu
A Tutorial on Inference and Learning in Bayesian Networks

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

1. A Tutorial on Inference and Learning in Bayesian Networks Irina Rish Moninder Singh IBM T.J.Watson Research Center rish,moninder@us.ibm.com

2. Road map Introduction: Bayesian networks What are BNs: representation, types, etc Why use BNs: Applications (classes) of BNs Information sources, software, etc Probabilistic inference Exact inference Approximate inference Learning Bayesian Networks Learning parameters Learning graph structure Summary

3. Bayesian Networks

5. Example: Printer Troubleshooting (Microsoft Windows 95)

8. Independence Assumptions

9. Independence Assumptions

18. Diagnosis: P(cause|symptom)=? What are BNs useful for?

19. What are BNs useful for?

20. What are BNs useful for?

21. Why use BNs?

22. Application Examples

23. Application Examples

24. Application Examples

25. Application Examples

26. Application Examples

27. Application Examples

28. Application Examples

29. Online/print resources on BNs

30. Online/Print Resources on BNs

31. Publicly available s/w for BNs

32. Road map Introduction: Bayesian networks What are BNs: representation, types, etc Why use BNs: Applications (classes) of BNs Information sources, software, etc Probabilistic inference Exact inference Approximate inference Learning Bayesian Networks Learning parameters Learning graph structure Summary

34. Belief Updating

35. Belief updating: P(X|evidence)=?

36. Bucket elimination Algorithm elim-bel (Dechter 1996)

37. Finding Algorithm elim-mpe (Dechter 1996)

38. Generating the MPE-tuple

39. Complexity of inference

41. Relationship with join-tree clustering

42. Relationship with Pearls belief propagation in poly-trees

43. Road map Introduction: Bayesian networks Probabilistic inference Exact inference Approximate inference Learning Bayesian Networks Learning parameters Learning graph structure Summary

44. Inference is NP-hard => approximations

45. Local Inference Idea

46. Bucket-elimination approximation: mini-buckets Local inference idea: bound the size of recorded dependencies Computation in a bucket is time and space exponential in the number of variables involved Therefore, partition functions in a bucket into mini-buckets on smaller number of variables

48. Approx-mpe(i) Input: i max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound]

49. Properties of approx-mpe(i) Complexity: O(exp(2i)) time and O(exp(i)) time. Accuracy: determined by upper/lower (U/L) bound. As i increases, both accuracy and complexity increase. Possible use of mini-bucket approximations: As anytime algorithms (Dechter and Rish, 1997) As heuristics in best-first search (Kask and Dechter, 1999) Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)

50. Anytime Approximation

51. Empirical Evaluation (Dechter and Rish, 1997; Rish, 1999) Randomly generated networks Uniform random probabilities Random noisy-OR CPCS networks Probabilistic decoding Comparing approx-mpe and anytime-mpe versus elim-mpe

52. Random networks Uniform random: 60 nodes, 90 edges (200 instances) In 80% of cases, 10-100 times speed-up while U/L<2 Noisy-OR even better results Exact elim-mpe was infeasible; appprox-mpe took 0.1 to 80 sec.

53. CPCS networks medical diagnosis (noisy-OR model)

54. Effect of evidence

55. Probabilistic decoding

56. approx-mpe vs. IBP

57. Mini-buckets: summary

58. Road map Introduction: Bayesian networks Probabilistic inference Exact inference Approximate inference Local inference Stochastic simulations Variational approximations Learning Bayesian Networks Summary

59. Approximation via Sampling

60. Forward Sampling (logic sampling (Henrion, 1988))

61. Forward sampling (example)

62. Likelihood Weighing (Fung and Chang, 1990; Shachter and Peot, 1990)

63. Gibbs Sampling (Geman and Geman, 1984)

64. Gibbs Sampling (contd) (Pearl, 1988)

65. Road map Introduction: Bayesian networks Probabilistic inference Exact inference Approximate inference Local inference Stochastic simulations Variational approximations Learning Bayesian Networks Summary

66. Variational Approximations Idea: variational transformation of CPDs simplifies inference Advantages: Compute upper and lower bounds on P(Y) Usually faster than sampling techniques Disadvantages: More complex and less general: must be derived for each particular form of CPD functions

67. Variational bounds: example

68. Convex duality (Jaakkola and Jordan, 1997)

69. Example: QMR-DT network (Quick Medical Reference Decision-Theoretic (Shwe et al., 1991))

70. Inference in QMR-DT

71. Variational approach to QMR-DT (Jaakkola and Jordan, 1997)

72. Variational approximations Bounds on local CPDs yield a bound on posterior Two approaches: sequential and block Sequential: applies variational transformation to (a subset of) nodes sequentially during inference using a heuristic node ordering; then optimizes across variational parameters Block: selects in advance nodes to be transformed, then selects variational parameters minimizing the KL-distance between true and approximate posteriors

73. Block approach

74. Inference in BN: summary Exact inference is often intractable => need approximations Approximation principles: Approximating elimination local inference, bounding size of dependencies among variables (cliques in a problems graph). Mini-buckets, IBP Other approximations: stochastic simulations, variational techniques, etc. Further research: Combining orthogonal approximation approaches Better understanding of what works well where: which approximation suits which problem structure Other approximation paradigms (e.g., other ways of approximating probabilities, constraints, cost functions)

75. Road map Introduction: Bayesian networks Probabilistic inference Exact inference Approximate inference Learning Bayesian Networks Learning parameters Learning graph structure Summary

76. Why learn Bayesian networks?

77. Learning Bayesian Networks

78. Learning Parameters: complete data ML-estimate:

79. Learning graph structure

80. Learning BNs: incomplete data

81. Learning Parameters: incomplete data

82. Learning Parameters: incomplete data Complete-data log-likelihood is E step Compute E( Nijk | Yobs, ?? M step Compute ???????????E( Nijk | Yobs, ?????E( Nij | Yobs, ???

83. Learning structure: incomplete data

84. Learning structure: incomplete data

85. Learning structure: incomplete data

86. Scoring functions: Minimum Description Length (MDL) Learning ? data compression Other: MDL = -BIC (Bayesian Information Criterion) Bayesian score (BDe) - asymptotically equivalent to MDL

87. Learning Structure plus Parameters

88. Model Selection

89. One Reasonable Score: Posterior Probability of a Structure

90. Global and Local Predictive Scores

91. Local Predictive Score Spiegelhalter et al. (1993)

92. Exact computation of p(D|Sh) No missing data Cases are independent, given the model. Uniform priors on parameters discrete variables

93. Bayesian Dirichlet Score Cooper and Herskovits (1991)

94. Learning BNs without specifying an ordering n! ordering; ordering greatly affects the quality of network learned. use conditional independence tests, and d-separation to get an ordering

95. Learning BNs via the MDL principle Idea: best model is that which gives the most compact representation of the data So, encode the data using the model plus encode the model. Minimize this.

96. Learning BNs: summary Bayesian Networks graphical probabilistic models Efficient representation and inference Expert knowledge + learning from data Learning: parameters (parameter estimation, EM) structure (optimization w/ score functions e.g., MDL) Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN-BLT(SRI)) Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.

Presentation Statistics
Views on SlideServe : 84
Views from Embeds : 0

Presentation Categories

Other Related Presentations