1 / 78

Graphical Models - Learning -

MINVOLSET. KINKEDTUBE. PULMEMBOLUS. INTUBATION. VENTMACH. DISCONNECT. PAP. SHUNT. VENTLUNG. VENITUBE. PRESS. MINOVL. FIO2. VENTALV. PVSAT. ANAPHYLAXIS. ARTCO2. EXPCO2. SAO2. TPR. INSUFFANESTH. HYPOVOLEMIA. LVFAILURE. CATECHOL. LVEDVOLUME. STROEVOLUME. ERRCAUTER. HR.

moke
Download Presentation

Graphical Models - Learning -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Advanced IWS 06/07 Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Graphical Models- Learning - Parameter Estimation Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany

  2. Outline • Introduction • Reminder: Probability theory • Basics of Bayesian Networks • Modeling Bayesian networks • Inference (VE, Junction tree) • [Excourse: Markov Networks] • Learning Bayesian networks • Relational Models

  3. What is Learning? • Agents are said to learn if they improve their performance over time based on experience. The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines. -- Tomasso Poggio and Steven Smale, 2003. - Learning

  4. Why bothering with learning? • Bottleneck of knowledge aquisition • Expensive, difficult • Normally, no expert is around • Data is cheap ! • Huge amount of data avaible, e.g. • Clinical tests • Web mining, e.g. log files • .... - Learning

  5. Why Learning Bayesian Networks? • Conditional independencies & graphical language capture structure of many real-world distributions • Graph structure provides much insight into domain • Allows “knowledge discovery” • Learned model can be used for many tasks • Supports all the features of probabilistic learning • Model selection criteria • Dealing with missing data & hidden variables

  6. B E A E B P(A | E,B) .9 .1 e b e b .7 .3 .8 .2 e b .99 .01 e b Learning With Bayesian Networks Data + Priori Info - Learning

  7. B E A Learning algorithm E B P(A | E,B) .9 .1 e b e b .7 .3 .8 .2 e b .99 .01 e b Learning With Bayesian Networks Data + Priori Info - Learning

  8. What does the data look like? attributes/variables complete data set X1 X2 data cases - Learning ... XM

  9. What does the data look like? incomplete data set • Real-world data: states of • some random variables are • missing • E.g. medical diagnose: not all patient are subjects to all test • Parameter reduction, e.g. clustering, ... - Learning

  10. What does the data look like? incomplete data set • Real-world data: states of • some random variables are • missing • E.g. medical diagnose: not all patient are subjects to all test • Parameter reduction, e.g. clustering, ... - Learning missing value

  11. What does the data look like? hidden/ latent incomplete data set • Real-world data: states of • some random variables are • missing • E.g. medical diagnose: not all patient are subjects to all test • Parameter reduction, e.g. clustering, ... - Learning missing value

  12. 2 2 2 2 2 2 118 34 16 16 64 32 4 4 4 Hidden variable – Examples • Parameter reduction: X1 X3 X2 X1 X3 X2 hidden H H Y2 Y3 Y1 Y2 Y3 Y1 - Learning

  13. Hidden variable – Examples • Clustering: • Hidden variables also appear in clustering • Autoclass model: • Hidden variables assignsclass labels • Observed attributes areindependent given the class hidden assignment of cluster Cluster Cluster ... attributes X1 X2 Xn - Learning

  14. [Slides due to Eamonn Keogh] - Learning

  15. [Slides due to Eamonn Keogh] Iteration 1 The cluster means are randomly assigned - Learning

  16. [Slides due to Eamonn Keogh] Iteration 2 - Learning

  17. [Slides due to Eamonn Keogh] Iteration 5 - Learning

  18. [Slides due to Eamonn Keogh] Iteration 25 - Learning

  19. [Slides due to Eamonn Keogh] What is a natural grouping among these objects?

  20. [Slides due to Eamonn Keogh] What is a natural grouping among these objects? Clustering is subjective Simpson's Family Females Males School Employees

  21. Learning With Bayesian Networks ? ? ? A B A B H A B - Learning

  22. Parameter Estimation • Let be set of data • over m RVs • is called a data case • iid - assumption: • All data cases areindependently sampled fromidenticaldistributions - Learning Find: Parameters of CPDs which match the data best

  23. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? - Learning

  24. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? Find paramteres which have most likely produced the data - Learning

  25. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? • MAP parameters - Learning

  26. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? • MAP parameters • Data is equally likely for all parameters. - Learning

  27. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? • MAP parameters • Data is equally likely for all parameters • All parameters are apriori equally likely - Learning

  28. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? Find: ML parameters - Learning

  29. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? Find: ML parameters Likelihood of the paramteres given the data - Learning

  30. Maximum Likelihood - Parameter Estimation • What does „best matching“ mean ? Find: ML parameters Likelihood of the paramteres given the data - Learning Log-Likelihood of the paramteres given the data

  31. Maximum Likelihood • This is one of the most commonly used estimators in statistics • Intuitively appealing • Consistent:estimate converges to best possible value as the number of examples grow • Asymptotic efficiency:estimate is as close to the true value as possible given a particular training set - Learning

  32. Learning With Bayesian Networks ? ? ? A B A B H A B ? - Learning

  33. B B E E A A Learning algorithm E E B B P(A | E,B) P(A | E,B) .9 ? .1 ? e e b b e e b b .7 ? .3 ? .8 ? .2 ? e e b b .99 ? .01 ? e e b b Known Structure, Complete Data E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> - Learning • Network structure is specified • Learner needs to estimate parameters • Data does not contain missing values

  34. ML Parameter Estimation - Learning

  35. ML Parameter Estimation (iid) - Learning

  36. ML Parameter Estimation (iid) - Learning

  37. ML Parameter Estimation (iid) (BN semantics) - Learning

  38. ML Parameter Estimation (iid) (BN semantics) - Learning

  39. ML Parameter Estimation (iid) (BN semantics) - Learning

  40. ML Parameter Estimation (iid) (BN semantics) Only local parameters of family of Aj involved - Learning

  41. ML Parameter Estimation (iid) (BN semantics) Only local parameters of family of Aj involved - Learning Each factor individually !!

  42. ML Parameter Estimation (iid) Decomposability of the likelihood (BN semantics) Only local parameters of family of Aj involved - Learning Each factor individually !!

  43. Decomposability of Likelihood If the data set if complete (no question marks) • we can maximize each local likelihood function independently, and • then combine the solutions to get an MLE solution. decompositionof the global problem to independent, local sub-problems. This allows efficient solutions to the MLE problem.

  44. Likelihood for Multinominals • Random variable V with 1,...,K values • where Nk is the counts of state k in data This constraint implies that the choice on I influences the choice on j (i<>j) - Learning

  45. Likelihood for Binominals (2 states only) • Compute partial derivative 1 + 2 =1 • Set partial derivative zero - Learning => MLE is

  46. Likelihood for Binominals (2 states only) • Compute partial derivative 1 + 2 =1 • Set partial derivative zero In general, for multinomials (>2 states), the MLE is - Learning => MLE is

  47. Likelihood for Conditional Multinominals • multinomial for each joint state pa of the parents of V: • MLE - Learning

  48. Learning With Bayesian Networks ? ? ? A B A B H A B - Learning ?

  49. B B E E A A Learning algorithm E E B B P(A | E,B) P(A | E,B) .9 ? .1 ? e e b b e e b b .7 ? .3 ? .8 ? .2 ? e e b b .99 ? .01 ? e e b b Known Structure, Incomplete Data E, B, A <Y,?,N> <Y,N,?> <N,N,Y> <N,Y,Y> . . <?,Y,Y> - Learning • Network structure is specified • Data contains missing values • Need to consider assignments to missing values

  50. EM Idea • In the case of complete data, ML parameter estimation is easy: • simply counting(1 iteration) Incomplete data ? • Complete data (Imputation) • most probable?, average?, ... value • Count • Iterate - Learning

More Related