1 / 76

CCF 贝叶斯网络在中国的应用和发展学术沙龙

CCF 贝叶斯网络在中国的应用和发展学术沙龙. 香港科技大学 BN 理论研究和应用的情况 2012-05-22. Early Work (1992-2002) Inference: Variable Elimination Inference: Local Structures Others: Learning, Decision Making, Book Latent Tree Models (2000 - ) Theory and Algorithms Applications

mills
Download Presentation

CCF 贝叶斯网络在中国的应用和发展学术沙龙

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCF贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN理论研究和应用的情况 2012-05-22

  2. Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making, Book Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

  3. Bayesian Networks

  4. Papers: • N. L. Zhang and D. Poole (1994), A simple approach to Bayesian network computations, in Proc. of the 10th Canadian Conference on Artificial Intelligence, Banff, Alberta, Canada, May 16-22. • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. Idea Variable Elimination

  5. Variable Elimination

  6. Variable Elimination

  7. First BN inference algorithm in Variable Elimination • Russell & Norvig wrote on page 529: • “The algorithm we describe is closest to that developed by Zhang and Poole (1994, 1996)” • Koller and Friedman wrote on page: • “… the variable elimination algorithm, as presented here, first described by Zhang and Poole (1994), …” • The K&F book cites 7 of our papers

  8. Local Structure

  9. Papers: • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. • N. L. Zhang and D. Poole (1994), Intercausal independence and heterogeneous factorization,i in Proc. of the 10th Conference on Uncertainties in Artificial Intelligence., Seattle, USA, July 29-31 Local Structures: Causal Independence

  10. Local Structures: Causal Independence

  11. Papers: • N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning IJCAI-99, 1288-1293. • D. Poole and N. L. Zhang (2003). Exploiting contextual independence in probablisitic inference.Journal of Artificial Intelligence Research, 18: 263-313. Local Structure: Context Specific Independence

  12. Parameter Learning • N. L. Zhang (1996), Irrelevance and parameter learning in Bayesian networks, Artificial Intelligence, An International Journal, 88: 359-373. Decision Making • N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams,Computational Intelligence , 14(4):  475-497. • N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision networks, International Journal of Approximate Reasoning, 1994, 11 (2): 83-158.  PhD Thesis Other Works

  13. Other Works

  14. Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

  15. Concept first mentioned by Pearl 1988 We are the first one to conduct systematic research on LTMs. • N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. AAAI-02, 230-237. • N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697--723, 2004. Earlier Followers: • Aarlborg U of Denmark, Norwegian University of Science and Technology Recent papers from: • MIT, CMU, USC, Goergia Tech, Edinburgh Latent Tree Models: Overview

  16. Recent survey by French researcher: Latent Tree Models

  17. Latent Tree Models (LTM) • Bayesian networks with • Rooted tree structure • Discrete random variables • Leaves observed (manifest variables) • Internal nodes latent (latent variables) • Also known as hierarchical latent class (HLC)models, HLC models P(Y1), P(Y2|Y1), P(X1|Y2), P(X2|Y2), …

  18. Example • Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence

  19. Theory: Root Walking and Model Equivalence • M1: root walks to X2; M2: root walks to X3 • Root walking leads to equivalent models on manifest variables • Implications: • Cannot determine edge orientation from data • Can only learn unrooted models

  20. Regularity • Regular latent tree models: For any latent node Z with neighbors X1, X2, …, • Can focus on regular models only • Irregular models can be made regular • Regularized models better than irregular models • The set of all such models is finite.

  21. Standard dimension: • Number of free parameters Effective dimension • X1, X2, …, Xn: observed variables • P(X1, X2, …, Xn) is a point in a high-D space for each value of the parameter • Spans a manifold as parameter value varies. • Effective dimension: dimension of the manifold. Parsimonious model: • Standard dimension = effective dimension • Open question: How to test parsimony? Effective Dimension

  22. Paper: • N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical latent class models.Journal of Artificial Intelligence Research, 21: 1-17. Effective Dimension Open question: Effective of LTM with one latent variable

  23. Learning Latent Tree Models Determine • Number of latent variables • Cardinality of each latent variable • Model Structure • Conditional probability distributions

  24. Search-Based Learning: Model Selection • Bayesian score: posterior probability P(m|D) • P(m|D)= P(m)∫P(D|m, θ) d θ/ P(D) • BIC Score: large sample approximation BIC(m|D) = log P(D|m, θ*) – d logN/2 • BICe Score: BICe(m|D) = log P(D|m, θ*) – de logN/2 effective dimensionde. • Effective dimensions are difficult to compute • BICe not realistic

  25. Search Algorithms • Papers: • T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence,  176(1), 2246-2269. • N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. ICTAI-2004 • Double hill climbing (DHC), 2002 • 7 manifest variables. • Single hill climbing (SHC), 2004 • 12 manifest variables • Heuristic SHC (HSHC), 2004 • 50 manifest variables • EAST, 2011 • 100+ manifest variables • Recent fast algorithm for specific applications.

  26. Illustration of the search process

  27. Variable clustering method • S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011). IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087-1097. • Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16. • Fast, model quality may be poor Adaptation of Evolution Tree Algorithms • Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky (2011). Learning latent tree graphical models. Journal of Machine Learning Research 1 (2011) 1-48. • Fast, has consistence proof, for special LTMs only Algorithm by Others

  28. Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

  29. Density Estimation • Characteristics of LTMs • Are computationally very simple to work with. • Can represent complex relationships among manifest variables. • Useful tool for density estimation.

  30. Density Estimation • New approximate inference algorithm for Bayesian networks (Wang, Zhang and Chen, AAAI 08, Exceptional Paper) Sample LTAB Algo sparse sparse dense dense

  31. Multidimensional Clustering Paper: T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence,  176(1), 2246-2269. Cluster Analysis Grouping of objects into clusters so that objects in the same cluster are similar in some sense Page 31

  32. How to Cluster Those? Page 32

  33. How to Cluster Those? Page 33 Style of picture

  34. How to Cluster Those? Page 34 Type of object in picture

  35. How to Cluster Those? Page 35 Multidimensional clustering / Multi-Clustering • How to partition data in multiple ways? • Latent tree models

  36. Latent Tree Models & Multidimensional Clustering • Model relationship between • Observed / Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence • Each latent variable gives a partition • Intelligence: Low, medium, high • Analytic skill: Low, medium, high

  37. ICAC Data // 31 variables, 1200 samples C_City: s0 s1 s2 s3 // very common, quit common, uncommon, .. C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,... Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 ….

  38. Latent Structure Discovery Y2: Demographic info; Y3: Tolerance toward corruption Y4: ICAC performance; Y7: ICAC accountability Y5: Change in level of corruption; Y6: Level of corruption

  39. Multidimensional Clustering Y2=s0: Low income youngsters; Y2=s1: Women with no/low income Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income

  40. Multidimensional Clustering Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus Y3=s0: Same attitude towardC-Gov and C-Bus People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv

  41. Multidimensional Clustering Interesting finding: Relationship btw background and tolerance toward corruption Y2=s2: ( good education and good income) the least tolerant. 4% tolerable Y2=s3: (poor education and average income) the most tolerant. 32% tolerable The other two classes are in between.

  42. Marketing Data

  43. Page 43 Latent Tree Analysis of Text Data • The WebKB Data Set • 1041 web pages collected from 4 CS departments in 1997 • 336 words

  44. Page 44 Latent Tree Model for WebKB Data by BI Algorithm 89 latent variables

  45. Latent Tree Modes for WebKB Data

  46. Page 46

  47. Page 47

  48. Topic • A latent state • A collection of document A document can belong to multiple topics 100% LTM for Topic Detection

  49. LTM • Topic • A latent state • A collection of document • A document can belong to multiple topics 100% LDA • Topic: • Distribution over the entire vocabulary. • The probabilities of the words add to one. • Document: • Distribution over topics. • If a document contains more of one topic, then it contains less of other topics. LTM vs LDA for Topic Detection

  50. Latent Tree Analysis Summary Page 50 • Finds meaningful facets of data • Identify natural clusters along each facet. • Gives clear picture of what is in data.

More Related