CCF 贝叶斯网络在中国的应用和发展学术沙龙

CCF贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN理论研究和应用的情况 2012-05-22

Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making, Book Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

Bayesian Networks

Papers: • N. L. Zhang and D. Poole (1994), A simple approach to Bayesian network computations, in Proc. of the 10th Canadian Conference on Artificial Intelligence, Banff, Alberta, Canada, May 16-22. • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. Idea Variable Elimination

Variable Elimination

First BN inference algorithm in Variable Elimination • Russell & Norvig wrote on page 529: • “The algorithm we describe is closest to that developed by Zhang and Poole (1994, 1996)” • Koller and Friedman wrote on page: • “… the variable elimination algorithm, as presented here, first described by Zhang and Poole (1994), …” • The K&F book cites 7 of our papers

Local Structure

Papers: • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. • N. L. Zhang and D. Poole (1994), Intercausal independence and heterogeneous factorization,i in Proc. of the 10th Conference on Uncertainties in Artificial Intelligence., Seattle, USA, July 29-31 Local Structures: Causal Independence

Local Structures: Causal Independence

Papers: • N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning IJCAI-99, 1288-1293. • D. Poole and N. L. Zhang (2003). Exploiting contextual independence in probablisitic inference.Journal of Artificial Intelligence Research, 18: 263-313. Local Structure: Context Specific Independence

Parameter Learning • N. L. Zhang (1996), Irrelevance and parameter learning in Bayesian networks, Artificial Intelligence, An International Journal, 88: 359-373. Decision Making • N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams,Computational Intelligence , 14(4): 475-497. • N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision networks, International Journal of Approximate Reasoning, 1994, 11 (2): 83-158. PhD Thesis Other Works

Other Works

Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

Concept first mentioned by Pearl 1988 We are the first one to conduct systematic research on LTMs. • N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. AAAI-02, 230-237. • N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697--723, 2004. Earlier Followers: • Aarlborg U of Denmark, Norwegian University of Science and Technology Recent papers from: • MIT, CMU, USC, Goergia Tech, Edinburgh Latent Tree Models: Overview

Recent survey by French researcher: Latent Tree Models

Latent Tree Models (LTM) • Bayesian networks with • Rooted tree structure • Discrete random variables • Leaves observed (manifest variables) • Internal nodes latent (latent variables) • Also known as hierarchical latent class (HLC)models, HLC models P(Y1), P(Y2|Y1), P(X1|Y2), P(X2|Y2), …

Example • Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence

Theory: Root Walking and Model Equivalence • M1: root walks to X2; M2: root walks to X3 • Root walking leads to equivalent models on manifest variables • Implications: • Cannot determine edge orientation from data • Can only learn unrooted models

Regularity • Regular latent tree models: For any latent node Z with neighbors X1, X2, …, • Can focus on regular models only • Irregular models can be made regular • Regularized models better than irregular models • The set of all such models is finite.

Standard dimension: • Number of free parameters Effective dimension • X1, X2, …, Xn: observed variables • P(X1, X2, …, Xn) is a point in a high-D space for each value of the parameter • Spans a manifold as parameter value varies. • Effective dimension: dimension of the manifold. Parsimonious model: • Standard dimension = effective dimension • Open question: How to test parsimony? Effective Dimension

Paper: • N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical latent class models.Journal of Artificial Intelligence Research, 21: 1-17. Effective Dimension Open question: Effective of LTM with one latent variable

Learning Latent Tree Models Determine • Number of latent variables • Cardinality of each latent variable • Model Structure • Conditional probability distributions

Search-Based Learning: Model Selection • Bayesian score: posterior probability P(m|D) • P(m|D)= P(m)∫P(D|m, θ) d θ/ P(D) • BIC Score: large sample approximation BIC(m|D) = log P(D|m, θ*) – d logN/2 • BICe Score: BICe(m|D) = log P(D|m, θ*) – de logN/2 effective dimensionde. • Effective dimensions are difficult to compute • BICe not realistic

Search Algorithms • Papers: • T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. • N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. ICTAI-2004 • Double hill climbing (DHC), 2002 • 7 manifest variables. • Single hill climbing (SHC), 2004 • 12 manifest variables • Heuristic SHC (HSHC), 2004 • 50 manifest variables • EAST, 2011 • 100+ manifest variables • Recent fast algorithm for specific applications.

Illustration of the search process

Variable clustering method • S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011). IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087-1097. • Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16. • Fast, model quality may be poor Adaptation of Evolution Tree Algorithms • Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky （2011）. Learning latent tree graphical models. Journal of Machine Learning Research 1 (2011) 1-48. • Fast, has consistence proof, for special LTMs only Algorithm by Others

Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview

Density Estimation • Characteristics of LTMs • Are computationally very simple to work with. • Can represent complex relationships among manifest variables. • Useful tool for density estimation.

Density Estimation • New approximate inference algorithm for Bayesian networks (Wang, Zhang and Chen, AAAI 08, Exceptional Paper) Sample LTAB Algo sparse sparse dense dense

Multidimensional Clustering Paper: T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. Cluster Analysis Grouping of objects into clusters so that objects in the same cluster are similar in some sense Page 31

How to Cluster Those? Page 32

How to Cluster Those? Page 33 Style of picture

How to Cluster Those? Page 34 Type of object in picture

How to Cluster Those? Page 35 Multidimensional clustering / Multi-Clustering • How to partition data in multiple ways? • Latent tree models

Latent Tree Models & Multidimensional Clustering • Model relationship between • Observed / Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence • Each latent variable gives a partition • Intelligence: Low, medium, high • Analytic skill: Low, medium, high

ICAC Data // 31 variables, 1200 samples C_City: s0 s1 s2 s3 // very common, quit common, uncommon, .. C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,... Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 ….

Latent Structure Discovery Y2: Demographic info; Y3: Tolerance toward corruption Y4: ICAC performance; Y7: ICAC accountability Y5: Change in level of corruption; Y6: Level of corruption

Multidimensional Clustering Y2=s0: Low income youngsters; Y2=s1: Women with no/low income Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income

Multidimensional Clustering Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus Y3=s0: Same attitude towardC-Gov and C-Bus People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv

Multidimensional Clustering Interesting finding: Relationship btw background and tolerance toward corruption Y2=s2: ( good education and good income) the least tolerant. 4% tolerable Y2=s3: (poor education and average income) the most tolerant. 32% tolerable The other two classes are in between.

Marketing Data

Page 43 Latent Tree Analysis of Text Data • The WebKB Data Set • 1041 web pages collected from 4 CS departments in 1997 • 336 words

Page 44 Latent Tree Model for WebKB Data by BI Algorithm 89 latent variables

Latent Tree Modes for WebKB Data

Page 46

Page 47

Topic • A latent state • A collection of document A document can belong to multiple topics 100% LTM for Topic Detection

LTM • Topic • A latent state • A collection of document • A document can belong to multiple topics 100% LDA • Topic: • Distribution over the entire vocabulary. • The probabilities of the words add to one. • Document: • Distribution over topics. • If a document contains more of one topic, then it contains less of other topics. LTM vs LDA for Topic Detection

Latent Tree Analysis Summary Page 50 • Finds meaningful facets of data • Identify natural clusters along each facet. • Gives clear picture of what is in data.

CCF 贝叶斯网络在中国的应用和发展学术沙龙

CCF 贝叶斯网络在中国的应用和发展学术沙龙

Presentation Transcript

Modernisation of NPP, Consideration of CCF aspects

Drugs for CCF

CCF course 704 Adult Faith Formation

THE THS CCF

Thomas Hardye School CCF

Reigate Grammar School CCF

CCF Fellowship Sept 22 nd , 2010

Mitra Basu CCF Division

Tracy Kimbrel CCF Division

Council of College Faculties (CCF) Report

CCF Summer camp

Structure of the CCF

Populating the CCF

CCF Summer camp

CCF Customer Demo

COMMON COMMUNICATION FORMAT (CCF)