1 / 58

# - PowerPoint PPT Presentation

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010 Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data

Related searches for

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '' - issac

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

Nevin L. ZhangRoom 3504, phone: 2358-7015,

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering

Given data:

Find mapping

(A1, A2, …, An) |- C

Possible solutions

ANN

Decision tree (Quinlan)

(SVM: Continuous data)

Classification

• Naïve Bayes model often has good performance in practice

• Drawbacks of Naïve Bayes:

• Attributes mutually independent given class variable

• Often violated, leading to double counting.

• Fixes:

• General BN classifiers

• Tree augmented Naïve Bayes (TAN) models

• General BN classifier

• Treat class variable just as another variable

• Learn a BN.

• Classify the next instance based on values of variables in the Markov blanket of the class variable.

• Pretty bad because it does not utilize all available information because of Markov boundary

• Tree-Augmented Naïve Bayes (TAN) model

• Capture dependence among attributes using a tree structure.

• During learning,

• First learn a tree among attributes: use Chow-Liu algorithm

• Special structure learning problem, easy

• Add class variable and estimate parameters

• Classification

• arg max_c P(C=c|A1=a1, …, An=an)

• BN inference

PMs for Clustering: Continuous data

• Gaussian distributions

• Parameter estimation for Gaussian distributions

• Gaussian mixtures

• Learning Gaussian mixtures

PMs for Clustering: Discrete data

Outline

Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

• Gaussian distributions

• Parameter estimation for Gaussian distributions

• Gaussian mixtures

• Learning Gaussian mixtures

PMs for Clustering: Discrete data

Outline

Data:http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Example

Mean vector

Covariance Matrix

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

• Gaussian distributions

• Parameter estimation for Gaussian distributions

• Gaussian mixtures

• Learning Gaussian mixtures

PMs for Clustering: Discrete data

Outline

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

• Gaussian distributions

• Parameter estimation for Gaussian distributions

• Gaussian mixtures

• Learning Gaussian mixtures

PMs for Clustering: Discrete data

Outline

Learning Gaussian Mixture Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

MLEhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

http://www.socr.ucla.edu/Applets.dir/MixtureEM.htmlhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

• A generalization

L09: Probabilistic Models (PMs) for Classification and Clustering

Latent Tree Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

• LC models

• local independence assumption

• often not true

• LT models generalize LC models

• Relax the independence assumption

• Each latent variable gives a way to partition data… multidimensional clustering

ICAC Datahttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

// 31 variables, 1200 samples

C_City: s0 s1 s2 s3 // very common, quit common, uncommon, ..

C_Gov: s0 s1 s2 s3

C_Bus: s0 s1 s2 s3

Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,...

Tolerance_C_Bus: s0 s1 s2 s3

WillingReport_C: s0 s1 s2 // yes, no, depends

LeaveContactInfo: s0 s1 // yes, no

I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e

I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

…..

-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0

-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0

-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….

Latent Structure Discoverhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.htmly

Y2: Demographic info; Y3: Tolerance toward corruption

Y4: ICAC performance; Y7: ICAC accountability

Y5: Change in level of corruption; Y6: Level of corruption

Interpreting Partitionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

• Information curves:

• Partition of Y2 is based on Income, Age, Education, Sex

• Interpretation: Y2 --- Represents a partition of the population based on demographic information

• Y3 --- Represents a partition based on Tolerance toward Corruption

Interpreting Clustershttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y2=s0: Low income youngsters; Y2=s1: Women with no/low income

Y2=s2: people with good education and good income;

Y2=s3: people with poor education and average income

Interpreting Clusteringhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y3=s0: people who find corruption totally intolerable; 57%

Y3=s1: people who find corruption intolerable; 27%

Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus

Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus

Y3=s0: Same attitude towardC-Gov and C-Bus

People who are tough on corruption are equally tough toward C-Gov and C-Bus.

People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv

Relationship Between Dimensionshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Interesting finding: Relationship btw background and tolerance toward corruption

Y2=s2: ( good education and good income) the least tolerant. 4% tolerable

Y2=s3: (poor education and average income) the most tolerant. 32% tolerable

The other two classes are in between.

Result of LCAhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

• Partition not meaningful

• Reason:

• Local Independence not true

• Another way to look at it

• LCA assumes that all the manifest variables joint defines a meaningful way to cluster data

• Obviously not true for ICAC data

• Instead, one should look for subsets that do define meaningful partition and perform cluster analysis on them

• This is what we do with LTA