THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY
Download
1 / 58

- PowerPoint PPT Presentation


  • 290 Views
  • Updated On :

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010 Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

Nevin L. ZhangRoom 3504, phone: 2358-7015,

Email: lzhang@cs.ust.hkHome page


L09 probabilistic models pms for classification and clustering l.jpg

PMs for Classification

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


Classification l.jpg

The problem:

Given data:

Find mapping

(A1, A2, …, An) |- C

Possible solutions

ANN

Decision tree (Quinlan)

(SVM: Continuous data)

Classification





Bayesian networks for classification l.jpg
Bayesian Networks for Classification

  • Naïve Bayes model often has good performance in practice

  • Drawbacks of Naïve Bayes:

    • Attributes mutually independent given class variable

    • Often violated, leading to double counting.

  • Fixes:

    • General BN classifiers

    • Tree augmented Naïve Bayes (TAN) models


Bayesian networks for classification12 l.jpg
Bayesian Networks for Classification

  • General BN classifier

    • Treat class variable just as another variable

    • Learn a BN.

    • Classify the next instance based on values of variables in the Markov blanket of the class variable.

    • Pretty bad because it does not utilize all available information because of Markov boundary


Bayesian networks for classification13 l.jpg
Bayesian Networks for Classification

  • Tree-Augmented Naïve Bayes (TAN) model

    • Capture dependence among attributes using a tree structure.

    • During learning,

      • First learn a tree among attributes: use Chow-Liu algorithm

        • Special structure learning problem, easy

      • Add class variable and estimate parameters

    • Classification

      • arg max_c P(C=c|A1=a1, …, An=an)

      • BN inference


Outline l.jpg

PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Slide15 l.jpg


Bivariate gaussian distribution l.jpg
Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Bivariate gaussian distribution19 l.jpg
Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Outline20 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Example l.jpg

Data:http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Example

Mean vector

Covariance Matrix


Outline24 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Outline30 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Learning gaussian mixture models l.jpg
Learning Gaussian Mixture Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Slide32 l.jpg
MLEhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Slide38 l.jpg

http://www.socr.ucla.edu/Applets.dir/MixtureEM.htmlhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


L09 probabilistic models pms for classification and clustering41 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


L09 probabilistic models pms for classification and clustering50 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

  • A generalization

L09: Probabilistic Models (PMs) for Classification and Clustering


Latent tree models l.jpg
Latent Tree Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • LC models

    • local independence assumption

    • often not true

  • LT models generalize LC models

    • Relax the independence assumption

    • Each latent variable gives a way to partition data… multidimensional clustering


Icac data l.jpg
ICAC Datahttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

// 31 variables, 1200 samples

C_City: s0 s1 s2 s3 // very common, quit common, uncommon, ..

C_Gov: s0 s1 s2 s3

C_Bus: s0 s1 s2 s3

Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,...

Tolerance_C_Bus: s0 s1 s2 s3

WillingReport_C: s0 s1 s2 // yes, no, depends

LeaveContactInfo: s0 s1 // yes, no

I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e

I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

…..

-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0

-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0

-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….


Latent structure discover y l.jpg
Latent Structure Discoverhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.htmly

Y2: Demographic info; Y3: Tolerance toward corruption

Y4: ICAC performance; Y7: ICAC accountability

Y5: Change in level of corruption; Y6: Level of corruption


Interpreting partition l.jpg
Interpreting Partitionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • Information curves:

    • Partition of Y2 is based on Income, Age, Education, Sex

    • Interpretation: Y2 --- Represents a partition of the population based on demographic information

    • Y3 --- Represents a partition based on Tolerance toward Corruption


Interpreting clusters l.jpg
Interpreting Clustershttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y2=s0: Low income youngsters; Y2=s1: Women with no/low income

Y2=s2: people with good education and good income;

Y2=s3: people with poor education and average income


Interpreting clustering l.jpg
Interpreting Clusteringhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y3=s0: people who find corruption totally intolerable; 57%

Y3=s1: people who find corruption intolerable; 27%

Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus

Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus

Y3=s0: Same attitude towardC-Gov and C-Bus

People who are tough on corruption are equally tough toward C-Gov and C-Bus.

People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv


Relationship between dimensions l.jpg
Relationship Between Dimensionshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Interesting finding: Relationship btw background and tolerance toward corruption

Y2=s2: ( good education and good income) the least tolerant. 4% tolerable

Y2=s3: (poor education and average income) the most tolerant. 32% tolerable

The other two classes are in between.


Result of lca l.jpg
Result of LCAhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • Partition not meaningful

  • Reason:

    • Local Independence not true

  • Another way to look at it

    • LCA assumes that all the manifest variables joint defines a meaningful way to cluster data

    • Obviously not true for ICAC data

    • Instead, one should look for subsets that do define meaningful partition and perform cluster analysis on them

    • This is what we do with LTA


ad