THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY
Download
1 / 58

THE HONG KONG UNIVERSITY OF SCIENCE TECHNOLOGY - PowerPoint PPT Presentation


  • 286 Views
  • Updated On :

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010 Nevin L. Zhang Room 3504, phone: 2358-7015, Email: [email protected] Home page PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'THE HONG KONG UNIVERSITY OF SCIENCE TECHNOLOGY ' - issac


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

Nevin L. ZhangRoom 3504, phone: 2358-7015,

Email: [email protected] page


L09 probabilistic models pms for classification and clustering l.jpg

PMs for Classification

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


Classification l.jpg

The problem:

Given data:

Find mapping

(A1, A2, …, An) |- C

Possible solutions

ANN

Decision tree (Quinlan)

(SVM: Continuous data)

Classification





Bayesian networks for classification l.jpg
Bayesian Networks for Classification

  • Naïve Bayes model often has good performance in practice

  • Drawbacks of Naïve Bayes:

    • Attributes mutually independent given class variable

    • Often violated, leading to double counting.

  • Fixes:

    • General BN classifiers

    • Tree augmented Naïve Bayes (TAN) models


Bayesian networks for classification12 l.jpg
Bayesian Networks for Classification

  • General BN classifier

    • Treat class variable just as another variable

    • Learn a BN.

    • Classify the next instance based on values of variables in the Markov blanket of the class variable.

    • Pretty bad because it does not utilize all available information because of Markov boundary


Bayesian networks for classification13 l.jpg
Bayesian Networks for Classification

  • Tree-Augmented Naïve Bayes (TAN) model

    • Capture dependence among attributes using a tree structure.

    • During learning,

      • First learn a tree among attributes: use Chow-Liu algorithm

        • Special structure learning problem, easy

      • Add class variable and estimate parameters

    • Classification

      • arg max_c P(C=c|A1=a1, …, An=an)

      • BN inference


Outline l.jpg

PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Slide15 l.jpg


Bivariate gaussian distribution l.jpg
Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Bivariate gaussian distribution19 l.jpg
Bivariate Gaussian Distributionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Outline20 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Example l.jpg

Data:http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Example

Mean vector

Covariance Matrix


Outline24 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Outline30 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Learning gaussian mixture models l.jpg
Learning Gaussian Mixture Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Slide32 l.jpg
MLEhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


Slide38 l.jpg

http://www.socr.ucla.edu/Applets.dir/MixtureEM.htmlhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html


L09 probabilistic models pms for classification and clustering41 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


L09 probabilistic models pms for classification and clustering50 l.jpg

PMs for Classificationhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

  • A generalization

L09: Probabilistic Models (PMs) for Classification and Clustering


Latent tree models l.jpg
Latent Tree Modelshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • LC models

    • local independence assumption

    • often not true

  • LT models generalize LC models

    • Relax the independence assumption

    • Each latent variable gives a way to partition data… multidimensional clustering


Icac data l.jpg
ICAC Datahttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

// 31 variables, 1200 samples

C_City: s0 s1 s2 s3 // very common, quit common, uncommon, ..

C_Gov: s0 s1 s2 s3

C_Bus: s0 s1 s2 s3

Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,...

Tolerance_C_Bus: s0 s1 s2 s3

WillingReport_C: s0 s1 s2 // yes, no, depends

LeaveContactInfo: s0 s1 // yes, no

I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e

I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

…..

-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0

-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0

-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….


Latent structure discover y l.jpg
Latent Structure Discoverhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.htmly

Y2: Demographic info; Y3: Tolerance toward corruption

Y4: ICAC performance; Y7: ICAC accountability

Y5: Change in level of corruption; Y6: Level of corruption


Interpreting partition l.jpg
Interpreting Partitionhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • Information curves:

    • Partition of Y2 is based on Income, Age, Education, Sex

    • Interpretation: Y2 --- Represents a partition of the population based on demographic information

    • Y3 --- Represents a partition based on Tolerance toward Corruption


Interpreting clusters l.jpg
Interpreting Clustershttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y2=s0: Low income youngsters; Y2=s1: Women with no/low income

Y2=s2: people with good education and good income;

Y2=s3: people with poor education and average income


Interpreting clustering l.jpg
Interpreting Clusteringhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Y3=s0: people who find corruption totally intolerable; 57%

Y3=s1: people who find corruption intolerable; 27%

Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus

Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus

Y3=s0: Same attitude towardC-Gov and C-Bus

People who are tough on corruption are equally tough toward C-Gov and C-Bus.

People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv


Relationship between dimensions l.jpg
Relationship Between Dimensionshttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Interesting finding: Relationship btw background and tolerance toward corruption

Y2=s2: ( good education and good income) the least tolerant. 4% tolerable

Y2=s3: (poor education and average income) the most tolerant. 32% tolerable

The other two classes are in between.


Result of lca l.jpg
Result of LCAhttp://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • Partition not meaningful

  • Reason:

    • Local Independence not true

  • Another way to look at it

    • LCA assumes that all the manifest variables joint defines a meaningful way to cluster data

    • Obviously not true for ICAC data

    • Instead, one should look for subsets that do define meaningful partition and perform cluster analysis on them

    • This is what we do with LTA


ad