THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY
Download
1 / 58

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010 - PowerPoint PPT Presentation


THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010 Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page PMs for Classification PMs for Clustering: Continuous data PMs for Clustering: Discrete data

Related searches for THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 600N:  Reasoning and Decision under Uncertainty Summer 2010

Nevin L. ZhangRoom 3504, phone: 2358-7015,

Email: lzhang@cs.ust.hkHome page


PMs for Classification

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


The problem:

Given data:

Find mapping

(A1, A2, …, An) |- C

Possible solutions

ANN

Decision tree (Quinlan)

(SVM: Continuous data)

Classification


Probabilistic Approach to Classification


Will Boss Play Tennis?


Will Boss Play Tennis?


Bayesian Networks for Classification

  • Naïve Bayes model often has good performance in practice

  • Drawbacks of Naïve Bayes:

    • Attributes mutually independent given class variable

    • Often violated, leading to double counting.

  • Fixes:

    • General BN classifiers

    • Tree augmented Naïve Bayes (TAN) models


Bayesian Networks for Classification

  • General BN classifier

    • Treat class variable just as another variable

    • Learn a BN.

    • Classify the next instance based on values of variables in the Markov blanket of the class variable.

    • Pretty bad because it does not utilize all available information because of Markov boundary


Bayesian Networks for Classification

  • Tree-Augmented Naïve Bayes (TAN) model

    • Capture dependence among attributes using a tree structure.

    • During learning,

      • First learn a tree among attributes: use Chow-Liu algorithm

        • Special structure learning problem, easy

      • Add class variable and estimate parameters

    • Classification

      • arg max_c P(C=c|A1=a1, …, An=an)

      • BN inference


PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


  • http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

  • Real-world example of Normal Distributions?


Bivariate Gaussian Distribution


Bivariate Gaussian Distribution


PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Data:

Example

Mean vector

Covariance Matrix


PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


PMs for Classification

PMs for Clustering: Continuous data

  • Gaussian distributions

  • Parameter estimation for Gaussian distributions

  • Gaussian mixtures

  • Learning Gaussian mixtures

    PMs for Clustering: Discrete data

Outline


Learning Gaussian Mixture Models


MLE


http://www.socr.ucla.edu/Applets.dir/MixtureEM.html


PMs for Classification

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

L09: Probabilistic Models (PMs) for Classification and Clustering


PMs for Classification

PMs for Clustering: Continuous data

PMs for Clustering: Discrete data

  • A generalization

L09: Probabilistic Models (PMs) for Classification and Clustering


Latent Tree Models

  • LC models

    • local independence assumption

    • often not true

  • LT models generalize LC models

    • Relax the independence assumption

    • Each latent variable gives a way to partition data… multidimensional clustering


ICAC Data

// 31 variables, 1200 samples

C_City: s0 s1 s2 s3 // very common, quit common, uncommon, ..

C_Gov: s0 s1 s2 s3

C_Bus: s0 s1 s2 s3

Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,...

Tolerance_C_Bus: s0 s1 s2 s3

WillingReport_C: s0 s1 s2 // yes, no, depends

LeaveContactInfo: s0 s1 // yes, no

I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e

I_Deterrence:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

…..

-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0

-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0

-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….


Latent Structure Discovery

Y2: Demographic info; Y3: Tolerance toward corruption

Y4: ICAC performance; Y7: ICAC accountability

Y5: Change in level of corruption; Y6: Level of corruption


Interpreting Partition

  • Information curves:

    • Partition of Y2 is based on Income, Age, Education, Sex

    • Interpretation: Y2 --- Represents a partition of the population based on demographic information

    • Y3 --- Represents a partition based on Tolerance toward Corruption


Interpreting Clusters

Y2=s0: Low income youngsters; Y2=s1: Women with no/low income

Y2=s2: people with good education and good income;

Y2=s3: people with poor education and average income


Interpreting Clustering

Y3=s0: people who find corruption totally intolerable; 57%

Y3=s1: people who find corruption intolerable; 27%

Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus

Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus

Y3=s0: Same attitude towardC-Gov and C-Bus

People who are tough on corruption are equally tough toward C-Gov and C-Bus.

People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv


Relationship Between Dimensions

Interesting finding: Relationship btw background and tolerance toward corruption

Y2=s2: ( good education and good income) the least tolerant. 4% tolerable

Y2=s3: (poor education and average income) the most tolerant. 32% tolerable

The other two classes are in between.


Result of LCA

  • Partition not meaningful

  • Reason:

    • Local Independence not true

  • Another way to look at it

    • LCA assumes that all the manifest variables joint defines a meaningful way to cluster data

    • Obviously not true for ICAC data

    • Instead, one should look for subsets that do define meaningful partition and perform cluster analysis on them

    • This is what we do with LTA


ad
  • Login