Classification &amp; Clustering

1 / 42

# Classification &amp; Clustering - PowerPoint PPT Presentation

Classification &amp; Clustering. -- Parametric and Nonparametric Methods. 魏志達 Jyh-Da Wei. Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin. Classes vs. Clusters. Classification: supervised learning Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Classification &amp; Clustering' - dieter

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Classification & Clustering

-- Parametric and Nonparametric Methods

Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin

Classes vs. Clusters
• Classification: supervised learning
• Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
• Clustering: unsupervised learning
• K-Means, Expectation Maximization, Self-Organization Map
Classes vs. Clusters
• Classification: supervised learning
• Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
• Clustering: unsupervised learning
• K-Means, Expectation Maximization, Self-Organization Map
Bayes’ Rule

prior

likelihood

posterior

evidence

Bayes’ Rule: K>2 Classes

Gaussian (Normal) Distribution
• p(x) = N ( μ, σ2)
• Estimateμ and σ2:

μ

σ

P(C1)=P(C2)

Equal variances

Single boundary at

halfway between means

P(C1)=P(C2)

Variances are different

Two boundaries

Multivariate Normal Distribution
• Mahalanobis distance: (x – μ)T∑–1(x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

• Bivariate: d = 2

discriminant:

P (C1|x ) = 0.5

likelihoods

posterior for C1

Classes vs. Clusters
• Classification: supervised learning
• Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
• Clustering: unsupervised learning
• K-Means, Expectation Maximization, Self-Organization Map
Parametric vs. Nonparametric
• Parametric Methods
• Advantage: it reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters.
• Disadvantage: this assumption does not always hold and we may incur a large error if it does not.
• Nonparametric Methods
• Keep the training data;“let the data speak for itself”
• Given x, find a small number of closest training instances and interpolate from these
• Nonparametric methods are also called memory-based or instance-based learning algorithms.
Density Estimation

• Given the training set X={xt}t drawn iid (independent and identically distributed) from p(x)
• Divide data into bins of size h
• Histogram estimator: (Figure – next page)

Extreme case: p(x)=1/h, for exactly consulting the sample space

Density Estimation
• Given the training set X={xt}t drawn iid from p(x)
• x is always at the center of a bin of size 2h
• Naive estimator:(Figure – next page)

or

(讓每一個 xt投票)

w(u): 依地緣關係投票，

Kernel Estimator
• Kernel function, e.g., Gaussian kernel:
• Kernel estimator (Parzen windows): Figure – next page
• If K is Gaussian, then will be smooth having all the derivatives.

K(u):依地緣關係給分，實數域積分值為1

Generalization to Multivariate Data
• Kernel density estimator

with the requirement that

Multivariate Gaussian kernel

spheric

ellipsoid

k-Nearest Neighbor Estimator
• Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width

dk(x): distance to kth closest instance to x

Nonparametric Classification(kernel estimator)

rit視xt是否遲於Ci而定0/1

Nonparametric Classification k-nn estimator (1)
• For the special case of k-nn estimator

where

ki : the number of neighbors out of the k nearest that belong to ci

Vk(x) : the volume of the d-dimensional hypersphere centered at x,

cd : the volume of the unit sphere in d dimensions

For example,

Nonparametric Classification k-nn estimator (2)
• From
• Then

Classes vs. Clusters
• Classification: supervised learning
• Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
• Clustering: unsupervised learning
• K-Means, Expectation Maximization, Self-Organization Map
Supervised:X= { xt ,rt }t

Classes Cii=1,...,K

where p ( x | Ci) ~ N ( μi , ∑i )

Φ= {P (Ci ), μi , ∑i }Ki=1

Unsupervised :X= { xt }t

Clusters Gi i=1,...,k

where p ( x | Gi) ~ N ( μi , ∑i )

Φ= {P ( Gi ), μi , ∑i }ki=1

Labels, r ti ?

Classes vs. Clusters
k-Means Clustering
• Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data
• Reference vectors, mj, j =1,...,k
• Use nearest (most similar) reference:
• Reconstruction error

k-means Clustering

1. Winner takes all

2. 不做逐步修正，而是一口氣取群平均

3. 下頁有實例，上課再舉反例(前方將士變節)

EM in Gaussian Mixtures
• zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of supervised learning); assume p(x|Gi)~N(μi,∑i)
• E-step:
• M-step:

Use estimated labels in place of unknown labels

Classes vs. Clusters
• Classification: supervised learning
• Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
• Clustering: unsupervised learning
• K-Means, Expectation Maximization, Self-Organization Map
Agglomerative Clustering
• Start with N groups each with one instance and merge two closest groups at each iteration
• Distance between two groups Gi and Gj:

Dendrogram