classification clustering l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Classification & Clustering PowerPoint Presentation
Download Presentation
Classification & Clustering

Loading in 2 Seconds...

play fullscreen
1 / 42

Classification & Clustering - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

Classification & Clustering. -- Parametric and Nonparametric Methods. 魏志達 Jyh-Da Wei. Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin. Classes vs. Clusters. Classification: supervised learning Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Classification & Clustering' - dieter


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
classification clustering

Classification & Clustering

-- Parametric and Nonparametric Methods

魏志達Jyh-Da Wei

Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin

classes vs clusters
Classes vs. Clusters
  • Classification: supervised learning
    • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
  • Clustering: unsupervised learning
    • K-Means, Expectation Maximization, Self-Organization Map
classes vs clusters3
Classes vs. Clusters
  • Classification: supervised learning
    • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
  • Clustering: unsupervised learning
    • K-Means, Expectation Maximization, Self-Organization Map
bayes rule
Bayes’ Rule

prior

likelihood

posterior

evidence

因為給定x之值則p(x) 均等

bayes rule k 2 classes
Bayes’ Rule: K>2 Classes

因為給定x之值則p(x) 均等

gaussian normal distribution
Gaussian (Normal) Distribution
  • p(x) = N ( μ, σ2)
  • Estimateμ and σ2:

μ

σ

slide7

P(C1)=P(C2)

Equal variances

Single boundary at

halfway between means

slide8

P(C1)=P(C2)

Variances are different

Two boundaries

multivariate normal distribution10
Multivariate Normal Distribution
  • Mahalanobis distance: (x – μ)T∑–1(x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

  • Bivariate: d = 2
slide14

只分二類的話,

剛好以0.5為界線

discriminant:

P (C1|x ) = 0.5

likelihoods

posterior for C1

classes vs clusters16
Classes vs. Clusters
  • Classification: supervised learning
    • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
  • Clustering: unsupervised learning
    • K-Means, Expectation Maximization, Self-Organization Map
p arametric vs nonparametric
Parametric vs. Nonparametric
  • Parametric Methods
    • Advantage: it reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters.
    • Disadvantage: this assumption does not always hold and we may incur a large error if it does not.
  • Nonparametric Methods
    • Keep the training data;“let the data speak for itself”
    • Given x, find a small number of closest training instances and interpolate from these
    • Nonparametric methods are also called memory-based or instance-based learning algorithms.
density estimation
Density Estimation

該 xt項構成集合之第t項

  • Given the training set X={xt}t drawn iid (independent and identically distributed) from p(x)
  • Divide data into bins of size h
  • Histogram estimator: (Figure – next page)

Extreme case: p(x)=1/h, for exactly consulting the sample space

density estimation20
Density Estimation
  • Given the training set X={xt}t drawn iid from p(x)
  • x is always at the center of a bin of size 2h
  • Naive estimator:(Figure – next page)

or

(讓每一個 xt投票)

w(u): 依地緣關係投票,

贊成票計1/2, [-1,1] 區間積分值為1

kernel estimator
Kernel Estimator
  • Kernel function, e.g., Gaussian kernel:
  • Kernel estimator (Parzen windows): Figure – next page
  • If K is Gaussian, then will be smooth having all the derivatives.

K(u):依地緣關係給分,實數域積分值為1

generalization to multivariate data
Generalization to Multivariate Data
  • Kernel density estimator

with the requirement that

Multivariate Gaussian kernel

spheric

ellipsoid

k nearest neighbor estimator
k-Nearest Neighbor Estimator
  • Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width

dk(x): distance to kth closest instance to x

nonparametric classification kernel estimator
Nonparametric Classification(kernel estimator)

rit視xt是否遲於Ci而定0/1

可不看係數只看後項,

意義為累計各委員評分

這些評分為依地緣而定

的正實數值

原本要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大

但給定x之值則 p(x) 均等,此處大家都不寫,式子較漂亮

nonparametric classification k nn estimator 1
Nonparametric Classification k-nn estimator (1)
  • For the special case of k-nn estimator

where

ki : the number of neighbors out of the k nearest that belong to ci

Vk(x) : the volume of the d-dimensional hypersphere centered at x,

with radius

cd : the volume of the unit sphere in d dimensions

For example,

nonparametric classification k nn estimator 2
Nonparametric Classification k-nn estimator (2)
  • From
  • Then

意義為

累積找到 k samples 之時

何類的出席數最多

要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大

雖然給定x之值則 p(x) 均等,

但此處大家寫出來,推得的式子較漂亮

classes vs clusters32
Classes vs. Clusters
  • Classification: supervised learning
    • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
  • Clustering: unsupervised learning
    • K-Means, Expectation Maximization, Self-Organization Map
classes vs clusters33
Supervised:X= { xt ,rt }t

Classes Cii=1,...,K

where p ( x | Ci) ~ N ( μi , ∑i )

Φ= {P (Ci ), μi , ∑i }Ki=1

Unsupervised :X= { xt }t

Clusters Gi i=1,...,k

where p ( x | Gi) ~ N ( μi , ∑i )

Φ= {P ( Gi ), μi , ∑i }ki=1

Labels, r ti ?

Classes vs. Clusters
k means clustering
k-Means Clustering
  • Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data
  • Reference vectors, mj, j =1,...,k
  • Use nearest (most similar) reference:
  • Reconstruction error

希望群中心造成的總偏離值最小

k means clustering36
k-means Clustering

1. Winner takes all

2. 不做逐步修正,而是一口氣取群平均

3. 下頁有實例,上課再舉反例(前方將士變節)

em in gaussian mixtures
EM in Gaussian Mixtures
  • zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of supervised learning); assume p(x|Gi)~N(μi,∑i)
  • E-step:
  • M-step:

Use estimated labels in place of unknown labels

擁有P(Gi )做後援

就不怕將士變節

classes vs clusters40
Classes vs. Clusters
  • Classification: supervised learning
    • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron
  • Clustering: unsupervised learning
    • K-Means, Expectation Maximization, Self-Organization Map
agglomerative clustering
Agglomerative Clustering
  • Start with N groups each with one instance and merge two closest groups at each iteration
  • Distance between two groups Gi and Gj:
    • Single-link:
    • Complete-link:
    • Average-link, centroid
example single link clustering
Example: Single-Link Clustering

人類

侏儒黑猩猩

大猩猩

獼猴

黑猩猩

長臂猿

Dendrogram

可以動態分群