1 / 42

Classification & Clustering

Classification & Clustering. -- Parametric and Nonparametric Methods. 魏志達 Jyh-Da Wei. Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin. Classes vs. Clusters. Classification: supervised learning Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron

dieter
Download Presentation

Classification & Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification & Clustering -- Parametric and Nonparametric Methods 魏志達Jyh-Da Wei Introduction to Machine Learning (Chap 4,5,7,8), E. Alpaydin

  2. Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map

  3. Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map

  4. Bayes’ Rule prior likelihood posterior evidence 因為給定x之值則p(x) 均等

  5. Bayes’ Rule: K>2 Classes 因為給定x之值則p(x) 均等

  6. Gaussian (Normal) Distribution • p(x) = N ( μ, σ2) • Estimateμ and σ2: μ σ

  7. P(C1)=P(C2) Equal variances Single boundary at halfway between means

  8. P(C1)=P(C2) Variances are different Two boundaries

  9. Multivariate Normal Distribution

  10. Multivariate Normal Distribution • Mahalanobis distance: (x – μ)T∑–1(x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) • Bivariate: d = 2

  11. Bivariate Normal

  12. Estimation of Parameters

  13. 只分二類的話, 剛好以0.5為界線 discriminant: P (C1|x ) = 0.5 likelihoods posterior for C1

  14. break

  15. Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map

  16. Parametric vs. Nonparametric • Parametric Methods • Advantage: it reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters. • Disadvantage: this assumption does not always hold and we may incur a large error if it does not. • Nonparametric Methods • Keep the training data;“let the data speak for itself” • Given x, find a small number of closest training instances and interpolate from these • Nonparametric methods are also called memory-based or instance-based learning algorithms.

  17. Density Estimation 該 xt項構成集合之第t項 • Given the training set X={xt}t drawn iid (independent and identically distributed) from p(x) • Divide data into bins of size h • Histogram estimator: (Figure – next page) Extreme case: p(x)=1/h, for exactly consulting the sample space

  18. 0.375

  19. Density Estimation • Given the training set X={xt}t drawn iid from p(x) • x is always at the center of a bin of size 2h • Naive estimator:(Figure – next page) or (讓每一個 xt投票) w(u): 依地緣關係投票, 贊成票計1/2, [-1,1] 區間積分值為1

  20. Naïve estimator: h=1 h=0.5 h=0.25

  21. Kernel Estimator • Kernel function, e.g., Gaussian kernel: • Kernel estimator (Parzen windows): Figure – next page • If K is Gaussian, then will be smooth having all the derivatives. K(u):依地緣關係給分,實數域積分值為1

  22. Generalization to Multivariate Data • Kernel density estimator with the requirement that Multivariate Gaussian kernel spheric ellipsoid

  23. k-Nearest Neighbor Estimator • Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width dk(x): distance to kth closest instance to x

  24. 同時向兩側長,看多遠可吃到k個samples

  25. Nonparametric Classification(kernel estimator) rit視xt是否遲於Ci而定0/1 可不看係數只看後項, 意義為累計各委員評分 這些評分為依地緣而定 的正實數值 原本要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大 但給定x之值則 p(x) 均等,此處大家都不寫,式子較漂亮

  26. Nonparametric Classification k-nn estimator (1) • For the special case of k-nn estimator where ki : the number of neighbors out of the k nearest that belong to ci Vk(x) : the volume of the d-dimensional hypersphere centered at x, with radius cd : the volume of the unit sphere in d dimensions For example,

  27. Nonparametric Classification k-nn estimator (2) • From • Then 意義為 累積找到 k samples 之時 何類的出席數最多 要比較 p(Ci|x)=p(x,Ci)/p(x)之值何者大 雖然給定x之值則 p(x) 均等, 但此處大家寫出來,推得的式子較漂亮

  28. break

  29. Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map

  30. Supervised:X= { xt ,rt }t Classes Cii=1,...,K where p ( x | Ci) ~ N ( μi , ∑i ) Φ= {P (Ci ), μi , ∑i }Ki=1 Unsupervised :X= { xt }t Clusters Gi i=1,...,k where p ( x | Gi) ~ N ( μi , ∑i ) Φ= {P ( Gi ), μi , ∑i }ki=1 Labels, r ti ? Classes vs. Clusters

  31. k-Means Clustering • Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data • Reference vectors, mj, j =1,...,k • Use nearest (most similar) reference: • Reconstruction error 希望群中心造成的總偏離值最小

  32. Encoding/Decoding

  33. k-means Clustering 1. Winner takes all 2. 不做逐步修正,而是一口氣取群平均 3. 下頁有實例,上課再舉反例(前方將士變節)

  34. EM in Gaussian Mixtures • zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of supervised learning); assume p(x|Gi)~N(μi,∑i) • E-step: • M-step: Use estimated labels in place of unknown labels 擁有P(Gi )做後援 就不怕將士變節

  35. P(G1|x)=h1=0.5

  36. Classes vs. Clusters • Classification: supervised learning • Pattern Recognization, K Nearest Neighbor, Multilayer Perceptron • Clustering: unsupervised learning • K-Means, Expectation Maximization, Self-Organization Map

  37. Agglomerative Clustering • Start with N groups each with one instance and merge two closest groups at each iteration • Distance between two groups Gi and Gj: • Single-link: • Complete-link: • Average-link, centroid

  38. Example: Single-Link Clustering 人類 侏儒黑猩猩 大猩猩 獼猴 黑猩猩 長臂猿 Dendrogram 可以動態分群

More Related