1 / 17

Dimensionality Reduction with Linear Transformations project update

Dimensionality Reduction with Linear Transformations project update. by Mingyue Tan March 17, 2004. Questions to answer - What’s the shape of the clusters? - Which clusters are dense/heterogeneous? - Which data coordinates

myron
Download Presentation

Dimensionality Reduction with Linear Transformations project update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimensionality Reduction with Linear Transformationsproject update by Mingyue Tan March 17, 2004

  2. Questions to answer - What’s the shape of the clusters? - Which clusters are dense/heterogeneous? - Which data coordinates account for the decomposition to clusters? - Which data points are outliers? Domain and Task Data are labeled

  3. Solution - Dimension Reduction 1. Project the high-dimensional points in a low dimensional space while preserving the “essence” of the data - i.e. distances are preserved as well as possible 2. Solve the problems in low dimensions Dimensionality reduction

  4. Principal Component Analysis • Intuition: find the axis that shows the greatest variation, and project all points into this axis f2 e1 e2 f1

  5. Problem with PCA • Not robust - sensitive to outliers • Usually does not show clustering structure

  6. PCA - seeks a projection that maximizes the sum Weighted PCA - seeks a projection that maximizes the weighted sum - flexibility New Approach Bigger wij -> More important to put them apart

  7. Weighted PCA Varying wijgives: • Weights specified by user • Normalized PCA – robust towards outliers • Supervised PCA – shows cluster structures - Ifiandjbelong to the same cluster  setwij=0 - Maximize inter-cluster scatter

  8. Comparison – with outliers - PCA: Outliers typically govern the projection direction

  9. Comparison – cluster structure - Projections that maximize scatter ≠ Projections that separate clusters

  10. Summary

  11. Interface

  12. Interface - File

  13. Interface - task

  14. Interface - method

  15. Interface

  16. Milestones • Dataset Assembled - same dataset used in the paper • Get familiar with NetBeans - implemented preliminary interface (no functionality) • Rewrite PCA in Java (from an existing Matlab implementation) – partially done • Implement four new methods

  17. Reference • [1] Y. Koren and L. Carmel, “Visualization of Labeled Data Using Linear Transformations", Proc. IEEE Information Visualization (InfoVis?3), IEEE, pp.121-128, 2003.

More Related