1 / 17

Visualizing Data using t-SNE

Visualizing Data using t-SNE. Presenter : Wei- Hao Huang Authors : Geoffrey Hinton JMLR 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

pomona
Download Presentation

Visualizing Data using t-SNE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visualizing Data using t-SNE Presenter : Wei-Hao Huang Authors : Geoffrey Hinton JMLR 2008

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • Visualization of high-dimensional data is an important problem and deals with data of widely varying dimensionality. • Linear v.s. Nonlinear dimensionality reduction techniques. • Techniques are strong performance on artificial data sets, but visualizing real high-dimensional data are not.

  4. Objectives • To convert a high-dimensional data set into a matrix of pairwise similarities. • To introduce a new technique is called “t-SNE” for visualizing the resulting similarity data.

  5. Methodology • Stochastic Neighbor Embedding • t-Distributed Stochastic Neighbor Embedding • Symmetric SNE • Mismatched Tail can Compensate for Mismatched Dimensionalities

  6. Stochastic Neighbor Embedding Data space Map space Cost function Perplexity Gradient descent method

  7. Symmetric SNE (t-SNE) • To use Student-t distribution improve performance. • Cost function is difficult to optimize Symmetrized • Crowding problem heavy-tailed distribution Cost function Map space Data space Gradient descent method

  8. Mismatched Tails can Compensate for Mismatched Dimensionalities (t-SNE) Map space Gradient descent method

  9. t-SNE Algorithm

  10. Experiments • Data Sets • MNIST data set, Olivetti faces data set, COIL-20 data set, word-features data set, and Netflix data set. • Experimental Setup • To use PCA to reduce the dimensionality • Cost function parameter settings

  11. Visualizations of 6,000 handwritten digits from the MNIST data set Sammon mapping t-SNE Isomap LLE

  12. Visualizations of the Olivetti faces data set t-SNE Sammon mapping Isomap LLE

  13. Visualizations of the COIL-20 data set t-SNE Sammon mapping Isomap LLE

  14. Applying t-SNE to Large Data Sets Neighborhood graph K=20

  15. Weaknesses Dimensionality reduction for other purposes. Curse of intrinsic dimensionality. Non-convexity of the t-SNE cost function.

  16. Conclusions • t-SNE is capable of retaining the local structure of the data while also revealing some important global structure. • To present a landmark approach that makes it possible to successfully visualize large real-world data sets.

  17. Comments • Advantages • Visualization of high-dimensional data is very well. • Open source. • Applications • Visual application for data.

More Related