1 / 9

Proceedings of the 2007 SIAM International Conference on Data Mining

Proceedings of the 2007 SIAM International Conference on Data Mining. Abstract. The paper studies semi-supervised dimensionality reduction. Besides unlabeled samples, must-link and cannot-link constraints are incorporated as domain knowledge.

solana
Download Presentation

Proceedings of the 2007 SIAM International Conference on Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proceedings of the 2007 SIAM International Conference on Data Mining

  2. Abstract • The paper studies semi-supervised dimensionality reduction. • Besides unlabeled samples, must-link and cannot-link constraints are incorporated as domain knowledge. • SSDR algorithm: preserves structure of data as well as constraints in the projected low-dimension space.

  3. Introduction • There exist supervised and unsupervised dimensionality reduction methods • FLD (Fisher Linear Discriminant): extracts discriminant vectors when class labels are available • cFLD (Constrained FLD): dimensionality reduction from equivalence constraints • PCA (Principal Component Analysis): preserves the global covariance structure of data when class labels are not available

  4. Introduction (cont) • SSDR: • Must-link constraints: pairs of instances belonging to the same class • Cannot-link constraints: pairs of instances belonging to different classes • Structure of data • SSDR: simultaneously preserves the structure of data and pairwise constraints specified by users

  5. SSDR Algorithm Find project vector W: Maximizing objective function: Subject to: wTw = 1 ???

  6. SSDR Algorithm (cont) Extended objective function: Final form of extended objective function: (2.5) is a typical eigen-problem, which can be solved by computing the eigenvectors of XLXT corresponding to the largest eigenvalues.

  7. Experiments • Data sets: 6 UCI data sets, YaleB facial image data set, 20-Newsgroup. • Results are averaged over 100 runs with different generation of constraints. • Parameters: α = 1, β = 20.

  8. Results on UCI Data Sets

  9. Results on UCI Data Sets (cont)

More Related