1 / 18

Effective Dimension Reduction with Prior Knowledge

Effective Dimension Reduction with Prior Knowledge. Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology Atlanta, GA. Joint work w/ Barry Drake, Peg Howland, Hyunsoo Kim, and Cheonghee Park

tad
Download Presentation

Effective Dimension Reduction with Prior Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Dimension Reductionwith Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology Atlanta, GA Joint work w/ Barry Drake, Peg Howland, Hyunsoo Kim, and Cheonghee Park DIMACS, May, 2007

  2. Dimension Reduction • Dimension Reduction for Clustered Data:Linear Discriminant Analysis (LDA) Generalized LDA (LDA/GSVD, regularized LDA) Orthogonal Centroid Method (OCM) • Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF) • Applications:Text classification, Face recognition, Fingerprint classification, Gene clustering in Microarray Analysis …

  3. 2D Representation Utilize Cluster Structure if Known 2D representation of 150x1000 data with 7 clusters: LDA vs. SVD

  4. Dimension Reduction for Clustered Data Measure for Cluster Quality A = [a1...an] :mxn, clustered data Ni= items in class i, | Ni | = ni ,total r classesci= centroid, c= global centroid • Sw = ∑1≤ i≤ r ∑ j∈Ni (aj – ci ) (aj – ci )T Sb= ∑1≤ i≤ r∑ j ∈Ni (ci – c) (ci – c)T St= ∑1≤ i≤ n(ai– c ) (ai– c)T , Sw + Sb = St

  5. Optimal Dimension Reducing Transformation GT: qxm GTy : qx1, q << m y:mx1 High quality clusters have small trace(Sw) & large trace(Sb) Want: G s.t. min trace(GT SwG) & max trace(GT SbG) • max trace ((GT SwG)-1 (GT SbG)) LDA(Fisher 36, Rao 48) • max trace (GT SbG) Orthogonal Centroid(Park et al. 03) • max trace (GT(Sw+Sb)G) PCA(Pearson 1901, Hotelling 33) • max trace (GTAATG) LSI(Deerwester et al. 90) GTG=I GTG=I GTG=I

  6. Classical LDA(Fisher ’36, Rao ‘48) max trace ((GT SwG)-1 (GT SbG)) • G : leading (r-1) e.vectors of Sw-1Sb Fails when m>n (undersampled), Sw singular Sw Hw HwT x = • Sw=Hw HwT, Hw=[a1-c1, a2-c1, …, an-cr ] : mxn • Sb=Hb HbT, Hb= [ n1(c1-c) , … ,nr(cr - c) ] : mxr

  7. LDA based on GSVD (LDA/GSVD)(Howland, Jeon, Park, SIMAX03, Howland and Park, IEEE TPAMI 04) Sw-1Sb x = l x Sbx=lSwx  δ2Hb HbTx = b2Hw HwTx 0 UT HbT X = (Sb 0) = VT HwT X = (Sw 0) = 0 XTSb X = XTSw X = XT HbHbTX = XT Sb X and XTHwHwTX = XT Sw X Classical LDA is a special case of LDA/GSVD

  8. Generalization of LDA for Undersampled Problems • Regularized LDA(Friedman ’89, Zhao et al. ’99 … ) • LDA/GSVD : Solution G = [ X1 X2 ](Howland, Jeon, Park ’03) • Solutions based on Null(Sw ) and Range(Sb )… (Chen et al. ’00, Yu & Yang ’01, Park & Park ’03 …) • Two-stage methods: • Face Recognition: PCA + LDA(Swets & Weng ’96 , Zhao et al. 99 ) • Information Retrieval: LSI + LDA(Torkkola ’01) • Mathematical Equivalence:(Howland and Park ’03) PCA+ LDA/GSVD = LDA/GSVD LSI +LDA/GSVD = LDA/GSVD More efficient = QRD + LDA/GSVD

  9. QRD Preprocessing in Dim. Reduction (Distance Preserving Dim. Redution) For undersampled data A:mxn, m>>n A Q1 R Q1 Q2 R = = 0 Q1:orthonormal basis for span(A) Dimension reduction of A by Q1T, Q1T A = R: nxn Q1Tpreserves distance of L2 norm: || ai ||2 = || Q1Tai ||2 || ai - aj ||2 = || Q1T (ai - aj )||2 In cos distance: cos(ai, aj) = cos(Q1Tai, Q1T aj) Applicable to PCA, LDA, LDA/GSVD, Isomap, LTSA, LLE, …

  10. Speed Up with QRD Preprocessing(computation time)

  11. Text Classification with Dim. Reduction (Kim, Howland, Park, JMLR03) Classification accuracy (%) Similarity measures: L2 norm and Cosine

  12. Face Recognition on Yale Data (C. Park and H. Park, icdm04) Dim. Red. Method Dim kNN k=1 k=5 k=9 Full Space 8586 79.4 76.4 72.1 LDA/GSVD 14 98.8 (90) 98.8 98.8 Regularized LDA(l=1) 14 97.6 (85) 97.6 97.6 Proj. to null (Sw) 14 97.6 (84) 97.6 97.6 (Chen et al., ’00) Transf. to range(Sb)14 89.7 (82) 94.6 91.5 (Yu & Yang, ’01) Prediction Accuracy in %, leave-one-out ( and average of 100 random split) Yale Face Database: 243 x 320 pixels = full dimension of 77760 11 images/person x 15 people = 165 images After Preprocessing (avg 3x3): 8586 x 165

  13. Fingerprint Classification Results on NIST Fingerprint Database 4 (C. Park and H. Park, Pattern Recognition, 2005) KDA/GSVD: Nonlinear Extension of LDA/GSVD based on Kernel Functions Rejection rate(%) 0 1.8 8.5 KDA/GSVD 90.7 91.3 92.8 kNN & NN Jain et al., 99 - 90.0 91.2 SVM Yao et al., 03 - 90.0 92.2 4000 fingerprint images of size 512x512 By KDA/GSVD, dimension reduced from 105x105 to 4

  14. Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06, …) Given A:mxn with A>=0 and k << min (m,n), find W:mxkand H:kxn with W>=0 and H>=0 s.t. A W min || A – WH ||F H ~ = • NMF/ANLS: Two-block Coordinate Descent Method in Bound-constrained Opt. • Iterate the following ANLS (Kim and Park, Bioinformatics, to appear ) : fixing W , solve minH>=0 || W H–A||F fixing H , solve minW>=0 || HTWT –AT||F Any limit point is a stationary point (Grippo and Siandrone 00)

  15. Nonnegativity Constraints? • Better Approximation vs. Better Representation/Interpretation • Given A : m x n and k < min(m, n) • SVD: Best Approximation  min ||A – W H||F, A = US VT, A @ UkSkVkT • NMF: Better Representation/Interpretation? •  min ||A – W H|| F, W>=0, H>=0 ? • Nonnegative Constraints are physically meaningful • Pixels in digital image, Molecule concentration in bioinformatics • Signal Intensities, Visualization…. • Interpretation of analysis results: nonsubtractive combinations of • nonnegative basis vectors

  16. Performance of NMF Algorithms The relative residuals vs. the number of iterations for NMF/ANLS, NMF/MUR, and NMF/ALS on a zero residual artificial problem A:200x50

  17. Recovery of Factors by SVD and NMF A: 2500x28, W:2500x3, H:3x28 where A=W*H Recovery of the factors W and H by SVD and NMF/ANLS

  18. Summary • Effective Algorithms for Dimension Reduction and Matrix Decompositions that exploits prior knowledge • Design of New Algorithms: e.g. for undersampled data • Take Advantage of Prior Knowledge for • Physically More Meaningful Modeling • Storage and Efficiency Issues for Massive Scale Data • Adaptive Algorithms • * Applicable to a wide range of problems • (Text classification, Facial recognition, Fingerprint classification, Gene class discovery in Microarray data, Protein secondary structure prediction … ) • Thank you!

More Related