1 / 14

Unsupervised Learning by Convex and Conic Coding

Unsupervised Learning by Convex and Conic Coding. D. D. Lee and H. S. Seung NIPS’1997. Introduction. Learning algorithms based on convex and conic encoders are introduced. Less constrained than VQ but more constrained than PCA. VQ Encode each input as the index of the closest prototype.

jleblanc
Download Presentation

Unsupervised Learning by Convex and Conic Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Learning by Convex and Conic Coding D. D. Lee and H. S. Seung NIPS’1997

  2. Introduction • Learning algorithms based on convex and conic encoders are introduced. • Less constrained than VQ but more constrained than PCA. • VQ • Encode each input as the index of the closest prototype. • Capture nonlinear structure • Highly localized • PCA • Encode as the coefficient of a linear superposition of a set of basis vectors. • Distributed representation • Can only model linear structures.

  3. Can produce sparse distributed representation. • Learning algorithms can be understood as approximate matrix factorization.

  4. Affine, Convex, Conic, and Point Coding • Definition • Given a set of basis vectors , the linear combination is called • Affine hull • Convex hull • Conic hull

  5. Goal of encoding • Find the nearest point to the input in the respective hulls. • Minimize the reconstruction error • Encoding of convex, conic encoders • Sparse encoding • Contain coefficients that vanish, due to the nonnegativity constraints in the optimization.

  6. Learning • Objective Function • X : nm matrix of training set • n : dimension, m: number of data • W : nr matrix (basis vectors) • V : rm matrix (encodings) • Description • Approximate factorization of the data matrix X into a matrix W of basis vectors and a matrix V of code vectors.

  7. Constraints • If the input vectors in X have been scaled to the range [0, 1], the constraints on the optimizations are given by • The nonnegativity constraints prevent cancellations from occurring in the linear combinations.

  8. Example: modeling handwritten digits • Experimental Setup • Affine, Convex, Conic, and VQ learning to the USPS database. • Handwritten digits segmented from actual zip codes. • 7291 training and 2007 test images were normalized to a 16 16 grid with pixel intensities in the range [0, 1]. • Training examples were segregated by digit class. • Separate basis vectors were trained for each of the classes using four encodings.

  9. VQ • k-means algorithm was used • Restarted with various initial conditions and the best solution was chosen. • Affine • Determine the affine space that best models the input data. • No obvious interpretation. • Convex • Finds the r basis vectors whose convex hull best fits the input data. • Alternate between projected gradient steps of W and V. • The basis vectors are interpretable as templates and are less blurred than those found in VQ. • Eliminate many invariant transformations, because they would violate the nonnegativity constraints.

  10. Conic • Finds basis vectors whose conic hull best models the input images. • Representation allows combinations of basis vectors. • The basis vectors found are features rather than templates.

  11. Classification • Separately reconstructing the test images with different digit model. • Associate the image with the model having the smallest reconstruction error.

  12. Results • With r=25 patterns per digit class • Convex : error rate = 113/2007 = 5.6% • With r=100 patterns, 89/2007 = 4.4% • Conic: 138/2007 = 6.8% • With r > 50, worse performance as the feature shrink to small spots. • Non-trivial correlations still remain in the and also need to be taken into account.

  13. Discussion • Convex coding is similar to other locally linear models. • Conic coding is similar to the noisy OR and harmonium models • Conic uses continuous variables rather than binary variables. • Makes the encoding computationally tractable and allows for interpolation between basis vectors. • Convex and Conic coding is can be viewed as probabilistic latent variable models. • No explicit model P(va) for the hidden variables was used. • Limit the quality of the Conic models • Building hierarchical representations is needed.

More Related