matrix factorization n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Matrix Factorization PowerPoint Presentation
Download Presentation
Matrix Factorization

Loading in 2 Seconds...

play fullscreen
1 / 50

Matrix Factorization - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Matrix Factorization. Recovering latent factors in a matrix. m movies. n users. V[ i,j ] = user i’s rating of movie j. Recovering latent factors in a matrix. m movies. m movies. ~. n users. V[ i,j ] = user i’s rating of movie j. KDD 2011. talk pilfered from  ….

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Matrix Factorization


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Matrix Factorization

    2. Recovering latent factors in a matrix m movies n users V[i,j] = user i’s rating of movie j

    3. Recovering latent factors in a matrix m movies m movies ~ n users V[i,j] = user i’s rating of movie j

    4. KDD 2011 talk pilfered from  …..

    5. Recovering latent factors in a matrix r m movies m movies ~ H W V n users V[i,j] = user i’s rating of movie j

    6. for image denoising

    7. Matrix factorization as SGD step size

    8. Matrix factorization as SGD - why does this work? step size

    9. Matrix factorization as SGD - why does this work? Here’s the key claim:

    10. Checking the claim • Think for SGD for logistic regression • LR loss = compare y and ŷ= dot(w,x) • similar but now update w (user weights) and x (movie weight)

    11. What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies “generalized” KL-divergence

    12. What loss functions are possible?

    13. What loss functions are possible?

    14. ALS = alternating least squares

    15. KDD 2011 talk pilfered from  …..

    16. Similar to McDonnell et al with perceptron learning

    17. Slow convergence…..

    18. More detail…. • Randomly permute rows/cols of matrix • Chop V,W,H into blocks of size d x d • m/d blocks in W, n/d blocks in H • Group the data: • Pick a set of blocks with no overlapping rows or columns (a stratum) • Repeat until all blocks in V are covered • Train the SGD • Process strata in series • Process blocks within a stratum in parallel

    19. More detail…. Z was V

    20. More detail…. M= • Initialize W,H randomly • not at zero  • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch” • Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size: • increase step size when loss decreases (in an epoch) • decrease step size when loss increases • Implemented in Hadoop and R/Snowfall

    21. Wall Clock Time8 nodes, 64 cores, R/snow

    22. Number of Epochs

    23. Varying rank100 epochs for all

    24. Hadoop scalability Hadoop process setup time starts to dominate

    25. Hadoop scalability