1 / 64

Music r ecommendations at Spotify

Music r ecommendations at Spotify. Erik Bernhardsson erikbern@spotify.com. Spotify. Launched in 2009 Available in 17 countries 20M active users, 5M paying subscribers Peak at 5k tracks/s, 1M logged in users 20M tracks. Some applications. Recommendation stuff at Spotify.

ayla
Download Presentation

Music r ecommendations at Spotify

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Music recommendations at Spotify Erik Bernhardssonerikbern@spotify.com

  2. Spotify • Launched in 2009 • Available in 17 countries • 20M active users, 5M paying subscribers • Peak at 5k tracks/s, 1M logged in users • 20M tracks

  3. Some applications

  4. Recommendation stuff at Spotify • Related artists:

  5. Recommendation stuff at Spotify, cont…

  6. More!

  7. How can we find music?

  8. Recommendations • Manual classification • Feature extraction • Social media analysis, web scraping, metadata based • Collaborative filtering

  9. Pandora & Music Genome Project • Classifies tracks in terms of 400 attributes • Each track takes 20-30 minutes to classify • A distance function finds similar tracks • “Subtle use of strings” • “Epic buildup” • “Acid Jazz roots” • “Beats made for dancing” • “Trippy soundscapes” • “Great trombone solo” • …

  10. Scraping the web is another approach

  11. Feature extraction

  12. Collaborative filtering • Idea: • If two movies x, y get similar ratings then they are probably similar • If a lot of users all listen to tracks x, y, z, then those tracks are probably similar

  13. Collaborative filtering

  14. Get data

  15. … lots of data

  16. Aggregate data • Throw away temporal information and just look at the number of times

  17. OK, so now we have a big matrix

  18. … very big matrix • Throw out all the temporal data:

  19. Supervised collaborative filtering is pretty much matrix completion

  20. Supervised learning: Matrix completion

  21. Supervised: evaluating rec quality

  22. Unsupervised learning • Trying to estimate the density • i.e. predict probability of future events

  23. Try to predict the future given the past

  24. How can we find similar items

  25. We can calculate correlation coefficient as an item similarity • Use something like Pearson, Jaccard, …

  26. Amazon did this for “customers who bought this also bought” • US patent 7113917

  27. Parallelization is hard though

  28. Can speed this up using various LSH tricks • Twitter: Dimension Independent Similarity Computation (DISCO)

  29. Are there other approaches?

  30. Natural Language Processing has a lot of similar problems • …matrix factorization is one idea

  31. Matrix factorization

  32. Matrix factorization • Want to get user vectors and item vectors • Assume f latent factors (dimensions) for each user/item

  33. Probabilistic Latent Semantic Analysis (PLSA) • Hofmann, 1999 • Also called PLSI

  34. PLSA, cont. • + a bunch of constraints:

  35. PLSA, cont. • Optimization problem: maximize log-likelihood

  36. PLSA, cont.

  37. “Collaborative Filtering for Implicit Feedback Datasets” • Hu, Koren, Volinsky (2008)

  38. “Collaborative Filtering for Implicit Feedback Datasets”, cont.

  39. Here is another method we use

  40. What happens each iteration • Assign all latent vectors small random values • Perform gradient ascent to optimize log-likelihood

  41. Calculate derivative and do gradient ascent • Assign all latent vectors small random values • Perform gradient ascent to optimize log-likelihood

  42. 2D iteration example

  43. Vectors are pretty nice because things are now super fast • User-item score is a dot product: • Item-item similarity score is a cosine similarity: • Both cases have trivial complexity in the number of factors f:

  44. Example: item similarity as a cosine of vectors

  45. Two dimensional example for tracks

  46. We can rank all tracks by the user’s vector

More Related