1 / 10

Overview of KDDCUP 2011

Overview of KDDCUP 2011. Nathan Liu nliu@cse.ust.hk. KDDCUP 2011 Music Recommendation. KDDCUP is the most prominent data mining competition. In recent years, there have been a number of contest related to movie recommendation: Netflix 2006: predict future ratings

xanto
Download Presentation

Overview of KDDCUP 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of KDDCUP 2011 Nathan Liu nliu@cse.ust.hk

  2. KDDCUP 2011 Music Recommendation • KDDCUP is the most prominent data mining competition. • In recent years, there have been a number of contest related to movie recommendation: • Netflix 2006: predict future ratings • KDDCUP 2007: how many ratings and who rated what • CAMRA 2010: context aware movie recommendation • KDDCUP 2011 is organized by yahoo and provides the first and largest music ratings datasets.

  3. Yahoo Music

  4. KDDCUP 2011 • There are three types of items: songs, artists, albums. • Songs and albums are annotated with genres. • You are given the date, time and scores of each user’s ratings of these different items. • Challenges: • Scale: biggest public dataset ever. 1 million user, 0.6 million items, 300 million ratings • Hierarchical item relation: song belong to albums, albums belong to artists. All of them are annotated with genre tags. • Rich meta data: over 900 genres • Fine temporal resolution: no previous challenge provided time in addition to date. • For the project, you will be provided with a small subset of the data and we will held a mini internal competition to determine which group obtained the best results.

  5. KDDCUP 2011: Task 1 • The test set consists of hold out ratings from users in the training set. Each rating is time stamped. • In the test set, you are given who rated which items at what time. • You are asked to predict the rating scores. • Closely related to Netflix competition, but may require time of day effect consideration. • References: • Koren. Matrix Factorization Techniques for Recommender Systems. (IEEE Computer 2009) • Koren. Collaborative Filtering with Temporal Dynamics (KDD’09) • Xiong. Time-Evolving Collaborative Filtering (SDM’10) • Liu. Online Evolutionary Collaborative Filtering (RECSYS’10)

  6. KDDCUP 2011: Task 2 • The test set consists of hold out ratings from users in the training set. Time has been removed. • In the test set, you are given 6 items for each user. • You are asked to predict which 3 of the 6 are actually rated by the user. • Closely related to KDDCUP 2007 “who rated what” and CAMRA2010 weekly recommendation track • References: • Hu. Collaborative Filtering for Implicit Feedback Datasets (ICDM’08) • Rendle. Bayesian Personalized Ranking from Implicit Feedback (UAI’09) • Cremonesi. Performance of Recommender Algorithms on Top-N Recommendation Tasks (RECSYS’10) • Steck. Training and Testing of Recommender Systems on Data Missing Not at Random (KDD’10)

  7. For The Project • We will extract a subset for you to work on. • We will provide some basic algorithms. • You can choose to work on one of the two tasks. • The minimum requirement is that you should run thorough experiments with the provided algorithms and write a report on your findings about different algorithms. • There are also new things to try….

  8. Things to Try (1): Ensemble • Same algorithm different parameter settings • Different algorithms • Stacking: • What meta learner? Gradient Boosted Decision Tree, Linear Regression • Any meta features? Tail vs. Head segmentation strategy • References: • Bao et. al. Stacking Recommendation Engines with Additional Meta-Features (RECSYS’09) • Jahrer et. al. Combining Predictions for Accurate Recommender Systems (KDD’10)

  9. Things to Try (3): Exploiting Item Relations and Genres • From social network of users to networks of items. • Combining collaborative filtering with genre based prediction for alleviating sparseness. • References: • Ma. Recommender Systems with Social Regularization (WSDM’11) • Agarwal. Regression based Latent Factor Models (KDD’09) • Popescul. Probabilistic Models for Unified Collaborative and Content-based Recommendation in sparse-data environments (UAI’01) • Gunawardana. Tied Boltzman Machines for Cold Start Recommendations (RecSys’08)

  10. Things to Try (2): Temporal Dynamics • Various possible types of temporal dynamics: • Long term effect: people getting pickier over time • Short term effect: festival mood • Time of day effect: day time vs. night time preference • Periodicity: every Friday night is party time • References: • Koren. Collaborative Filtering with Temporal Dynamics (KDD’09)

More Related