1 / 20

TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering

TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering. Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo. Outline. What is “TiVo” ? Why Suggestions? Collaborative filtering background TiVo collaborative filtering data cycle Server-side learning Previous Work

kasia
Download Presentation

TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo

  2. Outline • What is “TiVo” ? • Why Suggestions? • Collaborative filtering background • TiVo collaborative filtering data cycle • Server-side learning • Previous Work • Contributions

  3. Contributions • Large fielded system • Large number (3M) of users • Long-lived interaction w user: >90 t/user • 10^8 ratings over 300K shows • Very large in user-hours • Distributed architecture • Server: Throttle-able • Clients do bulk of work • Privacy-preservation • Privacy and distributed goals aligned • No persistent memory of user on server

  4. What is “TiVo” ? • TiVo = set-top TV box + program-guide service • Pause & rewind live TV • Linux OS • Viewers can rate shows • Suggestions • Q4 1999

  5. Why Suggestions? • Connect users to shows they’ll like • Predict degree to which viewer will like TV show • Produces ranked list of upcoming shows • Records shows if disk space is available

  6. Filtering Background Recommendation Systems Content-based: use “intrinsic” features such as genre, cast, director, writers, age, channel-type,… Combined, Cascaded Collaborative filtering: use other people’s ratings

  7. Content isn’t sufficient • Genres are few • Text length is small

  8. Data cycle 1: Collecting Feedback: Thumbs up/dn Recorded Rated shows in sorted order Thumbs Profile on TiVo Client box 5. Use correlations and Thumbs profile to rate shows 2. TiVo calls server uploads entire anonymized profile Correlation pairs on client Random ID generated for profile and stored on server Correlation pairs <s1,s2,r> on server 4. Download pairs during some client-initiated calls 3. Server- side learning

  9. Collaborative Filtering Model • k Nearest Neighbor over other rated correlated shows • Use Pair-wise Pearson correlation • Adjusted correlation for low support • Use weighted linear combination

  10. 1. Collecting Feedback • Explicit: • Thumbs up, down: -3 ... +3 • Implicit: • User-initiated recording ... +1 thumbs

  11. 2. Privacy and Data Upload • TiVo calls server daily • Entire profile uploaded and given temp id • Server deletes old profiles: sliding window

  12. 3.1 Server-side scaling • 300,000 unique shows /week • 10^11 pairs of shows • 3M users • Average of 90 thumbs / user:> 10^8 thumbs (ratings) • Ratings are sparse in the pair space • Don’t need to predict for very unpopular pairs

  13. 3.2 Server-side Learning • Building pair-wise item/item correlations on server • Use simple Pearson pair-wise correlation • 7 ratings levels per show [-3 … +3] • Only need to maintain 7 * 7 array of counts per pair • Efficient: CPU, memory • Compute r-to-z transform to computer confidence interval • Support-penalized degree of correlation:lower bound of confidence-interval • Distinguishes r = 0.8 for S=10 versus S=1000

  14. 3.3 Throttled Server-side Architecture Log Collector 1 Boxes 1..100K Log Collector m Boxes 100K(m-1) .. 100K m By-series Counter 1 Series 0..30K By-series Counter n Series 30k(n-1)..30kn 1: By-series-pair Counter and Correlations Calc. P: By-series-pair Counter and Correlations Calc. Transmit correlation pairs to TiVo Clients

  15. 3.4 Server-side throttling • min_single (150) • min_pair (100) • Throttle-able: • More HW available • Increasing TiVo population • Go deeper into distribution tail

  16. Details • Pearson r • Weighted average • r-to-z transform(Fisher) • Standard: Lower bound of confidence interval:

  17. 4. Download to clients • 28K pairs sent to client (320kb) • Correl. between old shows don’t change fast • New Shows: want to do it faster

  18. 5. Client-side processing • Ratings must not cause video glitching! • 2am: TiVo re-rates all shows • Collab: k-nearest neighbor • Content-based: Naïve Bayes

  19. Previous Work • User-user or item-item - Sarwar et al • Form of model • k-nearest neighbor, • Bayes nets (Breese et al.), • Factor Analysis (Canny) • Similarity/distance function • Pearson (subsumes cosine) • TFIDF corrections (Salton et al.) • User amplification • Combination functions: k-NN, Bayes nets.. • Evaluation Criteria: MAE, Spearman rank correl.

  20. Contributions • Large fielded system • Large number (3M) of users • Long-lived interaction w user: >90 t/user • 10^8 ratings over 300K shows • Very large in user-hours • Distributed architecture • Server: Throttle-able • Clients do actual suggestion calculations • Privacy-preservation • Privacy and distributed goals aligned • No persistent memory of user on server

More Related