1 / 22

Research Updates

Research Updates. He Xiangnan (PhD student) 11/2/2012. Research Topic. General topic: Leveraging UGC in Web2.0 to improve some IR related tasks Current task: Leveraging user comments to enable popularity-aware rank of items in Web2.0. Popularity-aware rank.

valiant
Download Presentation

Research Updates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Updates He Xiangnan (PhD student) 11/2/2012

  2. Research Topic • General topic: Leveraging UGC in Web2.0 to improve some IR related tasks • Current task: • Leveraging user comments to enable popularity-aware rank of items in Web2.0

  3. Popularity-aware rank • Based on the current states of items, ranking items to reflect their popularity in the future. • Motivation of popularity-aware rank: • Unequally distribution of popularity • Huge temporal dynamics of popularity • A rank of items that can forecast their future popularity will improve the user experience, especially for some temporal-related queries. Examples..

  4. Examples(I) • - Search “nba” to YouTube at 6/22/2012 night • (NBA final games at that day morning) • - None of the top results are not about the championship of the Miami Heat

  5. Examples(II) • Search “nobel prize China” at 10/12/2012 to Google domain search(YouTube) • None of the top results are about Mo Yan’s Nobel Literature Prize

  6. Challenges • Intuitive way: • Utilizing the visiting histories of items, treating them as time-series and performing prediction • Difficulties: • Visiting histories are difficult to get and maintain (expensive) • Traditional time-series prediction approaches are easy to fail in the case that items are experiencing bursts • My proposal: • Leveraging the user comments

  7. Observation in YouTube • Observation: the comment history is highly correlated with the view history

  8. Pre-Analysis(I) • YouTube dataset (14,509 videos of ten queries). • Pearson correlation of comment history and view history: More than 80% videos with correlation more than 0.5 Conclusion: the comment history is highly correlated with the view history!

  9. Pre-Analysis(II) • Have shown the tight correlation of comments and views • A natural question: how the past comments reflect the future comments? • Autocorrelation of a series: • Measure the correlation of a time series at different distances apart (lags) • Autocorrelation@lagK is the correlation of series {x_1, x_n-k} and {x_k+1, x_n}

  10. Results of Autocorrelation of Comment Series Exhibits a short-term correlation (r_1 is large and r_k decreases very fast) Conclusion: the recent comments reflect most of the future and the predicting ability decreases with time.

  11. Intuitions • More comments an item has, more popular it is. • Each comment has a contribution to the item’s popularity(or importance) • Different user’s commenting behavior has different influence on the item’s popularity. • Social interfaces in Web2.0 systems. • More active the user is, more influence it is. • More popular the commented item is, more influence the user is.

  12. Method Overview

  13. User-Item Temporal Bipartite Graph Model • The edge weight (decay function with time): • The weight matrix of the graph:

  14. Random Walk Process(I) • Transition matrix: • The nature iterative process (HITS): • Problem: if the graph is sparse and disconnected, it will be trapped into local optima.

  15. Random Walk Process(II) • Add the smoothing to avoid the local optima case: • The process in the bipartite graph can be converted into a random walk in homogeneous-node graph and it will converge (Proof ignored.)

  16. Experiments • Crawled 3 datasets(20k size) to give a comprehensive evaluation of the performance in general Web2.0 systems. • Have not done the whole experiments yet, show an experimental result on Last.fm

  17. Preparation • 2 time points: • 2012.10.19 (t0) • 2012.10.22 (t1) • Goundtruth is the #views in (t1-t0) • Comparing methods: • Comm_Oracle: #comment in the future days(t1-t0). • VC: View Count in the day t0 • CCP: Comment Count in the Past 3 days of t0 • Sum_Tscore: the sum of all comments’ contribution(all users have the same weights) • TPR: my approach

  18. Overall Performance

  19. Split by different popularity • Preparation: • Sort all items by the #view. (Large -> Small) • Split the 17000 items into 5 folds, each with the same size • Evaluate each fold. • Report the average performance of all folds

  20. Average Performance of Splitted Folds Observation: For the 1st fold, VC is the best; for the 2-5 folds, TPR is the best. Possible Reason: for the Last.fm dataset, the past extreme popular artists still attract many visits without attracting many new comments, such as The Beatles, Muse.

  21. To do... • Finish the experiments in the other datasets. • Refinement of the approach for different types of data. such as: • For extreme popular but old items, using personalized vector to have a bias.

  22. Questions && Suggestion? Thanks!

More Related