1 / 24

Thesis Proposal: Prediction of popular social annotations

Thesis Proposal: Prediction of popular social annotations. Abon. Outline . Background Related Work Problem Definition Possible Solution Experiment Plan Evaluation Plan. Background. Prevalence of social web services e.g. MY WEBSITE. WHAT DO THEY HAVE IN COMMON.

finola
Download Presentation

Thesis Proposal: Prediction of popular social annotations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thesis Proposal:Prediction of popular social annotations Abon

  2. Outline • Background • Related Work • Problem Definition • Possible Solution • Experiment Plan • Evaluation Plan

  3. Background • Prevalence of social web services e.g. MY WEBSITE WHAT DO THEY HAVE IN COMMON TAGS & User Generated Content

  4. BackgroundTAGs are for ? • According to del.icio.us founder Tags are one-word descriptors that you can assign to your bookmarks on del.icio.us to help you organize and remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders. Blah blah blah…..

  5. BackgroundTAGs are for ? • According to del.icio.us founder Tags areone-word descriptorsthat you can assign to your bookmarks on del.icio.us to help you organize andtoremember them. Tags are alittle bit like keywords, but they'rechosen by you, and they do not form a hierarchy. You can assign asmany tags to a bookmarkas you likeand rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

  6. BackgroundAn usage example

  7. Why TAGs are useful • In Information Retrieval field, it is a common technique to expand query to get more related data. • Tags are like human-expanded index term.

  8. Query expansion here

  9. Why TAGs are useful • Traditional term expansion scheme relies on term-document relations. And each tag’s importance to a document is often determined by tf-idf. • For each tag user applies, it is like voting for what tag should be with some document. Thus the term-document relations could be measured by tag applications.

  10. Why TAGs are useful • Tags are human-expanded query set which enables more complete concept mapping. • With more and more people applying tags, the popularity of tags reach a stable pattern. and top tags could be used as weighting parameters for search optimization

  11. Related Work • Usage patterns of collaborative tagging systemsJ. Inf. Sci., Vol. 32, No. 2. (April 2006), pp. 198-208.by Golder SA, Huberman BA . • 100+ users , stable pattern appear • Urn model

  12. Stable pattern: top 7 tags remain for one year+

  13. Related Work • Collaborative Tagging and Semiotic Dynamics Cattuto C,LoretoV, Pietronero L. • Long-term memory version of the classic Yule–Simon process • Memory model based on cognitive model

  14. Yule–Simon process Qt (x) = a(t)/(x + τ). a(t) is a normalizing factor τis memory parameter

  15. Related work • The Complex Dynamics of Collaborative Tagging,'‘ • H.~Halpin,V.~Robu,H.~Shepherd in Proceedings of WWW 2007

  16. Empirical Results for Power Law Regression for Popular Sites

  17. P(x) : tag probability distribution at each time point Q(x) : The final tag probability distribution

  18. Problem definition • In initial stage, each url is not sufficiently annotated by people. Thus, it is hard to be retrieved at this time. • For an immature url, predicting future popular tags could provide better retrieval experience. • Mature url : Borrowed from [Halpin] ‘s empirical results for tag dynamics. They are defined as urls with 3+ more years of history on del.icio.us

  19. Expanding tag set Ti{ } : The tag set applied by the ith user for an url. ETi {}:The expanded tag set after the ith user. T0{ } : The tag set suggested by tf-idf term extraction. STi=T0 ETi=ETi-1∪relevantn(Ti) relevantn(Ti)=The n tags with top mutual information to each tag in Ti Mutual information: f(ti,tj)/f(ti)*f(tj)

  20. Cohesivity • Each tag in ETi has a score which indicates its cohesivity to ETi cohesivity of tj to ETi Σf(tk,tj)/f(tj)*f(tk) tk belongs toETi

  21. Pruning ETi • Sort tags in ETi by popularity , take top 7 as suggesting tag set STi • Sort tags in ETi by popularity*cohesivity , take top 7 as suggesting tag set STi

  22. Experiment Plan • Dataset from del.icio.us rss api Mar 28~April 19, 30000 of url, 234982 of tagging, 8392 of users 1.del.icio.us/rss/popular every 30min del.icio.us/rss/recent every 2 min 2.del.icio.us/rss/url?url= xxx.com • Suggesting tags from no user to the 10th user.

  23. Evaluation Plan • For each url, we have mature tags and suggested tags at each iteration. • Recall rate and precision rate could be calculated . Pruning with cohesivity Expanding with relevant tags

More Related