Thesis Proposal: Prediction of popular social annotations

Thesis Proposal:Prediction of popular social annotations Abon

Outline • Background • Related Work • Problem Definition • Possible Solution • Experiment Plan • Evaluation Plan

Background • Prevalence of social web services e.g. MY WEBSITE WHAT DO THEY HAVE IN COMMON TAGS & User Generated Content

BackgroundTAGs are for ? • According to del.icio.us founder Tags are one-word descriptors that you can assign to your bookmarks on del.icio.us to help you organize and remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders. Blah blah blah…..

BackgroundTAGs are for ? • According to del.icio.us founder Tags areone-word descriptorsthat you can assign to your bookmarks on del.icio.us to help you organize andtoremember them. Tags are alittle bit like keywords, but they'rechosen by you, and they do not form a hierarchy. You can assign asmany tags to a bookmarkas you likeand rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

BackgroundAn usage example

Why TAGs are useful • In Information Retrieval field, it is a common technique to expand query to get more related data. • Tags are like human-expanded index term.

Query expansion here

Why TAGs are useful • Traditional term expansion scheme relies on term-document relations. And each tag’s importance to a document is often determined by tf-idf. • For each tag user applies, it is like voting for what tag should be with some document. Thus the term-document relations could be measured by tag applications.

Why TAGs are useful • Tags are human-expanded query set which enables more complete concept mapping. • With more and more people applying tags, the popularity of tags reach a stable pattern. and top tags could be used as weighting parameters for search optimization

Related Work • Usage patterns of collaborative tagging systemsJ. Inf. Sci., Vol. 32, No. 2. (April 2006), pp. 198-208.by Golder SA, Huberman BA . • 100+ users , stable pattern appear • Urn model

Stable pattern: top 7 tags remain for one year+

Related Work • Collaborative Tagging and Semiotic Dynamics Cattuto C,LoretoV, Pietronero L. • Long-term memory version of the classic Yule–Simon process • Memory model based on cognitive model

Yule–Simon process Qt (x) = a(t)/(x + τ). a(t) is a normalizing factor τis memory parameter

Related work • The Complex Dynamics of Collaborative Tagging,'‘ • H.~Halpin,V.~Robu,H.~Shepherd in Proceedings of WWW 2007

Empirical Results for Power Law Regression for Popular Sites

P(x) : tag probability distribution at each time point Q(x) : The final tag probability distribution

Problem definition • In initial stage, each url is not sufficiently annotated by people. Thus, it is hard to be retrieved at this time. • For an immature url, predicting future popular tags could provide better retrieval experience. • Mature url : Borrowed from [Halpin] ‘s empirical results for tag dynamics. They are defined as urls with 3+ more years of history on del.icio.us

Expanding tag set Ti{ } : The tag set applied by the ith user for an url. ETi {}:The expanded tag set after the ith user. T0{ } : The tag set suggested by tf-idf term extraction. STi=T0 ETi=ETi-1∪relevantn(Ti) relevantn(Ti)=The n tags with top mutual information to each tag in Ti Mutual information: f(ti,tj)/f(ti)*f(tj)

Cohesivity • Each tag in ETi has a score which indicates its cohesivity to ETi cohesivity of tj to ETi Σf(tk,tj)/f(tj)*f(tk) tk belongs toETi

Pruning ETi • Sort tags in ETi by popularity , take top 7 as suggesting tag set STi • Sort tags in ETi by popularity*cohesivity , take top 7 as suggesting tag set STi

Experiment Plan • Dataset from del.icio.us rss api Mar 28~April 19, 30000 of url, 234982 of tagging, 8392 of users 1.del.icio.us/rss/popular every 30min del.icio.us/rss/recent every 2 min 2.del.icio.us/rss/url?url= xxx.com • Suggesting tags from no user to the 10th user.

Evaluation Plan • For each url, we have mature tags and suggested tags at each iteration. • Recall rate and precision rate could be calculated . Pruning with cohesivity Expanding with relevant tags

Thesis Proposal: Prediction of popular social annotations

Thesis Proposal: Prediction of popular social annotations

Presentation Transcript

Welcome to Mars

BCB 444/544

Prediction of protein structure

DAAD PHD PROPOSAL WRITING WORKSHOP 28TH APRIL 2011 ICIPE, NAIROBI

THESIS WORKSHOP for APA only CSUF Department of Graduate Studies fullerton.edu/graduate

THESIS WORKSHOP Sponsored by the CSUF Department of Graduate Studies

CAREER Proposal Writing

Approaches to ensemble prediction: the TIGGE ensembles ( EC/TC/PR/RB-L2)

Popular culture

THESIS WORKSHOP Sponsored by the CSUF Department of Graduate Studies

Vision-Based Retrieval of Dynamic Hand Gestures

Prediction of protein function

Chapter 6. Classification and Prediction

GUI Programming, Building Applets and Introduction to Annotations:

Applications of GPS Radio Occultatoin Data to Tropical Cyclone Prediction

TAMU-C Proposal Writing Workshop If you don’t write grants, you won’t get any

Web 2.0 and Social Networking

Folk and Popular Culture

Weather and Climate Prediction