1 / 83

Modeling User Interactions in Social Media

Modeling User Interactions in Social Media. Eugene Agichtein Emory University. Outline. User-generated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems. 3. Trends in search and social media. Search in the East:

shino
Download Presentation

Modeling User Interactions in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling User Interactions in Social Media Eugene Agichtein Emory University

  2. Outline User-generated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems

  3. 3

  4. Trends in search and social media Search in the East: Heavily influenced by social media: Naver, Baidu Knows, TaskCn, .. Search in the West: Social media mostly indexed/integrated in search repositories Two opposite trends in social media search: Moving towards point relevance (answers, knowledge search) Moving towards browsing experience, subscription/push model How to integrate “active” engagement and contribution with “passive” viewing of content?

  5. Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Page views: 180-200Gb/day Technorati+Blogpulse~120M blogs~2M posts/day Twitter: since 11/07:~2M users~3M msgs/day Facebook/Myspace: 200-300M usersAverage 19 min/day Yahoo Answers90M users, ~20M questions, ~400M answers [From Andrew Tomkins/Yahoo!, SSM2008 Keynote]

  6. People Helping People • Naver: popularity reportedly exceeds web search • Yahoo! Answers: some users answer thousands of questions daily • And get a t-shirt • Open, “quirky”, information shared, not “sold” • Unlike Wikipedia: • Chatty threads: opinions, support, validation • No core group of moderators to enforce “quality”

  7. Where is the nearest car rental to Carnegie Mellon University?

  8. Successful Search • Give up on “magic”. • Lookup CMU address/zipcode • Google maps  • Query: “car rental near:5000 Forbes Avenue Pittsburgh, PA 15213”

  9. Total time: 7-10 minutes, active “work”

  10. Someone must know this…

  11. +0 minutes : 11pm

  12. +1 minute

  13. +36 minutes

  14. +7 hours: perfect answer

  15. Why would one wait hours? Rational thinking: effective use of time Unique information need Subjective/normative question Complex Human contact/community Multiple viewpoints

  16. http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO 20

  17. Challenges in ____ing Social Media • Estimating contributor expertise • Estimating content quality • Infering user intent • Predicting satisfaction: general, personalized • Matching askers with answerers • Searching archives • Detecting spam

  18. Yandong Liu Work done in collaboration with: Abulimiti Aji Qi Guo Pawel Jurczyk Prof. Hongyuan Zha Jiang Bian Yahoo! Research: ChaToCastillo, GiladMishne, ArisGionis, Debora Donato, Ravi Kumar Thanks:

  19. Related Work • Adamic et al., WWW 2007, WWW 2008 • Expertise sharing, network structure • Kumar et al.: Info diffusion in blogspace • Harper et al., CHI 2008: Answer quality • Lescovec et al: Cascades, preferential attachment models • Glance & Hurst: Blogging • Kraut et al.: community participation and retention • SSM 2008 Workshop (Searching Social Media) • Elsas et al, blog search, ICWSM 2008s

  20. User 3 User 1 User 4 User 5 User 6 User 2 Estimating Contributor Authority P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer Communities Using Link Analysis (poster), CIKM 2007 Answer 1 User 3 Question 1 User 1 User 4 Answer 2 Question 2 Answer 3 User 5 User 2 User 6 Answer 4 Question 3 Answer 5 Answer 6 Hub (asker) Authority (answerer)

  21. Finding Authorities: Results

  22. Qualitative Observations HITS effective   HITS ineffective

  23. Trolls

  24. Estimating Content Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008

  25. 29

  26. 30

  27. 31

  28. 32

  29. Community 33

  30. 34

  31. 35

  32. 37

  33. from all subsets, as follows: UQV Average number of "stars" to questions by the same asker. ; The punctuation density in the question's subject. ; The question's category (assigned by the asker). ; \Normalized Clickthrough:" The number of clicks on the question thread, normalized by the average number of clicks for all questions in its category. UAV Average number of "Thumbs up" received by answers written by the asker of the current question. ; Number of words per sentence. UA Average number of answers with references (URLs) given by the asker of the current question. UQ Fraction of questions asked by the asker in which he opens the question's answers to voting (instead of pick- ing the best answer by hand). UQ Average length of the questions by the asker. UAV The number of \best answers" authored by the user. U The number of days the user was active in the system. UAV \Thumbs up" received by the answers wrote by the asker of the current question, minus \thumbs down", divided by total number of \thumbs" received. ; \Clicks over Views:" The number of clicks on a ques- tion thread divided by the number of times the ques- tion thread was retrieved as a search result (see [2]). ; The KL-divergence between the question's language model and a model estimated from a collection of ques- tion answered by the Yahoo editorial team (available in http://ask.yahoo.com).

  34. 39

  35. ; Answer length. ; The number of words in the answer with a corpus fre- quency larger than c. UAV The number of \thumbs up" minus \thumbs down" re- ceived by the answerer, divided by the total number of \thumbs" s/he has received. ; The entropy of the trigram character-level model of the answer. UAV The fraction of answers of the answerer that have been picked as best answers (either by the askers of such questions, or by a community voting). ; The unique number of words in the answer. U Average number of abuse reports received by the an- swerer over all his/her questions and answers. UAV Average number of abuse reports received by the an- swerer over his/her answers. ; The non-stopword word overlap between the question and the answer. ; The Kincaid [21] score of the answer. QUA The average number of answers received by the ques- tions asked by the asker of this answer. ; The ratio between the length of the question and the length of the answer. UAV The number of \thumbs up" minus \thumbs down" re- ceived by the answerer. QUAV The average numbers of \thumbs" received by the an- swers to other questions asked by the asker of this an- swer.

  36. Rating Dynamics

  37. Editorial Quality != Popularity != Usefulness 42

  38. Yahoo! Answers: Time to Fulfillment Time to close (hours) Time to close a question (hours) for sample question categories 1. 2006 FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Scottish Football (Soccer) 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation

  39. Predicting Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. “Satisfied” : The asker has closed the question AND Selected the best answer AND Rated best answer >= 3 “stars” Else, “Unsatisfied Y. Liu, J. Bian, and E. Agichtein, Predicting Information Seeker Satisfaction in Community Question Answering, in SIGIR 2008 Jiang Bian Yandong Liu

  40. Motivation • Save time: don’t bother to post • Suggest a good forum for information need • Notify user when satisfactory answer contributed • From “relevance” to information need fulfillment • Explicit ratings from asker & community

  41. ASP: Asker Satisfaction Prediction Answer Answerer History Text Question Asker History Category asker is satisfied asker is not satisfied Wikipedia Classifier News

  42. Datasets Crawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!) Available at http://ir.mathcs.emory.edu/shared

  43. Dataset Statistics • Asker satisfaction varies by category • #Q, #A, Time to close… -> Asker Satisfaction

  44. Satisfaction Prediction: Human Judges Truth: asker’s rating A random sample of 130 questions Researchers Agreement: 0.82 F1: 0.45 Amazon Mechanical Turk Five workers per question. Agreement: 0.9 F1: 0.61. Best when at least 4 out of 5 raters agree

  45. ASP vs. Humans (F1) • ASP is significantly more effective than humans • Human F1 is lower than the naïve baseline!

More Related