1 / 74

Modeling Information Seeking Behavior in Social Media

Modeling Information Seeking Behavior in Social Media. Eugene Agichtein. Intelligent Information Access Lab ( IRLab ). Intelligent Information Access Lab ( IRLab ). Yandong Liu (2 n d year Phd ). Modeling information seeking behavior Web search and social media search

calvin
Download Presentation

Modeling Information Seeking Behavior in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)

  2. Intelligent Information Access Lab (IRLab) Yandong Liu (2nd year Phd) • Modeling information seeking behavior • Web search and social media search • Text and data mining for medical informatics and public health In collaboration with: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - HongyuanZha(GaTech) Ablimit Aji (2nd year PhD) Qi Guo (3rd year Phd) 1st year graduate students: Julia Kiseleva, Dmitry Lagun, Qiaoling Liu, Wang Yu Eugene Agichtein, Emory University, IR Lab

  3. Online Behavior and Interactions • Information sharing: blogs, forums, discussions • Search logs:queries, clicks • Client-side behavior: Gaze tracking, mouse movement, scrolling Eugene Agichtein, Emory University, IR Lab

  4. Research Overview Discover Models of Behavior(machine learning/data mining) Intelligent search Information sharing Cognitive Diagnostics Health Informatics 4 Eugene Agichtein, Emory University, IR Lab

  5. Key Challenges for Web Search • Query interpretation (infer intent) • Ranking (high dimensionality) • Evaluation (system improvement) • Result presentation (information visualization) Eugene Agichtein, Emory University, IR Lab

  6. Contextualized Intent Inference • SERP text • Mouse trajectory, hovering/dynamics • Scrolling • Clicks Eugene Agichtein, Emory University, IR Lab

  7. Research Intent Eugene Agichtein, Emory University, IR Lab

  8. Purchase Intent Eugene Agichtein, Emory University, IR Lab

  9. Relationship between behavior and intent? • Search intent is contextualized within a search session • Implication 1: model session-level state • Implication 2: improve detection based on client-side interactions Eugene Agichtein, Emory University, IR Lab

  10. Model: Linear Chain CRF Eugene Agichtein, Emory University, IR Lab

  11. Results: Ad Click Prediction • 200%+ precision improvement (within mission) Eugene Agichtein, Emory University, IR Lab

  12. Research Overview Discover Models of Behavior(machine learning/data mining) Intelligent search Information sharing Cognitive Diagnostics Health Informatics 12 Eugene Agichtein, Emory University, IR Lab

  13. Finding Information Online (Revisited) Next generation of search: Algorithmically-mediated information exchange CQA (collaborative question answering): • Realistic information exchange • Searching archives • Train NLP, IR, QA systems • Study of social behavior, norms Content quality, asker satisfaction Current andfuture work

  14. Goal: Hybrid Human-Powered Search 14

  15. Talk Outline • Overview of the Emory IR Lab • Intent-centric Web Search • Classifying intent of a query • Contextualized search intent detection Eugene Agichtein, Emory University, IR Lab

  16. 16

  17. (Text) Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Technorati+Blogpulse120M blogs2M posts/day Twitter: since 11/07:2M users3M msgs/day Facebook/Myspace: 200-300M usersAvg 19 m/day Yahoo Answers: 90M users, 20M questions, 400M answers Yes, we could read your blog. Or, you could tell us about your day [Data from Andrew Tomkins, SSM2008 Keynote]

  18. Total time: 7-10 minutes, active “work” 19

  19. Someone must know this…

  20. +1 minute

  21. +7 hours: perfect answer

  22. Update (2/15/2009)

  23. http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO 24

  24. Finding Information Online (Revisited) Next generation of search: Algorithmically-mediated information exchange CQA (collaborative question answering): • Realistic information exchange • Searching archives • Train NLP, IR, QA systems • Study of social behavior, norms Content quality, asker satisfaction Current andfuture work

  25. (Some) Related Work • Adamic et al., WWW 2007, WWW 2008: • Expertise sharing, network structure • Elsas et al., SIGIR 2008: • Blog search • Glance et al.: • Blog Pulse, popularity, information sharing • Harper et al., CHI 2008, 2009: • Answer quality across multiple CQA sites • Kraut et al.: • community participation • Kumar et al., WWW 2004, KDD 2008, …: • Information diffusion in blogspace, network evolution SIGIR 2009 Workshop on Searching Social Media http://ir.mathcs.emory.edu/SSM2009/

  26. Finding High Quality Content in SM E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding High Quality Content in Social Media, in WSDM 2008 • Well-written • Interesting • Relevant (answer) • Factually correct • Popular? • Provocative? • Useful? As judged by professional editors

  27. Social Media Content Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008 quality

  28. 30

  29. How do Question and Answer Quality relate? 31

  30. 32

  31. 33

  32. 34

  33. 35

  34. Community

  35. Link Analysis for Authority Estimation User 3 User 1 User 4 User 5 User 6 User 2 Answer 1 User 3 Question 1 User 1 User 4 Answer 2 Question 2 Answer 3 User 5 User 2 User 6 Answer 4 Question 3 Answer 5 Answer 6 Hub (asker) Authority (answerer)

  36. Qualitative Observations HITS effective   HITS ineffective

  37. Random forest classifier 39

  38. Result 1: Identifying High Quality Questions

  39. Top Features for Question Classification • Asker popularity (“stars”) • Punctuation density • Question category • Page views • KL Divergence from reference LM

  40. Identifying High Quality Answers

  41. Top Features for Answer Classification • Answer length • Community ratings • Answerer reputation • Word overlap • Kincaid readability score

  42. Finding Information Online (Revisited) • Next generation of search: • human-machine-human • CQA: a case study in complex IR • Content quality • Asker satisfaction • Understanding the interactions

  43. Dimensions of “Quality” • Well-written • Interesting • Relevant (answer) • Factually correct • Popular? • Timely? • Provocative? • Useful? As judged by the asker (or community) 45

  44. Are Editor Labels “Meaningful” for CGC? • Information seeking process: want to find useful information about topic with incomplete knowledge • N. Belkin: “Anomalous states of knowledge” • Want to model directly if user found satisfactory information • Specific (amenable) case: CQA

  45. Yahoo! Answers: The Good News • Active community of millions of users in many countries and languages • Effective for subjective information needs • Great forum for socialization/chat • Can be invaluable for hard-to-find information not available on the web

  46. Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer 1. FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Soccer 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours)

  47. Predicting Asker Satisfaction Y. Liu, J. Bian, and E. Agichtein, in SIGIR 2008 Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. • “Satisfied” : • The asker has closed the question AND • Selected the best answer AND • Rated best answer >= 3 “stars” (# not important) • Else, “Unsatisfied Jiang Bian Yandong Liu

More Related