1 / 84

Modeling User Interactions in Web Search and Social Media

Modeling User Interactions in Web Search and Social Media. Eugene Agichtein Intelligent Information Access Lab Emory University. Intelligent Information Access Lab http://ir.mathcs.emory.edu/. Research areas: Information retrieval & extraction, text mining, and information integration

anne-levine
Download Presentation

Modeling User Interactions in Web Search and Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University

  2. Intelligent Information Access Lab http://ir.mathcs.emory.edu/ • Research areas: • Information retrieval & extraction, text mining, and information integration • User behavior modeling, social networks and interactions, social media • People And colleagues at Yahoo! Research, Microsoft Research, Emory Libraries, Psychology, Emory School of Medicine, Neuroscience, and Georgia Tech College of Computing. • Support Walter Askew, EC‘09 Qi Guo, 2nd year Ph.D Yandong Liu, 2nd year Ph.D Alvin Grissom,2nd year MS Ryan Kelly, Emory’10 Abulimiti Aji, 1st Year Ph.D

  3. User Interactions:The 3rd Dimension of the Web Amount exceeds web content and structure Published: 4Gb/day; Social Media: 10gb/Day Page views: 100Gb/day[Andrew Tomkins, Yahoo! Search, 2007]

  4. Talk Outline • Web Search Interactions • Click modeling • Browsing • Social media • Content quality • User satisfaction • Ranking and Filtering

  5. Interpreting User Interactions • Clickthrough and subsequent browsing behavior of individual users influenced by many factors • Relevance of a result to a query • Visual appearance and layout • Result presentation order • Context, history, etc. • General idea: • Aggregate interactions across all users and queries • Compute “expected” behavior for any query/page • Recover relevance signal for a given query

  6. Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p)=expected (p) + relevance (q , d)

  7. Clickthrough for Queries with Known Position of Top Relevant Result Higher clickthrough at top non-relevant than at top relevant document Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

  8. Model Deviation from “Expected” Behavior • Relevance component: deviation from “expected”: Relevance(q , d)= observed - expected (p)

  9. Predicting Result Preferences • Task: predict pairwise preferences • A user will prefer Result A > ResultB • Models for preference prediction • Current search engine ranking • Clickthrough • Full user behavior model

  10. Predicting Result Preferences: Granka et al., SIGIR 2005 • SA+N: “Skip Above” and “Skip Next” • Adapted from Joachims’ et al. [SIGIR’05] • Motivated by gaze tracking • Example • Click on results 2, 4 • Skip Above: 4 > (1, 3), 2>1 • Skip Next: 4 > 5, 2>3 1 2 3 4 5 6 7 8

  11. Our Extension: Use Click Distribution • CD: distributional model, extends SA+N • Clickthrough considered iff frequency > εthan expected • Click on result 2 likely “by chance” • 4>(1,2,3,5), but not 2>(1,3) 1 2 3 4 5 6 7 8

  12. Results: Click Deviation vs. Skip Above+Next

  13. Problem: Users click based on result summaries/”captions”/”Snippets” Effect of Caption Features on Clickthrough Inversions, C. Clarke, E. Agichtien, S. Dumais, R. White, SIGIR 2007

  14. Clickthrough Inversions

  15. Relevance is Not the Dominant Factor!

  16. Snippet Features Studied

  17. Feature Importance

  18. Important Words in Snippet

  19. Summary • Clickthrough inversions are powerful tool for assessing the influence of caption features. • Relatively simple caption features can significantly influence user behavior. • Can help more accurately predicting relevance from clickthough by accounting for summary bias.

  20. Idea: go beyond clickthrough/download counts

  21. User Behavior Model • Full set of interaction features • Presentation, clickthrough, browsing • Train the model with explicit judgments • Input: behavior feature vectors for each query-page pair in rated results • Use RankNet (Burges et al., [ICML 2005]) to discover model weights • Output: a neural net that can assign a “relevance” score to a behavior feature vector

  22. RankNet for User Behavior • RankNet: general, scalable, robust Neural Net training algorithms and implementation • Optimized for ranking– predicting an ordering of items, not scores for each • Trains on pairs (where first point is to be ranked higher or equal to second) • Extremely efficient • Uses cross entropy cost(probabilistic model) • Usesgradient descent to set weights • Restarts to escape local minima

  23. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1

  24. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2

  25. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

  26. RankNet [Burges et al. 2005] • Update feature weights: • Cost function: f(o1-o2) – details in Burges et al. paper • Modified back-prop Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

  27. Predicting with RankNet • Present individual vector and get score Feature Vector1 NN output

  28. Example results: Predicting User Preferences • Baseline < SA+N < CD << UserBehavior • Rich user behavior features result in dramatic improvement

  29. How to Use Behavior Models for Ranking? • Use interactions from previous instances of query • General-purpose (not personalized) • Only for the queries with past user interactions • Models: • Rerank, clickthrough only: reorder results by number of clicks • Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences • Integrate directly into ranker: incorporate user interactions as features for the ranker

  30. Enhance Ranker Features with User Behavior Features • For a given query • Merge original feature set with user behavior features when available • User behavior features computed from previous interactions with same query • Train RankNet [Burges et al., ICML’05] on the enhanced feature set

  31. Feature Merging: Details • Value scaling: • Binning vs. log-linear vs. linear (e.g., μ=0, σ=1) • Missing Values: • 0? (meaning for normalized feats s.t. μ=0?) • Runtime: significant plumbing problems Query: SIGIR, fake results w/ fake feature values

  32. Evaluation Metrics • Precision at K: fraction of relevant in top K • NDCG at K: norm. discounted cumulative gain • Top-ranked results most important • MAP: mean average precision • Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

  33. Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

  34. Full Search Engine, User Behavior: NDCG, MAP

  35. User Behavior Complements Content and Web Topology

  36. Which Queries Benefit Most Most gains are for queries with poor ranking

  37. Result Summary • Incorporating user behavior into web search ranking dramatically improves relevance • Providing rich user interaction features to ranker is the most effective strategy • Large improvement shown for up to 50% of test queries

  38. User Generated Content

  39. Some goals of mining social media Find high-quality content Find relevant and high quality content Use millions of interactions to Understand complex information needs Model subjective information seeking Understand cultural dynamics

  40. http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdOhttp://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO

  41. Lifecycle of a Question in CQA + + - + - + - User Choose a category Compose the question Open question Examine Answer Answer Answer Close question Choose best answers Give ratings Find the answer? Yes No Question is closed by system. Best answer is chosen by voters 42

  42. Community

More Related