modeling user interactions in web search and social media n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Modeling User Interactions in Web Search and Social Media PowerPoint Presentation
Download Presentation
Modeling User Interactions in Web Search and Social Media

Loading in 2 Seconds...

play fullscreen
1 / 84

Modeling User Interactions in Web Search and Social Media - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Modeling User Interactions in Web Search and Social Media. Eugene Agichtein Intelligent Information Access Lab Emory University. Intelligent Information Access Lab http://ir.mathcs.emory.edu/. Research areas: Information retrieval & extraction, text mining, and information integration

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Modeling User Interactions in Web Search and Social Media


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University

    2. Intelligent Information Access Lab http://ir.mathcs.emory.edu/ • Research areas: • Information retrieval & extraction, text mining, and information integration • User behavior modeling, social networks and interactions, social media • People And colleagues at Yahoo! Research, Microsoft Research, Emory Libraries, Psychology, Emory School of Medicine, Neuroscience, and Georgia Tech College of Computing. • Support Walter Askew, EC‘09 Qi Guo, 2nd year Ph.D Yandong Liu, 2nd year Ph.D Alvin Grissom,2nd year MS Ryan Kelly, Emory’10 Abulimiti Aji, 1st Year Ph.D

    3. User Interactions:The 3rd Dimension of the Web Amount exceeds web content and structure Published: 4Gb/day; Social Media: 10gb/Day Page views: 100Gb/day[Andrew Tomkins, Yahoo! Search, 2007]

    4. Talk Outline • Web Search Interactions • Click modeling • Browsing • Social media • Content quality • User satisfaction • Ranking and Filtering

    5. Interpreting User Interactions • Clickthrough and subsequent browsing behavior of individual users influenced by many factors • Relevance of a result to a query • Visual appearance and layout • Result presentation order • Context, history, etc. • General idea: • Aggregate interactions across all users and queries • Compute “expected” behavior for any query/page • Recover relevance signal for a given query

    6. Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p)=expected (p) + relevance (q , d)

    7. Clickthrough for Queries with Known Position of Top Relevant Result Higher clickthrough at top non-relevant than at top relevant document Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

    8. Model Deviation from “Expected” Behavior • Relevance component: deviation from “expected”: Relevance(q , d)= observed - expected (p)

    9. Predicting Result Preferences • Task: predict pairwise preferences • A user will prefer Result A > ResultB • Models for preference prediction • Current search engine ranking • Clickthrough • Full user behavior model

    10. Predicting Result Preferences: Granka et al., SIGIR 2005 • SA+N: “Skip Above” and “Skip Next” • Adapted from Joachims’ et al. [SIGIR’05] • Motivated by gaze tracking • Example • Click on results 2, 4 • Skip Above: 4 > (1, 3), 2>1 • Skip Next: 4 > 5, 2>3 1 2 3 4 5 6 7 8

    11. Our Extension: Use Click Distribution • CD: distributional model, extends SA+N • Clickthrough considered iff frequency > εthan expected • Click on result 2 likely “by chance” • 4>(1,2,3,5), but not 2>(1,3) 1 2 3 4 5 6 7 8

    12. Results: Click Deviation vs. Skip Above+Next

    13. Problem: Users click based on result summaries/”captions”/”Snippets” Effect of Caption Features on Clickthrough Inversions, C. Clarke, E. Agichtien, S. Dumais, R. White, SIGIR 2007

    14. Clickthrough Inversions

    15. Relevance is Not the Dominant Factor!

    16. Snippet Features Studied

    17. Feature Importance

    18. Important Words in Snippet

    19. Summary • Clickthrough inversions are powerful tool for assessing the influence of caption features. • Relatively simple caption features can significantly influence user behavior. • Can help more accurately predicting relevance from clickthough by accounting for summary bias.

    20. Idea: go beyond clickthrough/download counts

    21. User Behavior Model • Full set of interaction features • Presentation, clickthrough, browsing • Train the model with explicit judgments • Input: behavior feature vectors for each query-page pair in rated results • Use RankNet (Burges et al., [ICML 2005]) to discover model weights • Output: a neural net that can assign a “relevance” score to a behavior feature vector

    22. RankNet for User Behavior • RankNet: general, scalable, robust Neural Net training algorithms and implementation • Optimized for ranking– predicting an ordering of items, not scores for each • Trains on pairs (where first point is to be ranked higher or equal to second) • Extremely efficient • Uses cross entropy cost(probabilistic model) • Usesgradient descent to set weights • Restarts to escape local minima

    23. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector1 Label1 NN output 1

    24. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Feature Vector2 Label2 NN output 1 NN output 2

    25. RankNet [Burges et al. 2005] • For query results 1 and 2, present pair of vectors and labels, label(1) > label(2) Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

    26. RankNet [Burges et al. 2005] • Update feature weights: • Cost function: f(o1-o2) – details in Burges et al. paper • Modified back-prop Error is function of both outputs (Desire output1 > output2) NN output 1 NN output 2

    27. Predicting with RankNet • Present individual vector and get score Feature Vector1 NN output

    28. Example results: Predicting User Preferences • Baseline < SA+N < CD << UserBehavior • Rich user behavior features result in dramatic improvement

    29. How to Use Behavior Models for Ranking? • Use interactions from previous instances of query • General-purpose (not personalized) • Only for the queries with past user interactions • Models: • Rerank, clickthrough only: reorder results by number of clicks • Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences • Integrate directly into ranker: incorporate user interactions as features for the ranker

    30. Enhance Ranker Features with User Behavior Features • For a given query • Merge original feature set with user behavior features when available • User behavior features computed from previous interactions with same query • Train RankNet [Burges et al., ICML’05] on the enhanced feature set

    31. Feature Merging: Details • Value scaling: • Binning vs. log-linear vs. linear (e.g., μ=0, σ=1) • Missing Values: • 0? (meaning for normalized feats s.t. μ=0?) • Runtime: significant plumbing problems Query: SIGIR, fake results w/ fake feature values

    32. Evaluation Metrics • Precision at K: fraction of relevant in top K • NDCG at K: norm. discounted cumulative gain • Top-ranked results most important • MAP: mean average precision • Average precision for each query: mean of the precision at K values computed after each relevant document was retrieved

    33. Content, User Behavior: NDCG BM25 < Rerank-CT < Rerank-All < +All

    34. Full Search Engine, User Behavior: NDCG, MAP

    35. User Behavior Complements Content and Web Topology

    36. Which Queries Benefit Most Most gains are for queries with poor ranking

    37. Result Summary • Incorporating user behavior into web search ranking dramatically improves relevance • Providing rich user interaction features to ranker is the most effective strategy • Large improvement shown for up to 50% of test queries

    38. User Generated Content

    39. Some goals of mining social media Find high-quality content Find relevant and high quality content Use millions of interactions to Understand complex information needs Model subjective information seeking Understand cultural dynamics

    40. http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdOhttp://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO

    41. Lifecycle of a Question in CQA + + - + - + - User Choose a category Compose the question Open question Examine Answer Answer Answer Close question Choose best answers Give ratings Find the answer? Yes No Question is closed by system. Best answer is chosen by voters 42

    42. Community