1 / 32

Evaluating Novelty and Diversity

two talks in one!. Evaluating Novelty and Diversity. Charles Clarke School of Computer Science University of Waterloo. Goals for Evaluation Measures. meaningful tractable reusable. Evaluation Framework. We examine a framework for evaluation.

manasa
Download Presentation

Evaluating Novelty and Diversity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. two talks in one! Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo

  2. Goals for Evaluation Measures • meaningful • tractable • reusable

  3. Evaluation Framework We examine a framework for evaluation. Specific measures covered by the framework include: Clarke et al. (SIGIR ’08) Agrawal et al. (WSDM ’09) Clarke et al. (ICTIR ‘09)

  4. Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo

  5. Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas

  6. Nuggets • Nugget = any binary property of a document • Provides address of a Pella dealer. • Discusses history of the Windows OS. • Is the Windows update page. • (factual, topical and navigational) • Problem: potentially thousands per query.

  7. Evaluation • Model user information needs using nuggets. Different users will be interested in different combinations of nuggets. • Express judgments in terms of nuggets. Judgments may be automatic or manual. Judgments are binary: Does this document contain this nugget? • Nuggets link users and documents

  8. Interdependencies Problem: Complex interdependencies between nuggets. Three possible simplifying assumptions: • User interested in nugget A will always be interested in nugget B. • User interested in nugget A will never be interested in nugget B. • Nuggets A and B are independent.

  9. Possible Assumption #1 If a user interested in nugget A will always be interested in nugget B, then A and B can be treated as the same nugget.

  10. Possible Assumption #2 A user interested in nugget A will never be interested in nugget B (and vice versa). A user’s interest in nugget A depends on their interest in nugget B. Nugget A and nugget be may be viewed as representing different interpretations of the query.

  11. Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas

  12. Query Interpretations • Assume M interpretations • Compute any effectiveness measure with respect to each interpretation (Sj) • Compute weighted average (where pjis probability of interpretation j) • Agrawal et al, 2009

  13. Possible Assumption #3 A user’s interest in nugget A is independent of their interest in nugget B. The probability that the user is interested in nugget A is a constant (pA). The probability that the user is interested in nugget B is a constant (pB).

  14. Query: “windows” • Microsoft Windows • When will Windows 7 be released? • What’s the Windows update URL? • I want to download Windows Live Essentials • House windows • Where can I buy replacement windows? • What brands are available? • Aluminum or vinyl? • Windows Restaurant, Las Vegas

  15. Relevance framework A document is relevant if it contains any relevant information (with N nuggets).

  16. Relevance • Assume constant user probabilities • Assume constant document probabilities • J(d, i) = 1 iff document d is judged to contain nugget i count the nuggets

  17. Probability of Relevance Estimated probability of relevance replaces relevance in standard evaluation measures, including nDCG, MAP, and Rank-biased precision. Assumptions #2 and #3 can then be combined. Other estimation methods possible.

  18. Research Issues (talk #1) • Identifying nuggets automatically • Clustering • Co-clicks • Query refinement • Automatic judging • Patterns • Classification • How many nuggets are enough? • Estimating probability of relevance

  19. Conclusions (talk #1) • Evaluating diversity requires us tomodel and represent the diversity. • Nuggets represent one possible solution. • Simple user model; simple assumptions; simple judging.

  20. Questions? Talk #1: Evaluating Diversity Charles Clarke School of Computer Science University of Waterloo

  21. Intermission The TREC 2009 Web Track • traditional adhoc task • novelty and diversity task • ClueWeb09 dataset (one billion pages) • explore effectiveness measures • http://plg.uwaterloo.ca/~trecweb

  22. Intermission: Free sample topic <topic number=0> <query> physical therapist </query> <description> The user requires information regarding the profession and the services it provides. </description> <subtopic number=1> What does a physical therapist do? </subtopic> <subtopic number=2> Where can I find a physical therapist? </subtopic> <subtopic number=3> How much does physical therapy cost per hour? </subtopic> …

  23. Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo

  24. Novelty • Novelty depends on diversity. • Previous talk considered probability of relevance in isolation (e.g., for the top-ranked document). • In this talk we will examine how user context impacts the probability of relevance.

  25. User context

  26. Simplest context model • Ranked list • User scans result 1, 2, 3, 4, 5, … in order. • Novelty of result k considered in light of the first k-1 results.

  27. Relevance framework

  28. Relevance Assuming constant probabilities.

  29. Beyond the ranked list

  30. Research issues (talk #2) • Better user models • Prior browsing context, local context, etc. • Evaluating impact of result presentation methods • Better captions • Query suggestions • Instant answers (stock quotes, weather, product prices, definitions)

  31. Conclusions (talk #2) • Modeling and representing diversity allows us to consider novelty. • User models should be simple enough to be tractable. • User models should be complex enough to be meaningful.

  32. Questions? Talk #2: Evaluating Novelty Charles Clarke School of Computer Science University of Waterloo

More Related