1 / 98

Information Retrieval and Search Engines

Information Retrieval and Search Engines. Lecture 6: Evaluation, Relevance Feedback and Query Expansion Prof. Michael R. Lyu. Outline (Ch. 8, 9 of IR Book). Recap Motivation Evaluation in Information Retrieval Relevance Feedback Query Expansion. Outline (Ch. 8, 9 of IR Book). Recap

joelle-wall
Download Presentation

Information Retrieval and Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval and Search Engines Lecture 6: Evaluation, Relevance Feedback and Query Expansion Prof. Michael R. Lyu

  2. Outline (Ch. 8, 9 of IR Book) • Recap • Motivation • Evaluation in Information Retrieval • Relevance Feedback • Query Expansion

  3. Outline (Ch. 8, 9 of IR Book) • Recap • Motivation • Evaluation in Information Retrieval • Relevance Feedback • Query Expansion

  4. Importance of Ranking: Summary • Viewingabstracts: Users are a lot more likely to read the abstracts of the top-ranked pages (1, 2, 3, 4) than the abstracts of the lower ranked pages (7, 8, 9, 10). • Clicking: Distribution is even more skewed for clicking • In 1 out of 2 cases, users click on the top-ranked page • Even if the top-ranked page is notrelevant, 30% of users will click on it • Getting the ranking right is very important • Getting the top-ranked page right is most important

  5. Predicted and True Probability of Relevance • source: • Lillian Lee

  6. Pivot Normalization • Cosine normalization produces weights that are too large for short documents and too small for long documents (on average) • Adjust cosine normalization by linear adjustment: “turning” the average normalization on the pivot • Effect: Similarities of short documents with query decrease; similarities of long documents with query increase • This removes the unfairadvantage that short documents have

  7. Sec. 7.1.6 Cluster Pruning: Query Processing • Process a query as follows: • Given query Q, find its nearest leader L • Seek K nearest docs from among L’s followers

  8. Sec. 7.1.6 Visualization Query Leader Follower

  9. Tiered Indexes • Basic idea: • Create several tiers of indexes, corresponding to importance of indexingterms • During query processing, start with highest-tier index • If highest-tier index returns at least k (e.g., k = 100) results: stop and returnresultstouser • If we’ve only found < k hits: repeat for next index in tier cascade

  10. Tiered Indexes • Example • Two-tier system • Tier 1: Index of all titles • Tier 2: Index of the rest of documents • Pages containing the search words in the title are better hits than pages containing the search words in the body of the text

  11. Outline (Ch. 8, 9 of IR Book) • Recap • Motivation • Evaluation in Information Retrieval • Relevance Feedback • Query Expansion

  12. Evaluation • How do we measure which search engine returns a better result? • How do we measure users’ happiness?

  13. How Can We Improve Recall in Search? • Two ways of improving recall: relevancefeedbackandqueryexpansion • As an exampleconsiderqueryq: [aircraft]and document d containing “plane”, but not containing “aircraft” • A simple IR system will not return d for q • Even if d is the most relevant document for q! • We want to change this: • Return relevant documents even if there is no term match with the (original) query

  14. Options forImprovingRecall • Local: Do a “local”, on-demand analysis for a user query • Main local method: relevance feedback • Global: Do a global analysis once (e.g., of collection) to producethesaurus • Use thesaurus for query expansion

  15. Outline (Ch. 8, 9 of IR Book) • Recap • Motivation • Evaluation in Information Retrieval • Relevance Feedback • Query Expansion

  16. Measures for a Search Engine • How fast does it index • E.g., number of bytes per hour • How fast does it search • E.g., latency as a function of queries per second • What is the cost per query? • In dollars 16

  17. Measures for a Search Engine • All of the preceding criteria are measurable: we can quantify speed/size/money • However, the key measure for a search engine is user happiness • Whatisuserhappiness? • Factorsinclude: • Speed ofresponse • Size ofindex • Uncluttered UI • Note that none of these is sufficient: blindingly fast, but useless answers won’t make a user happy • Most important: relevance • How can we quantify user happiness? 17

  18. Who Is the User? • Who is the user we are trying to make happy? • Web search engine • Searcher • Success: searcher finds what he/she was looking for • Measure: rate of return to this search engine • Advertiser • Success: searcher clicks on ad • Measure: clickthrough rate • Click through rate (CTR) • # of clicks on an ad/# of times the ad is shown 18

  19. Who Is the User • Enterprise • CEO • Success: employees are more productive (because of effective search) • Measure: profit of the company

  20. Who Is the User? • Ecommerce • Buyer • Success: buyer buys something • Measures: time to purchase, fraction of “conversions” of searchers to buyers • Seller • Success: seller sells something • Measure: profit per item sold

  21. Most Common Definition of User Happiness: Relevance • User happiness is equated with the relevance of search results to the query • But how do you measure relevance? • Standard methodology in information retrieval consists of three elements • A benchmark document collection • A benchmarksuiteofqueries • An assessment of the relevance of each query-document pair 21

  22. Relevance: Query vs. Information Need • Relevancetowhat? • First take: relevance to the query • “Relevance to the query” is very problematic • Information need i: “I am looking for information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.” • Query q: [red wine white wine heart attack] • Consider document d′: “At heart of his speech was an attack on the wine industry lobby for downplaying the role of red and white wine in drunk driving.“ 22

  23. Relevance: Query vs. Information Need • Relevancetowhat? • First take: relevance to the query • “Relevance to the query” is very problematic • Information need i: “I am looking for information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.” • Query q: [red wine white wine heart attack] • Consider document d′: “At heart of his speech was an attack on the wine industry lobby for downplaying the role of red and white wine in drunk driving.“ • d′ is an excellent match for query q • d′ is not relevant to the information needi 23

  24. Relevance: Query vs. Information Need • User happiness can only be measured by relevance to an information need, not by relevance to queries • We talk about query-document relevance judgments even though we mean information-need-document relevance judgments 24

  25. Precision and Recall • Precision (P) is the fraction of retrieved documents that are relevant 25

  26. Precision and Recall • Recall (R) is the fraction of relevant documents that are retrieved

  27. Precision at K • Precision at K documents (P@K) is the number of relevant results on the first K results

  28. Precision and Recall • P = TP/ ( TP + FP ) • R = TP / ( TP + FN ) 28

  29. Precision/Recall Tradeoff • You can increase recall by returning more docs • Recall is a non-decreasing function of the number of docs retrieved • A system that returns all docs has 100% recall! • The converse is also true (usually): It’s easy to get high precision for very low recall • Suppose the document with the largest score is relevant. How canwemaximizeprecision? 29

  30. A Combined Measure: F • F allows us to trade off precision against recall • where • α ϵ[0, 1] and thus b2ϵ [0,∞] • Most frequently used: balanced F (F1)with b = 1 or α = 0.5 • This is the harmonic mean of P and R: • Value of b < 1 emphasize precision • Value of b > 1 emphasize recall 30

  31. Example 1 • What is Precision, Recall and F1? 31

  32. Answer Example 1 • P = 20/(20 + 40) = 1/3 • R = 20/(20 + 60) = 1/4 32

  33. Example 2 • An IR system returns 8 relevant documents, and 10 non-relevant documents. There are a total of 20 relevant documents in the collection. What is the precision of the system on this search, and what is its recall?

  34. Answer Example 2 • An IR system returns 8 relevant documents, and 10 non-relevant documents. There are a total of 20 relevant documents in the collection. What is the precision of the system on this search, and what is its recall? • Answer:

  35. Accuracy • Why do we use complex measures like precision, recall, and F? • Why not something simple like accuracy? • Accuracy is the fraction of decisions (relevant/non-relevant) that are correct • In terms of the contingency table: • accuracy = (TP + TN)/(TP + FP + FN + TN) • Why is accuracy not a useful measure for Web information retrieval? 35

  36. Exercise • Compute precision, recall and F1 for this result set: • The snoogle search engine below always returns 0 results (“0 matching results found”), regardless of the query. Why does snoogle demonstrate that accuracy is not a useful measure in IR? 36

  37. Why Accuracy is a Useless Measure in IR • Simple trick to maximize accuracy in IR: always say no and returnnothing • You then get 99.99% accuracy on most queries • Searchers on the Web (and in IR in general)want to find something and have a certain tolerance for junk • It’s better to return some bad hits as long as you return something • →We use precision, recall, and F for evaluation, not accuracy 37

  38. F: Why Harmonic Mean? • Why don’t we use a different mean of P and R as a measure? • E.g., the arithmetic mean • The simple (arithmetic) mean is 50% for “return-everything” search engine, which is too high • Punish really bad performance on either precision or recall • Taking the minimum achieves this • But minimum is not smooth and hard to weight • F (harmonic mean) is a kind of smooth minimum 38

  39. Maximum Maximum Arithmetic Minimum Geometric Harmonic Minimum

  40. Outline (Ch. 8, 9 of IR Book) • Recap • Motivation • Evaluation in Information Retrieval • Relevance Feedback • Query Expansion

  41. RelevanceFeedback: Basic Idea • Basic Idea • It may be difficult to formulate a good query when you don’t know the collection well, or cannot express it, but can judge relevance of a result. So iterate … • The user issues a (short, simple) query • The search engine returns a set of documents • User marks some docs as relevant, some as non-relevant • Search engine computes a new representation of the information need. Hope: better than the initial query • Search engine runs new query and returns new results • New results have (hopefully) better recall

  42. RelevanceFeedback • We can iterate this: several rounds of relevance feedback • We will use the term ad hoc retrievalto referto regular retrievalwithoutrelevancefeedback • We will now look at three different examples of relevance feedback that highlight different aspects of the process

  43. RelevanceFeedback: Example 1

  44. ResultsforInitial Query

  45. User Feedback: Select Whatis Relevant

  46. Results after RelevanceFeedback

  47. VectorSpace Example: Query “canine” (1) • Source: • Fernando Díaz

  48. SimilarityofDocstoQuery “canine” • Source: • Fernando Díaz

  49. User Feedback: Select Relevant Documents • Source: • Fernando Díaz

  50. Results after RelevanceFeedback • Source: • Fernando Díaz

More Related