1 / 12

Information Retrieval Techniques - Lecture 25: Benchmarks for Evaluation of IR Systems

This lecture discusses evaluation measures such as precision, recall, accuracy, mean average precision, and more for assessing the performance of information retrieval systems.

wtrevino
Download Presentation

Information Retrieval Techniques - Lecture 25: Benchmarks for Evaluation of IR Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFORMATION RETRIEVAL TECHNIQUESBYDR. ADNAN ABID Lecture # 25 BENCHMARKS FOR THE EVALUATION OF IR SYSTEMS

  2. ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources • “Introduction to information retrieval” by PrabhakarRaghavan, Christopher D. Manning, and Hinrich Schütze • “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell • “Modern information retrieval” by Baeza-Yates Ricardo, ‎ • “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

  3. Outline • Evaluation Measures • Precision and Recall • Unranked retrieval evaluation • Trade-off between Recall and Precision • Computing Recall/Precision Points

  4. Evaluation Measures • Precision • Recall • Accuracy • Mean Average Precision • F-Measure/E-Measure • Non Binary Relevance • Discounted Cumulative Gain • Normalized Discounted Cumulative Gain

  5. retrieved & irrelevant Not retrieved & irrelevant Entire document collection irrelevant Relevant documents Retrieved documents retrieved & relevant not retrieved but relevant relevant retrieved not retrieved Precision and Recall

  6. Unranked retrieval evaluation:Precision and Recall • Precision: fraction of retrieved docs that are relevant = P(relevant|retrieved) • Recall: fraction of relevant docs that are retrieved = P(retrieved|relevant) • Precision P = tp/(tp + fp) • Recall R = tp/(tp + fn)

  7. Should we instead use the accuracy measure for evaluation? • Given a query, an engine classifies each doc as “Relevant” or “Nonrelevant” • The accuracy of an engine: the fraction of these classifications that are correct • ACCURACY = (tp + tn) / ( tp + fp + fn + tn) • Accuracy is a commonly used evaluation measure in machine learning classification work • Why is this not a very useful evaluation measure in IR?

  8. Precision and Recall • Precision • The ability to retrievetop-ranked documents that are mostly relevant. • Recall • The ability of the search to find all of the relevant items in the corpus.

  9. Determining Recall is Difficult • Total number of relevant items is sometimes not available: • Sample across the database and perform relevance judgment on these items. • Apply different retrieval algorithms to the same database for the same query. The aggregate of relevant items is taken as the total relevant set.

  10. Returns relevant documents but misses many useful ones too The ideal Returns most relevant documents but includes lots of junk Trade-off between Recall and Precision 1 Precision 0 1 Recall

  11. Computing Recall/Precision Points • For a given query, produce the ranked list of retrievals. • Adjusting a threshold on this ranked list produces different sets of retrieved documents, and therefore different recall/precision measures. • Mark each document in the ranked list that is relevant according to the gold standard. • Compute a recall/precision pair for each position in the ranked list that contains a relevant document.

  12. Computing Recall/Precision Points: Example 1 Let total # of relevant docs = 6 Check each new recall point: R=1/6=0.167; P=1/1=1 R=2/6=0.333; P=2/2=1 R=3/6=0.5; P=3/4=0.75 R=4/6=0.667; P=4/6=0.667 Missing one relevant document. Never reach 100% recall R=5/6=0.833; p=5/13=0.38

More Related