Information Retrieval Techniques: Mean Average Precision & Non-Binary Relevance

INFORMATION RETRIEVAL TECHNIQUESBYDR. ADNAN ABID Lecture # 27 Mean Average Precision Non Binary Relevance DCG NDCG

ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources • “Introduction to information retrieval” by PrabhakarRaghavan, Christopher D. Manning, and Hinrich Schütze • “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell • “Modern information retrieval” by Baeza-Yates Ricardo, ‎ • “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

Outline • Mean Average Precision • Mean Reciprocal Rank • Cumulative Gain • Discounted Cumulative Gain • Normalized Discounted Cumulative Gain

Mean Average Precision(MAP) • Average Precision: Average of the precision values at the points at which each relevant document is retrieved. • Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633 • Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625 • Mean Average Precision: Average of the average precision value for a set of queries.

Mean average precision • If a relevant document never gets retrieved, we assume the precision corresponding to that relevant doc to be zero • MAP is macro-averaging: each query counts equally • Now perhaps most commonly used measure in research papers • Good for web search? • MAP assumes user is interested in finding many relevant documents for each query • MAP requires many relevance judgments in text collection

Mean Reciprocal Rank • Consider rank position, K, of first relevant doc • Could be – only clicked doc • Reciprocal Rank score = • MRR is the mean RR across multiple queries

Non-Binary Relevance • Documents are rarely entirely relevant or non-relevant to a query • Many sources of graded relevance judgments • Relevance judgments on a 5-point scale • Multiple judges • Click distribution and deviation from expected levels (but click-through != relevance judgments)

Cumulative Gain • Withgraded relevance judgments, we can compute the gain at each rank. • Cumulative Gain at rank n: (Where reli is the graded relevance of the document at position i)

Discounted Cumulative Gain • Uses graded relevance as a measure of usefulness, or gain, from examining a document • Gain is accumulated starting at the top of the ranking and may be reduced, or discounted, at lower ranks • Typical discount is 1/log (rank) • With base 2, the discount at rank 4 is 1/2, and at rank 8 it is 1/3

Discounting Based on Position • Users care more about high-ranked documents, so we discount results by 1/log2(rank) • Discounted Cumulative Gain:

Normalized Discounted Cumulative Gain (NDCG) • To compare DCGs, normalize values so that a ideal ranking would have a Normalized DCGof 1.0 • Ideal ranking:

Normalized Discounted Cumulative Gain (NDCG) • Normalize by DCG of the ideal ranking: • NDCG ≤ 1 at all ranks • NDCG is comparable across different queries

Information Retrieval Techniques: Mean Average Precision & Non-Binary Relevance

Information Retrieval Techniques: Mean Average Precision & Non-Binary Relevance

Presentation Transcript

INFORMATION RETRIEVAL TECHNIQUES BY DR . ADNAN ABID

Information Retrieval Techniques

Dr. Muhammad Adnan Hashmi

Retrieval and Evaluation Techniques for Personal Information

Information Retrieval Techniques

Learning Techniques for Information Retrieval

Dr. Adnan Iqbal

Dr. Abid Qaiyum Suleri

Information Retrieval and Recommendation Techniques

BY .DR HINA ADNAN

Dr. Muhammad Adnan Hashmi