“Common information retrieval evaluation metrics”
Rank-Based Measures:
- Binary relevance
- Precision@K (P@K)
- Mean Average Precision (MAP)
- Mean Reciprocal Rank (MRR)
- Multiple levels of relevance
- Normalized Discounted Cumulative Gain (NDCG)
Precision@K
- Set a rank threshold K
- Compute % relevant in top K
- Ignores documents ranked lower than K
- Example:
- Prec@3 of 2/3
- Prec@4 of 2/4
- Prec@5 of 3/5

Mean Average Precision
- Consider rank position of each relevant doc \(K_{1}\), \(K_{2}\),…, \(K_{R}\).
- Compute Precision@K for each \(K_{1}\), \(K_{2}\),…, \(K_{R}\)
- Average precision = average of P@K
- Example: \(\frac{1}{3} \cdot\left(\frac{1}{1}+\frac{2}{3}+\frac{3}{5}\right) \approx 0.76\)

- MAP is Average Precision across multiple queries/rankings
- MAP is macro-averaging: each query counts equally
When There’s only 1 Relevant Document
- Scenarios:
- known-item search
- navigational queries
- looking for a fact
- Search Length = Rank of the answer
- measures a user’s effort
Mean Reciprocal Rank
- Consider rank position, K, of first relevant doc
- Reciprocal Rank score \(=\frac{1}{K}\)
- MRR is the mean RR across multiple queries
Discounted Cumulative Gain
- Popular measure for evaluating web search and related tasks
- Two assumptions:
- Highly relevant documents are more useful than marginally relevant document
- the lower the ranked position of a relevant document, the less useful it is for the user, since it is less likely to be examined
- Uses graded relevance as a measure of usefulness, or gain, from examining a document
- Gain is accumulated starting at the top of the ranking and may be reduced, or discounted, at lower ranks
- Typical discount is 1/log (rank)
- DCG is the total gain accumulated at a particular rank p: \(DCG_{p}=r e l_{1}+\sum_{i=2}^{p} \frac{r e l_{i}}{\log _{2} i}\)
DCG Examples:
- 10 ranked documents judged on 0-3 relevance scale: 3, 2, 3, 0, 0, 1, 2, 2, 3, 0
- discounted gain: 3, 2/1, 3/1.59, 0, 0, 1/2.59, 2/2.81, 2/3, 3/3.17, 0 = 3, 2, 1.89, 0, 0, 0.39, 0.71, 0.67, 0.95, 0
- DCG: 3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61
NDCG:
- Normalized Cumulative Gain (NDCG) at rank n
- Normalize DCG at rank n by the DCG value at rank n of the ideal ranking
- The ideal ranking would first return the documents with the highest relevance level, then the next highest relevance level, etc
- Compute the precision (at rank) where each (new) relevant document is retrieved => p(1),…,p(k), if we have k rel.docs
- An Example:
