Normalized Discounted Cumulative Gain – Multilabel Ranking Metrics | ML

27 July 2024

0

Discounted Cumulative Gain
Discounted Cumulative Gain (DCG) is the metric of measuring ranking quality. It is mostly used in information retrieval problems such as measuring the effectiveness of the search engine algorithm by ranking the articles it displays according to their relevance in terms of the search keyword.

Let’s consider that a search engine that outputs 5 documents named ( D₁, D₂, D₃, D₄, D₅) are output in that order. We need to define the relevance scale (0-3) where:

0 : not relevant
1-2 : somewhat relevant
3 : completely relevant

Suppose these documents have relevance scores:

D₁ : 3
D₂ : 2
D₃ : 0
D₄ : 0
D₅ : 1

The Cumulative Gain is the sum of these relevance scores and can be calculated as:
$CG = \sum_{i=1}^{5}(rel)_{i} = 3+2+0+0+1 =6$

The discounted cumulative gain can be calculated by the formula:
$DCG = \sum_{i=1}^{5} \dfrac{rel_i}{\log_2 (i+1)}$

Therefore the discounted cumulative gain of above example is:
$DCG_5 = \dfrac{3}{log_2 (2) } + \dfrac{2}{log_2 (3) }+ \dfrac{0}{log_2 (4) } +\dfrac{0}{log_2 (5) } + \dfrac{1}{log_2 (6) }\\ DCG_5 = \dfrac{3}{log_2 (2) } + \dfrac{2}{log_2 (3) }+ \dfrac{0}{log_2 (4) } +\dfrac{0}{log_2 (5) } + \dfrac{1}{log_2 (6) } \\ DCG_5 = 3 + \dfrac{2}{1.585} + 0 + 0 + \dfrac{1}{2.585}\\ DCG_5 = 3 + 1.26 + 0.3868 \\ DCG_5 \simeq 4.67$

Now we need to arrange these articles in descending order by rankings and calculate DCG to get the Ideal Discounted Cumulative Gain (IDCG) ranking.
$IDCG_5 = \dfrac{3}{log_2 (2) } + \dfrac{2}{log_2 (3) }+ \dfrac{1}{log_2 (4) } +\dfrac{0}{log_2 (5) } + \dfrac{0}{log_2 (6) } \\ IDCG_5 = 3 + \dfrac{2}{1.585} + \dfrac{1}{2}+ 0 + 0 \\ IDCG_5 = 3 + 1.26 + 0.5\\ IDCG_5 = 4.76$
Now, we calculate our Normalized DCG using the following formula :
$nDCG = \dfrac{DCG_5}{IDCG_5} \\ nDCG = \dfrac{4.67}{4.76} \\ nDCG \simeq 0.98\\$

Code : Python program for Normalized Discounted Cumulative Gain

Python3

# import required package
from sklearn.metrics import ndcg_score, dcg_score
import numpy as np
  
# Relevance scores in Ideal order
true_relevance = np.asarray([[3, 2, 1, 0, 0]])
  
# Relevance scores in output order
relevance_score = np.asarray([[3, 2, 0, 0, 1]])
  
# DCG score
dcg = dcg_score(true_relevance, relevance_score)
print("DCG score : ", dcg)
  
# IDCG score
idcg = dcg_score(true_relevance, true_relevance)
print("IDCG score : ", idcg)
  
# Normalized DCG score
ndcg = dcg / idcg
print("nDCG score : ", ndcg)
  
# or we can use the scikit-learn ndcg_score package
print("nDCG score (from function) : ", ndcg_score(
    true_relevance, relevance_score))

Output:

DCG score :  4.670624189796882
IDCG score :  4.761859507142915
nDCG score :  0.980840401274087
nDCG score (from function) :  0.980840401274087

Limitations of Normalized Discounted Cumulative Gain (NDCG):

NDCG metrics does not penalize the bad documents outputs. For Example: [3] and [3, 0, 0] has same NDCG but in second out there are two irrelevant documents.
Because no specific standard defined for the number of output documents. DCG does not seem to deal with missing any relevant document in the output. For example, two outputs [3, 3, 3] and [3, 3, 3, 3] of similar input are considered equally good. For output-1 the DCG₃ is calculated but for output-2 the DCG₄ is calculated.

References:

DCG Wikipedia article
Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.

Normalized Discounted Cumulative Gain – Multilabel Ranking Metrics | ML

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Recent Comments

EDITOR PICKS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR POSTS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR CATEGORY

ABOUT US

FOLLOW US