In this article, we will learn how to use Scikit learn for visualizing different beta divergence loss functions. We will first understand what are beta divergence loss functions and then we will look into its implementation in Python using _beta_divergence function of sklearn.decomposition._nmf library in scikit learn library.
Beta Divergence Loss Function
Beta-divergence loss functions are a type of loss function used to measure the difference between two probability distributions. The divergence is measured by comparing points from the two distributions. This distance can be used to optimize the matching of distributions and determine the best fitting model. This is useful for tasks such as unsupervised learning, where probability distributions are used to represent data. Beta-divergence is related to other loss functions such as Kullback-Leibler divergence and Total Variational Distance.
The beta-divergence loss function are commonly used in non-negative matrix factorization (NMF). In NMF, the goal is to factorize a non-negative matrix into two lower-rank non-negative matrices, typically referred to as the basis matrix and the coefficient matrix. The beta-divergence loss function is used as an objective function to optimize the factorization process by minimizing the discrepancy between the target matrix and the reconstructed matrix. These functions are preferred because they are more flexible that any other loss functions for NMF and they can be used on data that is not perfectly non-negative and has a wide range of values.
Some of the use cases of these loss functions involves in clustering, anomaly detection, image segmentation, natural language processing, time series analysis, and recommendation systems.
The formula for beta-divergence loss can be expressed as:
where,
- X and Y are the matrices.
- and are the elements of X and Y
- is the hyperparameter that controls the degree of divergence.
Beta-divergence is a generalization of other divergence measures such as the Kullback-Leibler (KL) divergence and the Total Variation (TV) distance. The specific form of the beta-divergence depends on the choice of a parameter, typically denoted as β. When β is set to 1, it corresponds to the KL divergence, and when β tends to infinity, it approximates the TV distance.
Based on different values of we can get different beta divergence loss functions. Some of the popular beta-divergence loss functions are:
- Itakura-Saito divergence: This is the divergence function with . This is defined as
- Kullback-Leibler divergence: This is the divergence function with . This divergence is used to measure the difference between two probability distributions. It is defined as .
- Frobenius norm: This is the divergence function with . This basically calculates the distance between two matrices (mean-squared error). It is defined as
The beta divergence loss function that is best for a particular application will depend on the characteristics of the data. In general, the Frobenius norm loss function is a good choice for data that is perfectly non-negative. The Kullback-Leibler divergence is a good choice for data that is not perfectly non-negative. The Itakura-Saito divergence is a good choice for data that has a wide range of values.
Code Implementation
To implement this we will first import the required libraries.
Python3
# Importing the libraries import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition._nmf import _beta_divergence |
Now, we will plot and compare different Beta-divergence loss functions.
Python3
# Plotting and comparing the beta divergence for different values of beta # Declaring the x and y variables x = np.linspace( 0 , 5 , 100 ) y = np.zeros(x.shape) # beta <= 0: Itakura-Saito divergence # beta = 1: Kullback-Leibler divergence # beta = 2: Frobenius norm (Euclidean distance) beta_loss = [ 'Itakura-Saito' , 'Kullback-Leibler' , 'Frobenius norm' ] beta = [ 0.0 , 1.0 , 2.0 ] # Plotting the graph for j, beta in enumerate (beta): for i, xi in enumerate (x): # Computing the beta divergence y[i] = _beta_divergence( 1 , xi, 1 , beta) # Setting beta loss name with the corresponding value of beta name = f 'beta = {beta}: {beta_loss[j]}' # Plotting the graph plt.plot(x, y, label = name) # Setting the graph parameters plt.xlabel( "x" ) plt.ylabel( "D(1, x)" ) plt.title( "Beta-Divergence(1, x)" ) plt.legend(loc = 'upper center' ) plt.grid( True ) plt.axis([ 0 , 4 , 0 , 3 ]) # Displaying the graph plt.show() |
Output: