In this article, we will Understand PyTorch Activation Functions.
What is an activation function and why to use them?
Activation functions are the building blocks of Pytorch. Before coming to types of activation function, let us first understand the working of neurons in the human brain. In the Artificial Neural Networks, we have an input layer which is the input by the user in some format, a hidden layer that performs the hidden calculations and identifies features and output is the result. So the whole structure is like a network with neurons connected to one another. So we have artificial neurons which are activated by these activation functions. The activation function is a function that performs calculations to provide an output that may act as input for the next neurons. An ideal activation function should handle non-linear relationships by using the linear concepts and it should be differentiable so as to reduce the errors and adjust the weights accordingly. All activation functions are present in the torch.nn library.
Types of Pytorch Activation Function
Let us look at the different Pytorch Activation functions:
- ReLU Activation Function
- Leaky ReLU Activation Function
- Sigmoid Activation Function
- Tanh Activation Function
- Softmax Activation Function
ReLU Activation Function:
ReLU stands for Rectified Linear Activation function. It is a non-linear function and, graphically ReLU has the following transformative behavior:
ReLU is a popular activation function since it is differentiable and nonlinear. If the inputs are negative its derivative becomes zero which causes the ‘dying’ of neurons and learning doesn’t take place. Let us illustrate the use of ReLU with the help of the Python program.
Python3
import torch import torch.nn as nn # defining relu r = nn.ReLU() # Creating a Tensor with an array input = torch.Tensor([ 1 , - 2 , 3 , - 5 ]) # Passing the array to relu function output = r( input ) print (output) |
Output:
tensor([1., 0., 3., 0.])
Leaky ReLU Activation Function:
Leaky ReLU Activation Function or LReLU is another type of activation function which is similar to ReLU but solves the problem of ‘dying’ neurons and, graphically Leaky ReLU has the following transformative behavior:
This function is very useful as when the input is negative the differentiation of the function is not zero. Hence the learning of neurons doesn’t stop. Let us illustrate the use of LReLU with the help of the Python program.
Python3
# code import torch import torch.nn as nn # defining Lrelu and the parameter 0.2 is passed to control the negative slope ; a=0.2 r = nn.LeakyReLU( 0.2 ) # Creating a Tensor with an array input = torch.Tensor([ 1 , - 2 , 3 , - 5 ]) output = r( input ) print (output) |
Output:
tensor([ 1.0000, -0.4000, 3.0000, -1.0000])
Sigmoid Activation Function:
Sigmoid Function is a non-linear and differentiable activation function. It is an S-shaped curve that does not pass through the origin. It produces an output that lies between 0 and 1. The output values are often treated as a probability. It is often used for binary classification. It is slow in computation and, graphically Sigmoid has the following transformative behavior:
Sigmoid activation function has a problem of “Vanishing Gradient”. Vanishing Gradient is a significant problem as a large number of inputs are fed to the neural network and the number of hidden layers increases, the gradient or derivative becomes close to zero thus leading to inaccuracy in the neural network.
Let us illustrate the use of the Sigmoid function with the help of a Python Program.
Python3
import torch import torch.nn as nn # Calling the sigmoid function sig = nn.Sigmoid() # Defining tensor input = torch.Tensor([ 1 , - 2 , 3 , - 5 ]) # Applying sigmoid to the tensor output = sig( input ) print (output) |
Output:
tensor([0.7311, 0.1192, 0.9526, 0.0067])
Tanh Activation Function:
Tanh function is a non-linear and differentiable function similar to the sigmoid function but output values range from -1 to +1. It is an S-shaped curve that passes through the origin and, graphically Tanh has the following transformative behavior:
The problem with the Tanh Activation function is it is slow and the vanishing gradient problem persists. Let us illustrate the use of the Tanh function with the help of a Python Program.
Python3
import torch import torch.nn as nn # Calling the Tanh function t = nn.Tanh() # Defining tensor input = torch.Tensor([ 1 , - 2 , 3 , - 5 ]) # Applying Tanh to the tensor output = t( input ) print (output) |
Output:
tensor([0.7311, 0.1192, 0.9526, 0.0067])
Softmax Activation Function:
The softmax function is different from other activation functions as it is placed at the last to normalize the output. We can use other activation functions in combination with Softmax to produce the output in probabilistic form. It is used in multiclass classification and generates an output of probabilities whose sum is 1. The range of output lies between 0 and 1. Softmax has the following transformative behavior:
Let us illustrate with the help of the Python Program:
Python3
import torch import torch.nn as nn # Calling the Softmax function with # dimension = 0 as dimension starts # from 0 sm = nn.Softmax(dim = 0 ) # Defining tensor input = torch.Tensor([ 1 , - 2 , 3 , - 5 ]) # Applying function to the tensor output = sm( input ) print (output) |
Output:
tensor([0.7311, 0.1192, 0.9526, 0.0067])