Introduction to Recurrent Neural Network

5 September 2024

1

In this article, we will introduce a new variation of neural network which is the Recurrent Neural Network also known as (RNN) that works better than a simple neural network when data is sequential like Time-Series data and text data.

What is Recurrent Neural Network (RNN)?

Recurrent Neural Network(RNN) is a type of Neural Network where the output from the previous step is fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The main and most important feature of RNN is its Hidden state, which remembers some information about a sequence. The state is also referred to as Memory State since it remembers the previous input to the network. It uses the same parameters for each input as it performs the same task on all the inputs or hidden layers to produce the output. This reduces the complexity of parameters, unlike other neural networks.

REcurrent neural network

Architecture Of Recurrent Neural Network

RNNs have the same input and output architecture as any other deep neural architecture. However, differences arise in the way information flows from input to output. Unlike Deep neural networks where we have different weight matrices for each Dense network in RNN, the weight across the network remains the same. It calculates state hidden state H_ifor every input X_{i .}By using the following formulas:

h= σ(UX + Wh_-1 + B)

Y = O(Vh + C) Hence

Y = f (X, h , W, U, V, B, C)

Here S is the State matrix which has element si as the state of the network at timestep i
The parameters in the network are W, U, V, c, b which are shared across timestep

What is Recurrent Neural Network

How RNN works

The Recurrent Neural Network consists of multiple fixed activation function units, one for each time step. Each unit has an internal state which is called the hidden state of the unit. This hidden state signifies the past knowledge that the network currently holds at a given time step. This hidden state is updated at every time step to signify the change in the knowledge of the network about the past. The hidden state is updated using the following recurrence relation:-

The formula for calculating the current state:

where:

h_t -> current state
h_t-1 -> previous state
x_t -> input state

Formula for applying Activation function(tanh):

where:

w_hh -> weight at recurrent neuron
w_xh -> weight at input neuron

The formula for calculating output:

Y_t -> output
W_hy -> weight at output layer

These parameters are updated using Backpropagation. However, since RNN works on sequential data here we use an updated backpropagation which is known as Backpropagation through time.

Backpropagation Through Time (BPTT)

In RNN the neural network is in an ordered fashion and since in the ordered network each variable is computed one at a time in a specified order like first h1 then h2 then h3 so on. Hence we will apply backpropagation throughout all these hidden time states sequentially.

Backpropagation Through Time (BPTT) In RNN

L(θ)(loss function) depends on h3
h3 in turn depends on h2 and W
h2 in turn depends on h1 and W
h1 in turn depends on h0 and W
where h0 is a constant starting state.

$\frac{\partial \mathbf{L}(\theta)}{\partial W }= \sum_{t=1}^{T}\frac{{}\partial \mathbf{L}(\theta)} {\partial W}$

For simplicity of this equation, we will apply backpropagation on only one row

$\frac{\partial L(\theta)}{\partial W} = \frac{\partial L (\theta)}{\partial h_3} \frac{\partial h_3}{\partial W}$

We already know how to compute this one as it is the same as any simple deep neural network backpropagation. $\frac{\partial L(\theta)}{\partial h_3}$ .However, we will see how to apply backpropagation to this term $\frac{\partial h_3}{\partial W}$

                            As we know h3 = σ(Wh2 + b)

And In such an ordered network, we can’t compute $\frac{\partial h_3}{\partial W}$ by simply treating h3 as a constant because as it also depends on W. the total derivative $\frac{\partial h_3}{\partial W}$ has two parts

Explicit: $\frac{\partial h_3{+}}{\partial W}$ treating all other inputs as constant
Implicit: Summing over all indirect paths from h₃ to W

Let us see how to do this

$\begin{aligned} \frac{\partial h_{3}}{\partial W} &=\frac{\partial h_{3}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{2}}\frac{\partial h_{2}}{\partial W} \\ &=\frac{\partial h_{3}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{2}} \left [\frac{\partial h_{2}^{+}}{\partial W} +\frac{\partial h_{2}}{\partial h_{1}}\frac{\partial h_{1}}{\partial W} \right ] \\ &=\frac{\partial h_{3}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{2}}\frac{\partial h_{2}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{2}}\frac{\partial h_{2}}{\partial h_{1}} \left [\frac{\partial h_{1}^{+}}{\partial W} \right ] \end{aligned}$

For simplicity, we will short-circuit some of the paths

$\frac{\partial h_{3}}{\partial W}=\frac{\partial h_{3}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{2}}\frac{\partial h_{2}^{+}}{\partial W} +\frac{\partial h_{3}}{\partial h_{1}} \frac{\partial h_{1}^{+}}{\partial W}$

Finally, we have

$\frac{\partial L(\theta)}{\partial W} = \frac{\partial L(\theta)}{\partial h_{3}} \cdot \frac{\partial h_{3}}{\partial W}$

Where

$\frac{\partial h_{3}}{\partial W} = \sum_{k=1}^{3} \frac{\partial h_{3}}{\partial h_k} \cdot \frac{\partial h_k}{\partial W}$

Hence,

$\frac{\partial L(\theta)}{\partial W} = \frac{\partial L(\theta)}{\partial h_{3}} \sum_{k=1}^{3} \frac{\partial h_{3}}{\partial h_k} \cdot \frac{\partial h_k}{\partial W}$

This algorithm is called backpropagation through time (BPTT) as we backpropagate over all previous time steps

Training through RNN

A single-time step of the input is provided to the network.
Then calculate its current state using a set of current input and the previous state.
The current ht becomes ht-1 for the next time step.
One can go as many time steps according to the problem and join the information from all the previous states.
Once all the time steps are completed the final current state is used to calculate the output.
The output is then compared to the actual output i.e the target output and the error is generated.
The error is then back-propagated to the network to update the weights and hence the network (RNN) is trained using Backpropagation through time.

Advantages of Recurrent Neural Network

An RNN remembers each and every piece of information through time. It is useful in time series prediction only because of the feature to remember previous inputs as well. This is called Long Short Term Memory.
Recurrent neural networks are even used with convolutional layers to extend the effective pixel neighborhood.

Disadvantages of Recurrent Neural Network

Gradient vanishing and exploding problems.
Training an RNN is a very difficult task.
It cannot process very long sequences if using tanh or relu as an activation function.

Applications of Recurrent Neural Network

Language Modelling and Generating Text
Speech Recognition
Machine Translation
Image Recognition, Face detection
Time series Forecasting

Types Of RNN

There are four types of RNNs based on the number of inputs and outputs in the network.

One to One
One to Many
Many to One
Many to Many

One to One

This type of RNN behaves the same as any simple Neural network it is also known as Vanilla Neural Network. In this Neural network, there is only one input and one output.

One to One RNN

One To Many

In this type of RNN, there is one input and many outputs associated with it. One of the most used examples of this network is Image captioning where given an image we predict a sentence having Multiple words.

One To Many RNN

Many to One

In this type of network, Many inputs are fed to the network at several states of the network generating only one output. This type of network is used in the problems like sentimental analysis. Where we give multiple words as input and predict only the sentiment of the sentence as output.

Many to One RNN

Many to Many

In this type of neural network, there are multiple inputs and multiple outputs corresponding to a problem. One Example of this Problem will be language translation. In language translation, we provide multiple words from one language as input and predict multiple words from the second language as output.

Many to Many RNN

Variation Of Recurrent Neural Network (RNN)

To overcome the problems like vanishing gradient and exploding gradient descent several new advanced versions of RNNs are formed some of these are as ;

Bidirectional Neural Network (BiNN)
Long Short-Term Memory (LSTM)

Bidirectional Neural Network (BiNN)

A BiNN is a variation of a Recurrent Neural Network in which the input information flows in both direction and then the output of both direction are combined to produce the input. BiNN is useful in situations when the context of the input is more important such as Nlp tasks and Time-series analysis problems.

Long Short-Term Memory (LSTM)

Long Short-Term Memory works on the read-write-and-forget principle where given the input information network reads and writes the most useful information from the data and it forgets about the information which is not important in predicting the output. For doing this three new gates are introduced in the RNN. In this way, only the selected information is passed through the network.

Difference between RNN and Simple Neural Network

RNN is considered to be the better version of deep neural when the data is sequential. There are significant differences between the RNN and deep neural networks they are listed as:

Recurrent Neural Network	Deep Neural Network
Weights are same across all the layers number of a Recurrent Neural Network	Weights are different for each layer of the network
Recurrent Neural Networks are used when the data is sequential and the number of inputs is not predefined.	A Simple Deep Neural network does not have any special method for sequential data also here the the number of inputs is fixed
The Numbers of parameter in the RNN are higher than in simple DNN	The Numbers of Parameter are lower than RNN
Exploding and vanishing gradients is the the major drawback of RNN	These problems also occur in DNN but these are not the major problem with DNN

Whether you’re preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, neveropen Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we’ve already empowered, and we’re here to do the same for you. Don’t miss out – check it out now!

Introduction to Recurrent Neural Network

What is Recurrent Neural Network (RNN)?

Architecture Of Recurrent Neural Network

How RNN works

Backpropagation Through Time (BPTT)

Training through RNN

Advantages of Recurrent Neural Network

Types Of RNN

One to One

One To Many

Many to One

Many to Many

Variation Of Recurrent Neural Network (RNN)

Difference between RNN and Simple Neural Network

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US