There are numerous machine learning problems in life that depend on time. For example, in financial fraud detection, we can’t just look at the present transaction; we should also consider previous transactions so that we can model based on their discrepancy. Using machine learning to solve such problems is called sequence learning, or sequence modeling. We need to model this sequential data in a different way. In this article, we will talk about the mechanism and types of the model used for modern sequence learning – Recurring Neural Networks.
This article is an excerpt from the book Python Machine Learning by Example, Third Edition by Yuxi (Hayden) Liu – A comprehensive guide to get you up to speed with the latest developments of practical machine learning with Python and upgrade your understanding of machine learning (ML) algorithms and techniques.
Learning RNN architecture
As you can imagine, recurring neural networks stand out because of their recurrent mechanism. We will start with a detailed explanation of this in the next section. We will talk about different types of RNNs after that, along with some typical applications.
Recurrent mechanism
In feedforward networks (such as vanilla neural networks and CNNs), data moves one way, from the input layer to the output layer. In recurring neural networks, the recurrent architecture allows data to circle back to the input layer. This means that data is not limited to a feedforward direction. Specifically, in a hidden layer of an RNN, the output from the previous time point will become part of the input for the current time point. The following diagram illustrates how data flows in an RNN in general:
Figure 1: The general form of an RNN
Such a recurrent architecture makes RNNs work well with sequential data, including time series (such as daily temperatures, daily product sales, and clinical EEG recordings) and general consecutive data with order (such as words in a sentence, DNA sequences, and so on). Take a financial fraud detector as an example; the output features from the previous transaction go into the training for the current transaction. In the end, the prediction for one transaction depends on all of its previous transactions. Let me explain the recurrent mechanism in a mathematical and visual way.
Suppose we have some inputs, xt . Here, t represents a time step or a sequential order. In a feedforward neural network, we simply assume that inputs at different t are independent of each other. We denote the output of a hidden layer at a time step, t, as ht = f(xt ), where f is the abstract of the hidden layer.
This is depicted in the following diagram:
Figure 2: General form of a feedforward neural network
On the contrary, the feedback loop in a recurring neural network feeds the information of the previous state to the current state. The output of a hidden layer of an RNN at a time step, t, can be expressed as ht = f(ht−1, xt ). This is depicted in the following diagram:
Figure 3: Unfolded recurrent layer over time steps
The same task, f, is performed on each element of the sequence, and the output, ht, is dependent on the output that’s generated from previous computations, ht−1. The chain-like architecture captures the “memory” that has been calculated so far. This is what makes RNNs so successful in dealing with sequential data.
Moreover, thanks to the recurrent architecture, RNNs also have great flexibility in dealing with different combinations of input sequences and/or output sequences. In the next section, we will talk about different categories of RNNs based on input and output, including the following:
- Many-to-one
- One-to-many
- Many-to-many (synced)
- Many-to-many (unsynced)
We will start by looking at many-to-one RNNs.
Many-to-one RNNs
The most intuitive type of RNN is probably many-to-one. A many-to-one RNN can have input sequences with as many time steps as you want, but it only produces one output after going through the entire sequence. The following diagram depicts the general structure of a many-to-one RNN:
Figure 4: General form of a many-to-one RNN
Here, f represents one or more recurrent hidden layers, where an individual layer takes in its own output from the previous time step. Here is an example of three hidden layers stacking up:
Figure 5: Example of three recurrent layers stacking up
Many-to-one RNNs are widely used for classifying sequential data. Sentiment analysis is a good example of this and is where the RNN reads the entire customer review, for instance, and assigns a sentiment score (positive, neutral, or negative sentiment). Similarly, we can also use RNNs of this kind in the topic classification of news articles. Identifying the genre of a song is another application as the model can read the entire audio stream. We can also use many-to-one RNNs to determine whether a patient is having a seizure based on an EEG trace.
One-to-many RNNs
One-to-many RNNs are the exact opposite of many-to-one RNNs. They take in only one input (not a sequence) and generate a sequence of outputs. A typical one-to many RNN is presented in the following diagram:
Figure 6: General form of a one-to-many RNN
Again, f represents one or more recurrent hidden layers.
Note that “one” here doesn’t mean that there is only one input feature. It means the input is from one time step, or it is time-independent.
One-to-many RNNs are commonly used as sequence generators. For example, we can generate a piece of music given a starting note or/and a genre. Similarly, we can write a movie script like a professional screenwriter using one-to-many RNNs with a starting word we specify. Image captioning is another interesting application: the RNN takes in an image and outputs the description (a sentence of words) of the image.
Many-to-many (synced) RNNs
The third type of RNN, many-to-many (synced), allows each element in the input sequence to have an output. Let us look at how data flows in the following many-tomany (synced) RNN:
Figure 7: General form of a many-to-many (synced) RNN
As you can see, each output is calculated based on its corresponding input and all the previous outputs.
One common use case for this type of RNN is time series forecasting, where we want to perform rolling prediction at every time step based on the current and previously observed data. Here are some examples of time series forecasting where we can leverage synced many-to-many RNNs:
- Product sales each day for a store
- Daily closing price of a stock
- Power consumption of a factory each hour
They are also widely used in solving NLP problems, including PoS tagging, named entity recognition, and real-time speech recognition.
Many-to-many (unsynced) RNNs
Sometimes, we only want to generate the output sequence after we’ve processed the entire input sequence. This is the unsynced version of many-to-many RNN.
Refer to the following diagram for the general structure of a many-to-many (unsynced) RNN:
Figure 8: General form of a many-to-many (unsynced) RNN
Note that the length of the output sequence (Ty in the preceding diagram) can be different from that of the input sequence (Tx in the preceding diagram). This provides us with some flexibility.
This type of RNN is a go-to model for machine translation. In French-English translation, for example, the model first reads a complete sentence in French and then produces a translated sentence in English. Multi-step ahead forecasting is another popular example: sometimes, we are asked to predict sales for multiple days in the future when given data from the past month.
Wait, what about one-to-one RNNs? There is no such thing. Oneto-one is just a regular feedforward model.
Summary of Recurring Neural Networks
In this article, you have learned about four types of RNN based on the model’s input and output. In the book Python Machine Learning by Example, Third Edition by Yuxi (Hayden) Liu, you will further learn how to apply some of these types of RNN to solve projects, including sentiment analysis and word generation and many other key concepts surrounding Python machine learning.
About the Author
Yuxi (Hayden) Liu is a machine learning software engineer at Google. Previously he worked as a machine learning scientist in a variety of data-driven domains and applied his machine learning expertise in computational advertising, marketing, and cybersecurity. Hayden is the author of a series of machine learning books and an education enthusiast.
Interesting in learning more about machine learning? Check out these Ai+ training sessions:
Machine Learning Foundations: Linear Algebra
This first installment in the Machine Learning Foundations series the topic at the heart of most machine learning approaches. Through the combination of theory and interactive examples, you’ll develop an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces, thereby enabling machines to recognize patterns and make predictions.
Supervised Machine Learning Series
Data Annotation at Scale: Active and Semi-Supervised Learning in Python
Explaining and Interpreting Gradient Boosting Models in Machine Learning
ODSC West 2020: Intelligibility Throughout the Machine Learning Lifecycle