A typical use of a Neural Network is a case of supervised learning. It involves training data that contains an output label. The neural network tries to learn the mapping from the given input to the given output label. But what if the output label is replaced by the input vector itself? Then the network will try to find the mapping from the input to itself. This would be the identity function which is a trivial mapping. But if the network is not allowed to simply copy the input, then the network will be forced to capture only the salient features. This constraint opens up a different field of applications for Neural Networks which was unknown. The primary applications are dimensionality reduction and specific data compression. The network is first trained on the given input. The network tries to reconstruct the given input from the features it picked up and gives an approximation to the input as the output. The training step involves the computation of the error and backpropagating the error. The typical architecture of an Auto-encoder resembles a bottleneck. The schematic structure of an autoencoder is as follows:
The encoder part of the network is used for encoding and sometimes even for data compression purposes although it is not very effective as compared to other general compression techniques like JPEG. Encoding is achieved by the encoder part of the network which has a decreasing number of hidden units in each layer. Thus this part is forced to pick up only the most significant and representative features of the data. The second half of the network performs the Decoding function. This part has an increasing number of hidden units in each layer and thus tries to reconstruct the original input from the encoded data. Thus Auto-encoders are an unsupervised learning technique.
Example: See the below code, in autoencoder training data, is fitted to itself. That’s why instead of fitting X_train to Y_train we have used X_train in both places.
Python3
autoencoder.fit(X_train, X_train, epochs = 200 ) |
Training of an Auto-encoder for data compression: For a data compression procedure, the most important aspect of the compression is the reliability of the reconstruction of the compressed data. This requirement dictates the structure of the Auto-encoder as a bottleneck. Step 1: Encoding the input data The Auto-encoder first tries to encode the data using the initialized weights and biases.
Step 2: Decoding the input data The Auto-encoder tries to reconstruct the original input from the encoded data to test the reliability of the encoding.
Step 3: Backpropagating the error After the reconstruction, the loss function is computed to determine the reliability of the encoding. The error generated is backpropagated.
The above-described training process is reiterated several times until an acceptable level of reconstruction is reached.
After the training process, only the encoder part of the Auto-encoder is retained to encode a similar type of data used in the training process. The different ways to constrain the network are:-
- Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then the network will be forced to pick up only the representative features of the data thus encoding the data.
- Regularization: In this method, a loss term is added to the cost function which encourages the network to train in ways other than copying the input.
- Denoising: Another way of constraining the network is to add noise to the input and teach the network how to remove the noise from the data.
- Tuning the Activation Functions: This method involves changing the activation functions of various nodes so that a majority of the nodes are dormant thus effectively reducing the size of the hidden layers.
The different variations of Auto-encoders and the advantages and disadvantages of using them are:-
- Denoising Auto-encoder: This type of auto-encoder works on a partially corrupted input and trains to recover the original undistorted image. As mentioned above, this method is an effective way to constrain the network from simply copying the input and thus learn the underlying structure and important features of the data.
Advantages:
- This type of autoencoder can extract important features and reduce the noise or the useless features.
- Denoising autoencoders can be used as a form of data augmentation, the restored images can be used as augmented data thus generating additional training samples.
Disadvantages:
- Selecting the right type and level of noise to introduce can be challenging and may require domain knowledge.
- Denoising process can result into loss of some information that is needed from the original input. This loss can impact accuracy of the output.
- Sparse Auto-encoder: This type of auto-encoder typically contains more hidden units than the input but only a few are allowed to be active at once. This property is called the sparsity of the network. The sparsity of the network can be controlled by either manually zeroing the required hidden units, tuning the activation functions or by adding a loss term to the cost function.
Advantages:
- The sparsity constraint in sparse autoencoders helps in filtering out noise and irrelevant features during the encoding process.
- These auto-encoders often learn important and meaningful features due to their emphasis on sparse activations.
Disadvantages:
- The choice of hyperparameters play a significant role in the performance of this autoencoder. Different inputs should result in the activation of different nodes of the network.
- The application of sparsity constraint increases computational complexity.
- Variational Auto-encoder: This type of auto-encoder makes strong assumptions about the distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator in the training process. It assumes that the data is generated by a Directed Graphical Model and tries to learn an approximation to to the conditional property where and are the parameters of the encoder and the decoder respectively.
Advantages:
- Variational Auto-encoders are used to generate new data points that resemble the original training data. These samples are learned from the latent space.
- Variational Auto-encoder is probablistic framework that is used to learn a compressed representation of the data that captures its underlying structure and variations, so it is useful in detecting anomalies and data exploration.
Disadvantages:
- Variational Auto-encoder use approximations to estimate the true distribution of the latent variables. This approximation introduces some level of error, which can affect the quality of generated samples.
- The generated samples may only cover a limited subset of the true data distribution. This can result in a lack of diversity in generated samples.
- Convolutional Auto-encoder: Convolutional auto-encoders are a type of autoencoder that use convolutional neural networks (CNNs) as their building blocks. The encoder consists of multiple layers that take a image or a grid as input and pass it through different convolution layers thus forming a compressed representation of the input. The decoder is the mirror image of the encoder it deconvolves the compressed representation and tries to reconstruct the original image.
Advantages:
- Convolutional auto-encoder can compress high-dimensional image data into a lower-dimensional data. This improves storage efficiency and transmission of image data.
- Convolutional auto-encoder can reconstruct missing parts of an image. It can also handle images with slight variations in object position or orientation.
Disadvantages:
- These auto-encoder are prone to overfitting. Proper regularization techniques should be used to tackle this issue.
- Compression of data can cause data loss which can result in reconstruction of a lower quality image.
Below is the basic intuition code of how to build the autoencoder model and fitting X_train to itself.
Python3
#build the simple encoder-decoder model. #Notice the number of neurons in each Dense layer. #The model will contract in the encoder then expand in the decoder. encoder = keras.models.Sequential([keras.layers.Dense( 2 , input_shape = [ 3 ])]) decoder = keras.models.Sequential([keras.layers.Dense( 3 , input_shape = [ 2 ])]) autoencoder = keras.models.Sequential([encoder, decoder]) #compile the model autoencoder. compile (loss = "mse" , optimizer = keras.optimizers.SGD(lr = 0.1 )) #train the model history = autoencoder.fit(X_train, X_train, epochs = 200 ) # encode the data codings = encoder.predict(X_train) # decode the encoder output decodings = decoder.predict(codings) |