Autoencoders are a type of neural network architecture used for unsupervised learning tasks such as data compression, dimensionality reduction, and data denoising. The architecture consists of two main components: an encoder and a decoder. The encoder portion of the network compresses the input data into a lower-dimensional representation, while the decoder portion of the network reconstructs the original input data from this lower-dimensional representation.
A Convolutional Autoencoder (CAE) is an autoencoder a type of deep learning neural network architecture that is commonly used for unsupervised learning tasks, such as image compression and denoising. It is an extension of the traditional autoencoder architecture that incorporates convolutional layers into both the encoder and decoder portions of the network.
Same like the Autoencoder, the Convolutional Autoencoder architecture also consists of two main components: an encoder and a decoder. The encoder portion of the network processes the input image using convolutional layers and pooling operations to produce a lower-dimensional feature representation of the image. The decoder portion of the network takes this lower-dimensional feature representation and upsamples it back to the original input image size using deconvolutional layers. The final output of the network is a reconstructed image that is as close as possible to the original input image.
The training process for a Convolutional Autoencoder is similar to that of a traditional autoencoder. The network is trained to minimize the difference between the original input image and the reconstructed output image using a loss function such as mean squared error (MSE) or binary cross-entropy (BCE). Once trained, the encoder portion of the network can be used for feature extraction, and the decoder portion of the network can be used for image generation or reconstruction.
Convolutional Autoencoders have shown impressive results in a variety of computer vision tasks, including image compression, denoising, and feature extraction. They have also been used in various applications such as image retrieval, object recognition, and anomaly detection.
Implementation in Pytorch:
Algorithm
- Load the dataset using PyTorch’s ImageFolder class and define a dataloader.
- Define the Convolutional Autoencoder architecture by creating an Autoencoder class that contains an encoder and decoder, each with convolutional and pooling layers.
- Initialize the autoencoder model and move it to the GPU if available using the to() method.
- Define the loss function and optimizer to use during training. Typically, mean squared error (MSE) loss is used, and the Adam optimizer is a popular choice for deep learning tasks.
- Set the number of epochs to train for and begin the training loop.
- In each epoch, iterate through the batches of the dataloader, move the data to the GPU, and perform forward propagation to obtain the autoencoder’s output.
- Calculate the loss between the output and the input using the loss function.
- Perform backward propagation to calculate the gradients of the model parameters with respect to the loss.\
- Use the optimizer to update the model parameters based on the calculated gradients.
- Print the loss after each epoch to monitor the training progress.
- Save the trained model to a file using the state_dict() method.
Code:
Python
import torch import torch.nn as nn import torch.optim as optim import torchvision.datasets as datasets import torchvision.transforms as transforms # Define the autoencoder architecture class Autoencoder(nn.Module): def __init__( self ): super (Autoencoder, self ).__init__() self .encoder = nn.Sequential( nn.Conv2d( 3 , 16 , kernel_size = 3 , stride = 1 , padding = 1 ), nn.ReLU(), nn.MaxPool2d(kernel_size = 2 , stride = 2 ), nn.Conv2d( 16 , 8 , kernel_size = 3 , stride = 1 , padding = 1 ), nn.ReLU(), nn.MaxPool2d(kernel_size = 2 , stride = 2 ) ) self .decoder = nn.Sequential( nn.ConvTranspose2d( 8 , 16 , kernel_size = 3 , stride = 2 , padding = 1 , output_padding = 1 ), nn.ReLU(), nn.ConvTranspose2d( 16 , 3 , kernel_size = 3 , stride = 2 , padding = 1 , output_padding = 1 ), nn.Sigmoid() ) def forward( self , x): x = self .encoder(x) x = self .decoder(x) return x # Initialize the autoencoder model = Autoencoder() # Define transform transform = transforms.Compose([ transforms.Resize(( 64 , 64 )), transforms.ToTensor(), ]) # Load dataset train_dataset = datasets.Flowers102(root = 'flowers' , split = 'train' , transform = transform, download = True ) test_dataset = datasets.Flowers102(root = 'flowers' , split = 'test' , transform = transform) # Define the dataloader train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = 128 , shuffle = True ) test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = 128 ) # Move the model to GPU device = torch.device( 'cuda' if torch.cuda.is_available() else 'cpu' ) print (device) model.to(device) # Define the loss function and optimizer criterion = nn.MSELoss() optimizer = optim.Adam(model.parameters(), lr = 0.001 ) # Train the autoencoder num_epochs = 50 for epoch in range (num_epochs): for data in train_loader: img, _ = data img = img.to(device) optimizer.zero_grad() output = model(img) loss = criterion(output, img) loss.backward() optimizer.step() if epoch % 5 = = 0 : print ( 'Epoch [{}/{}], Loss: {:.4f}' . format (epoch + 1 , num_epochs, loss.item())) # Save the model torch.save(model.state_dict(), 'conv_autoencoder.pth' ) |
Output:
cuda Epoch [1/50], Loss: 0.0919 Epoch [6/50], Loss: 0.0746 Epoch [11/50], Loss: 0.0362 Epoch [16/50], Loss: 0.0239 Epoch [21/50], Loss: 0.0178 Epoch [26/50], Loss: 0.0154 Epoch [31/50], Loss: 0.0144 Epoch [36/50], Loss: 0.0124 Epoch [41/50], Loss: 0.0127 Epoch [46/50], Loss: 0.0101
Plot the original image with decoded image
Python3
with torch.no_grad(): for data, _ in test_loader: data = data.to(device) recon = model(data) break import matplotlib.pyplot as plt plt.figure(dpi = 250 ) fig, ax = plt.subplots( 2 , 7 , figsize = ( 15 , 4 )) for i in range ( 7 ): ax[ 0 , i].imshow(data[i].cpu().numpy().transpose(( 1 , 2 , 0 ))) ax[ 1 , i].imshow(recon[i].cpu().numpy().transpose(( 1 , 2 , 0 ))) ax[ 0 , i].axis( 'OFF' ) ax[ 1 , i].axis( 'OFF' ) plt.show() |
Output: