Introduction
In the age of artificial intelligence, a remarkable phenomenon is unfolding—Generative Adversarial Networks (GANs) are ingeniously crafting artificial celebrity identities. This intriguing fusion of technology and creativity has given rise to an entirely new breed of digital celebrities. Join us on a captivating journey as we delve into the world of GANs and uncover the magic behind the creation of artificial celebrity personas that are captivating the virtual realm. How do GANs make this possible? Let’s explore the secrets behind this digital artistry.
Learning Objectives
In this article, we will learn
- The concept of Generative Adversarial Networks (GANs)
- How to train the generator and discriminator?
- The step-by-step process of implementing a GAN model
- Gain insights into how GANs improve over time through adversarial training
This article was published as a part of the Data Science Blogathon.
Table of contents
Generative Adversarial Network(GAN)
A Generative Adversarial Network (GAN) is a deep learning model and it is by Goodfellow. By the name itself, we can understand the purpose of these GANs. Yes! We use it for generation purposes. It is a network that generates something. Use GANs for generating synthetic data. This data includes images, text, audio, and many more, and that is similar to real-world data. GAN contains two neural networks. These networks are known as generator and discriminator. During training, these two networks train in such a way that they both compete with each other and get better.
What is Generator?
The generator is a neural network that is responsible for generating. So to give an output it needs input. The input that the generator takes is some random noise. The generator takes this random noise and it tries to generate output that resembles the real data. Every time it gets feedback from the discriminator, it improves itself and generates better data next time. For example, take image generation, the generator will produce images. As it improves through training, it starts with random noise and eventually refines its output to become more and more realistic. The first time, it may not produce the most resemblance to the original data. Sometimes it even generates images that are not at all an image. As the training goes on, better data is generated that is more accurate.
What is Discriminator?
The discriminator is a neural network that is responsible for evaluating. To understand easily, we can call it a detective. This generator receives both the real data and the fake data generated by the generator. It has to distinguish fake data from the real data. In simple terms, it involves classifying actual data from fake data. Similar to the generator, as the training continues it will be able to distinguish better. It may not be able to produce its best on the first attempt. But during training, it will get better and better, and finally, it will be able to distinguish most of the fake data. As I said it has to work like a detective.
Adversarial Training
Both the generator and discriminator undergo training and it is called adversarial training. They both engage in competitive training, as I already mentioned. We know that the generator generates fake data that looks like real data and the discriminator tries to distinguish fake data. In the next step of the training process, the generator aims to improve and generate fake data to fool the discriminator. Again the discriminator detects the fake data. In this way during training, both of them get better in their respective tasks. This process continues until the generator produces data that is realistic and the discriminator can’t tell it from real data. At this point, the GAN has reached a kind of equilibrium, and the generated data is very similar to real data.
Implementation
Let’s begin by importing all necessary libraries. This majorly includes some torch modules. We will use matplotlib for visualizations.
from __future__ import print_function
%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
Dataset
For this project implementation, we will use the CelebFaces Attributes (CelebA) Dataset. This dataset is available in Kaggle. You can download it from here.
Dataset link: https://www.kaggle.com/datasets/jessicali9530/celeba-dataset
This dataset contains 202,599 face images of different celebrities. Use it for training and testing face detection models, especially those capable of recognizing specific facial features.
Initial Configuration and Setup
Now let’s set some parameters and configurations before starting the actual implementation. First, we have to provide the path where the dataset is present. define the number of workers. It represents the number of data-loading workers for the data loader. We use data-loading workers to load batches of data in parallel, speeding up the data-loading process during training. Define batch size and image size, number of channels in the training images, size of latent vector, size of feature maps in the generator, size of feature maps in the discriminator, number of epochs, learning rate, beta1 hyperparameter for Adam optimizers. The number of GPUs available for training. If no GPU is available, it will use the CPU. The code sets up these configurations and prints the selected device (CPU or GPU) and the number of GPUs used for training. You must control these settings crucially when training your GAN model.
# dataset
dataroot = r"C:\Users\Admin\Downloads\faces\img_align_celeba"
workers = 1
batch_size = 16
image_size = 64
nc = 3
# Size of z latent vector
nz = 64
# Size of feature maps in generator
ngf = 64
# Size of feature maps in discriminator
ndf = 64
num_epochs = 5
lr = 0.0001
beta1 = 0.2
ngpu = 1
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
torch.backends.cudnn.benchmark = True
print(device)
print(ngpu)
Data Loader
Let’s create a PyTorch dataset and dataloader for our training data. Create the dataset variable using the dset.ImageFolder class. We created this PyTorch dataset class to load data organized into folders, requiring two essential arguments. We applied a series of image changes to each image in the dataset, known as a transform.
Create the dataloader variable using torch.utils.data.DataLoader. This is responsible for loading data in batches. It takes the dataset that was defined and the batch size. Along with them, we must mention the number of worker processes for data loading and whether to shuffle or not to shuffle the data. We defined this number of workers earlier. To ensure that the model doesn’t view the same order of images in every epoch, you must shuffle the data before training.
Then we plot some training images to understand what our training data looks like.
# Create the dataset
dataset = dset.ImageFolder(root=dataroot,
transform=transforms.Compose([
transforms.Resize(image_size),
transforms.CenterCrop(image_size),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
]))
# Create the dataloader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
shuffle=True, num_workers=workers)
# Plot some training images
real_batch = next(iter(dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64],
padding=2, normalize=True).cpu(),(1,2,0)))
Noise Generation
Now let’s create some noise. This noise is an input to the generator in a GAN. The noise() function generates random noise. So let’s define it. It takes no arguments and creates a random tensor of shape (nz, ngf) filled with values sampled from a uniform distribution between 0 and 1. Also, scales the random tensor so that the values are between -1 and 1. It does this by multiplying the random tensor by 2 and then subtracting 1. This noise tensor can be concatenated with other data (e.g., labels or latent vectors) and fed into the generator to generate images.
def noise():
return 2*torch.rand(nz, ngf, device = device) - 1
Generator
It’s time to define the generator. For that, we have to define the generator class. Next, specify the constructor method. It accepts the ngpu input, which indicates how many GPUs are available for training. The ngpu parameter is used to handle multi-GPU training, but in this code, it’s not being used. Inside the constructor, the generator architecture is defined.
Sequential blocks apply a series of convolutional operations, batch normalization, and activation functions to the input x. Then we define the final convolutional layer that produces the output image. That is self.toImage. It uses the feature maps from the past layers to create the desired 3-channel image.
Next, we have to define the forward method. The generator’s generator pass forward is defined by this method. It takes the input x. This x is a noise vector or latent vector and passes it through the layers defined in the constructor. The result is the generated image. Initially, input x is transformed after passing through the latent layer. Then it is processed through all the convolutional layers defined. After each convolutional layer, the feature maps are concatenated with the feature maps from the corresponding layer before upsampling. This helps the generator capture high-level features and spatial details. The generated image is obtained from the final convolutional layer’s output and is returned as the result of the forward pass.
# GeneratorCode
class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.fromLatent = nn.Linear(ngf, (image_size*image_size//16)*ndf)
self.dlayer1 = nn.Sequential(
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.dlayer2 = nn.Sequential(
nn.Upsample(scale_factor=2),
nn.Conv2d(2 * ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.dlayer3 = nn.Sequential(
nn.Upsample(scale_factor=2),
nn.Conv2d(2 * ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.toImage = nn.Conv2d(ndf, 3, 3, padding = 1)
def forward(self, x):
x = self.fromLatent(x)
x = x.view(x.size(0), ndf, image_size//4, image_size//4)
h0 = x
x = self.dlayer1(x)
x = torch.cat([x, h0], dim=1)
x = self.dlayer2(x)
h0 = nn.functional.interpolate(h0, scale_factor=2, mode='nearest')
x = torch.cat([x, h0], dim=1)
x = self.dlayer3(x)
x = self.toImage(x)
return x
Discriminator
To identify real and fake images, we have to construct a discriminator network that is similar to the generator that we defined earlier. A discriminator class will be present in this instance. Define the constructor method of the discriminator class. It accepts the ngpu input, which designates the number of GPUs available for training, just as the generator.
The elayer1 contains three convolutional layers, and each layer is followed by batch normalization and the ELU activation function. The self.toLatent is an FC layer that takes the output of the convolutional layers and maps it to a tensor. The self.fromLatent is another FC layer. It takes the output from the generator and maps it to a tensor.
These blocks are similar to the ones in the generator, but they are used for decoding. That means image generation. The self.toImage is the final convolutional layer that produces an output image. It creates a 3-channel image using the feature maps from the earlier layers. The forward method then specifies the discriminator’s forward pass. It takes image input either real or generated and passes it through the layers defined in the constructor. The result is an image-like output.
Code Implementation
class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.elayer1 = nn.Sequential(
nn.Conv2d(nc, ndf, 3, padding = 1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding = 1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, 2 * ndf, 3, padding = 1),
nn.BatchNorm2d(2 * ndf),
nn.ELU(inplace=True),
)
self.elayer2 = nn.Sequential(
nn.MaxPool2d(2),
nn.Conv2d(2 * ndf, 2 * ndf, 3, padding = 1),
nn.BatchNorm2d(2 * ndf),
nn.ELU(inplace=True),
nn.Conv2d(2 * ndf, 3 * ndf, 3, padding = 1),
nn.BatchNorm2d(3 * ndf),
nn.ELU(inplace=True)
)
self.elayer3 = nn.Sequential(
nn.MaxPool2d(2),
nn.Conv2d(3 * ndf, 3 * ndf, 3, padding = 1),
nn.BatchNorm2d(3 * ndf),
nn.ELU(inplace=True),
nn.Conv2d(3 * ndf, 3 * ndf, 3, padding = 1),
nn.BatchNorm2d(3 * ndf),
nn.ELU(inplace=True)
)
self.toLatent = nn.Linear((image_size*image_size//16)*3*ndf, ngf)
self.fromLatent = nn.Linear(ngf, (image_size*image_size//16)*ndf)
self.dlayer1 = nn.Sequential(
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.dlayer2 = nn.Sequential(
nn.Upsample(scale_factor=2),
nn.Conv2d(2*ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.dlayer3 = nn.Sequential(
nn.Upsample(scale_factor=2),
nn.Conv2d(2*ndf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True),
nn.Conv2d(nsdf, ndf, 3, padding=1),
nn.BatchNorm2d(ndf),
nn.ELU(inplace=True)
)
self.toImage = nn.Conv2d(ndf, 3, 3, padding = 1)
def forward(self, x):
x = self.elayer1(x)
x = self.elayer2(x)
x = self.elayer3(x)
x = x.view(x.size(0), -1)
x = self.toLatent(x)
x = self.fromLatent(x)
x = x.view(x.size(0), ndf, image_size//4, image_size//4)
h0 = x
x = self.dlayer1(x)
x = torch.cat([x, h0], dim=1)
x = self.dlayer2(x)
h0 = torch.nn.functional.interpolate(h0, scale_factor=2, mode='nearest')
x = torch.cat([x, h0], dim=1)
x = self.dlayer3(x)
x = self.toImage(x)
return x
Now, create a class called Generator and pass ngpu to build a generator neural network. You can use more GPUs to increase training efficiency if you have them. The Discriminator neural network is also created similarly to the generator by instantiating a class called Discriminator. Initiate the loss function in the next step.
Next, we have to create a batch of latent vectors called fixed_noise. These latent vectors are usually random noise vectors, often drawn from a normal distribution. They’re used to generate fake images from the Generator during training. And then we have to set up optimizers for both the generator and discriminator. We will be using Adam optimizers for both. Adam is a popular optimization algorithm.
# Create the generator
netG = Generator(ngpu).to(device)
# To handle multi-gpu
if (device.type == 'cuda') and (ngpu > 1):
netG = nn.DataParallel(netG, list(range(ngpu)))
# Create the Discriminator
netD = Discriminator(ngpu).to(device)
# To handle multi-gpu
if (device.type == 'cuda') and (ngpu > 1):
netD = nn.DataParallel(netD, list(range(ngpu)))
# Initiate Loss function
criterion = nn.L1Loss()
fixed_noise = noise()
# Setup Adam optimizers for both Generator and Discriminator
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))
Training
It’s time to train our model. Before that, we have to create a class called Timer that will help you calculate the training time at each step. Create some necessary empty lists and define some necessary parameters to store data during training. Iterate over a predetermined number of epochs and batches of data from the data loader in the training loop, which has been set up. The discriminator gets trained with the fake data generated by the generator. Throughout the training both the generator and discriminator get updated. Meanwhile, during the training at every step, all the stats will be printed. It collects losses and the generated images for analysis and visualization during and after training.
import time
class Timer():
def __init__(self):
self.startTime = time.time()
self.lastTime = time.time()
def timeElapsed(self):
auxTime = self.lastTime
self.lastTime = time.time()
return self.lastTime - auxTime
def timeSinceStart(self):
self.lastTime = time.time()
return self.lastTime - self.startTime
# Training Loop
k = 0
gamma = 0.4
lambda_k = 0.005
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Training Loop Started...")
timer = Timer()
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
netD.zero_grad()
# Format batch
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, real_cpu.view(-1))
# Calculate gradients for D in backward pass
D_x = output.mean().item()
fake = netG(noise())
# Classify all fake batch with D
output = netD(fake.detach()).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, fake.view(-1))
# Calculate the gradients for this batch
D_loss = errD_real - k * errD_fake
D_loss.backward()
D_G_z1 = output.mean().item()
# Add the gradients from the all-real and all-fake batches
optimizerD.step()
netG.zero_grad()
fake = netG(noise())
output = netD(fake).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, fake.view(-1))
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
delta = (gamma*errD_real - errG).data
k = max(min(k + lambda_k*delta, 1.0), 0.0)
# Output training stats
if i % 50 == 0:
print(
'[%.4f] [%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (timer.timeElapsed(), epoch, num_epochs, i, len(dataloader),
D_loss.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(D_loss.item())
if (iters % 1000 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
Visualizations
Now let’s generate a plot to visualize the generator and discriminator loss during the training of a GAN.
plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()
Similarly, let’s generate a plot for comparison between real and fake images produced by a GAN. For this, we have to grab a batch of real images from the dataloader. These real images are used for comparison with the images generated by the GAN. Then we will plot real images and fake images generated by GAN. This allows you to visually compare the quality of the generated images with real data.
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))
# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64],
padding=5, normalize=True).cpu(),(1,2,0)))
# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.savefig('fake_image.png')
plt.show()
Conclusion
In this article, we have used Generative Adversarial Networks (GANs) on the CelebFaces Attributes (CelebA) dataset and generated some fake celebrity faces. Generative Adversarial Networks (GANs) are a remarkable breakthrough in technology. They can create fake data that looks very real. This has a number of uses and is incredibly helpful, especially when a lot of data is required for the projects. It is really developing fast and is being used more in recent years. This technology is an interesting development in the field of artificial intelligence since it has a promising future for many different applications.
Key Takeaways
- Generative Adversarial Networks (GANs) are a revolutionary AI technology capable of generating data that closely mimic real data.
- It consists of two neural networks. One is a generator and the other is a discriminator. These two networks are engaged in adversarial training.
- GANs have applications in diverse fields, including image generation, super-resolution, style transfer, and data augmentation.
- There are many kinds of GANs that have their own advantages along with disadvantages and their own applications.
- GANs may raise ethical concerns related to deepfakes, fake content generation, and privacy violations.
Frequently Asked Questions
A. Generative Adversarial Network (GAN) is a deep learning model and it is introduced by Goodfellow.GANs are used for generating synthetic data. This data includes images, text, audio, and many more, and that is similar to real-world data. It contains two neural networks. These networks are called a generator and a discriminator. During training, these two networks train in such a way that they both compete with each other and get better.
A. In image generation with GANs, the generator initiates random noise and progressively refines its output to create images that mimic the real data. As training continues, the generator produces more realistic images.
A. GANs differ from other generative models like Variational Autoencoders as they use a competitive learning approach between the generator and discriminator, which can lead to the creation of high-quality, realistic data.
A. Generative Adversarial Networks (GANs) have the ability to make a big impact in different areas like entertainment, healthcare, and creating artificial data. They are expected to be a crucial part of generating lifelike fake data and pushing forward artificial intelligence in the near future.
A. GANs have raised ethical concerns due to their ability to generate deepfakes and fake content. Misuse of GAN technology for deceptive or malicious purposes is a significant concern.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.