Neural Networks are a biologically-inspired programming paradigm that deep learning is built around. Python provides various libraries using which you can create and train neural networks over given data. PyTorch is one such library that provides us with various utilities to build and train neural networks easily. When it comes to Neural Networks it becomes essential to set optimal architecture and hyper parameters. While training a neural network the training loss always keeps reducing provided the learning rate is optimal. But it’s important that our network performs better not only on data it’s trained on but also data that it has never seen before. One way to measure this is by introducing a validation set to keep track of the testing accuracy of the neural network. In this article we’ll how we can keep track of validation accuracy at each training step and also save the model weights with the best validation accuracy.
Installing PyTorch
Installing PyTorch is pretty similar to any other python library. We can use pip or conda to install PyTorch:-
pip install torch torchvision
This command will install PyTorch along with torchvision which provides various datasets, models, and transforms for computer vision. To install using conda you can use the following command:-
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Loading Data
For this tutorial, we are going to use the MNIST dataset that’s provided in the torchvision library. In Deep Learning we often train our neural networks in batches of a certain size, DataLoader is a data loading utility in PyTorch that creates an iterable over these batches of the dataset. Let’s start by loading our data:-
from torchvision import datasets, transforms from torch.utils.data import DataLoader, random_split transforms = transforms.Compose([ transforms.ToTensor() ])
In the above code, we declared a variable called transform which essentially helps us transform the raw data in the defined format. Here our transform is simply taking the raw data and converting it to a Tensor. A Tensor is a fancy way of saying a n-dimensional matrix.
train = datasets.MNIST('', train = True, transform = transforms, download = True) train, valid = random_split(train,[50000,10000])
Now we are downloading our raw data and apply transform over it to convert it to Tensors, train tells if the data that’s being loaded is training data or testing data. In the end, we did a split the train tensor into 2 tensors of 50000 and 10000 data points which become our train and valid tensors.
trainloader = DataLoader(train, batch_size=32) validloader = DataLoader(valid, batch_size=32)
Now we just created our DataLoaders of the above tensors of 32 batch size. Now that we have the data let’s start by creating our neural network.
Building our Model
There are 2 ways we can create neural networks in PyTorch i.e. using the Sequential() method or using the class method. We’ll use the class method to create our neural network since it gives more control over data flow. The format to create a neural network using the class method is as follows:-
from torch import nn class model(nn.Module): def __init__(self): # Define Model Here def forward(self, x): # Define Forward Pass Here
So in the __init__() method we define our layers and other variables and in the forward() method we define our forward pass i.e. how data flows through the layers.
import torch from torch import nn import torch.nn.functional as F class Network(nn.Module): def __init__(self): super(Network,self).__init__() self.fc1 = nn.Linear(28*28, 256) self.fc2 = nn.Linear(256, 128) self.fc3 = nn.Linear(128, 10) def forward(self, x): x = x.view(1,-1) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x model = Network() if torch.cuda.is_available(): model = model.cuda()
In the above code, we defined a neural network with the following architecture:-
- Input Layer: 784 nodes, MNIST images are of dimension 28*28 which have 784 pixels so when flatted it’ll become the input to the neural network with 784 input nodes.
- Hidden Layer 1: 256 nodes
- Hidden Layer 2: 128 nodes
- Output Layer: 10 nodes, for 10 classes i.e. numbers 0-9
nn.Linear() or Linear Layer is used to apply a linear transformation to the incoming data. If you are familiar with TensorFlow it’s pretty much like the Dense Layer.
In the forward() method we start off by flattening the image and passing it through each layer and applying the activation function for the same. After that, we create our neural network instance, and lastly, we are just checking if the machine has a GPU and if it has we’ll transfer our model there for faster computation.
Defining Criterion and Optimizer
Optimizers define how the weights of the neural network are to be updated, in this tutorial we’ll use SGD Optimizer or Stochastic Gradient Descent Optimizer. Optimizers take model parameters and learning rate as the input arguments. There are various optimizers you can try like Adam, Adagrad, etc.
The criterion is the loss that you want to minimize which in this case is the CrossEntropyLoss() which is the combination of log_softmax() and NLLLoss().
criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
Training Neural Network with Validation
The training step in PyTorch is almost identical almost every time you train it. But before implementing that let’s learn about 2 modes of the model object:-
- Training Mode: Set by model.train(), it tells your model that you are training the model. So layers like dropout etc. which behave differently while training and testing can behave accordingly.
- Evaluation Mode: Set by model.eval(), it tells your model that you are testing the model.
Even though you don’t need it here it’s still better to know about them. Now that we have that clear let’s understand the training steps:-
- Move data to GPU (Optional)
- Clear the gradients using optimizer.zero_grad()
- Make a forward pass
- Calculate the loss
- Perform a backward pass using loss.backward() to calculate the gradients
- Take optimizer step using optimizer.step() to update the weights
The validation and Testing steps are also similar but there you just make a forward pass and calculate the loss. A Simple training loop without validation is written like the following:-
epochs = 5 for e in range(epochs): train_loss = 0.0 for data, labels in tqdm(trainloader): # Transfer Data to GPU if available if torch.cuda.is_available(): data, labels = data.cuda(), labels.cuda() # Clear the gradients optimizer.zero_grad() # Forward Pass target = model(data) # Find the Loss loss = criterion(target,labels) # Calculate gradients loss.backward() # Update Weights optimizer.step() # Calculate Loss train_loss += loss.item() print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)}')
If you add the validation loop it’ll be the same but with forward pass and loss calculation only. But it may happen that your last iteration isn’t the one that gave you the least validation loss. To tackle this we can set a max valid loss which can be np.inf and if the current valid loss is lesser than we can save the state dictionary of the model which we can load later, like a checkpoint. state_dict is an OrderedDict object that maps each layer to its parameter tensor.
import numpy as np epochs = 5 min_valid_loss = np.inf for e in range(epochs): train_loss = 0.0 model.train() # Optional when not using Model Specific layer for data, labels in trainloader: if torch.cuda.is_available(): data, labels = data.cuda(), labels.cuda() optimizer.zero_grad() target = model(data) loss = criterion(target,labels) loss.backward() optimizer.step() train_loss += loss.item() valid_loss = 0.0 model.eval() # Optional when not using Model Specific layer for data, labels in validloader: if torch.cuda.is_available(): data, labels = data.cuda(), labels.cuda() target = model(data) loss = criterion(target,labels) valid_loss = loss.item() * data.size(0) print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)} \t\t Validation Loss: {valid_loss / len(validloader)}') if min_valid_loss > valid_loss: print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model') min_valid_loss = valid_loss # Saving State Dict torch.save(model.state_dict(), 'saved_model.pth')
After running the above code you should get the following output, although your loss might vary:-
Code
Python3
import torch from torch import nn import torch.nn.functional as F from torchvision import datasets, transforms from torch.utils.data import DataLoader, random_split import numpy as np #Declare transform to convert raw data to tensor transforms = transforms.Compose([ transforms.ToTensor() ]) # Loading Data and splitting it into train and validation data train = datasets.MNIST('', train = True , transform = transforms, download = True ) train, valid = random_split(train,[ 50000 , 10000 ]) # Create Dataloader of the above tensor with batch size = 32 trainloader = DataLoader(train, batch_size = 32 ) validloader = DataLoader(valid, batch_size = 32 ) # Building Our Mode class Network(nn.Module): # Declaring the Architecture def __init__( self ): super (Network, self ).__init__() self .fc1 = nn.Linear( 28 * 28 , 256 ) self .fc2 = nn.Linear( 256 , 128 ) self .fc3 = nn.Linear( 128 , 10 ) # Forward Pass def forward( self , x): x = x.view(x.shape[ 0 ], - 1 ) # Flatten the images x = F.relu( self .fc1(x)) x = F.relu( self .fc2(x)) x = self .fc3(x) return x model = Network() if torch.cuda.is_available(): model = model.cuda() # Declaring Criterion and Optimizer criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr = 0.01 ) # Training with Validation epochs = 5 min_valid_loss = np.inf for e in range (epochs): train_loss = 0.0 for data, labels in trainloader: # Transfer Data to GPU if available if torch.cuda.is_available(): data, labels = data.cuda(), labels.cuda() # Clear the gradients optimizer.zero_grad() # Forward Pass target = model(data) # Find the Loss loss = criterion(target,labels) # Calculate gradients loss.backward() # Update Weights optimizer.step() # Calculate Loss train_loss + = loss.item() valid_loss = 0.0 model. eval () # Optional when not using Model Specific layer for data, labels in validloader: # Transfer Data to GPU if available if torch.cuda.is_available(): data, labels = data.cuda(), labels.cuda() # Forward Pass target = model(data) # Find the Loss loss = criterion(target,labels) # Calculate Loss valid_loss + = loss.item() print (f'Epoch {e + 1 } \t\t Training Loss: {\ train_loss / len (trainloader)} \t\t Validation Loss: {\ valid_loss / len (validloader)}') if min_valid_loss > valid_loss: print (f'Validation Loss Decreased({min_valid_loss:. 6f \ } - - - >{valid_loss:. 6f }) \t Saving The Model') min_valid_loss = valid_loss # Saving State Dict torch.save(model.state_dict(), 'saved_model.pth' ) |