Deep learning models like convolutional neural networks (ConvNet) require large amounts of data to make accurate predictions. In general, sufficient sample size for a ConvNet application would involve tens of thousands of images. Often, only a few thousand labeled images are available for training, validation, and testing. Challenges associated with limited data can be overcome if you leverage pre-trained layers—components of models that have been already trained and published online. We’ll go through a few techniques of employing pre-trained layers using Keras in Python.
There are a number of open-source ConvNets available online complete with weights and biases pre-trained on vast image datasets. These models were developed for the ImageNet competition and built to predict one of a thousand diverse classes from say a giraffe to a microwave. Their usefulness lies in the fact that they were trained on millions of images, so the parameters associated with the convolution filters are well estimated. To predict a custom group of classes, at a minimum it is necessary to remove the final layer. We’ll go through how to use both full pre-trained models and single convolutional blocks using the pre-trained model “VGG16” with Keras.
[Related Article: Building an Image Search Service from Scratch]
Full Model
First we’ll want to import our dependencies, define the image size, and the number of classes we want to predict.
Next we’ll define an object “vgg” as the VGG16 model and specify how we want to customize it. First, “include_top=False” insures that the last layer of the model that makes predictions about the one thousand classes is omitted. The argument “pooling” we’ll define as “avg” to allow global average pooling for the last layer in the model. To download the weights, we need to define the “weights” argument as “imagenet”. Lastly, specify the image size for “input_shape”. The model we just assembled is incomplete, but we can investigate its structure with “.summary()”.
Figure 1: Architecture for VGG16 Model
To enable the model to make predictions, we’ll need to add one more layer. To stack layers, we’ll use “.Sequential()” from Keras and “.add” a softmax layer to the pre-trained model. This layer will have four activation values that will represent the relative probabilities of four classes we want to predict. It is important to insure we don’t train the 14 million parameters in the VGG16 model, so we freeze the parameters in the “vgg” object part of “my_model”.
Because we’re using the VGG16 model, we need to preprocess the images in a way that is compatible with the pretrained network. This can be accomplished by importing “preprocess_input”.
Single Block
Often pre-trained models have difficulty making accurate predictions for classes that are very dissimilar from the classes they were trained to predict. In this case, training new parameter values for convolutional blocks may be worthwhile, so the model learns the specific features associated with the unique classes. However, we can still employ layers from the first few convolutional blocks in pre-trained models that are adept at recognizing basic features like lines and curves. By including a single pre-trained convolutional block, we can reduce computation time and improve prediction accuracy with limited data.
Follow the same first steps detailed previously in setting up the object containing the VGG16 model.
By inspecting a summary of the model architecture (Figure 1), identify the name of the last layer we want to include. In this case, we’ll just take the first convolutional block ending with the max pooling layer “block1_pool”.
Now check out the model summary; we can see that only the first convolutional block is remaining.
We can stack our own layers onto this pre-trained layer using the same technique to add the softmax layer earlier. This time we’ll add additional convolutional blocks and two fully connected layers. Again, be sure to freeze the layers from the first block before training.
The final architecture for our custom model with one pre-trained convolutional block can be viewed with “model.summary()”
Conclusion on How to Leverage Pre-Trained Models
[Related Article: Learn How to Organize, Clean Up, and Process Medical Image Datasets for Computer Vision Training]
Pre-trained models offer a lot of value to ConvNet projects that suffer from limited data. While it is common to use the whole model, extracting single convolutional blocks can be helpful for networks that need to predict fairly unique classes. It is worthwhile to experiment with different architectures and pre-trained layers to see what works best for one’s particular task.