Tensorflow flower dataset is a large dataset of images of flowers. In this article, we are going to see, how we can use Tensorflow to load the flower dataset and work with it.
Let us start by importing the necessary libraries. Here we are going to use the tensorflow_dataset library to load the dataset. It is a library of public datasets ready to use with TensorFlow. If you don’t have any of the libraries mentioned below, you can install them using the pip command, for example, to install tensorflow_datasets library you need to write the following command:
pip install tensorflow-datasets
Python3
# Importing libraries import tensorflow as tf import numpy as np import pandas as pd import tensorflow_datasets as tfds |
To import the flower dataset, we are going to use the tfds.load() method. It is used to load the named dataset, which is provided using the name argument, into a tf.data.Dataset. The name for the flower dataset is tf_flowers. In the method, we also split the dataset using the split argument with training_set taking 70% of the dataset and the rest going to test_set.
Python3
(training_set, test_set), info = tfds.load( 'tf_flowers' , split = [ 'train[:70%]' , 'train[70%:]' ], with_info = True , as_supervised = True , ) |
If we print the information provided for the dataset by Tensorflow using the print command, we will get the following output:
Python3
print (info) |
Output:
The flower dataset contains 3670 flower images, which is distributed in the following fashion in training_set and test_set.
Python3
print ( "Training Set Size: %d" % training_set.cardinality().numpy()) print ( "Test Set Size: %d" % test_set.cardinality().numpy()) |
Output:
The flower dataset consists of images of 5 different kinds of flowers.
Python3
num_classes = info.features[ 'label' ].num_classes print ( "Number of Classes: %d" % num_classes) |
Output:
Let us now visualize some of the images in the dataset. The following code displays the first 5 images in the dataset.
Python3
import matplotlib.pyplot as plt ctr = 0 plt.rcParams[ "figure.figsize" ] = [ 30 , 15 ] plt.rcParams[ "figure.autolayout" ] = True for image, label in training_set: image = image.numpy() plt.subplot( 1 , 5 , ctr + 1 ) plt.title( 'Label {}' . format (label)) plt.imshow(image, cmap = plt.cm.binary) ctr + = 1 if ctr = = 5 : break plt.show() |
Output:
If you might observe carefully, the different images don’t have the same size rather they have different sizes. We can verify this by printing the sizes of the images we visualized just now. The following code accomplishes the goal:
Python3
for i, example in enumerate (training_set.take( 5 )): shape = example[ 0 ].shape print ( "Image %d -> shape: (%d, %d) label: %d" % (i, shape[ 0 ], shape[ 1 ], example[ 1 ])) |
Output:
As you might observe the shapes of the various images are different.
However, for the purposes of feeding this dataset into a machine learning model, we will need to have all images be of the same size. For this, we will preprocess the images a little. Namely, we will resize all the images to a fixed size which is 224 in this case, and normalize the images so that the value of each pixel comes in the range 0 to 1. The following piece of code serves the desired purpose.
Python3
IMG_SIZE = 224 def format_image(image, label): image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE)) # Normalisation image = image / 255.0 return image, label batch_size = 32 training_set = training_set.shuffle( 300 ). map ( format_image).batch(batch_size).prefetch( 1 ) test_set = test_set. map (format_image).batch(batch_size).prefetch( 1 ) |
Printing both the datasets reveals that rightfully each image in the dataset has now been resized, with each image being of size (224,224,3).
Python3
print (training_set) print (test_set) |
Output:
Now you can feed this dataset to any appropriate machine learning model.
For the purposes of demonstration, we will use a modified version of MobileNet to train on this dataset. The following is the piece of code that describes the model, optimizer, loss function, and metric used while training the model.
Python3
def getModel(image_shape): mobileNet = tf.keras.applications.mobilenet.MobileNet(image_shape) X = mobileNet.layers[ - 2 ].output X_output = tf.keras.layers.Dense( 1 , activation = 'relu' )(X) model = tf.keras.models.Model(inputs = mobileNet. input , outputs = X_output) return model model = getModel((IMG_SIZE, IMG_SIZE, 3 )) optimizer = tf.keras.optimizers.Adam() loss = 'mean_squared_error' model. compile (optimizer = optimizer, loss = loss, metrics = 'accuracy' ) epochs = 5 model.fit(training_set, epochs = epochs, validation_data = test_set) |
Output:
The model performs measly on the dataset right now. You can train the model for a longer number of epochs as well as use one-hot encoding for the output variable to increase the accuracy.