Build the Model for Fashion MNIST dataset Using TensorFlow in Python

26 July 2024

2

The primary objective will be to build a classification model which will be able to identify the different categories of the fashion industry from the Fashion MNIST dataset using Tensorflow and Keras

To complete our objective, we will create a CNN model to identify the image categories and train it on the dataset. We are using deep learning as a method of choice since the dataset consists of images, and CNN’s have been the choice of algorithm for image classification tasks. We will use Keras to create CNN and Tensorflow for data manipulation tasks.

The task will be divided into three steps data analysis, model training and prediction. Let us start with data analysis.

Data Analysis

Step 1: Importing the required libraries

We will first import all the required libraries to complete our objective. To show images, we will use matplotlib, and for array manipulations, we will use NumPy. Tensorflow and Keras will be used for ML and deep learning stuff.

Python3

# To load the mnist data
from keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
 
# importing various types of hidden layers
from tensorflow.keras.layers import Conv2D, MaxPooling2D,\
Dense, Flatten
 
# Adam optimizer for better LR and less loss
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np

The Fashion MNIST dataset is readily made available in the keras.dataset library, so we have just imported it from there.

The dataset consists of 70,000 images, of which 60,000 are for training, and the remaining are for testing purposes. The images are in grayscale format. Each image consists of 28×28 pixels, and the number of categories is 10. Hence there are 10 labels available to us, and they are as follows:

T-shirt/top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

Step 2: Loading data and auto-splitting it into training and test

We will load out data using the load_dataset function. It will return us with the training and testing dataset split mentioned above.

Python3

# Split the data into training and testing
(trainX, trainy), (testX, testy) = fashion_mnist.load_data()
 
# Print the dimensions of the dataset
print('Train: X = ', trainX.shape)
print('Test: X = ', testX.shape)

The train contains data from 60,000 images, and the test contains data from 10,000 images

Step 3: Visualise the data

As we have loaded the data, we will visualize some sample images from it. To view the images, we will use the iterator to iterate and, in Matplotlib plot the images.

Python3

for i in range(1, 10):
   
    # Create a 3x3 grid and place the
    # image in ith position of grid
    plt.subplot(3, 3, i)
     
    # Insert ith image with the color map 'grap'
    plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
 
# Display the entire plot
plt.show()

With this, we have come to the end of the data analysis. Now we will move forward to model training.

Model training

Step 1: Creating a CNN architecture

We will create a basic CNN architecture from scratch to classify the images. We will be using 3 convolution layers along with 3 max-pooling layers. At last, we will add a softmax layer of 10 nodes as we have 10 labels to be identified.

Python3

def model_arch():
    models = Sequential()
 
    # We are learning 64
    # filters with a kernal size of 5x5
    models.add(Conv2D(64, (5, 5),
                      padding="same",
                      activation="relu",
                      input_shape=(28, 28, 1)))
 
    # Max pooling will reduce the
    # size with a kernal size of 2x2
    models.add(MaxPooling2D(pool_size=(2, 2)))
    models.add(Conv2D(128, (5, 5), padding="same",
                      activation="relu"))
 
    models.add(MaxPooling2D(pool_size=(2, 2)))
    models.add(Conv2D(256, (5, 5), padding="same",
                      activation="relu"))
 
    models.add(MaxPooling2D(pool_size=(2, 2)))
 
    # Once the convolutional and pooling
    # operations are done the layer
    # is flattened and fully connected layers
    # are added
    models.add(Flatten())
    models.add(Dense(256, activation="relu"))
 
    # Finally as there are total 10
    # classes to be added a FCC layer of
    # 10 is created with a softmax activation
    # function
    models.add(Dense(10, activation="softmax"))
    return models

Now we will see the model summary. To do that, we will first compile our model and set out loss to sparse categorical crossentropy and metrics as sparse categorical accuracy.

Python3

model = model_arch()
 
model.compile(optimizer=Adam(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])
 
model.summary()

Model summary

Step 2: Train the data on the model

As we have compiled the model, we will now train our model. To do this, we will use mode.fit() function and set the epochs to 10. We will also perform a validation split of 33% to get better test accuracy and have a minimum loss.

Python3

history = model.fit(
    trainX.astype(np.float32), trainy.astype(np.float32),
    epochs=10,
    steps_per_epoch=100,
    validation_split=0.33
)

Step 3: Save the model

We will now save the model in the .h5 format so it can be bundled with any web framework or any other development domain.

Python3

model.save_weights('./model.h5', overwrite=True)

Step 4: Plotting the training and loss functions

Training and loss functions are important functions in any ML project. they tell us how well the model performs under how many epochs and how much time the model takes actually to converge.

Python3

# Accuracy vs Epoch plot
plt.plot(history.history['sparse_categorical_accuracy'])
plt.plot(history.history['val_sparse_categorical_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

Output:

Python3

# Loss vs Epoch plot
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Accuracy')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

Output:

Prediction

Now we will use model.predict() to get the prediction. It will return an array of size 10, consisting of the labels’ probabilities. The max probability of the label will be the answer.

Python3

# There are 10 output labels for the 
# Fashion MNIST dataset
labels = ['t_shirt', 'trouser', 'pullover',
          'dress', 'coat', 'sandal', 'shirt',
          'sneaker', 'bag', 'ankle_boots']
 
# Make a prediction
predictions = model.predict(testX[:1])
label = labels[np.argmax(predictions)]
 
print(label)
plt.imshow(testX[:1][0])
plt.show()

Output:

Build the Model for Fashion MNIST dataset Using TensorFlow in Python

Data Analysis

Python3

Python3

Python3

Model training

Python3

Python3

Python3

Python3

Python3

Python3

Prediction

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

I tried a Xiaomi mid-ranger for the first time in years, and I’m glad the Pixel 8a exists in the US

Recent Comments

EDITOR PICKS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR POSTS

One UI 7: Everything you need to know

Review: The Ulefone Armor Mini 20T Pro makes other rugged phones seem flimsy

Best midrange Android phones in 2024

POPULAR CATEGORY

ABOUT US

FOLLOW US