How can Tensorflow be used to standardize the data using Python?

22 July 2024

0

In this article, we are going to see how to use standardize the data using Tensorflow in Python.

What is Data Standardize?

The process of converting the organizational structure of various datasets into a single, standard data format is known as data standardization. It is concerned with the modification of datasets following their collection from various sources and before their loading into target systems. It requires a significant amount of time and iteration to complete, resulting in extremely accurate, efficient, time-consuming integration and development effort.

How can Tensorflow be used to standardize the data?

We are using the flower dataset for understanding how can Tensorflow be used to standardize the data using Python. That Flower dataset contains several thousands of images of flowers with proper naming. There is one sub-directory for each class inside its five sub-directories. The flower dataset will be loaded into the environment for use after being downloaded using the ‘get_file’ method.
Now, let’s try to understand how we can download the flower dataset but before downloading we need to import some of the python libraries, and to run the code below, we use Google Collaborate.

Import libraries

In the first step, we import some of the important tensorflow and python libraries that we are going to use in the further process.

Python

import matplotlib.pyplot as plt 
import numpy as np 
import os 
import PIL 
import tensorflow as tf 
from tensorflow import keras 
from tensorflow.keras import layers 
from tensorflow.keras.models import Sequential 
import pathlib as pt 

Download the Dataset

we are using a Flower dataset that contains five sub-directories and one for each class. so, for using that dataset we need to download it first. and for downloading the dataset we need get_file() method.

Python3

dataset_url = "https://storage.googleapis.com/\ 
download.tensorflow.org/example_images/flower_photos.tgz" 
data_dir = tf.keras.utils.get_file('flower_photos',  
                                   origin=dataset_url,  
                                   untar=True) 
data_dir = pt.Path(data_dir)

You should now have a copy of the dataset after downloading. There are a total of 3,670 images. and you can count the images on the dataset by using the code below:

Python3

img_count = len(list(data_dir.glob('*/*.jpg'))) 
print(img_count)

Output:

In the dataset we have 5 categories of flowers available roses, tulips, daisy, dandelion, and sunflowers. so you can check according to their category name and using the code below:

Python3

roses = list(data_dir.glob('roses/*')) 
PIL.Image.open(str(roses[0]))

Load the Dataset

For loading the dataset you need to define some parameters for the loader. Now, we need to split the dataset and by default, we are using 60% of the flower dataset as training and 40% for testing.

Python3

batch_size = 32
img_height = 180
img_width = 180
  
train_ds = tf.keras.utils.image_dataset_from_directory( 
    data_dir, 
    validation_split=0.4, 
    subset="training", 
    seed=123, 
    image_size=(img_height, img_width), 
    batch_size=batch_size) 

Output:

Found 3670 files belonging to 5 classes.
Using 2202 files for training.

Standardize the dataset

The RGB channel values are between 0 and 255. This is not ideal for a neural network; in general, try to keep your input values as minimal as possible.

We can standardize values to fall between [0, 1] by using a rescaling layer(tensorflow.keras.layers.Rescaling)

Python3

# create normalization layer 
nrmzln_layer = layers.experimental.preprocessing.Rescaling(1./255) 
  
print("The map function is used to apply \ 
this layer to the dataset. ") 
nrmlztn_ds = train_ds.map(lambda x, 
                          y: (nrmlztn_layer(x), y)) 
image_batch, labels_batch = next(iter(nrmlztn_ds)) 
  
first_image = image_batch[0] 
  
# pixel values are in the range of [0,1]. 
print("minimum pixel value:", np.min(first_image), 
      " maximum pixel value:", np.max(first_image)) 

Output:

The map function is used to apply this layer to the dataset.

minimum pixel value: 0.0

maximum pixel value: 0.87026095

How can Tensorflow be used to standardize the data using Python?

What is Data Standardize?

How can Tensorflow be used to standardize the data?

Import libraries

Python

Download the Dataset

Python3

Python3

Python3

Load the Dataset

Python3

Standardize the dataset

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

Google wants to hear your thoughts on the Android 15 QPR2 Beta

Recent Comments

EDITOR PICKS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR POSTS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR CATEGORY

ABOUT US

FOLLOW US