Load CSV data in Tensorflow

26 July 2024

0

This article will look at the ways to load CSV data in the Python programming language using TensorFlow.

TensorFlow library provides the make_csv_dataset( ) function, which is used to read the data and use it in our programs.

Loading single CSV File

To get the single CSV data file from the URL, we use the Keras get_file function. Here we will use the Titanic Dataset.

To use this, we add the following lines in our code:

Python3

import tensorflow as tf 
from tensorflow.keras import layers 
import pandas as pd 
  
data_path = tf.keras.utils.get_file("data_train.csv",  
                    "https://storage.googleapis.com/tf-datasets/titanic/train.csv") 
  
data_train_tf = tf.data.experimental.make_csv_dataset( 
    data_path, 
    batch_size=10, 
    label_name='survived', 
    num_epochs=1, 
    ignore_errors=True,) 

The data now can be used as a dict where the key is the column name and values are the data records. The first item in the dataset is our data columns; the other is label data. In our data batch, each column/feature name acts as a key, and all values in the column are its value.

Python3

for batch, label in data_train_tf.take(1): 
    for key, value in batch.items(): 
        print(f"{key:10s}: {value}") 

Output:

Loading Multiple CSVs Files:

The primary use of make_csv_dataset method can be seen when we have to import multiple CSV files into our dataset. We will use the fonts dataset, which contains different language fonts.

Example: In this example, we use the Keras get_file feature to read multiple datasets onto the disk, and cache_dir and cache_subdir define where to store these.

Once we have the datasets saved, then using the file_pattern command in our make_csv_dataset we can specify the path to all files to be imported. Create a new file and execute the following code:

Python3

fonts = tf.keras.utils.get_file('fonts.zip',  
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00417/fonts.zip", 
    cache_dir='.', cache_subdir='fonts', 
    extract=True) 
  
fonts_data = tf.data.experimental.make_csv_dataset( 
    file_pattern="fonts/*.csv", 
    batch_size=10, num_epochs=1, 
    num_parallel_reads=4, 
    shuffle_buffer_size=10000) 
  
for features in fonts_data.take(1): 
    for i, (name, value) in enumerate(features.items()): 
        if i > 15: 
            break
        print(f"{name:20s}: {value}") 
    print(f"[total: {len(features)} features]") 

We are displaying the first 15 features of each dataset and their values. The final count of total feature is displayed using len( ) function. In this example, we have 412 features in total.

Output:

Load CSV data in Tensorflow

Loading single CSV File

Python3

Python3

Loading Multiple CSVs Files:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

Google wants to hear your thoughts on the Android 15 QPR2 Beta

Recent Comments

EDITOR PICKS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR POSTS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR CATEGORY

ABOUT US

FOLLOW US