How can Tensorflow be used to download and explore the Iliad dataset using Python?

20 July 2024

2

Tensorflow is a free open-source machine learning and artificial intelligence library widely popular for training and deploying neural networks. It is developed by Google Brain Team and supports a wide range of platforms. In this tutorial, we will learn to download, load and explore the famous Iliad dataset.

In the Iliad dataset, there are various works of different English translations of the same Homer’s Iliad text. Tensorflow has modified the documents for focusing on the examples of their work. The dataset is available at the following URL.

https://storage.googleapis.com/download.tensorflow.org/data/illiad/

Example: In the following example, we will take the works of three translators named: William Cowper, Edward, Earl of Derb, and Samuel Butler. Then with the help of TensorFlow, we will load them and classify their works with their translations.

Install the TensorFlow text package:

pip install "tensorflow-text==2.8.*"

Download and load the Iliad dataset

We need to label each dataset individually and so we use the Dataset.map function. This will return example-label pairs.

Python3

import pathlib 
import tensorflow as tf 
from tensorflow.keras import layers 
from tensorflow.keras import losses 
from tensorflow.keras import utils 
from tensorflow.keras.layers import TextVectorization 
import tensorflow_datasets as tfds 
import tensorflow_text as tf_text 
  
print("Welcome to neveropen") 
print("Loading the Illiad dataset") 
DIRECTORY_URL = 'https://storage.googleapis.com/\ 
download.tensorflow.org/data/illiad/' 
FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt'] 
  
for name in FILE_NAMES: 
   text_dir = utils.get_file(name, 
                             origin=DIRECTORY_URL + name) 
  
parent_dir = pathlib.Path(text_dir).parent 
  
def labeler(example, index): 
  return example, tf.cast(index, tf.int64) 
  
labeled_data_sets = [] 
  
for i, file_name in enumerate(FILE_NAMES): 
  lines_dataset = tf.data.TextLineDataset(str(parent_dir/file_name)) 
  labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i)) 
  labeled_data_sets.append(labeled_dataset) 
labeled_data_sets

Output:

[<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>]

Concatenate and shuffle the datasets. There are concatenated using the Dataset.concatenate function. The shuffle function is used to shuffle the data. We then print out some examples.

Python3

BUFFER_SIZE = 50000
BATCH_SIZE = 64
VALIDATION_SIZE = 5000
  
all_labeled_data = labeled_data_sets[0] 
for labeled_dataset in labeled_data_sets[1:]: 
    all_labeled_data = all_labeled_data.concatenate(labeled_dataset) 
  
all_labeled_data = all_labeled_data.shuffle( 
    BUFFER_SIZE, reshuffle_each_iteration=False) 
for text, label in all_labeled_data.take(5): 
    print("Sentence: ", text.numpy()) 
    print("Label:", label.numpy()) 

Output:

Sentence: b”Of brass, and color’d with a ring of gold.”
Label: 0
Sentence: b’drove the horses in among the others.’
Label: 2
Sentence: b’Into the boundless ether. Reaching soon’
Label: 0
Sentence: b”Drive to the ships, for pain weigh’d down his soul.”
Label: 1
Sentence: b”Not one is station’d to protect the camp.”
Label: 1

How can Tensorflow be used to download and explore the Iliad dataset using Python?

Download and load the Iliad dataset

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

5 Best VPNs for the US in 2024: Very Secure & Safe by Gjurgjica Panova

Recent Comments

EDITOR PICKS

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

POPULAR POSTS

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

POPULAR CATEGORY

ABOUT US

FOLLOW US