In this article, we will learn the difference between from_tensors and from_tensor_slices. Both of these functionalities are used to iterate a dataset or convert a data to TensorFlow data pipeline but how it is done difference lies there. Suppose we have a dataset represented as a Numpy matrix of shape (num_features, num_examples) and we wish to convert it to Tensorflow type tf.data.Dataset.
Difference between Dataset.from_tensors and Dataset.from_tensor_slices
Now we have two methods to do this – Dataset.from_tensors and Dataset.from_tensor_slices.
from_tensors – This method is used to combine several smaller datasets to form a large dataset.
from_tensor_slices – This method is generally used while training machine learning models using data input pipeline. This methods help us to combine the independent features and their target as one dataset.
We will try to understand this one by one using code examples of the same. First of all, the main condition to using the from_tensor_slices is that the dimension of the matrix at the 0th rank must be the same.
Necessary Condition of Shapes in from_tensors and from_tensor_slices
There is a condition to using the from_tensor_slices function but there is no such in the case of from_tensors. The condition is that the input data or tensor’s shape must be the same if one wants to use the from_tensor_slices method. This condition is also referred to as the same dimension at the 0th rank of the input matrix.
Python3
import tensorflow as tf ds1 = ( tf.data.Dataset .from_tensors((tf.random.uniform([ 10 , 4 ]), tf.random.uniform([ 9 ]))) ) ds1 |
Output:
<TensorDataset element_spec=(TensorSpec(shape=(10, 4), dtype=tf.float32, name=None), TensorSpec(shape=(10,), dtype=tf.float32, name=None))>
Now if we will try to use this same function using the from_tensor_slices method then we will get an error message of incompatibility.
Python3
ds2 = ( tf.data.Dataset .from_tensor_slices((tf.random.uniform([ 10 , 4 ]), tf.random.uniform([ 9 ]))) ) ds2 |
Output:
#ERROR
The above code will give an error because the necessary condition of the same dimension at the 0th rank does not meet.
Way of Combining Input Data in from_tensors & .from_tensor_slices
from_tensors method combine smaller dataset to form a large data set but the from_tensor_slices don’t do any such thing. Let’s look at this using the below implementation.
Python3
dataset1 = tf.data.Dataset.from_tensors( [tf.random.uniform([ 2 , 3 ]), tf.random.uniform([ 2 , 3 ])]) print (dataset1) |
Output:
<TensorDataset element_spec=TensorSpec(shape=(2, 2, 3), dtype=tf.float32, name=None)>
From the shape of the above dataset, we can say that it has combined the two data into a single data.
Python3
dataset2 = tf.data.Dataset.from_tensor_slices( [tf.random.uniform([ 2 , 3 ]), tf.random.uniform([ 2 , 3 ])]) print (dataset2) |
Output:
<TensorSliceDataset element_spec=TensorSpec(shape=(2, 3), dtype=tf.float32, name=None)>
Way of Interpreting Input Data in from_tensors and from_tensor_slices
The next difference lies in the way data is being treated by these two functions.
Python3
t1 = tf.constant([[ 1 , 2 ], [ 3 , 4 ]]) ds1 = tf.data.Dataset.from_tensors(t1) [x for x in ds1] |
Output:
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy= array([[1, 2], [3, 4]], dtype=int32)>]
Now let’s look at the from_tensor_slices output for the same data or input matrix.
Python3
t2 = tf.constant([[ 1 , 2 ], [ 3 , 4 ]]) ds2 = tf.data.Dataset.from_tensor_slices(t2) [x for x in ds2] |
Output:
[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>, <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]
From the above output, we can say that from_tensors combines the whole data as a single entity. But the same is not true with from_tensor_slices it creates slices of the input data row-wise. From the above format in which the content of these two datasets has been printed even after the data was the same. We can say that ds1 has shape (2, 2) but in the case of ds2, it is (2, 2, 3).