TensorFlow is an open-source machine-learning framework that has become incredibly popular in the past few years. It is widely used for building and training deep neural networks, as well as for implementing other machine learning algorithms. Estimators, on the other hand, are high-level TensorFlow APIs that can be used to simplify the process of building, training, and evaluating machine learning models.
In TensorFlow, the estimator is a high-level API that simplifies the process of building, training, evaluating, and deploying machine learning models. It provides a simple interface for working with pre-built models or building custom models while abstracting away many of the low-level details of TensorFlow.
The Iris dataset is a popular machine learning dataset that contains measurements of various characteristics of iris flowers, such as the length and width of petals and sepals. The purpose of this dataset is to classify irises into one of three species based on these measurements. The Iris dataset is often used as a reference dataset in machine learning research and is an excellent dataset to explore TensorFlow and estimators.
Here we used the dataset directly online from the UCI machine learning repository so to run this code we need an active Internet connection.
Before we begin, make sure you have TensorFlow, scikit-learn, and pandas installed on your system. You can install them using pip:
pip install tensorflow pip install scikit-learn pip install pandas
Importing the necessary libraries
Python3
import tensorflow as tf import pandas as pd |
Load the iris dataset
Next, let’s load the iris dataset into a pandas DataFrame:
Python3
iris_data = pd.read_csv( 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' , header = None ) iris_data.columns = [ 'sepal_length' , 'sepal_width' , 'petal_length' , 'petal_width' , 'species' ] |
Split the dataset into training and testing sets
The iris dataset contains 150 samples, with 50 samples for each of the three species. We can split the dataset into training and testing sets using the train_test_split function from the scikit-learn library:
Python3
from sklearn.model_selection import train_test_split train_data, test_data, train_labels, test_labels = train_test_split( iris_data[[ 'sepal_length' , 'sepal_width' , 'petal_length' , 'petal_width' ]], iris_data[ 'species' ], test_size = 0.2 ) |
Now that we have split the dataset into training and testing sets.
Define the feature columns
let’s define the feature columns using the tf.feature_column API. Feature columns are used to map raw input data to a format that can be input to a TensorFlow model. In this case, we will define four feature columns for the four input features in the iris dataset:
Python3
feature_columns = [ tf.feature_column.numeric_column( 'sepal_length' ), tf.feature_column.numeric_column( 'sepal_width' ), tf.feature_column.numeric_column( 'petal_length' ), tf.feature_column.numeric_column( 'petal_width' ) ] |
TensorFlow Estimator
Next, let’s create an Estimator object using the DNNClassifier class. This will allow us to create a deep neural network that can classify the iris flowers based on the input features:
Python3
estimator = tf.estimator.DNNClassifier( feature_columns = feature_columns, hidden_units = [ 10 , 10 ], n_classes = 3 , model_dir = 'model' ) |
In this case, we are creating a neural network with two hidden layers, each with 10 nodes. The n_classes parameter is set to 3, since there are three possible classes in the iris dataset. The model_dir parameter specifies the directory where the TensorFlow model will be saved.
Train and test Dataset using Tensorflow Estimators
Now, let’s define the input functions that will feed data into the Estimator. We will define two input functions, one for the training data and one for the testing data:
Python3
train_input_fn = tf.estimator.inputs.pandas_input_fn( x = train_data, y = train_labels, batch_size = 32 , shuffle = True ) test_input_fn = tf.estimator.inputs.pandas_input_fn( x = test_data, y = test_labels, batch_size = 32 , shuffle = False ) |
The pandas_input_fn function is used to create input functions from pandas DataFrames. The batch_size parameter specifies the number of samples that will be fed to the model at once. The shuffle parameter is set to True for the training input function, which will shuffle the training data before feeding it to the model.
Train the Estimator
Now that we have defined the Estimator and the input functions, we can train the model using the train method:
Python3
estimator.train(input_fn = train_input_fn, steps = 1000 ) |
The train method trains the model using the specified input function for the specified number of steps.
Evaluation
Finally, we can evaluate the performance of the model on the testing data using the evaluate method:
Python3
eval_result = estimator.evaluate(input_fn = test_input_fn) print (eval_result) |
Output:
{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}
The evaluate method returns a dictionary containing various performance metrics, such as accuracy, loss. We can print out the evaluation results to see how well our model is performing on the testing data.
Complete code:
Python3
import tensorflow as tf import pandas as pd from sklearn.model_selection import train_test_split # Load the iris dataset iris_data = pd.read_csv( 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' , header = None ) iris_data.columns = [ 'sepal_length' , 'sepal_width' , 'petal_length' , 'petal_width' , 'species' ] # Map string labels to integers label_map = { 'Iris-setosa' : 0 , 'Iris-versicolor' : 1 , 'Iris-virginica' : 2 } iris_data[ 'label' ] = iris_data[ 'species' ]. map (label_map) # Split the dataset into training and testing sets train_data, test_data, train_labels, test_labels = train_test_split( iris_data[[ 'sepal_length' , 'sepal_width' , 'petal_length' , 'petal_width' ]], iris_data[ 'label' ], test_size = 0.2 ) # Define the feature columns feature_columns = [ tf.feature_column.numeric_column( 'sepal_length' ), tf.feature_column.numeric_column( 'sepal_width' ), tf.feature_column.numeric_column( 'petal_length' ), tf.feature_column.numeric_column( 'petal_width' ) ] # Define the Estimator estimator = tf.estimator.DNNClassifier( feature_columns = feature_columns, hidden_units = [ 10 , 10 ], n_classes = 3 , model_dir = 'model' ) # Define the input functions train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn( x = train_data, y = train_labels, batch_size = 32 , shuffle = True ) test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn( x = test_data, y = test_labels, batch_size = 32 , shuffle = False ) # Train the model estimator.train(input_fn = train_input_fn, steps = 1000 ) # Evaluate the model eval_result = estimator.evaluate(input_fn = test_input_fn) print (eval_result) |
Output:
{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}