Predict Fuel Efficiency Using Tensorflow in Python

23 July 2024

2

In this article, we will learn how can we build a fuel efficiency predicting model by using TensorFlow API. The dataset we will be using contain features like the distance engine has traveled, the number of cylinders in the car, and other relevant feature.

Importing Libraries

Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
Matplotlib – This library is used to draw visualizations.
Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
OpenCV – This is an open-source library mainly focused on image processing and handling.
Tensorflow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
 
import tensorflow as tf
from tensorflow import keras
from keras import layers
 
import warnings
warnings.filterwarnings('ignore')

The dataset can be downloaded from here.

Python3

df = pd.read_csv('auto-mpg.csv')
df.head()

Output:

Let’s check the shape of the data.

Python3

df.shape

Output:

(398, 9)

Now, check the datatypes of the columns.

Python3

df.info()

Output:

Here we can observe one discrepancy the horsepower is given in the object datatype whereas it should be in the numeric datatype.

Python3

df.describe()

Output:

Exploratory Data Analysis

As per the df.info() part first we will deal with the horsepower column and then we will move toward the analysis part.

Python3

df['horsepower'].unique()

Output:

Here we can observe that instead of the null they have been replaced by the string ‘?’ due to this, the data of this column has been provided in the object datatype.

Python3

print(df.shape)
df = df[df['horsepower'] != '?']
print(df.shape)

Output:

(398, 9)
(392, 9)

So, there were 6 such rows with a question mark.

Python3

df['horsepower'] = df['horsepower'].astype(int)
df.isnull().sum()

Output:

mpg             0
cylinders       0
displacement    0
horsepower      0
weight          0
acceleration    0
model year      0
origin          0
car name        0
dtype: int64

Python3

df.nunique()

Output:

mpg             127
cylinders         5
displacement     81
horsepower       93
weight          346
acceleration     95
model year       13
origin            3
car name        301
dtype: int64

Python3

plt.subplots(figsize=(15, 5))
for i, col in enumerate(['cylinders', 'origin']):
    plt.subplot(1, 2, i+1)
    x = df.groupby(col).mean()['mpg']
    x.plot.bar()
    plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

Output:

Here we can observe that the mpg values are highest for the origin 3.

Python3

plt.figure(figsize=(8, 8))
sb.heatmap(df.corr() > 0.9,
           annot=True,
           cbar=False)
plt.show()

Output:

If we will remove the displacement feature then the problem of high collinearity will be removed.

Python3

df.drop('displacement',
        axis=1,
        inplace=True)

Data Input Pipeline

Python3

from sklearn.model_selection import train_test_split
features = df.drop(['mpg', 'car name'], axis=1)
target = df['mpg'].values
 
X_train, X_val, \
    Y_train, Y_val = train_test_split(features, target,
                                      test_size=0.2,
                                      random_state=22)
X_train.shape, X_val.shape

Output:

((313, 6), (79, 6))

Python3

AUTO = tf.data.experimental.AUTOTUNE
 
train_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_train, Y_train))
    .batch(32)
    .prefetch(AUTO)
)
 
val_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_val, Y_val))
    .batch(32)
    .prefetch(AUTO)
)

Model Architecture

We will implement a model using the Sequential API of Keras which will contain the following parts:

We will have two fully connected layers.
We have included some BatchNormalization layers to enable stable and fast training and a Dropout layer before the final layer to avoid any possibility of overfitting.
The final layer is the output layer.

Python3

model = keras.Sequential([
    layers.Dense(256, activation='relu', input_shape=[6]),
    layers.BatchNormalization(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(1, activation='relu')
])

While compiling a model we provide these three essential parameters:

optimizer – This is the method that helps to optimize the cost function by using gradient descent.
loss – The loss function by which we monitor whether the model is improving with training or not.
metrics – This helps to evaluate the model by predicting the training and the validation data.

Python3

model.compile(
    loss='mae',
    optimizer='adam',
    metrics=['mape']
)

Let’s print the summary of the model’s architecture:

Python3

model.summary()

Output:

Model Training

Now we will train our model using the training and validation pipeline.

Python3

history = model.fit(train_ds,
                    epochs=50,
                    validation_data=val_ds)

Output:

Epoch 45/50
10/10 [==============================] - 0s 14ms/step - loss: 2.8792 - mape: 12.5425 - val_loss: 5.3991 - val_mape: 28.6586
Epoch 46/50
10/10 [==============================] - 0s 8ms/step - loss: 2.9184 - mape: 12.7887 - val_loss: 4.1896 - val_mape: 21.4064
Epoch 47/50
10/10 [==============================] - 0s 9ms/step - loss: 2.8153 - mape: 12.3451 - val_loss: 4.3392 - val_mape: 22.3319
Epoch 48/50
10/10 [==============================] - 0s 9ms/step - loss: 2.7146 - mape: 11.7684 - val_loss: 3.6178 - val_mape: 17.7676
Epoch 49/50
10/10 [==============================] - 0s 10ms/step - loss: 2.7631 - mape: 12.1744 - val_loss: 6.4673 - val_mape: 33.2410
Epoch 50/50
10/10 [==============================] - 0s 10ms/step - loss: 2.6819 - mape: 11.8024 - val_loss: 6.0304 - val_mape: 31.6198

Python3

history_df = pd.DataFrame(history.history)
history_df.head()

Output:

Python3

history_df.loc[:, ['loss', 'val_loss']].plot()
history_df.loc[:, ['mape', 'val_mape']].plot()
plt.show()

Output:

The training error has gone down smoothly but the case with the validation is somewhat different.

Predict Fuel Efficiency Using Tensorflow in Python

Importing Libraries

Python3

Python3

Python3

Python3

Python3

Exploratory Data Analysis

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Data Input Pipeline

Python3

Python3

Model Architecture

Python3

Python3

Python3

Model Training

Python3

Python3

Python3

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US