This article was published as a part of the Data Science Blogathon
Introduction
The ongoing global pandemic called COVID-19 is caused by SARS-COV-2, which was first found and identified in Wuhan, China, in December 2019. The lockdown in Wuhan and its neighboring districts failed to contain this virus which led to one of the worst humanitarian crises in the modern world, affecting millions of people all around the globe.
The virus spread quickly and mutated which gave rise to several waves of it, mostly affecting the third world and developing countries. The number of affected people is rising steadily as the world’s governments try to control the spread.

In this article, we will use the CoronaHack- Chest X-Ray Dataset. It contains chest X-Ray images and we have to find the ones that are affected by the coronavirus.
The SARS-COV-2 which we talked previously about is the type of virus that majorly affects the respiratory system, so Chest X-ray is one of the important imaging methods we can use to identify an affected lung. Here’s a side by side comparison:

and Classification in Chest X-ray Images
So as you can see, how COVID-19 pneumonia can engulf the whole lungs and is more dangerous than both Bacterial as well as Viral types of pneumonia. I highly suggest you read the paper Transfer Learning for COVID-19 Pneumonia Detection
and Classification in Chest X-ray Images which I’ve linked in the above image.
In this article, we will use Deep learning and Transfer learning to classify and identify X-Ray Images of lungs affected with Covid-19.
Importing Libraries and Loading Data
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import numpy as np
import pandas as pd
sns.set()
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.applications import DenseNet121, VGG19, ResNet50
import PIL.Image
import matplotlib.pyplot as mpimg
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator, img_to_array
from tensorflow.keras.preprocessing import image
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")
from sklearn.utils import shuffle
train_df = pd.read_csv('../input/coronahack-chest-xraydataset/Chest_xray_Corona_Metadata.csv')
train_df.shape
> (5910, 6)
train_df.head(5)

train_df.info()

Dealing with Missing Values
missing_vals = train_df.isnull().sum() missing_vals.plot(kind = 'bar')

train_df.dropna(how = 'all') train_df.isnull().sum()

train_df.fillna('unknown', inplace=True)
train_df.isnull().sum()

train_data = train_df[train_df['Dataset_type'] == 'TRAIN']
test_data = train_df[train_df['Dataset_type'] == 'TEST']
assert train_data.shape[0] + test_data.shape[0] == train_df.shape[0]
print(f"Shape of train data : {train_data.shape}")
print(f"Shape of test data : {test_data.shape}")
test_data.sample(10)


We will fill the missing values with ‘unknown’.
print((train_df['Label_1_Virus_category']).value_counts())
print('--------------------------')
print((train_df['Label_2_Virus_category']).value_counts())

Thus the Label 2 category contains the COVID-19 cases!
Displaying Images
test_img_dir = '/kaggle/input/coronahack-chest-xraydataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/test' train_img_dir = '/kaggle/input/coronahack-chest-xraydataset/Coronahack-Chest-XRay-Dataset/Coronahack-Chest-XRay-Dataset/train' sample_train_images = list(os.walk(train_img_dir))[0][2][:8] sample_train_images = list(map(lambda x: os.path.join(train_img_dir, x), sample_train_images)) sample_test_images = list(os.walk(test_img_dir))[0][2][:8] sample_test_images = list(map(lambda x: os.path.join(test_img_dir, x), sample_test_images))
plt.figure(figsize = (10,10))
for iterator, filename in enumerate(sample_train_images):
image = PIL.Image.open(filename)
plt.subplot(4,2,iterator+1)
plt.imshow(image, cmap=plt.cm.bone)
plt.tight_layout()

Visualizations
plt.figure(figsize=(15,10)) sns.countplot(train_data['Label_2_Virus_category']);

For COVID-19 cases
fig, ax = plt.subplots(4, 2, figsize=(15, 10))
covid_path = train_data[train_data['Label_2_Virus_category']=='COVID-19']['X_ray_image_name'].values
sample_covid_path = covid_path[:4]
sample_covid_path = list(map(lambda x: os.path.join(train_img_dir, x), sample_covid_path))
for row, file in enumerate(sample_covid_path):
image = plt.imread(file)
ax[row, 0].imshow(image, cmap=plt.cm.bone)
ax[row, 1].hist(image.ravel(), 256, [0,256])
ax[row, 0].axis('off')
if row == 0:
ax[row, 0].set_title('Images')
ax[row, 1].set_title('Histograms')
fig.suptitle('Label 2 Virus Category = COVID-19', size=16)
plt.show()

For Normal cases
fig, ax = plt.subplots(4, 2, figsize=(15, 10))
normal_path = train_data[train_data['Label']=='Normal']['X_ray_image_name'].values
sample_normal_path = normal_path[:4]
sample_normal_path = list(map(lambda x: os.path.join(train_img_dir, x), sample_normal_path))
for row, file in enumerate(sample_normal_path):
image = plt.imread(file)
ax[row, 0].imshow(image, cmap=plt.cm.bone)
ax[row, 1].hist(image.ravel(), 256, [0,256])
ax[row, 0].axis('off')
if row == 0:
ax[row, 0].set_title('Images')
ax[row, 1].set_title('Histograms')
fig.suptitle('Label = NORMAL', size=16)
plt.show()

final_train_data = train_data[(train_data['Label'] == 'Normal') |
((train_data['Label'] == 'Pnemonia') &
(train_data['Label_2_Virus_category'] == 'COVID-19'))]
final_train_data['class'] = final_train_data.Label.apply(lambda x: 'negative' if x=='Normal' else 'positive') test_data['class'] = test_data.Label.apply(lambda x: 'negative' if x=='Normal' else 'positive') final_train_data['target'] = final_train_data.Label.apply(lambda x: 0 if x=='Normal' else 1) test_data['target'] = test_data.Label.apply(lambda x: 0 if x=='Normal' else 1)
final_train_data = final_train_data[['X_ray_image_name', 'class', 'target', 'Label_2_Virus_category']] final_test_data = test_data[['X_ray_image_name', 'class', 'target']]
test_data['Label'].value_counts()

Data Augmentation
datagen = ImageDataGenerator(
shear_range=0.2,
zoom_range=0.2,
)
def read_img(filename, size, path):
img = image.load_img(os.path.join(path, filename), target_size=size)
#convert image to array
img = image.img_to_array(img) / 255
return img
samp_img = read_img(final_train_data['X_ray_image_name'][0],
(255,255),
train_img_path)
plt.figure(figsize=(10,10))
plt.suptitle('Data Augmentation', fontsize=28)
i = 0
for batch in datagen.flow(tf.expand_dims(samp_img,0), batch_size=6):
plt.subplot(3, 3, i+1)
plt.grid(False)
plt.imshow(batch.reshape(255, 255, 3));
if i == 8:
break
i += 1
plt.show();

corona_df = final_train_data[final_train_data['Label_2_Virus_category'] == 'COVID-19']
with_corona_augmented = []
def augment(name):
img = read_img(name, (255,255), train_img_path)
i = 0
for batch in tqdm(datagen.flow(tf.expand_dims(img, 0), batch_size=32)):
with_corona_augmented.append(tf.squeeze(batch).numpy())
if i == 20:
break
i =i+1
corona_df['X_ray_image_name'].apply(augment)
Note: The output was too long to include in the article. Here’s a small part of it.

train_arrays = [] final_train_data['X_ray_image_name'].apply(lambda x: train_arrays.append(read_img(x, (255,255), train_img_dir))) test_arrays = [] final_test_data['X_ray_image_name'].apply(lambda x: test_arrays.append(read_img(x, (255,255), test_img_dir)))
print(len(train_arrays)) print(len(test_arrays))

y_train = np.concatenate((np.int64(final_train_data['target'].values), np.ones(len(with_corona_augmented), dtype=np.int64)))
Converting all data to Tensors
train_tensors = tf.convert_to_tensor(np.concatenate((np.array(train_arrays), np.array(with_corona_augmented)))) test_tensors = tf.convert_to_tensor(np.array(test_arrays)) y_train_tensor = tf.convert_to_tensor(y_train) y_test_tensor = tf.convert_to_tensor(final_test_data['target'].values) train_dataset = tf.data.Dataset.from_tensor_slices((train_tensors, y_train_tensor)) test_dataset = tf.data.Dataset.from_tensor_slices((test_tensors, y_test_tensor))
for i,l in train_dataset.take(1):
plt.imshow(i);

Generating Batches
BATCH_SIZE = 16
BUFFER = 1000
train_batches = train_dataset.shuffle(BUFFER).batch(BATCH_SIZE)
test_batches = test_dataset.batch(BATCH_SIZE)
for i,l in train_batches.take(1):
print('Train Shape per Batch: ',i.shape);
for i,l in test_batches.take(1):
print('Test Shape per Batch: ',i.shape);

Transfer Learning with ResNet50
INPUT_SHAPE = (255,255,3)
base_model = tf.keras.applications.ResNet50(input_shape= INPUT_SHAPE,
include_top=False,
weights='imagenet')
# We set it to False because we don't want to mess with the pretrained weights of the model.
base_model.trainable = False
Now our Transfer Learning is successful!!

for i,l in train_batches.take(1):
pass
base_model(i).shape
> TensorShape([16, 8, 8, 2048])
Adding a dense layer for Image Classification
model = Sequential() model.add(base_model) model.add(Layers.GlobalAveragePooling2D()) model.add(Layers.Dense(128)) model.add(Layers.Dropout(0.2)) model.add(Layers.Dense(1, activation = 'sigmoid')) model.summary()

callbacks = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)
model.compile(optimizer='adam',
loss = 'binary_crossentropy',
metrics=['accuracy'])
Prediction
model.fit(train_batches, epochs=10, validation_data=test_batches, callbacks=[callbacks])

pred = model.predict_classes(np.array(test_arrays))
# classification report from sklearn.metrics import classification_report, confusion_matrix print(classification_report(test_data['target'], pred.flatten()))

So as you can see the prediction is not half bad. We will plot a confusion matrix to visualize the performance of our model:
con_mat = confusion_matrix(test_data['target'], pred.flatten())
plt.figure(figsize = (10,10))
plt.title('CONFUSION MATRIX')
sns.heatmap(con_mat, cmap='cividis',
yticklabels=['Negative', 'Positive'],
xticklabels=['Negative', 'Positive'],
annot=True);

End Notes
This dataset was an interesting one, and the more I am learning Data Science and Machine Learning, the more I find this subject interesting. There’s soo many ways we can use data nowadays and using it can save countless lives. Thank you for reading this article. You can read more of my content at:
