Introduction
In today’s AI-driven landscape, we’ve grown to rely on data-driven choices, making the lives of working professionals more straightforward. From managing supply chains to approving loans, data is the linchpin. The magic of data science extends to healthcare, offering promising outcomes. This article unfolds the human side of ‘Building a Medication Prediction Model from Medical Data Analysis.’ It explores the transformative journey where data science meets modern medicine, unveiling patterns that have the potential to revolutionize healthcare. Join us as we delve into a realm where insights and innovation converge, shaping a future prioritizing personalized treatments and compassionate care.
Learning Objectives
In this article, we’ll explore how to analyze a medical dataset to create a model that predicts which medications a patient should take when faced with a specific diagnosis. It sounds intriguing, so let’s dive right in!
This article was published as a part of the Data Science Blogathon.
Table of contents
- Dataset
- Loading Libraries and Data set
- Basic Statistics
- Data Preprocessing
- Looking at Some Stats
- Univariate EDA Route and Frequency Column
- Univariate EDA Diagnosis Column
- Univariate EDA Indication Column
- Univariate EDA Name of Drug column
- Bivariate Analysis Indication vs Name of Drug
- Bivariate Analysis Indication vs Age
- Bivariate Analysis Name of Drug vs Age
- Modeling Approach
- Feature Engineering
- Dataset Creation
- Modelling
Dataset
We will be downloading and using the Open Source dataset from kaggle:
The dataset contains
- Age and gender of the patient
- Diagnosis of the patient
- Antibiotics used to treat patient
- Dosage of the antibiotics in grams
- Route of application of antibiotics
- Frequency of usage of antibiotics
- Duration of treatment using antibiotics in days
- Indiction of antibiotics
Loading Libraries and Data set
Here, we shall import relevant libraries for the required for our exercise. Then we shall load the dataset into a dataframe and view few rows.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.pyplot as plt
import collections
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix,classification_report
df = pd.read_csv("/kaggle/input/hospital-antibiotics-usage/Hopsital Dataset.csv")
print(df.shape)
df.head()
Output:
The column names are pretty intuitive and they do make sense. Let us examine by looking at a few statistics.
Basic Statistics
# "Let's look at some stats"
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 833 entries, 0 to 832
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 833 non-null object
1 Date of Data Entry 833 non-null object
2 Gender 833 non-null object
3 Diagnosis 833 non-null object
4 Name of Drug 833 non-null object
5 Dosage (gram) 833 non-null object
6 Route 833 non-null object
7 Frequency 833 non-null object
8 Duration (days) 833 non-null object
9 Indication 832 non-null object
dtypes: object(10)
memory usage: 65.2+ KB
Observations:
- Every column is a string column
- Let’s convert column Age, Dosage(gram), Duration(days) to numeric.
- Let’s also convert Date of Data Entry into date time.
- Indication has one null value, all other column don’t have null values.
Data Preprocessing
Let’s clean some columns. In the previous step, we saw that all columns are integers. So, for starters, we will convert Age, Dosage and Duration to numeric values. Similarly, we will convert the Date of Data Entry into datetime type. Instead of directly converting them, we will create new columns i.e. we will create a Age2 column that will be a numeric version of Age column and so on.
df['Age2'] = pd.to_numeric(df['Age'],errors='coerce')
df['Dosage (gram)2'] = pd.to_numeric(df['Dosage (gram)'],errors='coerce')
df['Duration (days)2'] = pd.to_numeric(df['Duration (days)'],errors='coerce')
df['Date of Data Entry2'] = pd.to_datetime(df['Date of Data Entry'],errors='coerce')
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 833 entries, 0 to 832
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 833 non-null object
1 Date of Data Entry 833 non-null object
2 Gender 833 non-null object
3 Diagnosis 833 non-null object
4 Name of Drug 833 non-null object
5 Dosage (gram) 833 non-null object
6 Route 833 non-null object
7 Frequency 833 non-null object
8 Duration (days) 833 non-null object
9 Indication 832 non-null object
10 Age2 832 non-null float64
11 Dosage (gram)2 831 non-null float64
12 Duration (days)2 831 non-null float64
13 Date of Data Entry2 831 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(10)
memory usage: 91.2+ KB
After converting, we see some null values. Let us look at that data.
df[(df['Dosage (gram)2'].isnull())
| (df['Duration (days)2'].isnull())
| (df['Age2'].isnull())
| (df['Date of Data Entry2'].isnull())
]#import csv
Output:
Seems like some garbage values in dataset. Let’s remove them.
Now, we will also substitute the values of new column in the older columns and drop the newly created columns
df = df[~((df['Dosage (gram)2'].isnull())
| (df['Duration (days)2'].isnull())
| (df['Age2'].isnull())
| (df['Date of Data Entry2'].isnull()))
]
df['Age'] = df['Age2'].astype('int')
df['Dosage (gram)'] = df['Dosage (gram)2']
df['Date of Data Entry'] = df['Date of Data Entry2']
df['Duration (days)'] = df['Duration (days)2'].astype('int')
df = df.drop(['Age2','Dosage (gram)2','Date of Data Entry2','Duration (days)2'],axis=1)
print(df.shape)
df.head()
Output:
Looking at Some Stats
Now, we shall look at some statistics across all columns. We will use the describe function and pass the include argument to get the values statistics across all columns. Let us examine that
df.describe(include='all')
Output:
Observations:
- We can see that the age is 1, and the max is 90.
- Similarly, looking at the Data Entry column – we know that it contains data for 19-Dec-2019 between 1 and 7 pm.
- Gender has 2 unique values.
- Dosage has a minimum value of 0.02 grams and a max weight of 960 grams (seems extreme).
- Duration has a value of 1 and a max value of 28 days.
- Diagnosis, Name of Drug, Route, Frequency, and Indication columns have different values. We will examine them.
Univariate EDA Route and Frequency Column
Here, we will try to examine the Route and Frequency column. We will use the value_counts() function.
display(df['Route'].value_counts())
print()
df['Frequency'].value_counts()
Output:
Route
IV 534
Oral 293
IM 4
Name: count, dtype: int64
Frequency
BD 430
TDS 283
OD 110
QID 8
Name: count, dtype: int64
Observations:
- There are 3 different routes of Drug treatment. The most used Route of Drug treatment is IV, and the least used is IM.
- There are 4 different Frequency of Drug treatment. BD is the most used Frequency of Drug treatment, and the least used is QID.
Univariate EDA Diagnosis Column
Here, we will examine the Diagnosis column. We will use the value_counts() function.
df['Diagnosis'].value_counts()
Output:
Upon running the above code, we see that there are 263 different values. Also, we can see that each patient can have multiple diagnosis (separated by comma). So, let us try to build a wordcloud to make more sense of this column. We will try to remove stopwords from the wordcloud to filter out unnecessary noise.
text = " ".join(diagnosis for diagnosis in adf.Diagnosis)
print ("There are {} words in the combination of all diagnosis.".format(len(text)))
stopwords = set(STOPWORDS)
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Output:
There are 35359 words in the combination of all diagnosis.
Observations:
- There are 35359 words in the combination of all diagnosis. A lot of them will be repetitive.
- Looking at the wordcloud, Chest infection and koch lung seem to be more frequent diagnosis.
- We can see other conditions like – diabetes, myeloma, etc.
Let’s look at the top and bottom 10 words/phrases. We will use the Counter function within the collections library.
A = collections.Counter([i.strip().lower()
for i in text.split(',') if (i.strip().lower()) not in stopwords ])
print('Top 10 words/phrases')
display(A.most_common(10))
print('\nBottom 10 words/phrases')
display(A.most_common()[-11:-1])
Output:
Top 10 words/phrases
[('col', 77),
('chest infection', 68),
('ihd', 55),
('copd', 40),
('hypertension', 38),
('ccf', 36),
('type 2 dm', 32),
("koch's lung", 28),
('ckd', 28),
('uraemic gastritis', 18)]
Bottom 10 words/phrases
[('poorly controlled dm complicated uti', 1),
('poorly controlled dm septic shock', 1),
('hypertension hepatitis', 1),
('uti hepatitis', 1),
('uti uti', 1),
('cff ppt by chest infection early col', 1),
('operated spinal cord', 1),
('hematoma and paraplegia he', 1),
('rvi with col glandular fever', 1),
('fever with confusion ccf', 1)]
Observations:
- Among the top 10 words/phrases, col and chest infections are more frequent. 77 patients are diagnosed with col, and 68 are diagnosed with chest infection.
- Among the bottom 10 words/phrases, we see that each of the listed ones occurs only once.
Univariate EDA Indication Column
Here, we will examine the Indication column. We will use the value_counts() function.
display(df['Indication'].value_counts())
print()
df['Indication'].value_counts(1)
Output:
Indication
chest infection 92
col 32
uti 30
type 2 dm 25
prevention of infection 22
..
pad(lt u.l) 1
old stroke 1
fainting attack 1
cheat infection 1
centepede bite 1
Name: count, Length: 220, dtype: int64
Indication
chest infection 0.110843
col 0.038554
uti 0.036145
type 2 dm 0.030120
prevention of infection 0.026506
...
pad(lt u.l) 0.001205
old stroke 0.001205
fainting attack 0.001205
cheat infection 0.001205
centepede bite 0.001205
Name: proportion, Length: 220, dtype: float64
We see that there are 220 values of this Indication column. Seems like some spelling errors are prevalent as well – chest vs cheat. We can clean the same. For current exercise – let us consider top 25% indications.
top_indications = df['Indication'].value_counts(1).reset_index()
top_indications['cum_proportion'] = top_indications['proportion'].cumsum()
top_indications = top_indications[top_indications['cum_proportion']<0.25]
top_indications
Output:
Indication | proportion | cum_proportion | |
---|---|---|---|
0 | chest infection | 0.110843 | 0.110843 |
1 | col | 0.038554 | 0.149398 |
2 | uti | 0.036145 | 0.185542 |
3 | type 2 dm | 0.030120 | 0.215663 |
4 | prevention of infection | 0.026506 | 0.242169 |
We will use this dataframe in the bivariate analysis and modeling exercise later.
Univariate EDA Name of Drug column
Here, we will examine this column. We will use the value_counts() function.
display(df['Name of Drug'].nunique())
display(df['Name of Drug'].value_counts())
Output:
55
Name of Drug
ceftriaxone 221
co-amoxiclav 162
metronidazole 59
cefixime 58
septrin 37
clarithromycin 32
levofloxacin 31
amoxicillin+flucloxacillin 29
ceftazidime 24
cefepime 14
cefipime 13
clindamycin 12
rifaximin 10
amikacin 9
cefoperazone 9
coamoxiclav 9
meropenem 8
ciprofloxacin 7
gentamicin 5
pen v 5
rifampicin 5
azithromycin 5
cifran 4
mirox 4
amoxicillin 4
streptomycin 4
ceftazidine 4
clarthromycin 4
amoxicillin+flucoxacillin 3
cefoparazone+sulbactam 3
linezolid 3
ofloxacin 3
norfloxacin 3
imipenem 2
flucloxacillin 2
ceftiaxone 2
cefaziclime 2
ceftriaxone+sulbactam 2
cefexime 2
pipercillin+tazobactam 1
amoxicillin+flucloxiacillin 1
pentoxifylline 1
menopem 1
levefloxacin 1
pentoxyfylline 1
doxycyclin 1
amoxicillin+flucoxiacillin 1
vancomycin 1
cefteiaxone 1
dazolic 1
amoxicillin+flucloaxcin 1
amoxiclav 1
doxycycline 1
cefoperazone+sulbactam 1
nitrofurantoin 1
Observations:
- There are 55 unique drugs.
- Again, we see spelling errors – ceftriaxone vs ceftiaxone. The unique drug count may be lower.
Let us consider the top 5 drugs here. We will create a column cum_proportion to store the cumulative proportion that each drug is contributing.
top_drugs = (df['Name of Drug'].value_counts(1).reset_index())
top_drugs['cum_proportion'] = top_drugs['proportion'].cumsum()
top_drugs = top_drugs.head()
top_drugs
Output:
Name of Drug | proportion | cum_proportion | |
---|---|---|---|
0 | ceftriaxone | 0.265945 | 0.265945 |
1 | co-amoxiclav | 0.194946 | 0.460890 |
2 | metronidazole | 0.070999 | 0.531889 |
3 | cefixime | 0.069795 | 0.601685 |
4 | septrin | 0.044525 | 0.646209 |
The top 5 drugs are given to 64.6% of the patients.
PS : If we correct the spelling errors – this could be more than 64.6%.
We will use this dataframe in the bivariate analysis and modelling exercise later.
Bivariate Analysis Indication vs Name of Drug
Here, we will consider the top_indications and the top_drugs dataframes we created and try to see the distribution across them i.e. we compare top 5 drug name vs top 25% Indication. We will use pivot_table() function.
(df[(df['Indication'].isin(top_indications['Indication']))
&(df['Name of Drug'].isin(top_drugs['Name of Drug']))]
.pivot_table(index='Indication',columns='Name of Drug',values='Age',aggfunc='count')
)
Output:
Name of Drug | cefixime | ceftriaxone | co-amoxiclav | metronidazole | septrin |
---|---|---|---|---|---|
Indication | |||||
chest infection | 7.0 | 22.0 | 27.0 | 3.0 | 1.0 |
col | 1.0 | 14.0 | 3.0 | 3.0 | 1.0 |
prevention of infection | 2.0 | 6.0 | 3.0 | 2.0 | 1.0 |
type 2 dm | 2.0 | 7.0 | 4.0 | 1.0 | NaN |
uti | 3.0 | 9.0 | 2.0 | NaN | NaN |
Observations:
- for a chest infection, co-amoxiclav is the most recommended medicine, followed by ceftriaxone.
- Similar observations can be drawn for other indications.
Note: These medicines are prescribed under several other factors – like the Age, Gender, Diagnosis, and Medical history of the patient.
Bivariate Analysis Indication vs Age
Here, we will try to understand if a few conditions appear for older patients. We will consider the top_indications and examine the mean and median values of age.
(df[df['Indication'].isin(top_indications['Indication'])]
.groupby('Indication')['Age']
.agg(['mean','median','count'])
)
Output:
mean | median | count | |
---|---|---|---|
Indication | |||
chest infection | 57.173913 | 61.5 | 92 |
col | 48.031250 | 48.0 | 32 |
prevention of infection | 42.136364 | 46.0 | 22 |
type 2 dm | 63.560000 | 62.0 | 25 |
uti | 50.233333 | 53.5 | 30 |
Observations:
- Indications – col and prevention of infection are more observed in younger patients.
- uti is observed in mid-age patients
- Indications: chest infection and type 2 dm are more observed in older patients.
Bivariate Analysis Name of Drug vs Age
Here, we will try to understand if specific drugs are used for older patients. We will consider the top_drugs and examine the mean and median values of age.
(df[df['Name of Drug'].isin(top_drugs['Name of Drug'])]
.groupby('Name of Drug')['Age']
.agg(['mean','median','count']).sort_values(by='median')
)
Output:
mean | median | count | |
---|---|---|---|
Name of Drug | |||
septrin | 44.513514 | 40.0 | 37 |
cefixime | 43.137931 | 42.0 | 58 |
ceftriaxone | 50.484163 | 49.0 | 221 |
metronidazole | 53.661017 | 54.0 | 59 |
co-amoxiclav | 56.518519 | 60.0 | 162 |
Observations:
- septrin and cefixime is prescribed to younger patients
- ceftriaxone is prescribed to mid-age patients
- metronidazole and co-amoxiclav is prescribed to older patients
Modeling Approach
Here, we will try to help the Pharmacist or the Prescribing Doctor. The problem statement is to identify which Drug will be given to the patient basis the Diagnosis, Age and Gender column.
Few considerations and assumptions:
- For this exercise – we will consider only the top 5 drugs and mark the rest as Others.
- So, for each value of (Diagnosis, Age, Gender) – the objective is to identify the drug that will be recommended. The baseline model is (1/6) = 16.67% accuracy.
Let us see if we can try to beat that. We will create a copy of the dataframe for the modelling exercise.
adf = df.copy()
adf['Output'] = np.where(df['Name of Drug'].isin(top_drugs['Name of Drug']),
df['Name of Drug'],'Other')
adf['Output'].value_counts()
Output:
Output
Other 294
ceftriaxone 221
co-amoxiclav 162
metronidazole 59
cefixime 58
septrin 37
Name: count, dtype: int64
We have seen this distribution earlier as well. It means that the classes are not evenly distributed. We will need to keep this in mind while initializing the model.
Feature Engineering
We will try to capture more frequent words/ngrams in the diagnosis column. We will use the Count Vectorizer module.
vectorizer = CountVectorizer(max_features=150,stop_words='english',
ngram_range=(1,3))
X = vectorizer.fit_transform(adf['Diagnosis'].str.lower())
vectorizer.get_feature_names_out()
Output:
array(['abscess', 'acute', 'acute ge', 'af', 'aki', 'aki ckd',
'aki ckd retro', 'alcoholic', 'anaemia', 'art', 'bite', 'bleeding',
'bone', 'ca', 'cap', 'ccf', 'ccf chest', 'ccf chest infection',
'ccf increased', 'ccf increased lft', 'ccf koch', 'ccf koch lung',
'cerebral', 'cerebral infarct', 'chest', 'chest infection',
'chest infection pre', 'chronic', 'ckd', 'ckd chest infection',
'ckd retro', 'col', 'col portal', 'col portal hypertension',
'copd', 'copd chest', 'copd chest infection', 'debility',
'debility excessive', 'debility excessive vomitting', 'diabetes',
'disease', 'disease renal', 'disease renal impairment', 'dm',
'dm ihd', 'edema', 'effusion', 'encephalopathy', 'excessive',
'excessive vomitting', 'excessive vomitting uraemic', 'failure',
'fever', 'gastritis', 'gastritis hcv', 'gastritis hcv aki', 'ge',
'general', 'general debility', 'general debility excessive',
'gi bleeding', 'hcv', 'hcv aki', 'hcv aki ckd', 'hcv col', 'heart',
'hepatic', 'hepatic encephalopathy', 'hepatitis', 'ht',
'ht disease', 'ht disease renal', 'hypertension', 'ihd',
'impairment', 'impairment koch', 'impairment koch lungs',
'increased', 'increased lft', 'infarct', 'infection',
'infection pre', 'infection pre diabetes', 'koch', 'koch lung',
'koch lung copd', 'koch lungs', 'koch lungs ccf', 'left',
'left sided', 'leg', 'lft', 'lung', 'lung copd', 'lung copd chest',
'lungs', 'lungs ccf', 'lungs ccf increased', 'marrow', 'multiple',
'multiple myeloma', 'multiple myeloma ckd', 'myeloma',
'myeloma ckd', 'newly', 'old', 'pleural', 'pleural effusion',
'pneumonia', 'portal', 'portal hypertension', 'pre',
'pre diabetes', 'pulmonary', 'pulmonary edema', 'renal',
'renal impairment', 'renal impairment koch', 'retro', 'right',
'rvi', 'rvi stage', 'rvi stage ht', 'septic', 'septic shock',
'severe', 'severe anaemia', 'severe anaemia multiple', 'shock',
'sided', 'snake', 'snake bite', 'stage', 'stage ht',
'stage ht disease', 'stroke', 'tb', 'type', 'type dm',
'type dm ihd', 'uraemic', 'uraemic gastritis',
'uraemic gastritis hcv', 'uti', 'uti type', 'uti type dm',
'vomitting', 'vomitting uraemic', 'vomitting uraemic gastritis'],
dtype=object)
The above list showcases the top 150 ngrams observed in the diagnosis column after removing the stopwords.
Dataset Creation
Here, we will create a single dataframe containing the features we just made and the age and gender column as input. We will use Label Encoder to convert the Drug Names into numeric values.
feature_df = pd.DataFrame(X.toarray(),columns=vectorizer.get_feature_names_out())
feature_df['Age'] = adf['Age'].fillna(0).astype('int')
feature_df['Gender_Male'] = np.where(adf['Gender']=='Male',1,0)
le = LabelEncoder()
feature_df['Output'] = le.fit_transform(adf['Output'])
Now, we will do a train test split. We will keep 20% of the data as the test set. We will use the random_state argument to ensure reproducibility.
X_train, X_test, y_train, y_test = train_test_split(
feature_df.drop('Output',axis=1).fillna(-1),
feature_df['Output'],
test_size=0.2, random_state=42)
Modelling
Here, I have tried using the Random Forest model. You can try with other models as well. We will use the random_state argument to ensure reproducibility. We will use the class_weight parameter since we had seen earlier that the classes are not evenly distributed.
clf = RandomForestClassifier(max_depth=6, random_state=0,
class_weight='balanced')
clf.fit(X_train, y_train)
Let us see the accuracy and other metrics for the training dataset,
# accuracy on X_train data
final_accuracy = clf.score(X_train, y_train)
print("final_accuracy is : ",final_accuracy)
# creating a confusion matrix for determining and visualizing the accuracy score
clf_predict = clf.predict(X_train)
print(classification_report(y_train, clf_predict,
target_names=list(mapping_df['Actual_Name'])))
Output:
final_accuracy is : 0.411144578313253
precision recall f1-score support
Other 0.84 0.30 0.44 236
cefixime 0.16 0.86 0.27 49
ceftriaxone 0.74 0.26 0.39 176
co-amoxiclav 0.48 0.47 0.48 127
metronidazole 0.39 0.59 0.47 49
septrin 0.45 0.93 0.61 27
accuracy 0.41 664
macro avg 0.51 0.57 0.44 664
weighted avg 0.64 0.41 0.43 664
Similarly, let us see accuracy and other metrics for the test dataset.
# accuracy on X_test data
final_accuracy = clf.score(X_test, y_test)
print("final_accuracy is : ",final_accuracy)
# creating a confusion matrix for determining and visualizing the accuracy score
clf_predict = clf.predict(X_test)
print(classification_report(y_test, clf_predict,
target_names=list(mapping_df['Actual_Name'])))
Output:
final_accuracy is : 0.38323353293413176
precision recall f1-score support
Other 0.71 0.38 0.49 58
cefixime 0.08 0.56 0.14 9
ceftriaxone 0.36 0.09 0.14 45
co-amoxiclav 0.64 0.71 0.68 35
metronidazole 0.31 0.40 0.35 10
septrin 0.44 0.40 0.42 10
accuracy 0.38 167
macro avg 0.42 0.42 0.37 167
weighted avg 0.53 0.38 0.41 167
Key Observations:
- The model is 41.11% accurate on train data and 38.32% on test data.
- If we look at the f1-score, we see that for Other drug names – the training dataset had a value of 0.44, and the test dataset had a value of 0.49
- We can also see that the test f1-score is lower for cefixime and ceftriaxone. While cefixime has only 9 samples, ceftriaxone has 45 samples. So, we need to analyze these data points to understand the scope of improving our feature sets.
Conclusion
In this article, we analyzed a medical dataset end to end. Then, we cleaned the dataset. We saw basic statistics, distribution, and even a wordcloud to understand the columns. Then, we created a problem statement to help the Pharmacist or the Prescribing Doctor with the modern medicine. It was to identify which Drug would be given to the patient basis the Diagnosis, Age, and Gender column.
Key Takeaways
- We considered the top 150 words/bigrams/trigrams from the Diagnosis column for the modeling exercise. Let’s say we dissected the diagnosis column and created 150 features.
- We had the Age and Gender feature as well. So, in total, we had 152 features.
- Using these 152 features, we tried to determine which medicine/drug would be prescribed.
- The baseline/random model was 16.67% accurate. Our model was 41.11% accurate on train data and 38.32% on test data. That is a significant improvement over the baseline model.
Some things to try out to improve the model performance:
- Try TF-IFD or embeddings to extract features.
- Try different models and use grid search and cross-validation to optimize accuracy.
- Try predicting more medicines.
Thanks for reading my article. Feel free to connect with me on LinkedIn to discuss this.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.