Thursday, December 26, 2024
Google search engine
HomeLanguagesHow to Sort a Pandas DataFrame by Date?

How to Sort a Pandas DataFrame by Date?

In the real world, we can come across datasets of any form that may include the date inside them too. These datasets can be present in any file format like .CSV, .xlsx, .txt, etc. To load this data inside Python, we use a library named Pandas which provides us a plethora of functions and methods to play around with this data. Pandas read these datasets in the form of DataFrames.

Sometimes, there might be a situation where the dataset may contain attributes related to date, and we want to sort the records inside the dataframe as per the date values in a specific order.

In this article, we will learn about how can we sort Pandas DataFrame by the Date. I’ll be creating a custom dataframe object imitating a real-world problem and this method will work universally for any DataFrame.

Sorting by Single Column

To sort a DataFrame as per the column containing date we’ll be following a series of steps, so let’s learn along.

Step 1: Load or create dataframe having a date column

Python




# importing package
import pandas as pd
 
# Creating a dataframe that stores records of students taking admission in a college
data = pd.DataFrame({'AdmissionDate': ['2021-01-25','2021-01-22','2021-01-20',
                        '2021-01-18','2021-01-22','2021-01-17','2021-01-21'],
                     'StudentID': [7,5,3,2,6,1,4],
                     'Name': ['Ram','Shyam','Mohan','Sohan','Lucky','Abhinav','Danny'],
                     'Stream':['CSE','ECE','Civil','Mechanical','CSE','IT','EEE']
                   })
# Checking dataframe
print(data)


 
 

Output:

 

Here, it can be clearly seen that our DataFrame contains a column named ‘AdmissionDate’ which contains date values.

Step 2: Converting string data type into datetime object.

 When we read the dataset the values stored inside the  ‘AdmissionDate’  column are treated as string data type by default. So, we need to convert this string object to datetime object, for that we will be using the .to_datetime() method provided by Pandas that can be done as: 

Python




# checking datatype
print(type(data.AdmissionDate[0]))
 
# convert to date
data['AdmissionDate'] = pd.to_datetime(data['AdmissionDate'])
 
# verify datatype
print(type(data.AdmissionDate[0]))


Output:

<class 'str'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

 Step 3: Sorting the DataFrame as per date

We will be using the sort_values() method to sort our dataset and the attribute that we will pass inside the function is the column name using which we want to sort our DataFrame.

Python




data.sort_values(by='AdmissionDate')
print(data)


Output:

 

One thing to notice here is our DataFrame gets sorted in ascending order of dates, to sort the DataFrame in descending order we can pass an additional parameter inside the sort_values() function that will set ascending value to False and will return the DataFrame in descending order.

Python




data.sort_values(by='AdmissionDate',ascending=False)
print(data)


 
 Output:

Sorting by Multiple Columns as per date

We can further extend our understanding for sorting multiple datetime columns as well, in this, we maintain a priority order to sort our DataFrame. Let’s have a look.

 

Step 1: Load or create dataframe having multiple date columns 

Python




# importing package
import pandas as pd
 
# Creating a DataFrame
data_1 = pd.DataFrame({'Mfg. Date': ['2021-01-25','2021-01-22','2021-01-20','2021-01-18',
                       '2021-01-22','2021-01-17','2021-01-21'],
                       'ProductID': [7,5,3,2,6,1,4],
                       'Product Name': ['Paracetamol','Moov','Volini','Crocin',
                                        'Aciloc','Iodex','Combiflam'],
                       'Expiry Date':['2022-01-25','2023-01-22','2021-05-20','2022-03-18',
                                      '2022-01-22','2021-05-17','2022-01-30']
                       })
 
# Checking dataframe
print(data_1)


 
Output:

 

Here, it can be clearly seen that our DataFrame contains two columns having dates namely ‘Mfg. Date’ and ‘Expiry Date ‘.

Step 2: Converting string data type into datetime object.

Python




data_1[['Mfg. Date','Expiry Date']] = data_1[['Mfg. Date','Expiry Date']].apply(pd.to_datetime)


 
 Step 3: Sorting the DataFrame as per date 

Python




# sorting DataFrame by giving first priority to Expiry Date and then Mfg. Date
data_1.sort_values(by=['Expiry Date', 'Mfg. Date'])


 
 Output:

 Here, we got a sorted list in ascending order of Expiry Date as per our priority.

 

RELATED ARTICLES

Most Popular

Recent Comments