In this article, we will discuss how to extract only valid date from a specified column of a given Data Frame. The extracted date from the specified column should be in the form of ‘mm-dd-yyyy’.
Approach:
In this article, we have used a regular expression to extract valid date from the specified column of the data frame. Here we used \b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b this regular expression. We’ll be using re.findall() method for this. Now let us try to implement this using Python:
Step 1: Creating Dataframe
Python3
# importing pandas and re library import pandas as pd import re as re # creating data frame with column # name,date_of_birth and age df = pd.DataFrame({ 'Name' : [ 'Akash' , 'Shyam' , 'Ayush' , 'Diksha' , 'Radhika' ], 'date_of_birth' : [ '12/21/1998' , '15/12/1998' , '06/11/2000' , '05/10/1998' , '13/12/2010' ], 'Age' : [ 21 , 12 , 20 , 21 , 10 ]}) # printing the original data frame print ( "Printing the original dataframe" ) df |
Output:
Step 2: Extracting valid date from data frame in the format ‘mm-dd-yyyy’
Python3
# creating function to find whether the # given date is valid or not def checking_valid_dates(dt): # creating regular expression to check # whether date fall in the format # mm-dd-yyyy result = re.findall( r '\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b' , dt) return result # creating new column with valid_date_of_birth df[ 'valid_date_of_birth' ] = df[ 'date_of_birth' ]. apply ( lambda dt: checking_valid_dates(dt)) print ( "\nPrinting the data frame Valid dates in the format: mm-dd-yyyy:" ) df |
Output: