Pandas was created with regards to financial modeling, so as you may expect, it contains a genuinely ample number of tools for working with dates and times. Sometimes the given format of the date and time in our dataset cannot be directly used for analysis, so we pre-process these time values to obtain features like date, month, year, hours, minutes and seconds. Let’s discuss all the different ways to process date and time with Pandas dataframe. Divide date and time into multiple features: Create five dates and time using pd.date_range which generate sequences of fixed-frequency dates and time spans. Then we use pandas.Series.dt to extract the features.
Python3
# Load library import pandas as pd # calling DataFrame constructor df = pd.DataFrame() # Create 6 dates df[ 'time' ] = pd.date_range( '2/5/2019' , periods = 6 , freq = '2H' ) print (df[ 'time' ]) # print dataframe # Extract features - year, month, day, hour, and minute df[ 'year' ] = df[ 'time' ].dt.year df[ 'month' ] = df[ 'time' ].dt.month df[ 'day' ] = df[ 'time' ].dt.day df[ 'hour' ] = df[ 'time' ].dt.hour df[ 'minute' ] = df[ 'time' ].dt.minute # Show six rows df.head( 6 ) |
Output:
0 2019-02-05 00:00:00 1 2019-02-05 02:00:00 2 2019-02-05 04:00:00 3 2019-02-05 06:00:00 4 2019-02-05 08:00:00 5 2019-02-05 10:00:00 Name: time, dtype: datetime64[ns] time year month day hour minute 0 2019-02-05 00:00:00 2019 2 5 0 0 1 2019-02-05 02:00:00 2019 2 5 2 0 2 2019-02-05 04:00:00 2019 2 5 4 0 3 2019-02-05 06:00:00 2019 2 5 6 0 4 2019-02-05 08:00:00 2019 2 5 8 0 5 2019-02-05 10:00:00 2019 2 5 10 0
Convert strings to Timestamps: We convert the given strings to datetime format using pd.to_datetime and then we can extract different features from the datetime using first method.
Python3
# Load libraries import numpy as np import pandas as pd # Create time Strings dt_strings = np.array([ '04-03-2019 12:35 PM' , '22-06-2017 11:01 AM' , '05-09-2009 07:09 PM' ]) # Convert to datetime format timestamps = [pd.to_datetime(date, format = "%d-%m-%Y%I:%M %p" , errors = "coerce" ) for date in dt_strings] # Convert to datetimes timestamps = [pd.to_datetime(date, format = "%d-%m-%Y %I:%M %p" , errors = "coerce" ) for date in dt_strings] |
Output:
[Timestamp(‘2019-03-04 12:35:00’), Timestamp(‘2017-06-22 11:01:00’), Timestamp(‘2009-09-05 19:09:00’)]
Change the pattern of Timestamps: We can change the pattern of Timestamp using strftime method.
Python3
# Load library import pandas as pd # calling DataFrame constructor df = pd.DataFrame() # Create 6 dates df[ 'time' ] = pd.date_range( '2/5/2019' , periods = 6 , freq = '2H' ) print ( "Old Pattern" ) print (df[ 'time' ]) # print ( "\nNew Pattern" ) df[ 'new_time' ] = df[ 'time' ].dt.strftime( "%d-%B" ) print (df[ 'new_time' ]) |
Output:
Old Pattern 0 2019-02-05 00:00:00 1 2019-02-05 02:00:00 2 2019-02-05 04:00:00 3 2019-02-05 06:00:00 4 2019-02-05 08:00:00 5 2019-02-05 10:00:00 Name: time, dtype: datetime64[ns] New Pattern 0 05-February 1 05-February 2 05-February 3 05-February 4 05-February 5 05-February Name: new_time, dtype: object
Note: strftime method converts the timestamp to object datatype.
Extract Days Of the Week from the given Date: We use Series.dt.weekday_name to find name of the day in a week from the given Date.
Python3
# Load library import pandas as pd df = pd.DataFrame() # Create 6 dates dates = pd.pd.Series(date_range( '2/5/2019' , periods = 6 , freq = 'M' )) print (dates) # Extract days of week and then print print (dates.dt.weekday_name) |
Output:
0 2019-02-28 1 2019-03-31 2 2019-04-30 3 2019-05-31 4 2019-06-30 5 2019-07-31 dtype: datetime64[ns] 0 Thursday 1 Sunday 2 Tuesday 3 Friday 4 Sunday 5 Wednesday dtype: object
Extract Data in Date and Time Ranges: We can obtain the rows that lie in particular time range from the given dataset. Method #1: If the dataset is not indexed with time.
Python3
# Load library import pandas as pd # Create data frame df = pd.DataFrame() # Create datetimes df[ 'date' ] = pd.date_range( '1/1/2012' , periods = 1000 , freq = 'H' ) print (df.head( 5 )) # Select observations between two datetimes x = df[(df[ 'date' ] > '2012-1-1 01:00:00' ) & (df[ 'date' ] < = '2012-1-1 11:00:00' )] print (x) |
Output:
date 0 2012-01-01 00:00:00 1 2012-01-01 01:00:00 // 5 rows of Timestamps out of 1000 2 2012-01-01 02:00:00 3 2012-01-01 03:00:00 4 2012-01-01 04:00:00 date 2 2012-01-01 02:00:00 3 2012-01-01 03:00:00 4 2012-01-01 04:00:00 5 2012-01-01 05:00:00 //Timestamps in the given range 6 2012-01-01 06:00:00 7 2012-01-01 07:00:00 8 2012-01-01 08:00:00 9 2012-01-01 09:00:00 10 2012-01-01 10:00:00 11 2012-01-01 11:00:00
Method #2: If the dataset is indexed with time
Python3
# Load library import pandas as pd # Create data frame df = pd.DataFrame() # Create datetimes df[ 'date' ] = pd.date_range( '1/1/2012' , periods = 1000 , freq = 'H' ) # Set index df = df.set_index(df[ 'date' ]) print (df.head( 5 )) # Select observations between two datetimes x = df.loc[ '2012-1-1 04:00:00' : '2012-1-1 12:00:00' ] print (x) |
Output:
date date 2012-01-01 00:00:00 2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00 2012-01-01 03:00:00 // 5 rows of Timestamps out of 1000 2012-01-01 04:00:00 2012-01-01 04:00:00 date date 2012-01-01 04:00:00 2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00 2012-01-01 07:00:00 2012-01-01 08:00:00 2012-01-01 08:00:00 2012-01-01 09:00:00 2012-01-01 09:00:00 //Timestamps in the given range 2012-01-01 10:00:00 2012-01-01 10:00:00 2012-01-01 11:00:00 2012-01-01 11:00:00 2012-01-01 12:00:00 2012-01-01 12:00:00