Every modern company that engages in online sales or maintains a specialized e-commerce website now aims to maximize its throughput in order to determine what precisely their clients need in order to increase their chances of sales. The huge datasets handed to us can be properly analyzed to find out what time of day has the highest user activity in terms of transactions.
In this post, We will use Python Pandas and Matplotlib to analyze the insight of the dataset. We can use the column Transaction Date, in this case, to glean useful insights on the busiest time (hour) of the day. You can access the entire dataset here.
Stepwise Implementation
Step 1:
First, We need to create a Dataframe of the dataset, and even before that certain libraries have to be imported.
Python3
import numpy as np import pandas as pd import matplotlib.pyplot as plt Order_Details = pd.read_csv( 'Order_details(masked).csv' ) |
Output:
Step 2:
Create a new column called Time that has the DateTime format after converting the Transaction Date column into it. The DateTime format, which has the pattern YYYY-MM-DD HH:MM:SS, can be customized however you choose. Here we’re more interested in obtaining hours, so we can have an Hour column by using an in-built function for the same:
Python3
# here we have taken Transaction # date column Order_Details[ 'Time' ] = pd.to_datetime(Order_Details[ 'Transaction Date' ]) # After that we extracted hour # from Transaction date column Order_Details[ 'Hour' ] = (Order_Details[ 'Time' ]).dt.hour |
Step 3:
We then require the “n” busiest hours. For that, we get the first “n” entries in a list containing the occurrence rates of the hours when the transaction took place. To further simplify the manipulation of the provided data in Python, we may utilize value counts for frequencies and tolist() to convert to list format. We are also compiling a list of the associated index values.
Python3
# n =24 in this case, can be modified # as per need to see top 'n' busiest hours timemost1 = Order_Details[ 'Hour' ].value_counts().index.tolist()[: 24 ] timemost2 = Order_Details[ 'Hour' ].value_counts().values.tolist()[: 24 ] |
Step 4:
Finally, we stack the indices (hour) and frequencies together to yield the final result.
Python3
tmost = np.column_stack((timemost1,timemost2)) print ( " Hour Of Day" + "\t" + "Cumulative Number of Purchases \n" ) print ( '\n' .join( '\t\t' .join( map ( str , row)) for row in tmost)) |
Step 5:
Before we can create an appropriate data visualization, we must make the list slightly more customizable. To do so, we gather the hourly frequencies and perform the following tasks:
Python3
timemost = Order_Details[ 'Hour' ].value_counts() timemost1 = [] for i in range ( 0 , 23 ): timemost1.append(i) timemost2 = timemost.sort_index() timemost2.tolist() timemost2 = pd.DataFrame(timemost2) |
Step 6:
For data visualization, we will proceed with Matplotlib for better comprehensibility, as it is one of the most convenient and commonly used libraries. But, It is up to you to choose any of the pre-existing libraries like Matplotlib, Ggplot, Seaborn, etc., to plot the data graphically.
The commands written below are mainly to ensure that X-axis takes up the values of hours and Y-axis takes up the importance of the number of transactions affected, and also various other aspects of a line chart, including color, font, etc., to name a few.
Python3
plt.figure(figsize = ( 20 , 10 )) plt.title( 'Sales Happening Per Hour (Spread Throughout The Week)' , fontdict = { 'fontname' : 'monospace' , 'fontsize' : 30 }, y = 1.05 ) plt.ylabel( "Number Of Purchases Made" , fontsize = 18 , labelpad = 20 ) plt.xlabel( "Hour" , fontsize = 18 , labelpad = 20 ) plt.plot(timemost1, timemost2, color = 'm' ) plt.grid() plt.show() |
The results are indicative of how sales typically peak in late evening hours prominently, and this data can be incorporated into business decisions to promote a product during that time specifically.