How to group data by time intervals in Python Pandas?

27 July 2024

1

Prerequisites: Pandas

Grouping data by time intervals is very obvious when you come across Time-Series Analysis. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

Pandas provide two very useful functions that we can use to group our data.

resample()— This function is primarily used for time series data. It is a Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword. Resampling generates a unique sampling distribution on the basis of the actual data.

Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention=’start’, kind=None, loffset=None, limit=None, base=0, on=None, level=None)

Parameters :

rule : the offset string or object representing target conversion

axis : int, optional, default 0

closed : {‘right’, ‘left’}

label : {‘right’, ‘left’}

convention : For PeriodIndex only, controls whether to use the start or end of rule

loffset : Adjust the resampled time labels

base : For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.

on : For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.

level : For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.

Example: quantity added each month, total amount added each year.

Grouper — Grouper allows the user to specify on what basis the user wants to analyze the data.

Syntax: dataframe.groupby(pd.Grouper(key, level, freq, axis, sort, label, convention, base, Ioffset, origin, offset))

Parameters:

key: selects the target column to be grouped

level: level of the target index

freq: groupby a specified frequency if a target column is a datetime-like object

axis: name or number of axis

sort: to enable sorting

label: interval boundary to be used for labeling, valid only when freq parameter is passed.

convention: If grouper is PeriodIndex and freq parameter is passed

base: works only when freq is passed

Ioffset: works only when freq is passed

origin: timestamp to adjust grouping on the basis of

offset: offset timedelta added to the origin

Approach

Import module
Load or create data
Resample the data as required
Grouping the data

Implementation using this approach is given below:

Dataframe in use: timeseries.csv

Link: here.

Program : Aggregating using resampling

Python3

import numpy as np 
import pandas as pd 
  
# loading dataset 
data = pd.read_csv('path of dataset') 
  
# setting the index for the data 
data = data.set_index(['created_at']) 
  
# converting index to datetime index 
data.index = pd.to_datetime(data.index) 
  
# Changing start time for each hour, by default start time is at 0th minute 
data.resample('W',  loffset='30Min30s').price.sum().head(2) 
data.resample('W', loffset='30Min30s').price.sum().head(2) 
  
# we can also aggregate it will show quantity added in each week 
# as well as the total amount added in each week 
data.resample('W', loffset='30Min30s').agg( 
    {'price': 'sum', 'quantity': 'sum'}).head(5) 

Output:

Program : Grouping the data based on different time intervals

In the first part we are grouping like the way we did in resampling (on the basis of days, months, etc.) then we group the data on the basis of store type over a month Then aggregating as we did in resample It will give the quantity added in each week as well as the total amount added in each week.

Python3

import numpy as np 
  
import pandas as pd 
  
# loading dataset 
  
data = pd.read_csv(r'path of dataset') 
# setting the index for the data 
data = data.set_index(['created_at']) 
# converting index to datetime index 
data.index = pd.to_datetime(data.index) 
  
# Changing start time for each hour, by default start time is at 0th minute 
data.resample('W',  loffset='30Min30s').price.sum().head(2) 
data.resample('W', loffset='30Min30s').price.sum().head(2) 
  
data.groupby([pd.Grouper(freq='M'), 'store_type']).agg(total_quantity=('quantity', 'sum'), 
                                                       total_amount=('price', 'sum')).head(5) 

Output:

How to group data by time intervals in Python Pandas?

Approach

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US