Sometimes, while working with Python records, we can have a problem in which, we need to perform elements grouping based on multiple key equality, and also summation of the grouped result of a particular key. This kind of problem can occur in applications in data domains. Let’s discuss certain ways in which this task can be performed.
Input : test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best')] grp_indx = [1, 2] [ Indices to group ] sum_idx = [0] [ Index to sum ] Output : [('M', 'Gfg', 12), ('H', 'Gfg', 23), ('M', 'Best', 13)]
Input : test_list = [(12, 'M', 'Gfg'), (23, 'M', 'Gfg'), (13, 'M', 'Best')] grp_indx = [1, 2] [ Indices to group ] sum_idx = [0] [ Index to sum ] Output : [('M', 'Gfg', 35), ('M', 'Best', 13)]
Method 1: Using loop + defaultdict() + list comprehension
The combination of the above functionalities can be used to solve this problem. In this, we perform grouping using a loop and the task of performing a summation of keys is done using list comprehension.
Approach:
- List of tuples test_list is initialized with some values.
- grp_indx is a list of grouping indices, indicating the positions of elements in each tuple that will be used for grouping.
- sum_idx is a list of summation indices, indicating the positions of elements in each tuple that will be used for summation.
- A defaultdict named temp is initialized to store the results.
- A loop iterates through each tuple in test_list.
For each tuple, the elements at positions grp_indx[0] and grp_indx[1] are used to form a key for temp. - The value at position sum_idx[0] in the tuple is added to the corresponding value in temp.
- Once all tuples have been processed, a list comprehension is used to create a new list res by iterating through each key-value pair in temp and creating a new tuple by concatenating the key and value.
- Finally, the grouped summation is printed.
Follow the below steps to implement the above idea:
Python3
# Python3 code to demonstrate working of # Multiple Keys Grouped Summation # Using loop + defaultdict() + list comprehension from collections import defaultdict # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # printing original list print ( "The original list is : " + str (test_list)) # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using loop + defaultdict() + list comprehension temp = defaultdict( int ) for sub in test_list: temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]] res = [key + (val, ) for key, val in temp.items()] # printing result print ( "The grouped summation : " + str (res)) |
The original list is : [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’), (18, ‘M’, ‘Gfg’), (2, ‘H’, ‘Gfg’), (23, ‘M’, ‘Best’)]
The grouped summation : [(‘M’, ‘Gfg’, 30), (‘H’, ‘Gfg’, 25), (‘M’, ‘Best’, 36)]
Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(m), where m is the number of distinct combinations of grouping indices.
Method 2: Using itertools.groupby() and a lambda function for Multiple Keys Grouped Summation
In this method, we first sorts the input list using the sorted() function and a lambda function that extracts the grouping indices. It then uses itertools.groupby() to group the sorted list by the same indices. Finally, it uses a list comprehension to iterate over each group, summing the values of the sum_idx index for each element in the group, and creating a new tuple that includes the grouping indices and the summed value.
Python3
from itertools import groupby # Initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # Printing original list print ( "The original list is : " + str (test_list)) # Initializing grouping indices grp_indx = [ 1 , 2 ] # Initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using itertools.groupby() and a lambda function res = [(key[ 0 ], key[ 1 ], sum (sub[ 0 ] for sub in group)) for key, group in groupby( sorted (test_list, key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]])), key = lambda x: (x[grp_indx[ 0 ]], x[grp_indx[ 1 ]]))] # Printing result print ( "The grouped summation : " + str (res)) |
The original list is : [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')] The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation. The groupby function itself has a time complexity of O(n).
Auxiliary space: O(n).
Method 3: Using pandas library
Pandas is a powerful library in Python for data manipulation and analysis. It has a groupby function that can be used to group data by one or more keys and perform operations on the grouped data.
Python3
import pandas as pd # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # creating a pandas DataFrame from the list df = pd.DataFrame(test_list, columns = [ 'value' , 'key1' , 'key2' ]) # grouping by key1 and key2 and summing the values grouped = df.groupby([ 'key1' , 'key2' ])[ 'value' ]. sum () # converting the result back to a list of tuples res = [(key[ 0 ], key[ 1 ], value) for key, value in grouped.items()] # printing result print ( "The grouped summation : " + str (res)) |
OUTPUT- The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation performed internally by pandas for grouping the data.
Auxiliary space: O(n) because pandas needs to create a DataFrame object to store the input data and perform the grouping operation.
Method 4: Using itertools.groupby() and operator.itemgetter()
Use the itertools.groupby() function and the operator.itemgetter() function to group the elements by their keys and sum the values.
Python3
import itertools import operator # Initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # Initializing grouping indices grp_indx = [ 1 , 2 ] # Initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using itertools.groupby() and operator.itemgetter() test_list.sort(key = operator.itemgetter( * grp_indx)) res = [] for k, g in itertools.groupby(test_list, key = operator.itemgetter( * grp_indx)): vals = [sub[sum_idx[ 0 ]] for sub in g] res.append(k + ( sum (vals),)) # Printing result print ( "The grouped summation : " + str (res)) |
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) due to the sorting of the input list using the sorted() function.
Auxiliary space: O(n) because the result list res and the temporary list vals both have a maximum size of n, where n is the number of elements in the input list.
Method 5: Using dictionary comprehension
- Initialize the input list, grouping indices, and sum index.
- Create a dictionary comprehension to initialize a dictionary with keys as tuples of grouping indices and values as 0.
- Traverse through each sub-list in the input list, and update the corresponding key value in the dictionary by adding the value at the sum index to the existing value.
- Convert the dictionary to a list of tuples where each tuple contains the grouping indices followed by the sum.
- Print the result.
Python3
# Initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # Initializing grouping indices grp_indx = [ 1 , 2 ] # Initializing sum index sum_idx = [ 0 ] # Multiple Keys Grouped Summation # Using dictionary comprehension temp = {(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]]): 0 for sub in test_list} for sub in test_list: temp[(sub[grp_indx[ 0 ]], sub[grp_indx[ 1 ]])] + = sub[sum_idx[ 0 ]] res = [key + (val,) for key, val in temp.items()] # Printing result print ( "The grouped summation: " + str (res)) |
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time complexity: O(n). Where n is the length of the dictionary.
Auxiliary Space: O(m), where m is the number of unique combinations of grouping indices.
Method 6: Using the built-in function reduce() from the functools module
reduce() is a function from the functools module in Python that applies a function of two arguments cumulatively on a sequence of elements, in this case, our list of tuples.
Approach:
- Import the functools module
- Initialize grp_indx and sum_idx variables as before
- Define a lambda function that takes two tuples as arguments and returns a tuple with the same first two elements and the sum of their third elements. This function will be used by reduce() to perform the grouped summation.
- Use reduce() to apply the lambda function on the list of tuples. The initial value passed to reduce() is an empty dictionary.
- Convert the resulting dictionary to a list of tuples, where each tuple has the same first two elements as the keys of the dictionary and the third element is the value of the corresponding key.
- Print the result.
Below is the implementation of the above approach:
Python3
from functools import reduce # Initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # Initializing grouping indices grp_indx = [ 1 , 2 ] # Initializing sum index sum_idx = [ 0 ] # Using reduce() for Multiple Keys Grouped Summation res_dict = reduce ( lambda d, t: { * * d, (t[grp_indx[ 0 ]], t[grp_indx[ 1 ]]): d.get((t[grp_indx[ 0 ]], t[grp_indx[ 1 ]]), 0 ) + t[sum_idx[ 0 ]]}, test_list, {}) # Converting the dictionary to a list of tuples res = [(k[ 0 ], k[ 1 ], v) for k, v in res_dict.items()] # printing result print ( "The grouped summation: " + str (res)) |
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time Complexity: O(nlogn) due to the use of reduce() which has a time complexity of O(n) and the time complexity of the lambda function which is O(logn).
Auxiliary Space: O(n) because of the use of a dictionary to store intermediate results.
Method 7: Using NumPy
Steps:
- First, we import the NumPy library.
- We initialize the input list (test_list) and the grouping and sum indices (grp_indx and sum_indx, respectively).
- We convert the input list to a NumPy array using np.array().
- We extract the grouping and sum indices as separate arrays using array slicing (arr[:, grp_indx] and arr[:, sum_idx], respectively).\
- We convert the sum_arr to a numeric data type (such as int) using the astype() method, so that we can perform summation on it later.
- We use the np.unique() function to find the unique combinations of the grouping indices (grp_arr) and store them in unique_groups.
- We iterate over the unique combinations of grouping indices using a for loop.
- For each unique combination, we calculate the grouped summation by using the np.all() function to compare the grp_arr with the current group, and then summing the corresponding values in sum_arr.
- We append the results as tuples to a list called result.
- Finally, we print the result.
Python3
import numpy as np # initializing list test_list = [( 12 , 'M' , 'Gfg' ), ( 23 , 'H' , 'Gfg' ), ( 13 , 'M' , 'Best' ), ( 18 , 'M' , 'Gfg' ), ( 2 , 'H' , 'Gfg' ), ( 23 , 'M' , 'Best' )] # initializing grouping indices grp_indx = [ 1 , 2 ] # initializing sum index sum_idx = [ 0 ] # convert the list to a NumPy array arr = np.array(test_list) # extract the grouping and sum indices as separate arrays grp_arr = arr[:, grp_indx] sum_arr = arr[:, sum_idx].astype( int ) # convert to int for numeric summation # use np.unique() to find the unique combinations of the grouping indices unique_groups = np.unique(grp_arr, axis = 0 ) # iterate over the unique combinations and calculate the grouped summation result = [] for group in unique_groups: group_sum = np. sum (sum_arr[np. all (grp_arr = = group, axis = 1 )]) result.append((group[ 0 ], group[ 1 ], group_sum)) # printing result print ( "The grouped summation: " + str (result)) |
OUTPUT : The grouped summation: [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(NlogN) for np.unique(), where N is the number of elements in test_list, and O(N) for the for loop.
Auxiliary Space: O(N) for the NumPy arrays and O(N) for the result list.