Sometimes, while working with Python records, we can have a problem in which, we need to perform elements grouping based on multiple key equality, and also summation of the grouped result of a particular key. This kind of problem can occur in applications in data domains. Let’s discuss certain ways in which this task can be performed.
Input :
test_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best')]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [('M', 'Gfg', 12), ('H', 'Gfg', 23), ('M', 'Best', 13)]
Input :
test_list = [(12, 'M', 'Gfg'), (23, 'M', 'Gfg'), (13, 'M', 'Best')]
grp_indx = [1, 2] [ Indices to group ]
sum_idx = [0] [ Index to sum ]
Output : [('M', 'Gfg', 35), ('M', 'Best', 13)]
Method 1: Using loop + defaultdict() + list comprehension
The combination of the above functionalities can be used to solve this problem. In this, we perform grouping using a loop and the task of performing a summation of keys is done using list comprehension.
Approach:
- List of tuples test_list is initialized with some values.
- grp_indx is a list of grouping indices, indicating the positions of elements in each tuple that will be used for grouping.
- sum_idx is a list of summation indices, indicating the positions of elements in each tuple that will be used for summation.
- A defaultdict named temp is initialized to store the results.
- A loop iterates through each tuple in test_list.
For each tuple, the elements at positions grp_indx[0] and grp_indx[1] are used to form a key for temp. - The value at position sum_idx[0] in the tuple is added to the corresponding value in temp.
- Once all tuples have been processed, a list comprehension is used to create a new list res by iterating through each key-value pair in temp and creating a new tuple by concatenating the key and value.
- Finally, the grouped summation is printed.
Follow the below steps to implement the above idea:
Python3
# Python3 code to demonstrate working of # Multiple Keys Grouped Summation# Using loop + defaultdict() + list comprehensionfrom collections import defaultdict# initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# printing original listprint("The original list is : " + str(test_list))# initializing grouping indicesgrp_indx = [1, 2]# initializing sum index sum_idx = [0]# Multiple Keys Grouped Summation# Using loop + defaultdict() + list comprehensiontemp = defaultdict(int)for sub in test_list: temp[(sub[grp_indx[0]], sub[grp_indx[1]])] += sub[sum_idx[0]]res = [key + (val, ) for key, val in temp.items()] # printing result print("The grouped summation : " + str(res)) |
The original list is : [(12, ‘M’, ‘Gfg’), (23, ‘H’, ‘Gfg’), (13, ‘M’, ‘Best’), (18, ‘M’, ‘Gfg’), (2, ‘H’, ‘Gfg’), (23, ‘M’, ‘Best’)]
The grouped summation : [(‘M’, ‘Gfg’, 30), (‘H’, ‘Gfg’, 25), (‘M’, ‘Best’, 36)]
Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(m), where m is the number of distinct combinations of grouping indices.
Method 2: Using itertools.groupby() and a lambda function for Multiple Keys Grouped Summation
In this method, we first sorts the input list using the sorted() function and a lambda function that extracts the grouping indices. It then uses itertools.groupby() to group the sorted list by the same indices. Finally, it uses a list comprehension to iterate over each group, summing the values of the sum_idx index for each element in the group, and creating a new tuple that includes the grouping indices and the summed value.
Python3
from itertools import groupby# Initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# Printing original listprint("The original list is : " + str(test_list))# Initializing grouping indicesgrp_indx = [1, 2]# Initializing sum indexsum_idx = [0]# Multiple Keys Grouped Summation# Using itertools.groupby() and a lambda functionres = [(key[0], key[1], sum(sub[0] for sub in group)) for key, group in groupby(sorted(test_list, key=lambda x: (x[grp_indx[0]], x[grp_indx[1]])), key=lambda x: (x[grp_indx[0]], x[grp_indx[1]]))]# Printing resultprint("The grouped summation : " + str(res)) |
The original list is : [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation. The groupby function itself has a time complexity of O(n).
Auxiliary space: O(n).
Method 3: Using pandas library
Pandas is a powerful library in Python for data manipulation and analysis. It has a groupby function that can be used to group data by one or more keys and perform operations on the grouped data.
Python3
import pandas as pd# initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# creating a pandas DataFrame from the listdf = pd.DataFrame(test_list, columns=['value', 'key1', 'key2'])# grouping by key1 and key2 and summing the valuesgrouped = df.groupby(['key1', 'key2'])['value'].sum()# converting the result back to a list of tuplesres = [(key[0], key[1], value) for key, value in grouped.items()]# printing resultprint("The grouped summation : " + str(res)) |
OUTPUT-
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) because of the sorting operation performed internally by pandas for grouping the data.
Auxiliary space: O(n) because pandas needs to create a DataFrame object to store the input data and perform the grouping operation.
Method 4: Using itertools.groupby() and operator.itemgetter()
Use the itertools.groupby() function and the operator.itemgetter() function to group the elements by their keys and sum the values.
Python3
import itertoolsimport operator# Initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# Initializing grouping indicesgrp_indx = [1, 2]# Initializing sum indexsum_idx = [0]# Multiple Keys Grouped Summation# Using itertools.groupby() and operator.itemgetter()test_list.sort(key=operator.itemgetter(*grp_indx))res = []for k, g in itertools.groupby(test_list, key=operator.itemgetter(*grp_indx)): vals = [sub[sum_idx[0]] for sub in g] res.append(k + (sum(vals),))# Printing resultprint("The grouped summation : " + str(res)) |
The grouped summation : [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(n log n) due to the sorting of the input list using the sorted() function.
Auxiliary space: O(n) because the result list res and the temporary list vals both have a maximum size of n, where n is the number of elements in the input list.
Method 5: Using dictionary comprehension
- Initialize the input list, grouping indices, and sum index.
- Create a dictionary comprehension to initialize a dictionary with keys as tuples of grouping indices and values as 0.
- Traverse through each sub-list in the input list, and update the corresponding key value in the dictionary by adding the value at the sum index to the existing value.
- Convert the dictionary to a list of tuples where each tuple contains the grouping indices followed by the sum.
- Print the result.
Python3
# Initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# Initializing grouping indicesgrp_indx = [1, 2]# Initializing sum indexsum_idx = [0]# Multiple Keys Grouped Summation# Using dictionary comprehensiontemp = {(sub[grp_indx[0]], sub[grp_indx[1]]): 0 for sub in test_list}for sub in test_list: temp[(sub[grp_indx[0]], sub[grp_indx[1]])] += sub[sum_idx[0]] res = [key + (val,) for key, val in temp.items()]# Printing resultprint("The grouped summation: " + str(res)) |
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time complexity: O(n). Where n is the length of the dictionary.
Auxiliary Space: O(m), where m is the number of unique combinations of grouping indices.
Method 6: Using the built-in function reduce() from the functools module
reduce() is a function from the functools module in Python that applies a function of two arguments cumulatively on a sequence of elements, in this case, our list of tuples.
Approach:
- Import the functools module
- Initialize grp_indx and sum_idx variables as before
- Define a lambda function that takes two tuples as arguments and returns a tuple with the same first two elements and the sum of their third elements. This function will be used by reduce() to perform the grouped summation.
- Use reduce() to apply the lambda function on the list of tuples. The initial value passed to reduce() is an empty dictionary.
- Convert the resulting dictionary to a list of tuples, where each tuple has the same first two elements as the keys of the dictionary and the third element is the value of the corresponding key.
- Print the result.
Below is the implementation of the above approach:
Python3
from functools import reduce# Initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# Initializing grouping indicesgrp_indx = [1, 2]# Initializing sum indexsum_idx = [0]# Using reduce() for Multiple Keys Grouped Summationres_dict = reduce(lambda d, t: {**d, (t[grp_indx[0]], t[grp_indx[1]]): d.get((t[grp_indx[0]], t[grp_indx[1]]), 0) + t[sum_idx[0]]}, test_list, {})# Converting the dictionary to a list of tuplesres = [(k[0], k[1], v) for k, v in res_dict.items()]# printing resultprint("The grouped summation: " + str(res)) |
The grouped summation: [('M', 'Gfg', 30), ('H', 'Gfg', 25), ('M', 'Best', 36)]
Time Complexity: O(nlogn) due to the use of reduce() which has a time complexity of O(n) and the time complexity of the lambda function which is O(logn).
Auxiliary Space: O(n) because of the use of a dictionary to store intermediate results.
Method 7: Using NumPy
Steps:
- First, we import the NumPy library.
- We initialize the input list (test_list) and the grouping and sum indices (grp_indx and sum_indx, respectively).
- We convert the input list to a NumPy array using np.array().
- We extract the grouping and sum indices as separate arrays using array slicing (arr[:, grp_indx] and arr[:, sum_idx], respectively).\
- We convert the sum_arr to a numeric data type (such as int) using the astype() method, so that we can perform summation on it later.
- We use the np.unique() function to find the unique combinations of the grouping indices (grp_arr) and store them in unique_groups.
- We iterate over the unique combinations of grouping indices using a for loop.
- For each unique combination, we calculate the grouped summation by using the np.all() function to compare the grp_arr with the current group, and then summing the corresponding values in sum_arr.
- We append the results as tuples to a list called result.
- Finally, we print the result.
Python3
import numpy as np# initializing listtest_list = [(12, 'M', 'Gfg'), (23, 'H', 'Gfg'), (13, 'M', 'Best'), (18, 'M', 'Gfg'), (2, 'H', 'Gfg'), (23, 'M', 'Best')]# initializing grouping indicesgrp_indx = [1, 2]# initializing sum indexsum_idx = [0]# convert the list to a NumPy arrayarr = np.array(test_list)# extract the grouping and sum indices as separate arraysgrp_arr = arr[:, grp_indx]sum_arr = arr[:, sum_idx].astype(int) # convert to int for numeric summation# use np.unique() to find the unique combinations of the grouping indicesunique_groups = np.unique(grp_arr, axis=0)# iterate over the unique combinations and calculate the grouped summationresult = []for group in unique_groups: group_sum = np.sum(sum_arr[np.all(grp_arr == group, axis=1)]) result.append((group[0], group[1], group_sum))# printing resultprint("The grouped summation: " + str(result)) |
OUTPUT :
The grouped summation: [('H', 'Gfg', 25), ('M', 'Best', 36), ('M', 'Gfg', 30)]
Time complexity: O(NlogN) for np.unique(), where N is the number of elements in test_list, and O(N) for the for loop.
Auxiliary Space: O(N) for the NumPy arrays and O(N) for the result list.
