Sometimes, while working with data, we can have a problem in which we have flat data in the form of a list of dictionaries, and we need to perform the categorization from that bare dictionaries according to ids. This can have applications in domains that involve data, such as web development and Data Science. Let’s discuss the certain way in which this task can be performed.
Method 1: Using defaultdict() + setdefault() + loop
The combination of the above functionalities can be used to perform this task. It is brute way in which this can be performed. In this, we initialize the defaultdict() with dictionary values for nested records formation and according to populate the data using setdefault() and conditions.
Python3
# Python3 code to demonstrate working of # Value nested grouping on List # Using loop + setdefault() + defaultdict() from collections import defaultdict # initializing list test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] # printing original list print ( "The original list is : " + str (test_list)) # Value nested grouping on List # Using loop + setdefault() + defaultdict() temp = defaultdict( dict ) res = {} for sub in test_list: type = sub[ 'value' ] if 'tag' in sub: tag = sub[ 'tag' ] temp[tag].setdefault( type , temp[ type ]) else : res[ type ] = temp[ type ] # printing result print ( "The dictionary after grouping : " + str (res)) |
The original list is : [{‘value’: ‘Fruit’}, {‘tag’: ‘Fruit’, ‘value’: ‘mango’}, {‘value’: ‘Car’}, {‘tag’: ‘Car’, ‘value’: ‘maruti’}, {‘tag’: ‘Fruit’, ‘value’: ‘orange’}, {‘tag’: ‘Car’, ‘value’: ‘city’}] The dictionary after grouping : {‘Fruit’: {‘mango’: {}, ‘orange’: {}}, ‘Car’: {‘city’: {}, ‘maruti’: {}}}
Time complexity: O(n), where n is the number of elements in the input list.
Auxiliary space: O(n), where n is the number of elements in the input list
Method 2: “List Grouping with itertools.groupby()”
Use the groupby function to group the list of dictionaries by the ‘tag‘ key. If the ‘tag’ key is present, it creates a nested dictionary with the ‘value‘ key as the value. If the ‘tag‘ key is not present, it creates a dictionary with the ‘value‘ key as the key and an empty dictionary as the value.
Python3
from itertools import groupby from operator import itemgetter # initializing list test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] # printing original list print ( "The original list is : " + str (test_list)) # Value nested grouping on List res = {} for k, g in groupby(test_list, key = itemgetter( 'tag' )): if k: res[k] = {i[ 'value' ] for i in g} else : res.update({i[ 'value' ]: {} for i in g}) # printing result print ( "The dictionary after grouping : " + str (res)) |
Output:
The original list is : [{'value': 'Fruit'}, {'tag': 'Fruit', 'value': 'mango'}, {'value': 'Car'}, {'tag': 'Car', 'value': 'maruti'}, {'tag': 'Fruit', 'value': 'orange'}, {'tag': 'Car', 'value': 'city'}] The dictionary after grouping : {'Fruit': {'mango', 'orange'}, 'Car': {'city', 'maruti'}}
Time complexity: O(n log n), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.
Method 3- Using dictionary comprehension and iterating over the list.
Python3
test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] # Grouping the list based on 'tag' and 'value' keys res = {} for d in test_list: if 'tag' in d: res.setdefault(d[ 'tag' ], {}).setdefault( 'value' , set ()).add(d[ 'value' ]) else : res.setdefault(d[ 'value' ], {}) # printing result print ( "The dictionary after grouping : " + str (res)) |
The dictionary after grouping : {'Fruit': {'value': {'mango', 'orange'}}, 'Car': {'value': {'maruti', 'city'}}}
Note: In this method, if the ‘tag’ key is not present in the dictionary, the ‘value’ key with an empty set is added to the corresponding key in the resultant dictionary.
Time complexity: O(n)b where n is the length of the input list.
Auxiliary space: O(m), where m is the number of unique ‘tag’ and ‘value’ keys in the input list.
Method 4: Using pandas library
Pandas is a popular library for data manipulation and analysis, which includes powerful tools for grouping and aggregating data.
Python3
import pandas as pd # initializing list test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] # creating a DataFrame from the list df = pd.DataFrame(test_list) # grouping and aggregating data res = df.groupby( 'tag' )[ 'value' ]. apply ( set ).to_dict() # printing result print ( "The dictionary after grouping : " + str (res)) |
Output:
The dictionary after grouping : {‘Car’: {‘city’, ‘maruti’}, ‘Fruit’: {‘orange’, ‘mango’}}
Time complexity: O(n log n)
Auxiliary Space: O(n + m)
Method 6: Using the collections module’s defaultdict() and a for loop to group the values based on the ‘tag’ key.
The program groups the values in the input list based on the ‘tag’ key using a defaultdict with set as the default value. It then converts the defaultdict to a regular dictionary and prints the result.
Python3
from collections import defaultdict # initializing list test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] # creating a defaultdict with set as the default value res = defaultdict( set ) # iterating over the list and grouping the values for item in test_list: if 'tag' in item: res[item[ 'tag' ]].add(item[ 'value' ]) # converting the defaultdict to a regular dictionary res = dict (res) # printing result print ( "The dictionary after grouping : " + str (res)) |
The dictionary after grouping : {'Fruit': {'mango', 'orange'}, 'Car': {'city', 'maruti'}}
Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.
Method 6: Using a list comprehension and a lambda function
- Create a lambda function that takes an item from the input list and returns a tuple containing the ‘tag’ value (or None if ‘tag’ key is not present) and the ‘value’ value.
- Use the lambda function inside a list comprehension to extract the tuples from the input list.
- Use the itertools.groupby() function to group the tuples based on the ‘tag’ value.
- Use dictionary comprehension to convert the groups into a dictionary, where the keys are the ‘tag’ values and the values are lists of the corresponding ‘value’ values.
Python3
test_list = [{ 'value' : 'Fruit' }, { 'tag' : 'Fruit' , 'value' : 'mango' }, { 'value' : 'Car' }, { 'tag' : 'Car' , 'value' : 'maruti' }, { 'tag' : 'Fruit' , 'value' : 'orange' }, { 'tag' : 'Car' , 'value' : 'city' }] res = {k: { 'value' : set (v)} for k, v in [(d[ 'tag' ], [i[ 'value' ] for i in test_list if i.get( 'tag' ) = = d[ 'tag' ]]) for d in test_list if 'tag' in d]} for item in test_list: if 'value' in item and item[ 'value' ] not in res: res[item[ 'value' ]] = {} print ( "The dictionary after grouping: " + str (res)) |
The dictionary after grouping: {'Fruit': {'value': {'mango', 'orange'}}, 'Car': {'value': {'city', 'maruti'}}, 'mango': {}, 'maruti': {}, 'orange': {}, 'city': {}}
Time complexity: O(n log n), due to the use of itertools.groupby() function.
Auxiliary space: O(n), for the list of tuples grouped_tuples and the dictionary result_dict.