Python Pandas – Flatten nested JSON

28 July 2024

1

Most of the data extracted from the web through scraping are in the form of JSON datatype because JSON is a preferred datatype for transmitting data in web applications. The reason JSON is preferred is that it’s extremely lightweight to send back and forth in HTTP requests and responses due to the small file size.

Such a JSON file can sometimes be clumsy having different levels and hierarchy. Such raw formats of data cannot be used for further processing. It is general practice to convert the JSON data structure to a Pandas Dataframe as it can help to manipulate and visualize the data more conveniently. In this article, let us consider different nested JSON data structures and flatten them using inbuilt and custom-defined functions.

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.

Syntax: pandas.json_normalize(data, errors=’raise’, sep=’.’, max_level=None)

Parameters:

data – dict or list of dicts

errors – {‘raise’, ‘ignore’}, default ‘raise’

sep – str, default ‘.’ nested records will generate names separated by a specified separator.

max_level – int, default None. Max number of levels(depth of dict) to normalize.

Example 1:

Consider a list of nested dictionaries that contains details about the students and their marks as shown. Use pandas json_normalize on this JSON data structure to flatten it to a flat table as shown

Python3

import pandas as pd
  
data = [
    {"Roll no": 1,
     "student": {"first_name": "Ram", "last_name": "kumar"}
     },
    {"student": {"English": "95", "Math": "88"}
     },
    {"Roll no": 2,
     "student": {"first_name": "Joseph", "English": "90", "Science": "82"}
     },
    {"Roll no": 3,
     "student": {"first_name": "abinaya", "last_name": "devi"},
     "student": {"English": "91", "Math": "98"}
     },
]
  
df = pd.json_normalize(data)
print(df)

Output:

Example 2:

Now let us make use of the max_level option to flatten a slightly complicated JSON structure to a flat table. For this example, we have considered the max_level of 0, which means flattening only the first level of JSON and can experiment with the results.

Here, we have considered an example of the health records of different individuals in JSON format.

Python3

import pandas as pd
  
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=0)

Output:

As we have used only 1 level of flattening, The second level is retained as a key-value pair as shown

Example 3:

Now let us use the same JSON data structure as above, with max_level of 1, which means flattening the first two levels of JSON and can experiment with the results.

Python3

import pandas as pd
  
data = [
    {
        "id": 1,
        "candidate": "Roberto mathews",
        "health_index": {"bmi": 22, "blood_pressure": 130},
    },
    {"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
    {
        "id": 2,
        "candidate": "Bruce tommy",
        "health_index": {"bmi": 31, "blood_pressure": 190},
    },
]
pd.json_normalize(data, max_level=1)

Output:

Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table.

Example 4:

Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.

Here, in the below code, we have passed the chronological order in which the JSON has to be parsed to a flat table. In the below code, we first suggest the department key to be parsed followed by the company and tagline, then, we pass the management and CEO key as a nested list indicating that they have to be parsed as a single field.

Python3

import pandas as pd
data = [
    {
        "company": "Google",
        "tagline": "Dont be evil",
        "management": {"CEO": "Sundar Pichai"},
        "department": [
            {"name": "Gmail", "revenue (bn)": 123},
            {"name": "GCP", "revenue (bn)": 400},
            {"name": "Google drive", "revenue (bn)": 600},
        ],
    },
    {
        "company": "Microsoft",
        "tagline": "Be What's Next",
        "management": {"CEO": "Satya Nadella"},
        "department": [
            {"name": "Onedrive", "revenue (bn)": 13},
            {"name": "Azure", "revenue (bn)": 300},
            {"name": "Microsoft 365", "revenue (bn)": 300},
        ],
    },
  
]
result = pd.json_normalize(
    data, "department", ["company", "tagline", ["management", "CEO"]]
)
result

Output:

Python Pandas – Flatten nested JSON

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US