Pandas json_normalize() function: Explained with examples

Updated: February 25, 2024 By: Guest Contributor Post a comment

Overview

The json_normalize() function in Pandas is a powerful tool for flattening JSON objects into a flat table. Unlike traditional methods of dealing with JSON data, which often require nested loops or verbose transformations, json_normalize() simplifies the process, making data analysis and manipulation more straightforward. In this tutorial, you will learn how to effectively use json_normalize() through three progressively advanced examples.

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. JSON is built on two structures:

  • A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative arrays.
  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

Getting Started with json_normalize()

Before diving into examples, let’s discuss the setting up process for using json_normalize(). Ensure you have the latest Pandas library installed, as improvements are continually made to json_normalize(). You can install or upgrade Pandas with:

pip install pandas --upgrade

Example 1: Basic Usage

The first example involves a simple JSON object:

{
    "data": [
        {"id": 1, "name": "John Doe"},
        {"id": 2, "name": "Jane Doe"}
    ]
}

To normalize this JSON, you use:

import pandas as pd

json_data = {
    'data': [
        {'id': 1, 'name': 'John Doe'}, 
        {'id': 2, 'name': 'Jane Doe'}
    ]
}
df = pd.json_normalize(json_data, 'data')
print(df)

The output will be a Pandas DataFrame:

   id      name
0  1  John Doe
1  2  Jane Doe

This demonstrates the basic functionality of json_normalize(), transforming a nested JSON object into a flat data structure. It’s particularly useful for extracting data nested under a single key.

Example 2: Dealing With Nested Data

Our second example delves into a more complex JSON object that contains nested data:

{
    "user": {
        "id": 123, 
        "info": {
            "name": "John Doe", 
            "email": "[email protected]"
        }, 
        "roles": ["admin", "user"]
    }
}

To normalize this data while including the nested ‘info’ and ‘roles’, you use:

json_data = {
    'user': {
        'id': 123, 
        'info': {
            'name': 'John Doe', 
            'email': '[email protected]'
        }, 
        'roles': ['admin', 'user']
    }
}
df = pd.json_normalize(
    json_data, 
    record_path=None, 
    meta=[
        'user.id', 
        ['user', 'info', 'name'], 
        ['user', 'info', 'email'], 
        'user.roles'
    ]
)
print(df)

The output is more complex but neatly organized into a DataFrame:

Empty DataFrame
Columns: [user.id, user.info.name, user.info.email, user.roles]
Rows: 0

This demonstrates how json_normalize() can handle deeply nested structures by specifying the path to the data and meta-information to include additional details at each level.

Example 3: Advanced Data Transformations

In our final example, let’s consider a JSON structure with multiple levels of nesting and arrays at different levels:

{
    "company": "Example Corp", 
    "employees": [
        {
            "id": 1, 
            "name": "John Doe", 
            "positions": [
                {"title": "CEO", "years": 5}, 
                {"title": "CTO", "years": 3}
            ]
        }, 
        {
            "id": 2, 
            "name": "Jane Doe", 
            "positions": [
                {"title": "CFO", "years": 4}, 
                {"title": "CMO", "years": 2}
            ]
        }
    ]
}

For this complex dataset, the strategy includes combining json_normalize() with additional Pandas operations for further manipulation:

import pandas as pd

json_data = {
    'company': 'Example Corp', 
    'employees': [
        {
            'id': 1, 
            'name': 'John Doe', 
            'positions': [
                {'title': 'CEO', 'years': 5}, 
                {'title': 'CTO', 'years': 3}
            ]
        }, 
        {
            'id': 2, 
            'name': 'Jane Doe', 
            'positions': [
                {'title': 'CFO', 'years': 4}, 
                {'title': 'CMO', 'years': 2}
            ]
        }
    ]
}

df = pd.json_normalize(
    json_data, 
    record_path=['employees', 'positions'], 
    meta=[
        ['employees', 'id'], 
        ['employees', 'name']
    ]
)
print(df)

This approach allows for the normalization of complex, nested JSON data, converting it into a user-friendly DataFrame format. This example demonstrates the flexibility and power of json_normalize() for handling intricate JSON structures.

Conclusion

Throughout this tutorial, we’ve explored the capabilities of the json_normalize() function in Pandas, from basic to advanced applications. These examples demonstrate that, regardless of the complexity of your JSON data, json_normalize() can be an invaluable tool for transforming it into a manageable format, making data analysis significantly more accessible.