Overview
The json_normalize()
function in Pandas is a powerful tool for flattening JSON objects into a flat table. Unlike traditional methods of dealing with JSON data, which often require nested loops or verbose transformations, json_normalize()
simplifies the process, making data analysis and manipulation more straightforward. In this tutorial, you will learn how to effectively use json_normalize()
through three progressively advanced examples.
What is JSON?
JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. JSON is built on two structures:
- A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative arrays.
- An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
Getting Started with json_normalize()
Before diving into examples, let’s discuss the setting up process for using json_normalize()
. Ensure you have the latest Pandas library installed, as improvements are continually made to json_normalize()
. You can install or upgrade Pandas with:
pip install pandas --upgrade
Example 1: Basic Usage
The first example involves a simple JSON object:
{
"data": [
{"id": 1, "name": "John Doe"},
{"id": 2, "name": "Jane Doe"}
]
}
To normalize this JSON, you use:
import pandas as pd
json_data = {
'data': [
{'id': 1, 'name': 'John Doe'},
{'id': 2, 'name': 'Jane Doe'}
]
}
df = pd.json_normalize(json_data, 'data')
print(df)
The output will be a Pandas DataFrame:
id name
0 1 John Doe
1 2 Jane Doe
This demonstrates the basic functionality of json_normalize()
, transforming a nested JSON object into a flat data structure. It’s particularly useful for extracting data nested under a single key.
Example 2: Dealing With Nested Data
Our second example delves into a more complex JSON object that contains nested data:
{
"user": {
"id": 123,
"info": {
"name": "John Doe",
"email": "[email protected]"
},
"roles": ["admin", "user"]
}
}
To normalize this data while including the nested ‘info’ and ‘roles’, you use:
json_data = {
'user': {
'id': 123,
'info': {
'name': 'John Doe',
'email': '[email protected]'
},
'roles': ['admin', 'user']
}
}
df = pd.json_normalize(
json_data,
record_path=None,
meta=[
'user.id',
['user', 'info', 'name'],
['user', 'info', 'email'],
'user.roles'
]
)
print(df)
The output is more complex but neatly organized into a DataFrame:
Empty DataFrame
Columns: [user.id, user.info.name, user.info.email, user.roles]
Rows: 0
This demonstrates how json_normalize()
can handle deeply nested structures by specifying the path to the data and meta-information to include additional details at each level.
Example 3: Advanced Data Transformations
In our final example, let’s consider a JSON structure with multiple levels of nesting and arrays at different levels:
{
"company": "Example Corp",
"employees": [
{
"id": 1,
"name": "John Doe",
"positions": [
{"title": "CEO", "years": 5},
{"title": "CTO", "years": 3}
]
},
{
"id": 2,
"name": "Jane Doe",
"positions": [
{"title": "CFO", "years": 4},
{"title": "CMO", "years": 2}
]
}
]
}
For this complex dataset, the strategy includes combining json_normalize()
with additional Pandas operations for further manipulation:
import pandas as pd
json_data = {
'company': 'Example Corp',
'employees': [
{
'id': 1,
'name': 'John Doe',
'positions': [
{'title': 'CEO', 'years': 5},
{'title': 'CTO', 'years': 3}
]
},
{
'id': 2,
'name': 'Jane Doe',
'positions': [
{'title': 'CFO', 'years': 4},
{'title': 'CMO', 'years': 2}
]
}
]
}
df = pd.json_normalize(
json_data,
record_path=['employees', 'positions'],
meta=[
['employees', 'id'],
['employees', 'name']
]
)
print(df)
This approach allows for the normalization of complex, nested JSON data, converting it into a user-friendly DataFrame format. This example demonstrates the flexibility and power of json_normalize()
for handling intricate JSON structures.
Conclusion
Throughout this tutorial, we’ve explored the capabilities of the json_normalize()
function in Pandas, from basic to advanced applications. These examples demonstrate that, regardless of the complexity of your JSON data, json_normalize()
can be an invaluable tool for transforming it into a manageable format, making data analysis significantly more accessible.