Pandas: How to save a DataFrame in JSON format (3 examples)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a versatile tool for data analysis in Python, enabling users to handle and manipulate large datasets efficiently. One of its many functionalities includes the ability to save DataFrames in various formats, including JSON. JSON (JavaScript Object Notation) is a lightweight data-interchange format that’s easy for humans to read and write, and easy for machines to parse and generate. It’s especially useful for storing and transporting data between a server and a web application. In this tutorial, we will explore three examples that show how to save a Pandas DataFrame in JSON format, ranging from basic to advanced use cases.

Basic Example: Convert DataFrame to JSON

Let’s start with the most straightforward example. We have a simple DataFrame and we want to save it as a JSON file.

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Save the DataFrame as a JSON file
df.to_json('output.json')

In this example, the DataFrame was saved in a JSON file named ‘output.json’. The default behavior of to_json() function converts the DataFrame into a JSON string, which then gets written into a file. The resulting file is structured as columns with their corresponding values, which is the default JSON orientation in Pandas.

Specifying JSON Orientation

Pandas allows you to specify the orientation of the JSON output using the orient parameter. This can be particularly useful when working with different systems that may require a specific JSON structure. Here are the different orientations you can specify:

  • split: Dictionary containing indexes, columns, and data.
  • records: List of dictionaries with each dictionary representing a row in the DataFrame.
  • index: Nested dictionaries containing {index:{column:value}}.
  • columns: Nested dictionaries containing {column:{index:value}}.
  • values: Just the values array.

Let’s see examples for ‘records’ and ‘split’ orientations.

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Save the DataFrame as a JSON file using 'records' orientation
df.to_json('output_records.json', orient='records')

# Save the DataFrame as a JSON file using 'split' orientation
df.to_json('output_split.json', orient='split')

In ‘output_records.json’, each row of the DataFrame is a dictionary within an array. This format can be very convenient for applications that consume JSON data on a row-by-row basis. On the other hand, ‘output_split.json’ separates the data into three parts: columns, index, and data, which could be useful for reconstructing the DataFrame exactly on the receiving end.

Advanced: Customizing JSON Output

For more complex requirements, Pandas allows for a high degree of customization when converting DataFrames to JSON, including options to exclude certain columns, specify a custom date format, or apply a compression method. Here’s an example that showcases some of these advanced features:

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Only include 'Name' and 'City' columns, and compress the file
df[['Name', 'City']].to_json('output_compressed.json', orient='records', lines=True, compression='infer')

This command saves a JSON file including only the ‘Name’ and ‘City’ columns with each record on a new line, making it compatible with JSON Lines format. Depending on the file extension, Pandas compression='infer' option will attempt to compress the file, saving disk space. This is particularly handy when dealing with very large datasets.

Conclusion

Saving a DataFrame as JSON in Pandas is a straightforward process that can be customized to fit a wide range of data storage and interchange needs. Whether you’re working with simple data structures or require more advanced configurations, Pandas provides the tools necessary to efficiently save your data in a format that’s both human-readable and machine-parseable. With the examples provided, you should now be equipped to export your DataFrames to JSON whenever the need arises.