Sling Academy
Home/Pandas/Pandas: How to save a DataFrame in JSON format (3 examples)

Pandas: How to save a DataFrame in JSON format (3 examples)

Last updated: February 22, 2024

Introduction

Pandas is a versatile tool for data analysis in Python, enabling users to handle and manipulate large datasets efficiently. One of its many functionalities includes the ability to save DataFrames in various formats, including JSON. JSON (JavaScript Object Notation) is a lightweight data-interchange format that’s easy for humans to read and write, and easy for machines to parse and generate. It’s especially useful for storing and transporting data between a server and a web application. In this tutorial, we will explore three examples that show how to save a Pandas DataFrame in JSON format, ranging from basic to advanced use cases.

Basic Example: Convert DataFrame to JSON

Let’s start with the most straightforward example. We have a simple DataFrame and we want to save it as a JSON file.

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Save the DataFrame as a JSON file
df.to_json('output.json')

In this example, the DataFrame was saved in a JSON file named ‘output.json’. The default behavior of to_json() function converts the DataFrame into a JSON string, which then gets written into a file. The resulting file is structured as columns with their corresponding values, which is the default JSON orientation in Pandas.

Specifying JSON Orientation

Pandas allows you to specify the orientation of the JSON output using the orient parameter. This can be particularly useful when working with different systems that may require a specific JSON structure. Here are the different orientations you can specify:

  • split: Dictionary containing indexes, columns, and data.
  • records: List of dictionaries with each dictionary representing a row in the DataFrame.
  • index: Nested dictionaries containing {index:{column:value}}.
  • columns: Nested dictionaries containing {column:{index:value}}.
  • values: Just the values array.

Let’s see examples for ‘records’ and ‘split’ orientations.

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Save the DataFrame as a JSON file using 'records' orientation
df.to_json('output_records.json', orient='records')

# Save the DataFrame as a JSON file using 'split' orientation
df.to_json('output_split.json', orient='split')

In ‘output_records.json’, each row of the DataFrame is a dictionary within an array. This format can be very convenient for applications that consume JSON data on a row-by-row basis. On the other hand, ‘output_split.json’ separates the data into three parts: columns, index, and data, which could be useful for reconstructing the DataFrame exactly on the receiving end.

Advanced: Customizing JSON Output

For more complex requirements, Pandas allows for a high degree of customization when converting DataFrames to JSON, including options to exclude certain columns, specify a custom date format, or apply a compression method. Here’s an example that showcases some of these advanced features:

import pandas as pd

# Create a simple DataFrame
data = {
  'Name': ['John Doe', 'Jane Smith', 'Emily Jones'],
  'Age': [28, 34, 24],
  'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Only include 'Name' and 'City' columns, and compress the file
df[['Name', 'City']].to_json('output_compressed.json', orient='records', lines=True, compression='infer')

This command saves a JSON file including only the ‘Name’ and ‘City’ columns with each record on a new line, making it compatible with JSON Lines format. Depending on the file extension, Pandas compression='infer' option will attempt to compress the file, saving disk space. This is particularly handy when dealing with very large datasets.

Conclusion

Saving a DataFrame as JSON in Pandas is a straightforward process that can be customized to fit a wide range of data storage and interchange needs. Whether you’re working with simple data structures or require more advanced configurations, Pandas provides the tools necessary to efficiently save your data in a format that’s both human-readable and machine-parseable. With the examples provided, you should now be equipped to export your DataFrames to JSON whenever the need arises.

Next Article: Pandas + Jinja: How to render a DataFrame as an HTML table

Previous Article: Pandas: How to store a DataFrame in a SQLite table

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)