Pandas: Turn a DataFrame to a list of dictionaries

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is an immensely popular Python library for data manipulation and analysis. One of its core data structures is the DataFrame, which efficiently stores and operates on tabular data. In certain cases, you may want to convert a DataFrame into a list of dictionaries, which can be more convenient for JSON serialization, or for passing data to systems that expect this format. This tutorial will guide you through this conversion process, covering basic to advanced scenarios.

Getting Started

Let’s start by importing pandas and creating a simple DataFrame for our examples:

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age      City
0   John   28  New York
1   Anna   34      Paris
2  Peter   29    Berlin
3  Linda   32    London

Basic Conversion

The simplest way to turn a DataFrame into a list of dictionaries is by using the .to_dict() method with the 'records' orientation, which converts each row to a dictionary, with column names as keys:

list_of_dicts = df.to_dict('records')
print(list_of_dicts)

Output:

[{'Name': 'John', 'Age': 28, 'City': 'New York'},
 {'Name': 'Anna', 'Age': 34, 'City': 'Paris'},
 {'Name': 'Peter', 'Age': 29, 'City': 'Berlin'},
 {'Name': 'Linda', 'Age': 32, 'City': 'London'}]

Customizing the Conversion

You may want to include or exclude certain columns from your list of dictionaries. Pandas allows for easy customization by passing specific columns to the .to_dict() method:

# Including specific columns
list_of_dicts_partial = df[['Name', 'City']].to_dict('records')
print(list_of_dicts_partial)

Output:

[{'Name': 'John', 'City': 'New York'},
 {'Name': 'Anna', 'City': 'Paris'},
 {'Name': 'Peter', 'City': 'Berlin'},
 {'Name': 'Linda', 'City': 'London'}]

Dealing with Missing Data

When converting DataFrames with missing values to dictionaries, it’s important to decide how these values should be handled. By default, Pandas will include missing values as None in the dictionaries. However, you can choose to exclude these keys entirely:

# DataFrame with missing values
data_with_missing = {'Name': ['Tom', 'Sara', 'Chris'], 'Age': [25, None, 28], 'City': ['Rome', 'Madrid', None]}
df_missing = pd.DataFrame(data_with_missing)

# Exclude keys with None values
list_of_dicts_no_none = df_missing.dropna().to_dict('records')
print(list_of_dicts_no_none)

Output:

[{'Name': 'Tom', 'Age': 25, 'City': 'Rome'}]

Advanced Conversion Techniques

For applications needing more control or additional processing during conversion, Pandas allows for more sophisticated customization. For example, using list comprehensions with DataFrame.iterrows() for row-wise processing:

# Advanced example with iterrows()
advanced_list_of_dicts = [{col:val for col, val in row.iteritems()} for index, row in df.iterrows()]
print(advanced_list_of_dicts)

Output:

[{'Name': 'John', 'Age': 28, 'City': 'New York'},
 {'Name': 'Anna', 'Age': 34, 'City': 'Paris'},
 {'Name': 'Peter', 'Age': 29, 'City': 'Berlin'},
 {'Name': 'Linda', 'Age': 32, 'City': 'London'}]

Conclusion

Converting a DataFrame to a list of dictionaries is a versatile skill that enhances interoperability between Pandas and other Python libraries or external systems. This tutorial has demonstrated how to perform this conversion, from basic to advanced techniques, providing complete flexibility for your data manipulation needs.