Pandas: How to save a DataFrame to an Excel file

Updated: February 19, 2024 By: Guest Contributor Post a comment

Overview

Pandas, a powerful and versatile Python library, is synonymous with data manipulation and analysis for good reason. Its intuitive design and rich functionality make it an indispensable tool for data scientists and analysts. One of its many strengths is the seamless way it interacts with a variety of data formats, including Excel. In this tutorial, we will journey through the process of saving a DataFrame to an Excel file using Pandas, exploring from the basic to more advanced techniques.

Getting Started

Before diving into the code, ensure you have Pandas installed in your environment. If not, you can install it using pip:

pip install pandas

You also need to install the ‘openpyxl’ engine, as it’s Pandas’ default engine to write Excel files:

pip install openpyxl

Basic Example

Let’s start with the basics, creating a simple DataFrame and saving it to an Excel file.

import pandas as pd

# Create a simple DataFrame
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 34, 29, 32],
    'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)

# Save the DataFrame to an Excel file
df.to_excel('people.xlsx', index=False)

This code block creates a DataFrame from a simple dictionary and saves it as ‘people.xlsx’, turning off the index to prevent it from appearing in the Excel file. The result is a neatly structured Excel file containing our DataFrame information.

Specifying Sheet Name

Excel workbooks often comprise multiple sheets for better data organization. Pandas allows you to specify which sheet to save your DataFrame to.

df.to_excel('people.xlsx', sheet_name='Employees', index=False)

This code snippet saves the DataFrame to a sheet named ‘Employees’. If the sheet does not exist, Pandas will create it.

Multiple DataFrames in a Single Excel File

Saving multiple DataFrames to a single Excel file, each in a different sheet, requires a slightly more advanced approach using the ExcelWriter object.

import pandas as pd

# Assuming 'df' is your DataFrame
with pd.ExcelWriter('people_multiple.xlsx') as writer:
    df.to_excel(writer, sheet_name='Sheet1', index=False)
    df.to_excel(writer, sheet_name='Sheet2', index=False)

This method ensures that each DataFrame is saved to a different sheet within the same Excel file, a handy feature for organizing related datasets.

Styling Excel Files

Pandas also allows for basic styling in Excel files. This feature can be particularly useful for highlighting important data or making your Excel files more visually appealing.

Applying Conditional Formatting

You can apply conditional formatting to your Excel files to highlight data that meets certain conditions.

import pandas as pd
from openpyxl.styles import PatternFill

# Assuming 'df' is your DataFrame
with pd.ExcelWriter('styled.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, index=False)
    # Accessing the workbook and then the sheet to apply styles
    workbook = writer.book
    sheet = writer.sheets['Sheet1']
    
    for i, value in enumerate(df['Age']):
        if value > 30:
            # Apply yellow fill to cells in the 'Age' column where the value is greater than 30
            sheet.cell(row=i + 2, column=2).fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')

This code applies yellow fill to cells in the ‘Age’ column where the age is greater than 30, making them stand out.

Conclusion

Through this tutorial, we have explored a variety of ways to save Pandas DataFrames to Excel files, from simple one-liners to more complex techniques involving multiple sheets and basic styling. Mastering these techniques allows for efficient data preparation and presentation, a crucial skill in the data science toolkit. Happy analyzing!