Pandas: How to write a DataFrame to a PDF file

Updated: February 22, 2024 By: Guest Contributor Post a comment

Overview

Exporting a Pandas DataFrame to a PDF file can be an extremely useful operation when aiming to share data in a universally accessible format without compromising the data’s integrity or formatting. This tutorial aims to guide you through basic to advanced techniques of converting DataFrames to PDF using Python’s Pandas library in conjunction with other supportive libraries.

Firstly, it’s essential to understand that Pandas alone does not directly support exporting DataFrames to PDF. The library lacks built-in functions for this specific task, hence, requires the assistance of additional libraries such as Matplotlib and ReportLab. Throughout this guide, we will explore different methods to achieve our goal, starting from simple DataFrame visualizations to complex, stylized reports.

Basic DataFrame to PDF using Matplotlib

Initially, we’ll need to install the necessary libraries:

pip install pandas matplotlib

Assuming you have a Pandas DataFrame ready, the first approach involves using Matplotlib for visualizing the DataFrame as a plot, and then saving that plot as a PDF file:

import pandas as pd
import matplotlib.pyplot as plt

# Sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

# Plotting the DataFrame
fig, ax = plt.subplots() # Create a figure and a set of subplots
ax.axis('tight')
ax.axis('off')
ax.table(cellText=df.values, colLabels=df.columns, loc='center')

# Save the plot as a PDF
plt.savefig('df_plot.pdf')

This code snippet creates a basic table representation of your DataFrame and saves it as a PDF file named df_plot.pdf. However, this method lacks customization capabilities and might not be suitable for complex reports.

Exporting PDFs with ReportLab

For more comprehensive and customized PDFs, ReportLab offers a wide range of possibilities. First, you need to install ReportLab:

pip install reportlab

Once installed, you can start building PDFs with much greater control over the layout and appearance. Here’s how to create a basic PDF report with ReportLab:

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import pandas as pd

# Creating a canvas
c = canvas.Canvas("dataframe_report.pdf", pagesize=letter)

# Sample DataFrame
# {... Same DataFrame as before ...}

c.drawString(100, 750, "DataFrame Report")
c.showPage()
c.save()

This snippet initiates a PDF file, writes a title, and saves the document. However, to include our DataFrame, we need to serialize it into a format that ReportLab can display. Since ReportLab doesn’t inherently understand Pandas DataFrames, we have to manually iterate over the DataFrame rows and columns to print each cell’s value onto the PDF:

# {... Initialize canvas and DataFrame as before ...}

def add_dataframe_to_pdf(canvas, dataframe):
    textobject = canvas.beginText(40, 750)
    for col in dataframe.columns:
        textobject.textLine(col)
    for index, row in dataframe.iterrows():
        row_str = ' '.join(map(str, row))
        textobject.textLine(row_str)
    canvas.drawText(textobject)

add_dataframe_to_pdf(c, df)
c.save()

This method provides basic functionality for DataFrame to PDF conversion but is quite limited in terms of styling and layout options. For larger data sets or more sophisticated formatting needs, a more advanced approach is necessary.

Advanced Styling with ReportLab

ReportLab is powerful and supports intricate PDF designs, including table styles that better suit data representation. Expanding on the previous examples, we can implement table styling to enhance the visual appeal of our DataFrame in the PDF report:

from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.lib import colors

# {... Initialize DataFrame ...}

doc = SimpleDocTemplate("styled_dataframe_report.pdf")
elements = []

table = Table(df.values.tolist(), colWidths=100, rowHeights=30)
table.setStyle(TableStyle([('BACKGROUND', (0,0), (-1,0), colors.grey),
                           ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
                           ('ALIGN',(0,0),(-1,-1),'CENTER'),
                           ('FONTNAME', (0,0), (-1,0), 'Helvetica-Bold'),
                           ('BOTTOMPADDING', (0,0), (-1,0), 12),
                           ('BACKGROUND',(0,1),(-1,-1),colors.beige),
                           ('GRID', (0,0), (-1,-1), 1, colors.black)]))
elements.append(table)

doc.build(elements)

This method allows you to create beautifully styled PDF reports that are both functional and aesthetically pleasing. The TableStyle enables us to apply backgrounds, font styles, alignment, and cell padding to our data, significantly improving the readability and presentation of the exported DataFrame.

Conclusion

In summary, Pandas does not directly support exporting DataFrames to PDF, but with the help of libraries like Matplotlib and ReportLab, we can accomplish this task. Starting from basic visualizations to more advanced, stylized PDF reports, this guide offers solutions catered to a variety of needs. Mastering these techniques will enable you to share and present your data sets in the universally accessible and professional format of PDF.