Sling Academy
Home/Pandas/Pandas: How to Visualize a Time Series with Holidays

Pandas: How to Visualize a Time Series with Holidays

Last updated: February 22, 2024

Introduction

Pandas is a powerful tool for data analysis in Python, particularly for manipulating numerical tables and time series data. One common task in time series analysis is visualizing data to understand trends, patterns, and anomalies. When working with daily data, holidays can significantly impact trends, making it important to highlight them in visualizations. This tutorial will guide you through the process of visualizing a time series with holidays using Pandas and Matplotlib.

First, ensure you have the necessary packages installed:

pip install pandas matplotlib

Step-by-Step Instructions

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
from pandas.tseries.holiday import USFederalHolidayCalendar

Step 2: Load Your Time Series Data

Assuming you have time series data, let’s load it into a Pandas DataFrame:

data = 'your_data.csv'
df = pd.read_csv(data, index_col='Date', parse_dates=True)

Ensure the date column is used as the index and parsed into datetime objects.

Step 3: Identify Holidays

Now, we’ll identify US federal holidays within our dataset’s time span. If you’re dealing with data from another country or custom holiday dates, you might need to adjust accordingly.

calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start=df.index.min(), end=df.index.max())

Step 4: Plot Time Series Data

With our data and holidays identified, let’s plot them:

plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Your_Column'], label='Daily Data')
plt.scatter(holidays, [df.loc[day, 'Your_Column'] for day in holidays], color='red', label='Federal Holidays')
plt.title('Time Series with Holidays')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

This code chunk does a few things. It creates a larger plot for better visibility, plots the daily data, then overlays red dots on the holidays by extracting their corresponding values from the DataFrame. It’s critical to ensure that the holidays’ dates exist in the DataFrame to avoid indexing errors.

Step 5: Customizing the Plot

Matplotlib offers various customization options. Here are a few tips to enhance your plot:

  • Use a grid: Helps in interpreting the plot by adding a background grid.
  • Format the dates: Especially with longer time ranges, formatting the x-axis dates prevents overcrowding.
  • Highlighting weekends: This can provide additional context, particularly when analyzing financial time series data.
from matplotlib.dates import DateFormatter, MO
plt.figure(figsize=(10, 6))
plt.plot(df.index, df['Your_Column'], label='Daily Data')
plt.scatter(holidays, [df.loc[day, 'Your_Column'] for day in holidays], color='red', label='Federal Holidays')
plt.gca().xaxis.set_major_formatter(DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(MO())
plt.grid(True)
plt.title('Time Series with Holidays Enhanced')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

The last two lines before plotting add gridlines and format date labels for better readability.

Conclusion

Visualizing time series data with holidays highlighted can uncover insights that might not be obvious at first glance, including the impact of holidays on trends. Pandas, together with Matplotlib, offers a robust set of tools for such tasks. Although this tutorial focused on US Federal holidays, the approach can be easily adapted for other regions or specific business calendars. Experimenting with plot styles and examining data from various angles can lead to more comprehensive analyses and compelling data stories.

Next Article: Pandas: How to update a Series in place (6 examples)

Previous Article: Using pandas.Series.squeeze() method (5 examples)

Series: Pandas Series: From Basic to Advanced

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)