Pandas: Perform rolling window calculations on DataFrame (5 examples)

Oveview
Preparation
Example 1: Basic Rolling Average
Example 2: Rolling Sum with a Fixed Window
Example 3: Applying Custom Functions
Example 4: Rolling Window with Minimum Number of Observations
Example 5: Expanding Window Calculation
Conclusion

Oveview

Pandas is a powerful library in Python for data manipulation and analysis. One of the sophisticated features it offers is the ability to perform rolling window calculations on DataFrame. This technique is incredibly useful for time series analysis, smoothing out data, or for calculating moving averages, sums, or other aggregations within a sliding window across your data. This tutorial will guide you through five examples that range from basic to advanced applications of rolling window calculations using Pandas.

Preparation

Before we dive into the examples, ensure you have Pandas installed in your Python environment. If not, you can install it using pip:

pip install pandas

Additionally, for these examples, let’s assume we are working with a simple time series dataset of daily temperatures. You can create a DataFrame as follows:

import pandas as pd
import numpy as np

# Sample time series data
dates = pd.date_range('20230101', periods=6)
data = {'Temperature': [22, 24, 27, 21, 20, 19]}
df = pd.DataFrame(data, index=dates)
print(df)

Example 1: Basic Rolling Average

Our first example calculates a simple 3-day rolling average of the temperatures. This is done using the .rolling() method and specifying window=3, followed by .mean() to calculate the average.

# Calculate a 3-day rolling average
df['3_day_rolling_avg'] = df['Temperature'].rolling(window=3).mean()
print(df)

Example 2: Rolling Sum with a Fixed Window

In this example, we’ll calculate a rolling sum over a 4-day period. Similar to the rolling average, we use the .rolling() method but this time specify window=4 and use .sum() for calculation.

# Calculate a 4-day rolling sum
df['4_day_rolling_sum'] = df['Temperature'].rolling(window=4).sum()
print(df)

Example 3: Applying Custom Functions

Pandas’ rolling method also allows for the application of custom functions. This opens up a wealth of possibilities for data analysis. Here, we demonstrate using a lambda function to calculate the range (max-min) within a 3-day window.

# Apply a custom function to calculate rolling range
df['3_day_rolling_range'] = df['Temperature'].rolling(window=3).apply(lambda x: x.max() - x.min())
print(df)

Example 4: Rolling Window with Minimum Number of Observations

By default, Pandas requires the window to be fully populated with non-NA values. We can adjust this behavior by specifying the min_periods. This example computes a 3-day rolling average but requires only 2 observations within the window to perform the calculation.

# 3-day rolling average with at least 2 observations
df['3_day_avg_min_2'] = df['Temperature'].rolling(window=3, min_periods=2).mean()
print(df)

Example 5: Expanding Window Calculation

Expanding window calculations provide another form of analysis where instead of a sliding window, the window size increases over time. This is akin to a cumulative function but with more flexibility in applying statistical methods. Here, we use .expanding() followed by .mean() to calculate the expanding average of temperatures.

# Calculate an expanding average
df['expanding_avg'] = df['Temperature'].expanding().mean()
print(df)

Conclusion

The ability to perform rolling window calculations opens up numerous possibilities for analyzing temporal data in a nuanced way. Whether smoothing data points, calculating moving averages, or applying custom functions, Pandas provides an intuitive and efficient framework for these tasks. With practice, these examples can serve as a launchpad for more complex data analysis projects.

Next Article: Pandas – Perform expanding window calculations on DataFrame (5 examples)

Previous Article: Pandas: Mastering DataFrame.groupby() method (8 examples)

Series: DateFrames in Pandas

Pandas