Pandas: How to get the cumulative min/max of a Series

Updated: February 18, 2024 By: Guest Contributor Post a comment

Introduction

When analyzing time series data or sequences of numbers, it’s often useful to compute cumulative statistics, such as the cumulative minimum or maximum of a series up until a certain point. This technique is invaluable for identifying trends, setting benchmarks, or detecting anomalies over time. This tutorial provides a step-by-step guide on how to calculate the cumulative minimum and maximum in a Pandas Series, offering examples that range from basic to advanced usage.

Getting Started

Before diving into cumulative calculations, ensure you have Pandas installed in your environment:

pip install pandas

Now, import Pandas and create a simple Pandas Series to work with:

import pandas as pd

# Sample Series
data = [10, 4, 2, 8, 6, 7, 3, 5, 9, 1]
series = pd.Series(data)

With your data prepared, let’s explore how to compute the cumulative minimum and maximum.

Basic Cumulative Calculations

Cumulative Minimum

To calculate the cumulative minimum up until each point in a Series, use cummin():

# Calculate cumulative minimum
cum_min = series.cummin()
print(cum_min)

Output:

0    10
1     4
2     2
3     2
4     2
5     2
6     2
7     2
8     2
9     1
dtype: int64

Cumulative Maximum

Similarly, to compute the cumulative maximum, apply cummax():

# Calculate cumulative maximum
cum_max = series.cummax()
print(cum_max)

Output:

0    10
1    10
2    10
3    10
4    10
5    10
6    10
7    10
8    10
9    10
dtype: int64

Handling Null Values

In real-world scenarios, data often contains null values. It’s essential to understand how cumulative functions handle these. By default, cummin() and cummax() ignore NaN values, effectively treating them as if they’re not part of the sequence. To illustrate, let’s add a NaN value to our series:

series_with_nan = series.copy()
series_with_nan[3] = np.nan

# Cumulative minimum with NaN
print(series_with_nan.cummin())

# Output shows NaN is ignored

Output:

0    10.0
1     4.0
2     2.0
3     NaN
4     2.0
5     2.0
6     2.0
7     2.0
8     2.0
9     1.0
dtype: float64

This behavior ensures that a single NaN value doesn’t interrupt the cumulative calculation, making it profoundly useful in noisy data sets.

Advanced Scenarios

Time Series Data

For time series data, index values are often datetime objects. Let’s simulate a time series data set and calculate the cumulative min/max:

# Generating time series data
pd.date_range('20230101', periods=10)
series.index = pd.date_range('20230101', periods=10)

# Cumulative min/max with time series
print(series.cummin())
print(series.cummax())

Using expanding() for More Flexibility

If you need more control over the calculation, such as specifying a minimum number of periods before calculating cumulative stats, you can use expanding(). This method is particularly useful for calculating running statistics with a customized window size.

exp_min = series.expanding(min_periods=3).min()
exp_max = series.expanding(min_periods=3).max()

print(exp_min)
print(exp_max)

Visualizing the Results

Visual analysis is a powerful way to understand the behavior of cumulative statistics. Pandas integrates seamlessly with Matplotlib, allowing you to plot your Series directly:

import matplotlib.pyplot as plt

cum_min.plot(label='Cumulative Min')
cum_max.plot(label='Cumulative Max', color='red')
plt.legend()
plt.show()

Conclusion

This tutorial walked you through calculating the cumulative minimum and maximum in a Pandas Series. Starting from basic examples, we explored handling null values, dealing with time series data, and leveraging the expanding() function for more complex scenarios. Understanding these techniques enhances your data analysis toolkit, enabling you to uncover deeper insights in your data. With practice, you’ll find these methods indispensable for a wide range of analytical tasks.