Pandas DataFrame.cummax() method: Explained with examples

Introduction
Understanding cummax()
1. Syntax and Parameters
Basic Example
Handling Null Values
Column-wise and Row-wise Computation
Advanced Usage Examples
Conclusion

Introduction

Pandas is a powerhouse tool for data analysis in Python, offering an array of functions to manipulate and analyze large datasets efficiently. One such function is .cummax(), a method used to compute the cumulative maximum of a DataFrame or Series object. This tutorial delves into the .cummax() method, explaining its syntax, parameters, and utility through a series of progressively complex examples.

Understanding `cummax()`

The .cummax() method is part of the pandas library, designed to compute the cumulative maximum of array elements over a specified axis. It returns a DataFrame or Series with the same size as the input, containing the cumulative maxima of the elements.

Syntax and Parameters

The basic syntax of the .cummax() method is as follows:

DataFrame.cummax(axis=None, skipna=True)

axis: Specifies the direction of computation. 0 or ‘index’ for column-wise, and 1 or ‘columns’ for row-wise.
skipna: Excludes NA/null values. If true, will skip over NA/null in the computation.

Basic Example

Let’s start with a basic example to illustrate how .cummax() works with a simple DataFrame:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 4, 1],
    'B': [4, 2, 1, 3]
})

# Apply cummax()
cummax_df = df.cummax()
print(cummax_df)

The output will be:

Handling Null Values

Let’s see how .cummax() handles null values in a DataFrame:

import pandas as pd

# DataFrame with null values
df = pd.DataFrame({
    'A': [1, None, 4, 1],
    'B': [None, 2, 1, 3]
})

# Apply cummax() with skipna=False
cummax_df = df.cummax(skipna=False)
print(cummax_df)

The output illustrates how null values are handled based on the skipna parameter:

     A    B
0  1.0  NaN
1  NaN  2.0
2  4.0  2.0
3  4.0  3.0

Column-wise and Row-wise Computation

Understanding how to switch between column-wise and row-wise computation is crucial for utilizing .cummax() effectively. This example demonstrates both approaches:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Column-wise cummax
col_cummax = df.cummax()
print("Column-wise cummax:\n", col_cummax)

# Row-wise cummax
row_cummax = df.cummax(axis=1)
print("Row-wise cummax:\n", row_cummax)

The resulting output highlights the differences:

Column-wise cummax:
    A  B  C
0  1  4  7
1  2  5  8
2  3  5  9

Row-wise cummax:
    A    B    C
0  1  4  7
1  2  5  8
2  3  6  9

Advanced Usage Examples

Moving to more advanced applications, let’s consider a more complex dataframe:

import pandas as pd

# Creating a more complex DataFrame
df = pd.DataFrame({
    'A': [1, 2, 5, 3],
    'B': [1, 4, 2, 8],
    'C': [5, 3, 7, 6]
})

# Apply cummax with row-wise computation and skipping NA
advanced_cummax = df.cummax(axis=1, skipna=True)
print(advanced_cummax)

This code snippet shows a practical scenario where you might need a row-wise computation to understand the progression of maximum values across each row.

Conclusion

This tutorial aimed to provide a foundational understanding of the .cummax() method in pandas, demonstrating its versatility through various examples. From handling null values to computing cumulative maxima column-wise or row-wise, we’ve seen how .cummax() can be a valuable tool in the data analysis process. Embracing its simplicity and efficiency can undoubtedly elevate your data manipulation capabilities.

Next Article: How to set a random seed in Pandas (not NumPy)

Previous Article: Pandas: How to count non-NA/null values in a DataFrame (4 ways)

Series: DateFrames in Pandas

Pandas