Pandas – Using DataFrame.cumsum() method (with examples)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

The DataFrame.cumsum() method in Pandas is an incredibly useful tool that allows for the computation of cumulative sums across a DataFrame, either column-wise or row-wise. This functionality is particularly beneficial when analyzing sequential data, time series, or for computing running totals in financial data or inventories. In this tutorial, we’ll dive deep into the cumsum() method, exploring its utility through 5 practical examples.

Before we begin, ensure that you have Pandas installed in your Python environment. You can install Pandas using pip:

pip install pandas

Basic Use of cumsum() in Pandas

First, let’s start with a basic example to understand how to apply cumsum().

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

cs_df = df.cumsum()
print(cs_df)

Output:

    A   B
0   1   5
1   3  11
2   6  18
3   10  26

Column-wise Cumulative Sum

Next, let’s focus on calculating column-wise cumulative sums. This is the default behavior of cumsum(), which we saw in our first example. However, you can explicitly specify this by passing axis=0 as an argument.

df.cumsum(axis=0)

Row-wise Cumulative Sum

To calculate cumulative sums across rows, we change the axis parameter to axis=1.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

cs_row = df.cumsum(axis=1)
print(cs_row)

Output:

    A   B   C
0   1   5  12
1   2   7  15
2   3   9  18

Handling NA Values

Pandas cumsum() allows you to handle NaN (Not a Number) values gracefully. By default, cumsum() includes NaN in the calculation as a zero value. However, you can skip them by setting the skipna parameter to False.

df_with_na = pd.DataFrame({
    'A': [1, NaN, 3],
    'B': [NaN, 5, 6],
    'C': [7, NaN, 9]
})

cs_skip_na = df_with_na.cumsum(skipna=False)
print(cs_skip_na)

Output:

     A    B     C
0  1.0  NaN   7.0
1  NaN  5.0   NaN
2  3.0  11.0  9.0

Using cumsum() with GroupBy

For more advanced use cases, you can combine cumsum() with GroupBy operations to calculate cumulative sums within groups. This is particularly useful when analyzing subdivided data.

import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Value': [1, 2, 3, 4]
})

df_grouped = df.groupby('Category').cumsum()
print(df_grouped)

Output:

   Value
0      1
1      3
2      3
3      7

Visualizing Cumulative Sums

The final example demonstrates how to visualize cumulative sums using the powerful data visualization library, Matplotlib. This is an essential skill for data scientists who need to communicate their findings effectively.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

cs_df = df.cumsum()

plt.plot(cs_df['A'], label='Column A')
plt.plot(cs_df['B'], label='Column B')
plt.legend()
plt.show()

Conclusion

The Pandas DataFrame.cumsum() method is a versatile tool that enables detailed data analysis through the computation of cumulative sums. Whether working with simple or complex datasets, understanding how to employ cumsum() effectively can vastly enhance your data manipulation and analysis capabilities. By integrating these examples into your workflow, you’ll be well-equipped to tackle a wide range of data science challenges.