Sling Academy
Home/Pandas/Pandas – Using DataFrame.cumsum() method (with examples)

Pandas – Using DataFrame.cumsum() method (with examples)

Last updated: February 22, 2024

Introduction

The DataFrame.cumsum() method in Pandas is an incredibly useful tool that allows for the computation of cumulative sums across a DataFrame, either column-wise or row-wise. This functionality is particularly beneficial when analyzing sequential data, time series, or for computing running totals in financial data or inventories. In this tutorial, we’ll dive deep into the cumsum() method, exploring its utility through 5 practical examples.

Before we begin, ensure that you have Pandas installed in your Python environment. You can install Pandas using pip:

pip install pandas

Basic Use of cumsum() in Pandas

First, let’s start with a basic example to understand how to apply cumsum().

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

cs_df = df.cumsum()
print(cs_df)

Output:

    A   B
0   1   5
1   3  11
2   6  18
3   10  26

Column-wise Cumulative Sum

Next, let’s focus on calculating column-wise cumulative sums. This is the default behavior of cumsum(), which we saw in our first example. However, you can explicitly specify this by passing axis=0 as an argument.

df.cumsum(axis=0)

Row-wise Cumulative Sum

To calculate cumulative sums across rows, we change the axis parameter to axis=1.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

cs_row = df.cumsum(axis=1)
print(cs_row)

Output:

    A   B   C
0   1   5  12
1   2   7  15
2   3   9  18

Handling NA Values

Pandas cumsum() allows you to handle NaN (Not a Number) values gracefully. By default, cumsum() includes NaN in the calculation as a zero value. However, you can skip them by setting the skipna parameter to False.

df_with_na = pd.DataFrame({
    'A': [1, NaN, 3],
    'B': [NaN, 5, 6],
    'C': [7, NaN, 9]
})

cs_skip_na = df_with_na.cumsum(skipna=False)
print(cs_skip_na)

Output:

     A    B     C
0  1.0  NaN   7.0
1  NaN  5.0   NaN
2  3.0  11.0  9.0

Using cumsum() with GroupBy

For more advanced use cases, you can combine cumsum() with GroupBy operations to calculate cumulative sums within groups. This is particularly useful when analyzing subdivided data.

import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Value': [1, 2, 3, 4]
})

df_grouped = df.groupby('Category').cumsum()
print(df_grouped)

Output:

   Value
0      1
1      3
2      3
3      7

Visualizing Cumulative Sums

The final example demonstrates how to visualize cumulative sums using the powerful data visualization library, Matplotlib. This is an essential skill for data scientists who need to communicate their findings effectively.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

cs_df = df.cumsum()

plt.plot(cs_df['A'], label='Column A')
plt.plot(cs_df['B'], label='Column B')
plt.legend()
plt.show()

Conclusion

The Pandas DataFrame.cumsum() method is a versatile tool that enables detailed data analysis through the computation of cumulative sums. Whether working with simple or complex datasets, understanding how to employ cumsum() effectively can vastly enhance your data manipulation and analysis capabilities. By integrating these examples into your workflow, you’ll be well-equipped to tackle a wide range of data science challenges.

Next Article: Pandas: Reading CSV and Excel files from AWS S3 (4 examples)

Previous Article: Pandas – DataFrame.cumprod() method (4 examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)