Introduction
The DataFrame.cumsum()
method in Pandas is an incredibly useful tool that allows for the computation of cumulative sums across a DataFrame, either column-wise or row-wise. This functionality is particularly beneficial when analyzing sequential data, time series, or for computing running totals in financial data or inventories. In this tutorial, we’ll dive deep into the cumsum()
method, exploring its utility through 5 practical examples.
Before we begin, ensure that you have Pandas installed in your Python environment. You can install Pandas using pip:
pip install pandas
Basic Use of cumsum()
in Pandas
First, let’s start with a basic example to understand how to apply cumsum()
.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]
})
cs_df = df.cumsum()
print(cs_df)
Output:
A B
0 1 5
1 3 11
2 6 18
3 10 26
Column-wise Cumulative Sum
Next, let’s focus on calculating column-wise cumulative sums. This is the default behavior of cumsum()
, which we saw in our first example. However, you can explicitly specify this by passing axis=0
as an argument.
df.cumsum(axis=0)
Row-wise Cumulative Sum
To calculate cumulative sums across rows, we change the axis parameter to axis=1
.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
cs_row = df.cumsum(axis=1)
print(cs_row)
Output:
A B C
0 1 5 12
1 2 7 15
2 3 9 18
Handling NA Values
Pandas cumsum()
allows you to handle NaN (Not a Number) values gracefully. By default, cumsum()
includes NaN in the calculation as a zero value. However, you can skip them by setting the skipna
parameter to False
.
df_with_na = pd.DataFrame({
'A': [1, NaN, 3],
'B': [NaN, 5, 6],
'C': [7, NaN, 9]
})
cs_skip_na = df_with_na.cumsum(skipna=False)
print(cs_skip_na)
Output:
A B C
0 1.0 NaN 7.0
1 NaN 5.0 NaN
2 3.0 11.0 9.0
Using cumsum()
with GroupBy
For more advanced use cases, you can combine cumsum()
with GroupBy
operations to calculate cumulative sums within groups. This is particularly useful when analyzing subdivided data.
import pandas as pd
df = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B'],
'Value': [1, 2, 3, 4]
})
df_grouped = df.groupby('Category').cumsum()
print(df_grouped)
Output:
Value
0 1
1 3
2 3
3 7
Visualizing Cumulative Sums
The final example demonstrates how to visualize cumulative sums using the powerful data visualization library, Matplotlib. This is an essential skill for data scientists who need to communicate their findings effectively.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]
})
cs_df = df.cumsum()
plt.plot(cs_df['A'], label='Column A')
plt.plot(cs_df['B'], label='Column B')
plt.legend()
plt.show()
Conclusion
The Pandas DataFrame.cumsum()
method is a versatile tool that enables detailed data analysis through the computation of cumulative sums. Whether working with simple or complex datasets, understanding how to employ cumsum()
effectively can vastly enhance your data manipulation and analysis capabilities. By integrating these examples into your workflow, you’ll be well-equipped to tackle a wide range of data science challenges.