Sling Academy
Home/Pandas/Using DataFrame.sum() method in Pandas (5 examples)

Using DataFrame.sum() method in Pandas (5 examples)

Last updated: February 22, 2024

Introduction

In this tutorial, we’ll explore the DataFrame.sum() method in Pandas, an incredibly versatile and powerful Python library used for data manipulation and analysis. This method is essential for performing sum operations across different axes of a DataFrame, offering both simplicity and flexibility in handling numeric data. We’ll begin with basic examples and gradually progress to more advanced use-cases, demonstrating the full potential of the sum() method. By the end of this guide, you’ll have a solid understanding of how to apply this function within various contexts.

When to Use DataFrame.sum()?

Before diving into the examples, let’s briefly overview what DataFrame.sum() is and why it’s important. In Pandas, a DataFrame is a two-dimensional labeled data structure with columns of potentially different types. The sum() method is used to calculate the sum of the values for the requested axis, which by default is the index (axis=0), meaning it sums up values column-wise. However, you can also sum up values row-wise by setting the axis parameter to 1.

Basic Summation

import pandas as pd
import numpy as np

# Creating a simple DataFrame
df = pd.DataFrame({
  'A': [1, 2, 3],
  'B': [4, 5, 6],
  'C': [7, 8, 9]
})

# Basic column-wise summation
column_sum = df.sum()
print(column_sum)

This will output:

A    6
B   15
C   24
dtype: int64

This example demonstrates the simplest use of the sum() method, summing up the values of each column.

Row-wise Summation

import pandas as pd

# Assuming the same DataFrame (df)

# Summing up values row-wise
row_sum = df.sum(axis=1)
print(row_sum)

This will output:

0    12
1    15
2    18
dtype: int64

By setting axis=1, we change the direction of summation to be across the rows, yielding the total for each row.

Summation with NaN Handling

import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
df = pd.DataFrame({
  'A': [1, np.nan, 3],
  'B': [np.nan, 5, 6],
  'C': [7, 8, np.nan]
})

# Column-wise summation with NaN values excluded
column_sum_nan = df.sum()
print(column_sum_nan)

This will output:

A     4.0
B    11.0
C    15.0
dtype: int64

The sum() method automatically excludes NaN (missing) values from the calculation, ensuring a reliable summation process even in datasets that are not perfectly clean.

Summing with Different Data Types

import pandas as pd

# DataFrame with different data types
df = pd.DataFrame({
  'A': [1, 2, 3],
  'B': [4.5, 5.5, 6.5],
  'C': ['a', 'b', 'c']
})

# Trying to sum up the entire DataFrame
total_sum = df.sum()
print(total_sum)

This demonstrates that the sum() method only operates on numeric data by default, ignoring non-numeric columns.

Using skipna to Control NaN Handling

import pandas as pd
import numpy as np

# DataFrame with NaN values
df = pd.DataFrame({
  'A': [1, np.nan, 3],
  'B': [4, 5, 6],
  'C': [np.nan, 8, 9]
})

# Column-wise summation without excluding NaN
column_sum_skipna = df.sum(skipna=False)
print(column_sum_skipna)

This will output:

A     NaN
B    15.0
C     NaN
dtype: int64

By setting skipna=False, we include NaN values in the summation, resulting in NaN for any column that contains at least one missing value. This might be useful in scenarios where identifying columns with missing data is important.

Conclusion

The DataFrame.sum() method is a powerful yet straightforward tool for performing summation operations across a DataFrame. Through this tutorial, we’ve explored various aspects and functionalities of the method, from basic summation to handling missing values and different data types. The examples provided here should serve as a solid foundation for applying these techniques to more complex data analysis tasks.

Next Article: Pandas DataFrame.std() method: Explained with examples

Previous Article: Pandas – Understanding DataFrame.skew() method

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)