# Using DataFrame.sum() method in Pandas (5 examples)

## Introduction

In this tutorial, weâ€™ll explore the `DataFrame.sum()` method in Pandas, an incredibly versatile and powerful Python library used for data manipulation and analysis. This method is essential for performing sum operations across different axes of a DataFrame, offering both simplicity and flexibility in handling numeric data. Weâ€™ll begin with basic examples and gradually progress to more advanced use-cases, demonstrating the full potential of the `sum()` method. By the end of this guide, youâ€™ll have a solid understanding of how to apply this function within various contexts.

## When to Use DataFrame.sum()?

Before diving into the examples, letâ€™s briefly overview what `DataFrame.sum()` is and why itâ€™s important. In Pandas, a DataFrame is a two-dimensional labeled data structure with columns of potentially different types. The `sum()` method is used to calculate the sum of the values for the requested axis, which by default is the index (axis=0), meaning it sums up values column-wise. However, you can also sum up values row-wise by setting the axis parameter to 1.

## Basic Summation

``````import pandas as pd
import numpy as np

# Creating a simple DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})

# Basic column-wise summation
column_sum = df.sum()
print(column_sum)
``````

This will output:

``````A    6
B   15
C   24
dtype: int64
``````

This example demonstrates the simplest use of the `sum()` method, summing up the values of each column.

## Row-wise Summation

``````import pandas as pd

# Assuming the same DataFrame (df)

# Summing up values row-wise
row_sum = df.sum(axis=1)
print(row_sum)
``````

This will output:

``````0    12
1    15
2    18
dtype: int64
``````

By setting `axis=1`, we change the direction of summation to be across the rows, yielding the total for each row.

## Summation with NaN Handling

``````import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [np.nan, 5, 6],
'C': [7, 8, np.nan]
})

# Column-wise summation with NaN values excluded
column_sum_nan = df.sum()
print(column_sum_nan)
``````

This will output:

``````A     4.0
B    11.0
C    15.0
dtype: int64
``````

The `sum()` method automatically excludes `NaN` (missing) values from the calculation, ensuring a reliable summation process even in datasets that are not perfectly clean.

## Summing with Different Data Types

``````import pandas as pd

# DataFrame with different data types
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4.5, 5.5, 6.5],
'C': ['a', 'b', 'c']
})

# Trying to sum up the entire DataFrame
total_sum = df.sum()
print(total_sum)
``````

This demonstrates that the `sum()` method only operates on numeric data by default, ignoring non-numeric columns.

## Using skipna to Control NaN Handling

``````import pandas as pd
import numpy as np

# DataFrame with NaN values
df = pd.DataFrame({
'A': [1, np.nan, 3],
'B': [4, 5, 6],
'C': [np.nan, 8, 9]
})

# Column-wise summation without excluding NaN
column_sum_skipna = df.sum(skipna=False)
print(column_sum_skipna)
``````

This will output:

``````A     NaN
B    15.0
C     NaN
dtype: int64
``````

By setting `skipna=False`, we include `NaN` values in the summation, resulting in `NaN` for any column that contains at least one missing value. This might be useful in scenarios where identifying columns with missing data is important.

## Conclusion

The `DataFrame.sum()` method is a powerful yet straightforward tool for performing summation operations across a DataFrame. Through this tutorial, weâ€™ve explored various aspects and functionalities of the method, from basic summation to handling missing values and different data types. The examples provided here should serve as a solid foundation for applying these techniques to more complex data analysis tasks.

Search tutorials, examples, and resources