Pandas: Using DataFrame.agg() method (4 examples)

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a powerful Python library for data manipulation and analysis, particularly useful for working with structured data. The DataFrame.agg() method is one of the core functionalities provided by Pandas. It allows you to aggregate using one or more operations over the specified axis. This article will explore the DataFrame.agg() method through four progressive examples, starting from basic to more advanced use cases.

The Use of DataFrame.agg()

The DataFrame.agg() method is used for applying functions along an axis of the DataFrame. Objects passed to the agg() function can be a single function, list, or a dictionary, making it versatile for different aggregation operations. Whether you’re summarizing data, performing statistical analysis, or consolidating records, agg() offers a straightforward approach.

Example 1: Basic Aggregation

First, let’s start with a basic example. We’ll create a simple DataFrame and perform a single aggregation function on it.

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)

# Use `agg` to find the sum of each column
df_sum = df.agg('sum')
print(df_sum)

Output:

A    10
B    26
C    42
type: int64

Example 2: Multiple Aggregation Functions

Now, let’s apply multiple aggregation functions to each column. This showcases the flexibility of agg().

import pandas as pd

# Reuse the first example's DataFrame
df_multiple_agg = df.agg(['sum', 'mean', 'std'])
print(df_multiple_agg)

Output:

        A     B     C
sum  10.0  26.0  42.0
mean  2.5   6.5  10.5
std   1.29  1.29  1.29

Example 3: Aggregating with a Custom Function

The power of agg() also lies in its ability to work with custom functions. This example demonstrates how a custom function can be used for aggregation.

import pandas as pd

# Define a custom function
def custom_range(x):
    return x.max() - x.min()

# Apply the custom function using `agg`
df_custom_fun = df.agg(custom_range)
print(df_custom_fun)

Output:

A    3
B    3
C    3
type: int64

Example 4: Column-specific Aggregation

For our final example, let’s see how to apply different aggregation functions to specific columns. This is particularly useful for datasets where different columns may require different types of summarization.

import pandas as pd

# Specify column-wise aggregation
agg_funcs = {
    'A': 'mean',
    'B': ['sum', 'max'],
    'C': custom_range  # Using the custom function from Example 3
}
df_column_spec_agg = df.agg(agg_funcs)
print(df_column_spec_agg)

Output:

           A     B   C
mean     2.5   NaN NaN
sum      NaN  26.0 NaN
max      NaN   8.0 NaN
custom_range NaN NaN 3.0

Conclusion

The DataFrame.agg() method in Pandas offers a flexible way to aggregate data across different dimensions of your DataFrame. Whether using preset functions, lists of functions, or custom ones, agg() can address a wide range of data summarization needs. The examples provided showcase just a fraction of what’s possible, encouraging exploration and application to your specific data tasks.