Pandas DataFrame: Finding min/max value in each group

Updated: February 21, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a powerful Python data analysis toolkit, and its DataFrame structure provides numerous functionalities for manipulating and analyzing tabular data. One common operation is grouping data and then finding the minimum or maximum values within each group. This tutorial explores how to accomplish this task with multiple examples ranging from basic to advanced usage.

Getting Started

First, let’s ensure you have Pandas installed. If not, you can install it using pip:

pip install pandas

Then, import Pandas and create a simple DataFrame to work with:

import pandas as pd
data = {'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'C'],
        'Values': [10, 15, 5, 20, 22, 3, 27, 25]}
df = pd.DataFrame(data)
print(df)

The output will display our initial DataFrame:

  Category  Values
0        A      10
1        B      15
2        A       5
3        C      20
4        B      22
5        A       3
6        C      27
7        C      25

Basic Grouping and Finding Min/Max

The simplest way to find the minimum or maximum value in each group is by using the groupby approach together with min or max functions:

min_values = df.groupby('Category')['Values'].min()
print(min_values)

This will give us the minimum values:

Category
A     3
B    15
C    20
Name: Values, dtype: int64
max_values = df.groupby('Category')['Values'].max()
print(max_values)

And similarly, the maximum values:

Category
A    10
B    22
C    27
Name: Values, dtype: int64

Creating Custom Aggregations

For more advanced usage, you might want to perform custom aggregations after grouping. Pandas allows us to do this using the agg() method. Here, we can pass in a dictionary specifying the operations to perform on each column:

custom_agg = df.groupby('Category').agg({
    'Values': ['min', 'max']
})
print(custom_agg)

This code returns both the minimum and maximum values for each category under the ‘Values’ column:

          Values    
              min max
Category            
A               3  10
B              15  22
C              20  27

Utilizing Custom Functions for Groupwise Analysis

To further customize our data analysis, we can apply our functions to groups. This allows for more flexibility and can be accomplished with the apply() method:

def min_max_difference(group):
    return group.max() - group.min()
differences = df.groupby('Category')['Values'].apply(min_max_difference)
print(differences)

Output:

Category
A     7
B     7
C     7
Name: Values, dtype: int64

Here, we defined a function min_max_difference that calculates the difference between the maximum and minimum values of each group and applied it to our DataFrame groups.

Advanced Scenario: Conditional Aggregations

In more complex scenarios, you might want to perform aggregations based on certain conditions. For example, calculating the maximum value in each group but only if it exceeds a specified threshold. This is where the mask and where functions from Pandas can be useful:

filtered_max = df.groupby('Category')['Values'].apply(lambda x: x.where(x > 5).max())
print(filtered_max)

This code snippet filters the values before calculating the maximum, ensuring that only values greater than 5 are considered:

Category
A    10
B    22
C    27
Name: Values, dtype: int64

Conclusion

Finding the minimum or maximum values in each group of a DataFrame is a fundamental task when analyzing data with Pandas. Through examples in this tutorial, we explored the versatility of Pandas’ grouping, aggregation, and custom function capabilities. Whether you’re performing basic analysis or dealing with complex custom logic, Pandas provides the tools necessary to derive meaningful insights from your data.