Pandas DataFrame: Get the rank of values within each group (4 examples)

Updated: February 24, 2024 By: Guest Contributor Post a comment

Introduction

One of Pandas’ most powerful features is its ability to perform group operations efficiently. Among these, ranking values within groups based on certain criteria stands out as highly useful for data analysis. This tutorial will show you how to get the rank of values within each group in a Pandas DataFrame through four progressively complex examples.

Prerequisites

Before diving into the examples, ensure that you have Python and Pandas installed. You can install Pandas using pip:

pip install pandas

Import Pandas in your Python script to get started:

import pandas as pd

Example 1: Basic Ranking

The first example demonstrates how to rank numeric values within groups in a DataFrame. Consider the following dataset:

import pandas as pd

data = {
    'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value': [1, 2, 2, 3, 1, 5]
}
df = pd.DataFrame(data)
print(df)

This will output:

 Group  Value
  A      1
  A      2
  B      2
  B      3
  C      1
  C      5

To rank these values within each group, we can use the groupby() function along with rank():

df['Rank'] = df.groupby('Group')['Value'].rank() 
print(df)

This will result in:

 Group  Value  Rank
  A      1      1.0
  A      2      2.0
  B      2      1.0
  B      3      2.0
  C      1      1.0
  C      5      2.0

Example 2: Ranking with Ties

Next, we handle scenarios where values within groups are tied. Given the modified dataset:

import pandas as pd

data = {
    'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value': [2, 2, 3, 3, 1, 5]
}
df = pd.DataFrame(data)
print(df)

Applying the same grouping and ranking method will handle ties by assigning the average rank:

df['Rank'] = df.groupby('Group')['Value'].rank() 
print(df)

The output now indicates how Pandas handles ties:

 Group  Value  Rank
  A      2      1.5
  A      2      1.5
  B      3      1.5
  B      3      1.5
  C      1      1.0
  C      5      2.0

Example 3: Ranking in Descending Order

Often, you may want to rank items in descending order. For instance, if higher values denote higher importance, ranking them demerits-first could be insightful:

df['Rank_Desc'] = df.groupby('Group')['Value'].rank(ascending=False) 
print(df)

This will produce:

 Group  Value  Rank  Rank_Desc
  A      2      1.5     1.0
  A      2      1.5     1.0
  B      3      1.5     1.0
  B      3      1.5     1.0
  C      1      1.0     2.0
  C      5      2.0     1.0

Example 4: Custom Ranking

The final example addresses more complex ranking criteria, such as ranking by multiple columns or using custom functions. Suppose our dataset now includes two metrics:

import pandas as pd

data = {
    'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value1': [2, 2, 3, 1, 1, 5],
    'Value2': [5, 4, 3, 6, 7, 2]
}
df = pd.DataFrame(data)
print(df)

To rank by the sum of Value1 and Value2 within each group:

# applying rank to each group and calculating the rank sum
df['Rank_Sum'] = df.groupby('Group').apply(
    lambda x: x.rank(ascending=False, method='average').sum(axis=1)
).reset_index(level=0, drop=True)

print(df)

The code snippet above will add a new column 'Rank_Sum' to the DataFrame df, where each row’s value is the sum of its ranks within its group across Value1 and Value2. The .reset_index(drop=True) part is used to drop the group index added by .apply(), aligning the result back with the original DataFrame’s index. ​

Conclusion

In this tutorial, we’ve covered how to get the rank of values within groups in a Pandas DataFrame through a series of examples, ranging from the most basic scenarios to more complex ones involving custom ranking criteria. By mastering these techniques, you can uncover meaningful insights from your data, facilitating better-informed decision-making.