Computing data ranks in Pandas DataFrame (5 examples)

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

Working with data often requires ordering and ranking based on certain criteria. Pandas, a powerful and widely-used Python library for data manipulation, provides an intuitive way to rank data within DataFrames. Ranking plays a crucial role in data analysis, helping to identify trends, anomalies, or relationships among data. This tutorial aims to guide you through various examples of computing data ranks in Pandas DataFrames, catering to beginners and advanced users alike.

Ranking in Pandas

Before diving into examples, it’s crucial to understand how ranking in Pandas works. The .rank() method in Pandas is used to compute numerical data ranks (1 through n) along an axis. By default, equal values are assigned a rank that is the average of the ranks of those values. However, this behavior can be customized using the method parameter.

Available ranking methods include:

  • average: Default. Assigns the average rank to tied values.
  • min: Assigns the minimum rank to tied values.
  • max: Assigns the maximum rank to tied values.
  • first: Ranks items by their order of appearance in the data.
  • dense: Similar to min, but the ranks always increase by 1 between groups.

Different data types and structures may require different approaches to ranking, which we will explore in the examples below.

Example 1: Basic Ranking

This example demonstrates the most straightforward ranking in a single DataFrame column.

import pandas as pd

df = pd.DataFrame({
    'Scores': [90, 85, 90, 75, 85]
})

df['Rank'] = df['Scores'].rank()

print(df)

Output:

   Scores  Rank
0      90   4.5
1      85   2.5
2      90   4.5
3      75   1.0
4      85   2.5

In this example, the scores 90 and 85 are tied, thus receive the average of their ranks (4.5 and 2.5, respectively), showcasing the default average ranking method.

Example 2: Custom Ranking Method

Here, we apply a different ranking method to handle ties differently.

df['Rank_min'] = df['Scores'].rank(method='min')

print(df)

Output:

   Scores  Rank  Rank_min
0      90   4.5       4.0
1      85   2.5       2.0
2      90   4.5       4.0
3      75   1.0       1.0
4      85   2.5       2.0

This time, using the min method, tied values receive the minimum possible rank, illustrating how choosing a ranking method affects the output.

Example 3: Ranking with Missing Values

Handling missing values is an essential aspect of data manipulation. Here, we show how Pandas deals with NaN values in ranking.

df = pd.DataFrame({
    'Scores': [90, None, 85, 90, None, 85]
})

df['Rank'] = df['Scores'].rank()

print(df)

Output:

   Scores  Rank
0    90.0   3.0
1     NaN   NaN
2    85.0   1.5
3    90.0   3.0
4     NaN   NaN
5    85.0   1.5

NaN values are excluded from the ranking, emphasizing the need to clean or impute missing values before performing ranking for analysis completeness.

Example 4: Ranking Across Multiple Columns

Advanced use cases may involve ranking data across multiple columns. This example demonstrates ranking students by multiple performance metrics.

df = pd.DataFrame({
    'Math': [90, 100, 85, 95],
    'Science': [85, 90, 88, 100],
    'English': [95, 80, 90, 85]
})

df['Overall Rank'] = df.mean(axis=1).rank(method='min')

print(df)

Output:

   Math  Science  English  Overall Rank
0    90       85       95           2.0
1   100       90       80           3.0
2    85       88       90           1.0
3    95      100       85           4.0

This method calculates an average score for each row (student) and then ranks them, offering a way to compare multidimensional data.

Example 5: Ranking with Custom Functions

The power of Pandas ranking extends with the ability to use custom functions for more complex scenarios, such as weighted averages.

def weighted_rank(df):
    weights = {'Math': 0.5, 'Science': 0.3, 'English': 0.2}
    weighted_scores = df[['Math', 'Science', 'English']].mul(weights).sum(axis=1)
    df['Weighted Rank'] = weighted_scores.rank(method='min')
    return df

df = weighted_rank(df)

print(df)

Output:

   Math  Science  English  Overall Rank  Weighted Rank
0    90       85       95           2.0           2.0
1   100       90       80           3.0           4.0
2    85       88       90           1.0           1.0
3    95      100       85           4.0           3.0

Combining Python’s flexibility with Pandas ranking capabilities allows for tailored ranking methods, such as the weighted ranking shown above.

Conclusion

Understanding and implementing data ranking in Pandas opens up numerous possibilities for data analysis and insight generation. The examples provided in this tutorial illustrate the versatility and power of Pandas for addressing a wide range of ranking needs, from the most basic to more complex, customized scenarios. Empowered with this knowledge, you are well-equipped to explore your data’s hierarchical structure and derive meaningful conclusions.