Pandas DataFrame nlargest() and nsmallest() methods (5 examples)

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

When working with large datasets in Python, optimizing your data analysis and manipulation tasks is crucial for efficiency. Pandas, a powerful library for data analysis, offers various functions that make these tasks easier. Two such functions are nlargest() and nsmallest(), which are incredibly useful for quickly obtaining the rows with the largest or smallest values in one or more columns. This tutorial will introduce you to these functions through five examples that range from basic to advanced.

Example 1: Basics of nlargest()

import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Score': [82, 93, 88, 75, 95]
})
# Get the top 3 scores
print(df.nlargest(3, 'Score'))

Output:

    Name  Score
4    Eve     95
1    Bob     93
2 Charlie     88

Example 2: Using nlargest() with multiple columns

import pandas as pd
# Additional column for our DataFrame
df['Age'] = [25, 20, 30, 35, 29]
# Get top 2 based on Score then Age
print(df.nlargest(2, ['Score', 'Age']))

Output:

  Name  Score  Age
4  Eve     95   29
1  Bob     93   20

Example 3: nsmallest() Basics

import pandas as pd
# Using the same DataFrame
# Get the bottom 3 ages
print(df.nsmallest(3, 'Age'))

Output:

      Name  Score  Age
1      Bob     93   20
4      Eve     95   29
2  Charlie     88   30

Example 4: Applying Conditions with nlargest()

import pandas as pd
# Let's add a 'Passed' column
df['Passed'] = [True, True, True, False, True]
# Get top 2 scores among those who passed
passed_df = df[df['Passed'] == True]
print(passed_df.nlargest(2, 'Score'))

Output:

  Name  Score  Age  Passed
4  Eve     95   29    True
1  Bob     93   20    True

Example 5: Custom Sorting with Lambda and nlargest()

import pandas as pd
# Consider custom conditions for sorting
sort_key = lambda x: x['Score'] + x['Age'] / 10
# Apply custom sort to get the top entry
print(df.apply(sort_key, axis=1).nlargest(1))

Output:

4    98.9
dtype: float64

Conclusion

Through the examples outlined, we have explored how nlargest() and nsmallest() methods in Pandas can efficiently find top or bottom records based on specific columns. Understanding and employing these methods can significantly streamline your data analysis processes.