Pandas: How to get N smallest elements of a Series

Updated: February 18, 2024 By: Guest Contributor Post a comment

Overview

In data analysis, extracting specific parts of your data is crucial for deep insights. This is especially true with large datasets where you may only be interested in the smallest values for comparative, statistical, or ranking purposes. Pandas, a powerful and widely used Python library for data manipulation and analysis, provides intuitive methods for such operations. Among these is the capability to easily retrieve the N smallest elements from a Series. This tutorial will guide you through various scenarios and methods for accomplishing this task, enriching your data manipulation toolbox.

Preparation

Before diving into extracting elements, it’s important to understand the basics of Pandas Series. A Series is a one-dimensional labeled array capable of holding any data type. It’s one of the core data structures in Pandas. You can create a Series from a list, array, or a Python dictionary. Understanding Series is fundamental for effectively using Pandas for data manipulation.

Creating a Simple Series

import pandas as pd

# Creating a simple Series from a list
s = pd.Series([20, 35, 10, 15, 30, 45])
print(s)

The output will look something like this:

0    20
1    35
2    10
3    15
4    30
5    45
dtype: int64

Retrieving N Smallest Elements

Now, let’s learn how to get the N smallest elements from a Series. Pandas offers a straightforward method called nsmallest(). This method returns the smallest N elements from the Series, sorted in ascending order.

Basic Example

print(s.nsmallest(3))

The output will be:

2    10
3    15
4    30
dtype: int64

Understanding the nsmallest() Method

The nsmallest() method is not only easy to use but also offers various parameters to customize the operation. The most important parameter is the ‘n’ parameter, which specifies the number of smallest elements to retrieve. However, it also accepts a ‘keep’ parameter to decide how to handle ties. Let us explore how this works in different scenarios.

Handling Ties with the keep Parameter

import pandas as pd

# Consider a Series with ties
s = pd.Series([20, 10, 10, 15, 30, 45, 10])

# Using keep='first' (default)
print(s.nsmallest(3))

# Using keep='last'
print(s.nsmallest(3, keep='last'))

# Using keep='all'
print(s.nsmallest(3, keep='all'))

The above examples will respectively output:

1    10
2    10
6    10
dtype: int64

2    10
6    10
1    10
dtype: int64

1    10
2    10
6    10
dtype: int64

Advanced Usage

For more advanced data manipulation needs, you might want to explore using the nsmallest() method on DataFrames or applying it after filtering or transforming your Series. Here, the concepts of indexing, boolean masking, and function chaining come into play, allowing for more sophisticated data analyses.

Applying nsmallest() to a DataFrame

import pandas as pd

# Creating a DataFrame

# Getting the N smallest values from a specific column

Conclusion

Retrieving the N smallest elements from a Series in Pandas is not only incredibly useful but also incredibly easy, thanks to the nsmallest() method. Whether you’re dealing with basic or more complex datasets, this method offers the flexibility and power to perform this operation efficiently. Empowering your data analysis and manipulation skills begins with mastering these basic yet essential tasks.