NumPy: Getting indices of N maximum values in an array (4 examples)

Updated: March 1, 2024 By: Guest Contributor Post a comment

Introduction

NumPy is a fundamental package for scientific computing in Python. It offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. One common operation that data scientists and engineers require when working with data is finding the indices of the top N maximum values in an array. This can be crucial in tasks such as feature selection, outlier detection, or even when sorting data. In this tutorial, we will explore four different examples, ranging from basic to advanced, on how to get the indices of the N maximum values in a NumPy array.

Example 1: Basic use of argpartition for finding maximum values

import numpy as np

# Create a numpy array
a = np.array([3, 1, 2, 5, 4])

# Get the indices of the two maximum values
indices = np.argpartition(a, -2)[-2:]

# Sort the indices to get them in the order they appear in the array
indices = indices[np.argsort(a[indices])]

# Output: [3, 4]
print(f'Indices of the top 2 maximum values are: {indices}')

Output:

Indices of the top 2 maximum values are: [4 3]

The argpartition function performs an indirect partitioning along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as the input array. This method is not only efficient but also reduces the complexity of finding the top N elements.

Example 2: Using flip for more intuitive sorting

import numpy as np

a = np.array([10, 65, 22, 11, 3])

# Get the indices of the four maximum values
indices = np.argpartition(a, -4)[-4:]
indices = np.flip(indices[np.argsort(a[indices])])

# Output: [1, 2, 3, 0]
print(f'Indices of the top 4 maximum values are: {indices}')

Output:

Indices of the top 4 maximum values are: [1 2 3 0]

Using np.flip on the sorted indices array helps in getting the indices in descending order of their value in the original array, making it more intuitive and readable.

Example 3: Finding the top N values in multidimensional arrays

import numpy as np

# Creating a 2D array
array = np.array([[1, 3, 5], [2, 4, 6]])

# Flattening the 2D array to a 1D array
flattened_array = array.flatten()

# Finding indices of the top 2 maximum values in the flattened array
indices = np.argpartition(flattened_array, -2)[-2:]
indices = np.flip(indices[np.argsort(flattened_array[indices])])

# Converting flat indices back to 2D indices
row_indices, col_indices = np.unravel_index(indices, array.shape)

# Output: (array([1, 1]), array([2, 1]))
print(f'Row indices: {row_indices}, Column indices: {col_indices}')

Output:

Row indices: [1 0], Column indices: [2 2]

Finding the top N values in multidimensional arrays requires an additional step of flattening the array, performing the operation, and then converting back the flat indices to multi-dimensional indices using np.unravel_index. This method keeps the operation efficient and straightforward.

Example 4: Utilizing heapq for large arrays

import numpy as np
import heapq

# For very large arrays where efficiency is crucial
large_array = np.random.randint(0, 1000, size=10000)

# Finding the 5 maximum values using a heap
indices = np.array(heapq.nlargest(5, range(len(large_array)), large_array.take))

# Output: array([indices of the 5 largest values])
print(f'Indices of the top 5 maximum values are: {indices}')

Output (vary, due to the randomness of the input array);

Indices of the top 5 maximum values are: [1165 1587 2034 2240 7263]

While NumPy is incredibly efficient, for very large arrays and where computational efficiency is paramount, the Python built-in module heapq can be used. This approach utilizes a min-heap data structure to find the N largest elements in a more memory-efficient way, especially when N is much smaller than the size of the array.

Conclusion

Finding the indices of the top N maximum values in a NumPy array is a fundamental task that can be performed efficiently using various methods. Whether you are working with small arrays and need a quick solution or dealing with large datasets where computational efficiency is crucial, these four examples showcase different ways to achieve this task, emphasizing both simplicity and performance.