Using ndarray.searchsorted() method in NumPy (5 examples)

Updated: February 26, 2024 By: Guest Contributor Post a comment

Introduction

NumPy, an essential library in the Python ecosystem, significantly enhances numerical computations, making it a staple for scientists and engineers alike. One of its handy functions, ndarray.searchsorted(), offers a fast way to find indices to insert elements in a sorted array, ensuring the array’s order is maintained. This tutorial will delve into the searchsorted() method, demonstrating its utility and flexibility through five progressively complex examples.

Prerequisites: You should have Python and NumPy installed on your machine. Basic knowledge of Python and familiarity with NumPy arrays is also expected.

Basic Usage of searchsorted()

The searchsorted() method finds indices where elements should be inserted to maintain order. Let’s start with a simple sorted array:

import numpy as np

arr = np.array([1, 3, 5, 7])
indexes = arr.searchsorted([2, 4, 6])
print(indexes)

Output:

[1, 2, 3]

This output shows that to insert the numbers 2, 4, and 6 into our array and maintain the sorted order, they should be placed at indices 1, 2, and 3 respectively.

Specifying the Side Parameter

Next, let’s see how the side parameter affects insertion. By default, searchsorted() inserts values to the right of equal elements. However, you can alter this behavior:

import numpy as np

arr = np.array([1, 2, 2, 3])
left_side = arr.searchsorted([2], side='left')
right_side = arr.searchsorted([2], side='right')
print("Left side insertion index:", left_side)
print("Right side insertion index:", right_side)

Output:

Left side insertion index: 1 
Right side insertion index: 3

As we can see, specifying side='left' inserts before the first equal value, while side='right' inserts after the last equal value.

Searching in Multidimensional Arrays

NumPy’s searchsorted() isn’t limited to one-dimensional arrays. However, for multidimensional arrays, it only operates along the specified axis. Observe:

import numpy as np

arr = np.array([[10, 15, 20, 25], [30, 35, 40, 45]])
index_2d = arr.searchsorted(22, axis=1)
print("Indices to maintain order in a 2D array:", index_2d)

Output:

Indices to maintain order in a 2D array: [3 0]

This tells us that to preserve order, the value 22 should be inserted at index 3 in the first row and index 0 in the second row.

Sorting and Searching with Duplicate Values

Handling arrays with duplicate values is a common scenario. To demonstrate how searchsorted() deals with duplicates, let’s first sort an array:

import numpy as np

arr = np.array([2, 1, 3, 2, 4, 2])
sorted_arr = np.sort(arr)
indexes = sorted_arr.searchsorted([2], side='left')
print("Duplicated values sorting and searching:", indexes)

Output:

Duplicated values sorting and searching: [1]

This example shows that when searching for where to insert a value present multiple times in the sorted array, specifying the side can help control the exact index of insertion.

Custom Comparator for Searching

NumPy does not directly support custom comparators for searchsorted(). However, one can achieve similar functionality by transforming array elements before the search operation. Let’s say we want to insert a list of durations into an array of timedeltas:

import numpy as np
from datetime import timedelta

# Assuming arr is an array of timedeltas, and durations is a list of seconds
arr = np.array([timedelta(seconds=s) for s in [10, 20, 30]])
durations = [15, 25]
# Convert durations to timedeltas
new_durations = np.array([timedelta(seconds=s) for s in durations])
# Perform search
indexes = arr.searchsorted(new_durations)
print("Insert indices for durations:", indexes)

Output:

 Insert indices for durations: [1, 2]

In this sophisticated scenario, we converted the seconds to timedelta instances before searching, illustrating how preprocessing data can mimic custom comparison logic.

Conclusion

The ndarray.searchsorted() function in NumPy is a powerful tool for maintaining the order of sorted arrays with minimal effort. From basic usage to handling duplicates and even working with custom data types, understanding how to leverage this method can significantly improve data handling and processing efficiency. As we saw in the examples above, it’s a versatile function that can be adapted to a wide range of scenarios, showcasing the extent of NumPy’s capability in handling arrays.