Using ndarray.partition() method in NumPy (4 examples)

Updated: March 1, 2024 By: Guest Contributor Post a comment

Introduction

When working with numerical data in Python, NumPy is a crucial library that offers a wide array of functionalities for handling arrays. One of the useful methods provided by NumPy is ndarray.partition(), which helps in partially sorting an array. Essentially, this method reorders an array in such a way that all elements smaller than the kth smallest element move to its left, and all greater elements move to its right, without completely sorting the array. This tutorial will guide you through the nuances of using the ndarray.partition() method in NumPy with practical examples, ranging from basic to advanced use cases.

What is ndarray.partition() Used for?

The partition() method of an ndarray object in NumPy rearranges the elements of the array such that the value of the element in the kth position is in the place it would be in a sorted array. All elements smaller than the kth element are moved before this element, and all larger elements are placed after it. This action effectively partitions the array into lesser and greater segments around the kth selected value.

Syntax:

ndarray.partition(kth, axis=-1, kind='introselect', order=None)

Where:

  • kth: int or sequence of ints. The index (indices) of the element(s) to partition around.
  • axis: int or None, optional. The axis along which to partition. By default, the array is flattened before partitioning. If None, the array is flattened.
  • kind: {‘introselect’}, optional. Selection algorithm. Default is ‘introselect’.
  • order: str or list of str, optional. If the array contains fields, the order of fields to consider when partitioning.

Basic Example

Let’s start with a basic example to understand the usage of ndarray.partition():

import numpy as np

# Create an unsorted numpy array
arr = np.array([3, 1, 2, 4, 5])

# Partition the array around the third element
np.partition(arr, 2)

Output:

[1 2 3 4 5]

This result shows that the array has been rearranged so that the third smallest element (3 in this case) is now at the index 2, with all smaller elements to its left and larger elements to its right, effectively partitioning the array.

Partitioning Along Multiple Indices

Next, let’s look at how to partition an array around more than one index:

import numpy as np

# Again, start with an unsorted array
multi_arr = np.array([3, 4, 2, 1, 5, 7, 6])

# Partition around the first, third, and fifth smallest elements
np.partition(multi_arr, [0, 2, 4])

Output:

[1 2 3 4 5 7 6]

This demonstrates the flexibility of ndarray.partition() in handling multiple partition indices. Here, the array is reorganized such that the specified indices (0, 2, 4) are positioned as if the array were sorted, with all other elements partitioned around them accordingly.

Partitioning Multidimensional Arrays

NumPy’s ndarray.partition() isn’t limited to one-dimensional arrays. Let’s explore its application on a two-dimensional array:

import numpy as np

# Creating a 2D array
matrix = np.array([[9, 8, 7], [1, 3, 2], [4, 6, 5]])

# Partition each row around the second smallest value
np.partition(matrix, 1, axis=1)

Output:

[[7 8 9]
 [1 2 3]
 [4 5 6]]

Here, specifying axis=1 directs ndarray.partition() to operate along rows, arranging each row such that the second smallest element is at the index 1 position. This capability to partition across different dimensions significantly enhances the method’s versatility.

Advanced Use Case: Searching With Partition

For a more advanced scenario, consider a situation where you need to find median values in a large dataset. Using partition() combined with indexing can solve this efficiently:

import numpy as np

# Large random array
large_arr = np.random.randint(1, 100, size=1000000)

# Finding the median
median_idx = len(large_arr) // 2
np.partition(large_arr, median_idx)[median_idx]

Output: Shows the median value, which varies due to the random nature of the array.

In this example, partitioning around the median index and then directly accessing the median value offers a performance advantage over full sorting, which could be significantly slower for large arrays.

Conclusion

The ndarray.partition() method in NumPy offers a powerful yet simple mechanism for performing partial sorts on arrays. From basic rearrangements to advanced data manipulation, understanding how to effectively use this method can enhance your data processing workflows. As we’ve seen through these examples, whether it’s rearranging elements around one or more indices, handling multidimensional arrays, or optimizing search operations, ndarray.partition() can be an extremely useful tool in your Python data science toolkit.