NumPy: How to Filter an Array by a Condition

Updated: January 22, 2024 By: Guest Contributor Post a comment

Introduction

NumPy is a foundational package for scientific computing in Python. It offers comprehensive mathematical functions, random number generators, linear algebra routines, and much more. Particularly, its powerful N-dimensional array object is widely used in data analysis, machine learning, and engineering. In this tutorial, we’ll explore how to filter NumPy arrays using boolean indexing and conditions to select elements that satisfy certain criteria.

Basic Filtering with Comparison Operators

At its simplest, filtering can be done with comparison operators. When you perform a comparison operation on an array, you get a boolean array that you can use to select elements.

import numpy as np

# Create a NumPy array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Condition for filtering
condition = arr > 5

# Filtered array
filtered_arr = arr[condition]
print(filtered_arr)

Output:

[ 6  7  8  9 10]

Combining Conditions

You can combine multiple conditions using logical operators such as & (and), | (or), and ~ (not).

import numpy as np

# Combining conditions
condition = (arr > 5) & (arr < 8)

# Apply the conditions to filter the array
filtered_arr = arr[condition]
print(filtered_arr)

Output:

[6 7]

Using np.where

The np.where function is a versatile choice for filtering, as it can return indices of elements or replace the elements based on conditions.

Returning indices:

indices = np.where(arr > 5)
print(indices)

# Use indices to filter the array
filtered_arr = arr[indices]
print(filtered_arr)

Output:

(array([5, 6, 7, 8, 9]),)
 [ 6  7  8  9 10]

Replacing elements based on a condition:

replaced_arr = np.where(arr > 5, arr, -1)
print(replaced_arr)

Output:

[-1 -1 -1 -1 -1  6  7  8  9 10]

Fancy Indexing

Fancy indexing allows you to filter arrays by specifying the exact indices of the elements you want to select.

indices = [1, 3, 5]
filtered_arr = arr[indices]
print(filtered_arr)

Output:

[2 4 6]

Filtering with A Function

Sometimes, you might want to use a custom function to determine the elements to filter. In such cases, you can use the np.vectorize utility to apply the function to each element of the array.

def is_prime(x):
    if x < 2:
        return False
    for n in range(2, int(x**0.5) + 1):
        if x % n == 0:
            return False
    return True

# Vectorize the custom function
vectorized_is_prime = np.vectorize(is_prime)

# Apply the vectorized function as a filter
filtered_arr = arr[vectorized_is_prime(arr)]
print(filtered_arr)

Output:

[2 3 5 7]

Advanced Example: Structured Arrays

NumPy can handle more complex data through structured arrays. You can filter structured arrays similarly, using boolean indexing based on the condition tested on one or more fields.

structured_arr = np.array([  ('Alice', 25, 55.0),
                            ('Bob', 32, 75.5),
                            ('Catherine', 19, 62.1)],
                          dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

# Condition: Select people older than 20
condition = structured_arr['age'] > 20

# Apply condition
filtered_structured_arr = structured_arr[condition]
print(filtered_structured_arr)

Output:

[('Alice', 25, 55.)  ('Bob', 32, 75.5)]

Conclusion

Filtering arrays based on conditions is a frequent operation in data analysis and scientific computing. NumPy offers multiple ways to perform such tasks, which can handle a wide range of scenarios from simple to complex data structures. Understanding these techniques can significantly increase your data manipulation prowess in Python.