NumPy: How to Filter an Array Based on an Array of Booleans

Updated: January 22, 2024 By: Guest Contributor Post a comment

Introduction

NumPy, standing for Numerical Python, is a foundational package for numerical computations in Python. It provides support for arrays (vectors, matrices, etc.) along with a collection of mathematical functions to operate on these arrays. A common operation when dealing with NumPy arrays is filtering based on certain conditions. This tutorial will cover how to filter an array using an array of booleans in NumPy, from the basics to more advanced topics.

Prerequisite

To implement the examples provided in this tutorial, you should have the following:

  • Python installed on your system
  • NumPy installed. You can install it using pip install numpy

Basic Array Filtering by an Array of Booleans

First, let’s start by creating a simple NumPy array:

import numpy as np

# Sample array
x = np.array([1, 2, 3, 4, 5])

You can create a boolean array where each element specifies whether or not a condition is met.

# Create boolean array
bools = x > 2
print(bools)
# Output: [False False  True  True  True]

To filter the array, you can use this boolean array as an index.

# Filter the array
filtered_x = x[bools]
print(filtered_x)
# Output: [3 4 5]

This technique is commonly referred to as boolean indexing or boolean array indexing.

Filtering with Conditions

Filtering directly using a condition can simplify the process:

# Directly using condition
filtered_x = x[x > 2]
print(filtered_x)
# Output: [3 4 5]

This is equivalent to the previous example but done in one step. Furthermore, you can combine conditions using the logical operators & (and), | (or), and ~ (not).

# Combined conditions
filtered_x = x[(x > 2) & (x < 5)]
print(filtered_x)
# Output: [3 4]

Note that each condition must be enclosed in parentheses to maintain the correct order of operations.

Advanced Filtering

Let’s explore advanced filtering techniques using NumPy functions such as np.where and np.select.

Using np.where

The np.where function returns elements chosen from two arrays or values depending on the condition.

# Using np.where to select elements
result = np.where(x > 3, x, x * 10)
print(result)
# Output: [10 20 30  4  5]

If the condition is True, np.where selects the element from x; if False, it multiplies the element by 10.

Using np.select

np.select lets you apply multiple conditions and choices for each condition:

# Using np.select for multiple conditions
conditions = [x < 2, x > 4]
choices = [x**2, x * 10]
result = np.select(conditions, choices)
print(result)
# Output: [ 1  0  0  0 50]

np.select applies the power of 2 for elements less than 2 and multiplies by 10 for elements greater than 4; otherwise, it returns 0 by default.

Filtering with np.vectorize

While boolean indexing is efficient for straightforward conditions, sometimes we require a more complex function to filter elements. np.vectorize allows us to vectorize a custom function.

def custom_filter(value):
    return value if value % 2 == 0 else 0

vect_filter = np.vectorize(custom_filter)
filtered_x = vect_filter(x)
print(filtered_x)
# Output: [0 2 0 4 0]

This function filters out odd numbers by replacing them with zeros through vectorization.

Filtering Two-Dimensional Arrays

Filtering becomes a bit more intricate with 2D arrays. Let’s look at a simple example:

matrix = np.array([[1, 2], [3, 4], [5, 6]])

# Flatten the array to apply filter
flat_filtered = matrix[matrix % 2 == 0]
print(flat_filtered)
# Output: [2 4 6]

Note that the filter flattens the 2D array. To maintain the original shape, we can use np.where:

# Filter and keep 2D shape with np.where
filtered_matrix = np.where(matrix % 2 == 0, matrix, 0)
print(filtered_matrix)
# [[0 2]
#  [0 4]
#  [0 6]]

Using np.where, the structure of the matrix is preserved, and only the values that do not meet the condition are altered.

Fancy Indexing with Boolean Arrays

Fancy indexing is another advanced feature of NumPy. It uses arrays of indices to select elements. Boolean arrays can serve as such arrays.

row_condition = np.array([True, False, True])
filtered_matrix = matrix[row_condition]
print(filtered_matrix)
# Output: [[1 2]
#  [5 6]]

This selects only the rows that meet the row_condition criteria.

Conclusion

In this tutorial, we thoroughly explored various ways to filter a NumPy array using boolean arrays. We learned the basic boolean indexing and moved on to advanced examples using np.where, np.select, and np.vectorize. Remember that mastering these techniques can make array manipulation more efficient and expressive. Happy coding!