How to Use Boolean Indexing in NumPy

Updated: January 22, 2024 By: Guest Contributor Post a comment

Introduction

NumPy is a popular Python library used for numerical computing. It introduces powerful capabilities like arrays and matrices, along with a suite of functions that can operate efficiently on these data structures. One of NumPy’s handy features is ‘Boolean indexing’ – a form of indexing that allows for filtering complex datasets in a concise way. In this tutorial, we’ll delve into the basics of Boolean indexing and explore various examples, escalating from simple to more complex applications.

What is Boolean Indexing?

Boolean indexing is a method where an array or a matrix is indexed by another array of Boolean values, indicating True or False. In practice, it is primarily used for filtering data according to some condition, extracting subsets of an array, or modifying parts of an array based on some logical criteria. When a NumPy array is indexed with a Boolean array, only the elements where the Boolean array is True will be included in the result.

Basic Boolean Indexing

First, let us start with a simple case. Suppose we have an array, and we want to filter out the elements that are greater than 5:

import numpy as np

# Create a NumPy array
arr = np.array([1, 5, 7, 8, 3, 2])
# Create a Boolean array that tells us which elements of arr are greater than 5
condition = arr > 5
print(condition)  # Outputs: [False False  True  True False False]

# Use Boolean indexing to create a filtered array
filtered_arr = arr[condition]
print(filtered_arr)  # Outputs: [7 8]

In the above code, we first created a condition that tests which elements of ‘arr’ are greater than 5, resulting in a Boolean array. Then, we used this Boolean array to index into ‘arr’ and obtained a new array containing only the numbers 7 and 8.

Combining Conditions

You can combine multiple conditions using logical operators to create more complex filters. Let’s say we want to extract all even numbers that are also greater than 5:

import numpy as np

# Continue using the previou_array.s arr
# Combine two conditions using a logical AND(&)
condition = (arr > 5) & (arr % 2 == 0)
filtered_arr = arr[condition]
print(filtered_arr)  # Outputs: [8]

Be sure to put parentheses around each condition to clearly define them. The logical AND operator ‘&’ is used here to ensure that both conditions must be met.

Modifying Elements with Boolean Indexing

Boolean indexing isn’t just for extraction; it’s also a powerful tool for modifying subsets of an array. For example, suppose you want to cap all elements in the array to a maximum value of 5:

import numpy as np

# Continue using the previous array.
# Using Boolean indexing to cap values
arr[arr > 5] = 5
print(arr)  # Outputs: [1 5 5 5 3 2]

This operation modifies the original array ‘arr’ by setting any value greater than 5 to 5.

Advanced Usage: Indexing with a Condition on a Different Array

It is also possible to index an array based on conditions applied to a different array, which can be quite useful for aligned datasets. Imagine you have one array representing the ages of individuals and another array of their income, and you would like to focus solely on the income of individuals who are at least 35 years old:

import numpy as np

# Simulate some data
ages = np.array([25, 35, 45, 30, 40])
income = np.array([50000, 60000, 75000, 40000, 65000])

# Boolean index with respect to another array
filtered_income = income[ages >= 35]
print(filtered_income)  # Outputs: [60000 75000 65000]

Here we filtered the ‘income’ array by creating a Boolean condition on the ‘ages’ array.

Using Boolean Indexing with Multidimensional Arrays

Boolean indexing also extends to multidimensional arrays. Consider you have a 2D array (matrix) representing some data, and you need to filter out rows based on a condition:

import numpy as np

# Create a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Condition for filtering out rows where the first element is greater than 3
condition = matrix[:,0] > 3

# Apply Boolean indexing
filtered_matrix = matrix[condition]
print(filtered_matrix)
# Output:
# [[4 5 6]
# [7 8 9]]

This example extracts rows from ‘matrix’ where the first column’s element is greater than 3. Note the comma syntax in ‘matrix[:,0]’, which is used to select the entire first column.

Conclusion

In conclusion, Boolean indexing is a powerful technique in NumPy that facilitates the manipulation and filtering of data according to specific, potentially complex, conditions. By mastering Boolean indexing, you can write more efficient and legible data processing code for your scientific and mathematical computations.