NumPy: Checking if an array contains another subarray (3 examples)

Updated: March 1, 2024 By: Guest Contributor Post a comment

Introduction

Numerical computing is a core aspect of data science and machine learning, with Python’s NumPy library standing at the forefront. This article delves into how to ascertain the presence of a subarray within a larger array using NumPy, accompanied by practical code illustrations.

In data analysis and manipulation, being able to determine whether a larger dataset contains a specific subset of data is invaluable. NumPy, a cornerstone library for numerical computations in Python, offers efficient ways to perform this check, which we will explore through incremental examples.

Example 1: Basic Comparison

Starting with the simplest case, let’s see how to check if an array includes a certain subarray using the np.array_equal function when both arrays are exactly identical.

import numpy as np

# Define the main array and a subarray
dataset = np.array([1, 2, 3, 4, 5])
target_subarray = np.array([3, 4])

# Function to check for subarray
def contains_subarray(dataset, subarray):
    for i in range(len(dataset) - len(subarray) + 1):
        if np.array_equal(dataset[i:i+len(subarray)], subarray):
            return True
    return False

# Check if dataset contains the subarray
result = contains_subarray(dataset, target_subarray)
print(f"Array contains subarray: {result}")

Output:

Array contains subarray: True

This code snippet checks each segment of the dataset of the same length as the subarray for an exact match. It returns True if the subarray is found, indicating the presence of the subarray within the larger array.

Example 2: Using Strides for Efficient Searching

Next, we’ll leverage NumPy’s powerful handling of array strides to search for the subarray in a more efficient manner. This method is particularly useful for large datasets.

import numpy as np

from numpy.lib.stride_tricks import sliding_window_view

# Define the main array and a subarray
dataset = np.array([1, 2, 3, 4, 5])
target_subarray = np.array([3, 4])

# Repeat the dataset and target subarray definition

# Use sliding_window_view to create all possible windows of the same size as the subarray
windows = sliding_window_view(dataset, window_shape=len(target_subarray))

# Check if any window is equal to the subarray
result = np.any(np.all(windows == target_subarray, axis=1))
print(f"Subarray found: {result}")

Output:

Subarray found: True

This approach creates a sliding window over the dataset, each of the same size as the subarray, and checks all of them for a match with the subarray simultaneously. It’s much faster for larger arrays, making efficient use of NumPy’s capabilities.

Example 3: Advanced Matching with Broadcasting and Convolution

For even more advanced scenarios, where you might be looking for a pattern rather than an exact subarray, you can use broadcasting and convolution operations to find subarrays that match a given pattern within a tolerance level.

from scipy.signal import convolve
import numpy as np

# Define the main array
dataset = np.array([1, 2, 3, 4, 5])

# Assume the same dataset and define a pattern with a tolerance
pattern = np.array([3, 0])  # 0 as a placeholder for tolerance
tolerance = 1

# Function that uses convolution to find matches


def find_pattern(dataset, pattern, tolerance):
    conv_result = convolve(dataset, pattern[::-1], mode='valid')
    possible_matches = np.where((conv_result >= pattern.sum() - (len(pattern) * tolerance)) &
                                (conv_result <= pattern.sum() + (len(pattern) * tolerance)))[0]
    return len(possible_matches) > 0


result = find_pattern(dataset, pattern, tolerance)
print(f"Pattern matches found: {result}")

Output:

Pattern matches found: True

This example demonstrates a sophisticated method using convolution to identify regions in the dataset that approximately match the pattern, considering the specified tolerance. This is particularly useful for fuzzy matches in data analysis and signal processing.

Conclusion

Through these examples, we’ve explored various efficient methods for checking the presence of a subarray within a larger dataset using NumPy. From basic comparisons to advanced pattern matching, NumPy offers powerful tools for array manipulation and analysis, enabling sophisticated data processing pipelines. Whether working with small datasets or large-scale data, mastering these techniques is crucial for efficient data analysis in Python.