Working with structured arrays in NumPy (with examples)

Updated: February 19, 2024 By: Guest Contributor Post a comment

Overview

NumPy, a cornerstone library for numerical computing in Python, empowers scientists, engineers, data analysts, and hobbyists alike to perform complex mathematical operations swiftly and with ease. Among its versatile capabilities, its support for structured arrays stands out, offering a robust method to manage heterogeneous data efficiently. This tutorial explores structured arrays in NumPy through seven illustrative examples, spanning basic to advanced usage scenarios.

Creating a Structured Array

Structured arrays allow users to create ndarrays with compound data types. These data types can aggregate different scalar data types, permitting the storage of complex records. Here is how to create a basic structured array:

import numpy as np

dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = [('Alice', 29, 55.0), ('Bob', 45, 85.5), ('Cathy', 37, 68.0)]

structured_array = np.array(data, dtype=dtype)
print(structured_array)

Output:

[('Alice', 29, 55.) ('Bob', 45, 85.5) ('Cathy', 37, 68.)]

Accessing Elements

Once created, accessing data in a structured array is somewhat different from traditional ndarrays due to their composite nature. Here’s how you can access specific fields and records:

print(structured_array['age'])  # Access all ages
print(structured_array[0])     # Access the first record
print(structured_array[-1]['name'])  # Access the name of the last record

Output:

[29 45 37]
('Alice', 29, 55.)
Cathy

Modifying Structured Arrays

Modifying data within a structured array is straightforward. You can change individual records or specific fields across all records:

structured_array['age'][0] = 30
print(structured_array[0])

structured_array['weight'] += 5
print(structured_array['weight'])

Output:

('Alice', 30, 55.)
[ 60. 90.5 73. ]

Structured Array Operations

Although operations on structured arrays can’t directly execute the way they do on unstructured ndarrays, you can perform operations on individual fields. Here is an example of calculating the average weight:

average_weight = np.mean(structured_array['weight'])
print(average_weight)

Output:

74.5

Indexing and Slicing

Structured arrays support both indexing and slicing, allowing for flexible data access. This feature can be particularly useful for filtering data:

print(structured_array[structured_array['age'] > 35])

Output:

[('Bob', 45, 90.5) ('Cathy', 37, 73.)]

Multidimensional Structured Arrays

Structured arrays are not limited to one dimension. Here’s how to create a two-dimensional structured array and access its elements:

dtype = [('name', 'U10'), ('score', 'f4', (2,))]
students = np.zeros((3,3), dtype=dtype)
students['name'] = [['Alice', 'Bob', 'Charlie'], ['David', 'Eve', 'Frank'], ['Grace', 'Helen', 'Ivan']]
students['score'] = np.random.random((3,3,2))
print(students)

This example demonstrates not just the creation of a multidimensional structured array but also how to populate it with heterogeneous data.

Using pandas with Structured Arrays

For those who require the familiarity and convenience of the pandas DataFrame for structured arrays, converting between NumPy arrays and pandas DataFrames is seamless:

import pandas as pd

structured_array_df = pd.DataFrame(structured_array)
print(structured_array_df)

This conversion allows for the utilization of pandas’ extensive data manipulation and analysis functionalities, bridging the gap between NumPy’s performance-focused structured arrays and pandas’ user-friendly data structures.

Conclusion

Structured arrays in NumPy provide a powerful and efficient means to work with heterogeneous data, combining the performance advantages of NumPy with the flexibility of Python’s dynamic typing. Through the examples provided, learners can gain a foundational understanding of how to effectively utilize structured arrays in their data analysis and engineering projects.