NumPy is a fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays. Aggregate functions are a set of functionalities NumPy offers for performing statistical operations across array elements, enabling efficient data analysis. In this tutorial, you will learn how to use NumPy’s aggregate functions like sum, min, max, and mean to analyze numerical data.

Prerequisites

Basic understanding of Python programming language.
An environment to run Python code (Jupyter Notebook, Google Colab, or any Python IDE).

NumPy library installed. You can install it via pip with the command pip install numpy.

Setting Up NumPy

Let’s start by importing the NumPy library and creating a sample array:

import numpy as np
sample_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(sample_array)

Using Aggregate Functions

Next, you’ll learn how to apply aggregate functions to NumPy arrays.

`sum` Function

The sum function calculates the total sum of the elements in the array. Below are a few examples:

# Sum of all elements
print(np.sum(sample_array))
# Sum along the first axis (rows)
print(np.sum(sample_array, axis=0))
# Sum along the second axis (columns)
print(np.sum(sample_array, axis=1))

`min` Function

The min function returns the smallest value. See these examples:

# Minimum of all elements
print(np.min(sample_array))
# Minimum along the first axis (rows)
print(np.min(sample_array, axis=0))
# Minimum along the second axis (columns)
print(np.min(sample_array, axis=1))

`max` Function

The max function returns the largest value in the array:

# Maximum of all elements
print(np.max(sample_array))
# Maximum along the first axis (rows)
print(np.max(sample_array, axis=0))
# Maximum along the second axis (columns)
print(np.max(sample_array, axis=1))

`mean` Function

The mean function calculates the arithmetic mean of elements:

# Mean of all elements
print(np.mean(sample_array))
# Mean along the first axis (rows)
print(np.mean(sample_array, axis=0))
# Mean along the second axis (columns)
print(np.mean(sample_array, axis=1))

Additional Examples and Tips

In this section, we delve into more complex scenarios and offer additional tips for using aggregate functions with NumPy arrays.

Conditional Aggregation

You can combine Boolean indexing with aggregation to compute statistics on subsets of your data:

# Sum of elements greater than 5
print(np.sum(sample_array[sample_array > 5]))
# Mean of even elements
print(np.mean(sample_array[sample_array % 2 == 0]))

Dealing with Multidimensional Arrays

NumPy allows for aggregate functions to be applied across multiple dimensions:

# Creating a 3D array
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Sum across the first axis (depth)
print(np.sum(array_3d, axis=0))
# Mean across the third axis (columns within each 2D array)
print(np.mean(array_3d, axis=2))

Using `np.amin` and `np.amax`

NumPy also provides the amin and amax functions for maximum and minimum. They are equivalent to min and max but are sometimes preferred for clarity when doing element-wise operations:

# Element-wise minimum
print(np.amin(sample_array, axis=1))
# Element-wise maximum
print(np.amax(sample_array, axis=0))

Conclusion

In this tutorial, you’ve learned how to work with NumPy’s aggregate functions to compute summarized statistics of datasets represented as multidimensional arrays. These tools are essential for data preprocessing, analysis, and performing complex mathematical computations efficiently. With your newfound understanding of sum, min, max, and mean, you’re well-prepared to tackle a wide range of data analysis tasks.

Next Article: Numpy Array vs Python List: What's the Difference?

Previous Article: Explaining numpy.logical_or() function (4 examples)

Series: NumPy Basic Tutorials

NumPy