NumPy – Using random Generator.shuffle() method (4 examples)

Updated: March 1, 2024 By: Guest Contributor Post a comment

Overview

In NumPy, the random.Generator.shuffle() method randomly rearranges the elements of an array. Unlike permutations, which return a new array, shuffle() modifies the array in place. This is important for managing memory usage, especially with large arrays.

Syntax:

generator.shuffle(x, axis=0)

Parameters:

  • x: array_like. The array or mutable sequence to be shuffled. The shuffle is performed along the first axis by default, but can be performed along a specified axis.
  • axis: int, optional. The axis along which x is shuffled. The default is 0. If None, the array is flattened before shuffling.

Returns:

  • This method does not return a value; it shuffles the array x in-place.

Example 1: Basic Usage

First, we’ll see how to shuffle a simple one-dimensional array.

import numpy as np

rng = np.random.default_rng()
arr = np.array([1, 2, 3, 4, 5])
rng.shuffle(arr)
print(arr)

Output might vary, e.g., [5, 2, 4, 1, 3]. The exact sequence of shuffled elements will differ everytime you run the code.

Example 2: Shuffling Multidimensional Arrays

Next, let’s shuffle a 2D array. Importantly, shuffle() only shuffles the first axis.

import numpy as np

rng = np.random.default_rng()
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
rng.shuffle(arr)
print(arr)

Output might look like:

[[5, 6],
 [1, 2],
 [7, 8],
 [3, 4]]

This demonstrates how the rows are shuffled but elements within rows remain in their original order.

Example 3: Shuffling with Seeds

Using seeds can ensure reproducibility in experiments. When a seed is set, shuffle() will produce the same shuffling order each time.

import numpy as np

seed = 42
rng = np.random.default_rng(seed)
arr = np.array([1, 2, 3, 4, 5])
rng.shuffle(arr)
print(arr)

Setting the seed to 42 and running this block should consistently produce the same shuffled array, like this:

[5 3 4 2 1]

Example 4: Shuffling in Applications

Finally, let’s explore a real-world application: shuffling a dataset before splitting it into training and test sets.

import numpy as np
from sklearn.model_selection import train_test_split

rng = np.random.default_rng()
data = np.arange(100).reshape((50, 2))
rng.shuffle(data)
train, test = train_test_split(data, test_size=0.2)
print(f'Training Shape: {train.shape}, Test Shape: {test.shape}')

Output:

Training Shape: (40, 2), Test Shape: (10, 2)

This shuffles the dataset and then splits it. Such a step is crucial to avoid bias in machine learning model evaluation.

Conclusion

Today, we covered the fundamentals and some advanced uses of the Generator.shuffle() method from NumPy’s random module. Starting from simple array shuffling to its applications in preparing datasets for machine learning, we saw how shuffle() plays a critical role in data manipulation. Whether you’re dealing with one-dimensional or multi-dimensional arrays, or requiring reproducibility with seeds, shuffle() provides the functionality needed for efficient random shuffling in Python. Embracing this method in your data processing pipeline can greatly enhance the quality and randomness of your data, leading to more reliable and robust machine learning models.