NumPy Random Seed: Explained with examples

Updated: January 23, 2024 By: Guest Contributor Post a comment

Introduction

The NumPy library is an essential tool in the Python data science stack, providing support for arrays, matrices, and high-level mathematical functions. Randomness in computations is used for a variety of tasks including simulations, algorithms, and even random sampling of data. However, the need to reproduce experiments or tests necessitates a predictable form of randomness – a paradox that’s resolved by using ‘random seeds’. In this tutorial, we will explore the concept of a random seed and how to work with it through the NumPy library.

Understanding Randomness and Seeds

Randomness in programming is achieved through pseudo-random number generators (PRNGs), which use complex algorithms to produce sequences of numbers that seem random. However, these algorithms actually produce a deterministic sequence that only seems random. To achieve repeatability, we use ‘seeds’ which set the starting point for the sequence. By using the same seed, one ensures that the pseudo-random generator will output the same sequence of ‘random’ numbers every time.

Example 1: Basic Random Seed Usage

import numpy as np

# Set the random seed
np.random.seed(0)

# Generate five random numbers
random_numbers = np.random.random(5)
print(random_numbers)

Output:

 [0.5488135  0.71518937 0.60276338 0.54488318 0.4236548]

Using a seed value of 0 consistently reproduces the same array every time the code is executed.

Example 2: Seeding and Data Shuffling

import numpy as np

# Set the random seed
np.random.seed(42)

# Create an array from 0 to 9
data = np.arange(10)
print('Original data:', data)

# Shuffle the data
np.random.shuffle(data)
print('Shuffled data:', data)

Output:

 Original data: [0 1 2 3 4 5 6 7 8 9]
 Shuffled data: [8 1 5 0 7 2 9 4 3 6]

Even when shuffling the data in the array, the output remains consistent across runs when seeded with the same value.

Random Sampling and Distributions

NumPy offers various functions to generate random samples according to different statistical distributions. Seeding can be particularly useful here to ensure reproducible research or simulations.

Example 3: Random Sampling from a Normal Distribution

import numpy as np

# Set the seed
np.random.seed(7)

# Generate random samples from a normal distribution
samples = np.random.normal(loc=0.0, scale=1.0, size=1000)

# Check the first five samples
print(samples[:5])

Output:
[0.07630829 -1.7813084 -0.35666642 1.77293985 -0.23832635]

This code produces the same normal distribution sample whenever executed with the same seed.

Reproducibility Across Sessions and Systems

Another aspect of random seeds is their importance in maintaining consistency not just in a single environment, but across different systems or between separate computing sessions.

Example 4: Multi-dimensional Array Generation

import numpy as np

# Set the seed
np.random.seed(11)

# Generate a 3x3 matrix of random integers ranging from 0 to 10
matrix = np.random.randint(0, 10, (3, 3))
print(matrix)

Output:

 [[9 0 1]
 [8 8 3]
 [9 8 7]]

This example demonstrates the generation of the same 3×3 matrix across different Python sessions and devices, ensuring reproducibility of an experiment’s conditions or results.

Example 5: Using Random Seed with a Random State Object

NumPy also allows for the creation of a separate random number generator through its RandomState class. This is particularly handy when dealing with multiple threads or processes to avoid overlapping seeds.

import numpy as np

# Create a new RandomState object with a given seed
rng = np.random.RandomState(29)

# Generate random numbers using the RandomState object
values = rng.standard_normal(10)
print(values)

Output:

 [ 0.4274952   0.17499346 -0.91231898 -0.43256247 -1.12280684 0.42007918
   0.57192801 -0.41124656  0.6670449  -1.49266338]

Even though NumPy globals remain unaffected, the sequences generated by the RandomState object are reproducible and isolated using its own seed.

Advanced Usage and Best Practices

When writing complex programs or conducting research, it is essential to note some best practices regarding random seeds.

Seeding in Parallel Computations

Parallel computing introduces random number generation complexities. Seeding must be managed uniquely for each process to avoid identical sequences which could lead to biased results. NumPy’s RandomState class can be used to assign different seeds to each process.

Changing Seeds

It is sometimes necessary to change the random seed within a program to explore the effects of randomness in analysis. This can be achieved simply by calling np.random.seed with a different value at the desired point in the code.

Example 6: Changing Seeds Mid-Execution

import numpy as np

# Set the seed
np.random.seed(33)

# Generate initial random numbers
initial_values = np.random.rand(2)
print('Initial values:', initial_values)

# Set a new seed
np.random.seed(44)

# Generate new random numbers post re-seeding
new_values = np.random.rand(2)
print('New values:', new_values)

Output:

 Initial values: [0.24851013 0.44997542]
 New values: [0.28730852 0.17342911]

The randomness is controlled yet modified by the introduction of a new seed, enabling the comparison of results under varying random sequences.

Conclusion

In scientific and data-driven fields, the control offered by randomized computations with set seeds is invaluable, enabling deterministic reproducibility amidst seemingly stochastic environments. Navigating through NumPy’s random module with an understanding of seeding paves the way for reliable, reproducible research and algorithm diagnostics.