Sling Academy
Home/NumPy/NumPy: Get random samples from a multivariate normal distribution (3 examples)

NumPy: Get random samples from a multivariate normal distribution (3 examples)

Last updated: March 01, 2024

Introduction

NumPy, a cornerstone library for numerical computing in Python, offers extensive functionality for random sampling, including the ability to generate samples from a multivariate normal distribution. Understanding this feature is essential for simulations, statistical modeling, and machine learning tasks where normality assumes a pivotal role.

Welcome to an exciting exploration of generating random samples from a multivariate normal distribution using NumPy. This journey not only enhances your data science toolkit but also dives into practical examples ranging from simple to complex applications.

What is a Multivariate Normal Distribution?

Before diving into code examples, let’s briefly understand the concept. A multivariate normal distribution is an extension of the univariate normal distribution to multiple dimensions, where each dimension represents a variable. It is characterized by mean vector and covariance matrix, dictating the distribution’s center and the correlation between variables, respectively.

Example 1: Basic Example

Let’s start with a basic example of generating random samples:

import numpy as np

# Parameters: mean vector and covariance matrix
mean = [0, 0]
cov = [[1, 0], [0, 1]]  # diagonal covariance, for independent variables

# Generate samples
samples = np.random.multivariate_normal(mean, cov, 5000)

print(samples)

This code generates 5000 samples from a two-dimensional normal distribution with mean [0,0] and an identity covariance matrix, implying independence and equal variance.

A possible output:

[[ 1.66083004  0.76686959]
 [-1.08087341  0.39214399]
 [-0.03251931  1.17903401]
 ...
 [ 0.39979638 -0.30943035]
 [-1.1714748   0.26815543]
 [-0.43391382  0.97492261]]

Example 2: Correlated Variables

Expanding on the first example, let’s see how to model correlated variables:

import numpy as np

# Parameters adjusted for correlation
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]  # now there's correlation

# Generate samples
samples = np.random.multivariate_normal(mean, cov, 5000)

print("Samples with correlated variables:")
print(samples)

This snippet generates samples where the two variables have a positive correlation, as indicated by off-diagonal elements of the covariance matrix.

Possible output:

Samples with correlated variables:
[[-0.85891156  0.92035279]
 [-0.80370562  0.12568221]
 [ 0.75384196 -0.74140922]
 ...
 [-0.61681461 -0.91852092]
 [ 0.12062102 -0.34523382]
 [-0.90990227  0.27054913]]

Example 3: Higher Dimensions and Visualizing

Moving forward to more complex scenarios involves dealing with higher dimensions. Additionally, we’ll explore visualizing these samples:

import numpy as np
import matplotlib.pyplot as plt

# Define a higher-dimension mean and covariance
mean = [0, 0, 0]
cov = [[1, 0.5, 0.2], [0.5, 1, 0.3], [0.2, 0.3, 1]]  # higher-dimensional correlation

# Generate samples
samples = np.random.multivariate_normal(mean, cov, 5000)

# Visualization
fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(samples[:,0], samples[:,1], samples[:,2], c='blue', marker='o')
ax.set_title('3D Visualization of Multivariate Normal Samples')
plt.show()

Output (vary):

This example generates and visualizes samples from a three-dimensional multivariate normal distribution, showcasing how to handle more complex scenarios and the importance of visualization in understanding data distributions.

Conclusion

Sampling from a multivariate normal distribution in NumPy is a straightforward yet powerful tool for simulations, statistical analysis, and advanced modeling. Through the examples demonstrated, you should now have a good understanding of basic operations, dealing with correlations, and extending to higher dimensions. Embrace these techniques to enhance your data science and machine learning projects.

Next Article: NumPy: Get samples from a negative binomial distribution (4 examples)

Previous Article: NumPy: Generate variates from a multivariate hypergeometric distribution (3 examples)

Series: NumPy Basic Tutorials

NumPy

You May Also Like

  • SciPy linalg.solve_triangular() function (3 examples)
  • SciPy linalg.solve_circulant() function (4 examples)
  • SciPy – Using linalg.solveh_banded() function
  • SciPy: Using linalg.solve_banded() function (3 examples)
  • SciPy linalg.solve() function (4 examples)
  • SciPy – Using linalg.inv() function (4 examples)
  • SciPy io.arff.loadarff() function (4 examples)
  • SciPy io.wavfile.read() function (4 examples)
  • SciPy io.hb_write() function (4 examples)
  • Using SciPy’s io.hb_read() function (3 examples)
  • SciPy io.mmwrite() function (4 examples)
  • SciPy – Exploring io.mmread() function (4 examples)
  • Using io.mminfo() function in SciPy (3 examples)
  • SciPy – Working with io.readsav() function (4 examples)
  • Understanding io.whosmat() function in SciPy (3 examples)
  • SciPy: Using io.savemat() function (4 examples)
  • Using io.loadmat() function in SciPy (4 examples)
  • SciPy: Using interpolate.bisplev() function (3 examples)
  • SciPy: Using interpolate.insert() function (4 examples)