NumPy: Getting samples from a multinomial distribution (4 examples)

Updated: February 28, 2024 By: Guest Contributor Post a comment

Introduction

Understanding and utilizing the multinomial distribution is a fundamental aspect of data analysis, especially in scenarios involving multiple outcomes or categories. NumPy, a cornerstone library for numerical computing in Python, offers versatile functions to work with various statistical distributions, including the multinomial distribution. In this tutorial, we will explore how to generate samples from a multinomial distribution using NumPy, accompanied by practical examples ranging from basic to advanced implementations.

Understanding Multinomial Distribution

The multinomial distribution is a generalization of the binomial distribution. While the binomial distribution represents experiments with two possible outcomes (success or failure), the multinomial distribution applies to experiments where there are two or more possible outcomes. It is defined by two parameters: the number of trials (n) and the success probabilities of the possible outcomes (p).

Let’s dive into how we can generate samples from this distribution using NumPy’s numpy.random.multinomial function through a series of examples.

Example 1: Basic Usage

import numpy as np

# Defining the number of trials and the success probabilities for 3 possible outcomes
n = 10
pvals = [0.2, 0.5, 0.3]

# Generating one sample
sample = np.random.multinomial(n, pvals)

print(sample)

Output (varies with each execution):

[2, 5, 3]

This output represents the number of occurrences for each outcome in our single trial of 10 attempts.

Example 2: Generating Multiple Samples

import numpy as np

# Multiple samples, let's generate 5
n = 10
pvals = [0.2, 0.5, 0.3]
size = 5

samples = np.random.multinomial(n, pvals, size=size)

print(samples)

Output (varies with each execution):

[[3, 4, 3],
 [2, 5, 3],
 [1, 6, 3],
 [2, 7, 1],
 [4, 3, 3]]

Now, we have a clearer picture of how the outcomes vary across multiple trials.

Example 3: Using Multinomial Distribution for Simulation

import numpy as np

# Simulating dice throws
n = 1  # one roll of the dice
pvals = [1/6] * 6  # equal probability for each face
size = 10000  # number of throws

results = np.random.multinomial(n, pvals, size=size)

# Getting the frequency of each outcome
outcomes = np.sum(results, axis=0)
print(outcomes)

Output (vary, due to the randomness):

[1634 1697 1630 1652 1654 1733]

In this example, we’re simulating 10,000 dice rolls, and by summing the results, we get the frequency of each dice face appearing.

Example 4: Complex Data Simulation

import numpy as np

# Complex distribution example, representing different probabilities for events
n = 1000  # total events
pvals = [0.1, 0.2, 0.25, 0.15, 0.3]  # probabilities for each event
size = 100  # simulations

complex_samples = np.random.multinomial(n, pvals, size=size)

# Using statistical measures to analyze the simulation
means = np.mean(complex_samples, axis=0)
stds = np.std(complex_samples, axis=0, ddof=1)

print("Mean occurrences:", means)
print("Standard deviations:", stds)

Output (random)

Mean occurrences: [100.29 199.64 249.23 150.1  300.74]
Standard deviations: [ 9.15886169 12.93067764 15.07480003  9.5774356  13.08682737]

This advanced example delves into simulating a complex scenario with 1000 events over 100 simulations, showcasing how we can further analyze the results using mean occurrences and standard deviations.