NumPy: How to get samples from a Hypergeometric distribution (3 examples)

Updated: February 28, 2024 By: Guest Contributor Post a comment

In this guide, we will dive deep into the world of hypergeometric distributions and learn how to generate samples using NumPy, one of the most powerful numerical computing tools in Python. Sampling from a hypergeometric distribution is a common requirement in statistical analyses where we need to understand the outcomes of draws without replacement from a finite population.

What is a Hypergeometric Distribution?

The hypergeometric distribution models the probability of drawing a specific number of successes from a finite population without replacement. It is characterized by three parameters: the population size (N), the number of successes in the population (K), and the sample size (n). This distribution is particularly useful in scenarios where the ‘without replacement’ aspect affects the probabilities of subsequent draws, such as card games, lot sampling in quality control, and election forecasts.

Getting Started with NumPy

Before diving into generating samples, ensure you have NumPy installed in your Python environment. If not, you can install it using pip:

pip install numpy

Once installed, you can import NumPy in your Python script:

import numpy as np

Example 1: Basic Sampling

Let’s start with a simple example. Suppose you have a lot of 100 items, 20 of which are defective. If you randomly select 10 items, what is the probability distribution of selecting exactly X defective items?

import numpy as np


# Parameters: N=100, K=20, n=10
samples = np.random.hypergeometric(20, 80, 10, 10)
print(samples)

The code above generates 10 samples from the hypergeometric distribution, where samples represents the number of defective items selected in each draw. The output may vary but should be around 2-3 defective items, reflecting the underpinning hypergeometric distribution.

A possible output looks like this:

[2 3 2 1 0 1 2 2 1 1]

Example 2: Visualizing the Distribution

To better understand the distribution, visualizing the samples can be extremely helpful. The following example uses matplotlib, another Python library, to plot the distribution of samples.

import matplotlib.pyplot as plt
import numpy as np

# Generating 1000 samples
data = np.random.hypergeometric(20, 80, 10, 1000)

# Plotting the resultsplt.hist(data, bins=range(min(data), max(data) + 1), align='left')
plt.title('Hypergeometric Distribution')
plt.xlabel('Number of Defective Items')
plt.ylabel('Frequency')
plt.show()

The histogram reflects the distribution of drawing defective items over 1000 trials, showcasing the likelihood of various outcomes.

Example 3: Advanced Applications

An interesting application of the hypergeometric distribution is in simulating complex scenarios. Suppose you are simulating a card game where drawing specific cards has strategic implications. You could model this using a hypergeometric distribution to study the probability of drawing a combination of cards under given conditions.

import numpy as np

# Set the random seed for reproducibility
np.random.seed(0)

# Say, in a deck of 52 cards, there are 4 aces. 
# If you draw 5 cards, what's the likelihood that exactly 3 are aces?

# Parameters: deck=52, aces=4, draw=5
likelihood = np.random.hypergeometric(4, 48, 5, 10000)

# Counting how many draws contain exactly 3 aces
counts = np.count_nonzero(likelihood == 3)

print(f'Probability of drawing exactly 3 aces in 5 cards: {counts / 10000:.4f}')

Output (with the seed of 0):

Probability of drawing exactly 3 aces in 5 cards: 0.0018

This example showcases how to use the hypergeometric distribution for more complex, real-world problems, providing insights into probabilities that can inform strategies and decisions.

Conclusion

Understanding and utilizing the hypergeometric distribution with NumPy provides a powerful way to model scenarios involving finite populations and draws without replacement. From quality control in manufacturing to strategic gaming, the ability to generate and analyze hypergeometric samples is an essential skill in statistical and mathematical analysis. While the principles remain consistent, the applications can be as varied as the problems we seek to understand.