NumPy: How to draw samples from an exponential distribution (4 examples)

Introduction
Prerequisites
Example 1: Basic Sampling
Example 2: Generating Multiple Datasets
Example 3: Analyzing Sample Properties
Example 4: Simulating Real-World Processes
Conclusion

Introduction

Understanding how to draw samples from a probability distribution is a foundational skill in data science, particularly for those working with statistical models or simulations. The exponential distribution, with its wide applications in fields ranging from queuing theory to reliability engineering, is a common choice for modeling the time between events in a Poisson process. In this tutorial, we will delve into generating samples from an exponential distribution using NumPy—a core library for numerical and scientific computation in Python. Through four graded examples, we will progress from basic techniques to more advanced applications, providing insights and code snippets at each step.

Prerequisites

Before we start, ensure that you have a functioning Python environment with NumPy installed. You can install NumPy using pip if you haven’t already:

pip install numpy

Basic familiarity with probability distributions and programming in Python will be helpful to follow along with the examples presented.

Example 1: Basic Sampling

The exponential distribution is defined by a single parameter λ (lambda), which is the rate parameter. The mean or expected time between events is 1/λ. To generate samples, we use the numpy.random.exponential function.

import numpy as np

# Set the lambda parameter
lambda_param = 1.5

# Generate 10 samples
samples = np.random.exponential(1/lambda_param, 10)

# Print the samples
print(samples)

This will output a series of floats representing the time between events as per the exponential distribution specified by λ = 1.5.

Example 2: Generating Multiple Datasets

Often in experiments or simulations, we need to generate multiple sets of samples. This could be for performing Monte Carlo simulations or for bootstrap estimates. Let’s scale our sampling to generate 100 subsets, each containing 50 samples.

samples_set = np.random.exponential(1/lambda_param, (100, 50))
print(samples_set.shape)

The output, (100, 50), confirms that we have created a two-dimensional array with 100 rows (each representing a dataset) and 50 columns (each representing a sample within those datasets).

Example 3: Analyzing Sample Properties

After generating samples, a common next step is to analyze these samples to understand their properties. This might include calculating their mean, standard deviation, or plotting their distribution to visually assess the fit. Given our datasets, we can compute these statistics.

# Calculate and print the mean and standard deviation for each dataset
means = np.mean(samples_set, axis=1)
std_devs = np.std(samples_set, axis=1)

print("Mean of datasets:\n", means)
print("Standard deviation of datasets:\n", std_devs)

Using visualizations to assess the distribution fit:

import matplotlib.pyplot as plt

# Plot the histogram of a single dataset
plt.hist(samples_set[0], bins=20, density=True)
plt.title('Histogram of Exponential Samples')
plt.show()

This histogram should approximate the shape of the theoretical exponential distribution, demonstrating the practical utility of our sampling method.

Example 4: Simulating Real-World Processes

Finally, let’s apply our knowledge to simulate a real-world process. Suppose we are interested in modeling the time between successive customer arrivals at a service center. Assume an average rate of customer arrival (λ) of 1.5 per minute. We can simulate the arrival times for 1000 customers and analyze the distribution.

# Simulate customer arrivals
arrival_times = np.cumsum(np.random.exponential(1/lambda_param, 1000))

# Plot the inter-arrival times
plt.plot(np.diff(arrival_times))
plt.title('Inter-Arrival Times of Customers')
plt.xlabel('Customer')
plt.ylabel('Inter-Arrival Time')
plt.show()

The plot of inter-arrival times provides a visualization of how the exponential distribution models real-world phenomena, emphasizing its significance in stochastic modeling.

Conclusion

In this tutorial, we explored how to sample from the exponential distribution using NumPy with four practical examples. Starting with basic sampling, progressing through generating and analyzing multiple datasets, and finally applying these concepts in a simulation of a real-world process, we’ve seen the power and flexibility of using NumPy for stochastic simulations. Whether you’re new to data science or looking to expand your skill set, sampling from distributions like the exponential is a critical tool in your repertoire.

Next Article: NumPy: Getting samples from an F distribution (3 examples)

Previous Article: NumPy: Drawing samples from the Dirichlet distribution (4 examples)

Series: NumPy Basic Tutorials

NumPy