How to Perform Cross-correlation and Autocorrelation with NumPy

Introduction
The Basics of Correlation
1. Setting Up the Environment
Performing Autocorrelation
Understanding Cross-correlation
Advanced Cross-correlation Techniques
Conclusion

Introduction

Cross-correlation and autocorrelation are two fundamental statistical concepts widely used in signal processing, time-series analysis, and various other domains. Python’s NumPy library provides intuitive functions that make these operations straightforward to implement. In this tutorial, we’ll look at how to perform both cross-correlation and autocorrelation using NumPy, covering basic to advanced examples.

The Basics of Correlation

Before diving into the code, it’s imperative to understand the basics of correlation. Autocorrelation measures the similarity between a signal and its delayed version over varying intervals, helping identify repeating patterns or periodic signals. Cross-correlation, on the other hand, measures the similarity between two different signals, which can be useful in identifying the lagged relationship between them.

Setting Up the Environment

import numpy as np
import matplotlib.pyplot as plt

Performing Autocorrelation

The simplest way to perform autocorrelation is by using the np.correlate() function with its ‘mode’ parameter set to ‘full’. Here’s an example:

data = np.random.randn(1000)
auto_corr = np.correlate(data, data, mode='full')
auto_corr = auto_corr[auto_corr.size // 2:]

plt.plot(auto_corr)
plt.title('Autocorrelation of White Noise')
plt.show()

Note that we only take the second half of the resulting array since the result is symmetrical around zero lag.

Understanding Cross-correlation

To perform cross-correlation, we will use the same np.correlate() but with two different datasets. See this example:

signal_1 = np.sin(np.linspace(0, 10, 200))
signal_2 = np.cos(np.linspace(0, 10, 200))
cross_corr = np.correlate(signal_1, signal_2, mode='full')
cross_corr = cross_corr[cross_corr.size // 2:]

plt.plot(cross_corr)
plt.title('Cross-correlation of Sin and Cos')
plt.show()

Advanced Cross-correlation Techniques

When performing cross-correlation on real-world data, normalizing your result can be essential to compare results across different scales. NumPy doesn’t have a direct function to perform normalized cross-correlation, but this can be manually calculated. Following is an example:

def normalize_cross_correlation(x, y):
    norm = np.sqrt(np.sum(x ** 2)) * np.sqrt(np.sum(y ** 2))
    return np.correlate(x, y, 'full') / norm

normalized_corr = normalize_cross_correlation(signal_1, signal_2)
normalized_corr = normalized_corr[normalized_corr.size // 2:]

plt.plot(normalized_corr)
plt.title('Normalized Cross-correlation')
plt.show()

Normalization bounds the output between -1 and 1, where the extremes indicate perfect (inverse) correlation.

Conclusion

In this guide, we explored how to use NumPy to perform cross-correlation and autocorrelation operations. Starting from basic implementations, we worked our way up to normalized cross-correlation to handle real-world data. Whether it’s in time-series analysis or signal processing, mastering these techniques is invaluable for extracting meaningful insights from your data.

Next Article: How to Use NumPy in Parallel Computing Scenarios

Previous Article: Using NumPy with PyTables: The Complete Guide

Series: NumPy Intermediate & Advanced Tutorials

NumPy