NumPy: How to calculate Euclidean and Manhattan distances

Updated: January 23, 2024 By: Guest Contributor Post a comment

Introduction

Understanding how to calculate distances between points is a fundamental concept in mathematics, with numerous applications in fields like machine learning, data analysis, and physics. NumPy, a powerful Python library for numerical computing, offers efficient ways to compute these distances. In this tutorial, we’ll explore how to calculate both Euclidean and Manhattan distances using NumPy.

Explaning Distance Metrics

The Euclidean distance is the ‘straight-line’ distance between two points in a Euclidean plane. The Manhattan distance, also known as the Taxicab or City Block distance, calculates the sum of the absolute differences of their coordinates. These measures are crucial in various algorithms, such as k-nearest neighbors (k-NN) and k-means clustering.

Setting up the Environment

Make sure you have Python and NumPy installed. You can install NumPy using pip:

pip install numpy

Calculating Euclidean Distance

To calculate the Euclidean distance between two points, you can use the NumPy linalg.norm function. Here is an example:

import numpy as np

point1 = np.array((1, 2, 3))
point2 = np.array((4, 5, 6))

euclidean_distance = np.linalg.norm(point1 - point2)
print('Euclidean Distance:', euclidean_distance)

The linalg.norm calculates the Euclidean L2 norm, and by subtracting point2 from point1, we obtain the vector representing the straight-line path between them.

Manhattan Distance

Now, let’s look at how we can calculate the Manhattan distance. We need to compute the sum of absolute differences:

import numpy as np

point1 = np.array([1, 2, 3])
point2 = np.array([4, 5, 6])

manhattan_distance = np.sum(np.abs(point1 - point2))
print('Manhattan Distance:', manhattan_distance)

In this code, np.abs computes the absolute value of each coordinate difference, and np.sum aggregates these values to find the total Manhattan distance.

NumPy Optimizations

When you’re working with a large number of points, you may want to utilise vectorisation to calculate distances more efficiently. NumPy’s ability to perform bulk operations on arrays can greatly enhance performance.

Consider the following example, where we calculate the Euclidean distance for multiple pairs of points:

import numpy as np

points1 = np.array([[1, 2], [3, 4]])
points2 = np.array([[5, 6], [7, 8]])

distances = np.linalg.norm(points1 - points2, axis=1)
print('Euclidean Distances:', distances)

The axis=1 parameter allows us to compute the distance for each pair of corresponding points in the provided arrays.

Conclusion

Calculating Euclidean and Manhattan distances are basic but important operations in data science. NumPy provides a simple and efficient way to perform these calculations. By understanding how to implement these with NumPy, you can leverage this for numerous applications such as evaluating similarities or clustering data points in high-dimensional spaces.

This tutorial demonstrated the basic approach to calculating these distances, providing you with a solid foundation for more complex analyses in your projects.