SciPy cluster.hierarchy.is_monotonic() function (3 examples)

Updated: March 4, 2024 By: Guest Contributor Post a comment

Introduction

SciPy’s cluster.hierarchy.is_monotonic() function plays a pivotal role in hierarchical clustering analysis, checking the monotonicity of linkage arrays. This ensures cluster trees grow without inconsistencies.

Understanding Monotonicity and Hierarchical Clustering

Before we dive into the is_monotonic() function, it’s crucial to understand what monotonicity means in the context of hierarchical clustering. Hierarchical clustering combines or divides clusters step by step. These actions are reflected in increments or decrements in a linkage array, respectively. Monotonicity in this context means each step increases the distance in agglomerative (bottom-up) clustering or decreases it in divisive (top-down) clustering.

This check is vital to validate the integrity of dendrograms, visual representations of clustering processes.

Installation and Basic Usage

First, ensure that SciPy is installed in your environment. Use pip:

pip install scipy

After installation, we can start with a simple example using automatically generated hierarchical data:

from scipy.cluster import hierarchy
import numpy as np

data = np.random.rand(10, 2)
z = hierarchy.linkage(data, 'single')
print(hierarchy.is_monotonic(z))

This snippet creates random data, applies single-linkage clustering, and checks if the resultant linkage array is monotonic. The expected output should be True or False, indicating whether the linkage array is monotonically increasing.

Validating Monotonicity in Real-World Data

Let’s consider a real-world dataset to illustrate the importance of this function. We’ll use the Iris dataset, a popular choice in machine learning for demonstrating algorithms:

from scipy.cluster import hierarchy
from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()
data = iris.data
z = hierarchy.linkage(data, 'ward')
print('Monotonic:', hierarchy.is_monotonic(z))

Here, we used the Ward method to create the linkage array. If the function returns True, it indicates that the array is monotonically increasing, which means our hierarchical clustering process is likely valid.

Output:

Monotonic: True

Advanced Usage: Custom Data and Visualization

For those interested in a deeper analysis, understanding how to visualize and manipulate dendrograms in relation to the is_monotonic() function can provide more insights. This next example combines custom data generation, advanced linkage methods, and visualization:

from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
import numpy as np

data = np.array([[5, 3], [10, 15], [15, 12], [24, 10], [30, 45], [
                85, 70], [71, 80], [60, 78], [70, 55], [80, 91],])
z = hierarchy.linkage(data, 'complete')
monotonic = hierarchy.is_monotonic(z)

plt.title(f'Hierarchy Monotonic: {monotonic}')
hierarchy.dendrogram(z)
plt.show()

Output:

This code experiments with custom data points, applying complete linkage clustering, and then plots the resulting dendrogram. The title of the plot indicates whether the linkage array is monotonically increasing. Such visual checks aide in comprehending how the theory translates into practice.

Conclusion

The is_monotonic() function of SciPy’s clustering hierarchy module is an essential tool in validating the monotonicity of linkage arrays. Through examples ranging from basic to advanced uses, we’ve demonstrated how this function fits into hierarchical clustering workflows, ensuring the integrity and validity of clustering results. Regardless of the complexity of your data or the clustering approach, monitoring monotonicity can prevent common pitfalls and guide the analysis towards meaningful structures and insights.