How to Create Customized NumPy Builds for Specific Needs

Updated: January 23, 2024 By: Guest Contributor Post a comment

Introduction

NumPy is an essential library for scientific computing in Python, offering a powerful array object and tools for integrating C, C++, and Fortran code. However, as you delve into more specialized areas, the generic built-in functionality may not fully address your needs. There may be scenarios where you want to optimize for specific hardware, strip away unnecessary parts to reduce size, or add custom functionality. This tutorial will guide you through creating your own customized NumPy builds.

Prerequisites

Before we start, make sure you have the following:

  • A working installation of Python 3.
  • Pip – Python’s package installer.
  • A C compiler compatible with your platform, like gcc on Linux or MSVC on Windows.
  • Knowledge of how to navigate and execute commands in your shell of choice.
  • Basic knowledge about NumPy and Python.

If you’re missing any of the above, be sure to install them and familiarize yourself with the basics before proceeding.

Forking and Cloning the NumPy Repository

The first step in creating a customized build is to fork and clone the NumPy repository:

git clone https://github.com/numpy/numpy.git cd numpy

Forking the repository gives you your own copy, allowing you to make changes without affecting the original codebase.

Understanding the NumPy Build System

NumPy uses a configuration system based on Python scripts and the distutils package to facilitate the build process. It allows for customization and extension of build options.

Core components of NumPy’s build system:

  1. Distutils: NumPy’s build system relies heavily on distutils. This module offers support for building and installing additional modules into a Python installation. In the case of NumPy, distutils is used to compile the core C and Fortran code that NumPy relies on for high-performance numerical processing.
  2. Python Scripts: The build process is orchestrated by Python scripts. These scripts are responsible for defining the build configuration, including specifying compiler flags, linking libraries, and other build parameters. This approach leverages Python’s scripting capabilities to create a flexible and extendable build environment.
  3. Configuration Files: NumPy utilizes configuration files (usually written in Python) that define various aspects of the build process. These files allow for the customization of compile-time options, enabling NumPy to be tailored to different system environments and requirements.
  4. Extension Modules: NumPy extends Python’s capabilities by including extension modules written in C and Fortran. These modules are compiled during the build process and are integral for performance-critical operations. The build system ensures that these modules are properly compiled and linked to the necessary libraries.

Customizing the Build Configuration

Here’s how you can customize the build configuration:

  • Use the file site.cfg to specify paths to libraries and include files if you’re linking against optimized versions of BLAS or LAPACK.
  • Modify numpy/distutils/system_info.py to adjust the discovery process of various libraries and system features.
  • Add or remove functionality by modifying corresponding Python files and C source files accordingly.

Example: Disabling Unused Modules

Save resources by disabling modules you don’t need. Comment out the corresponding lines in numpy/lib/setup.py:

# config.add_subpackage('random') # config.add_subpackage('fft')

Optimizing for Specific Hardware

Customization can also mean optimizing for specific hardware. Here’s how to target certain CPU features:

python setup.py build_ext --define-flags="-march=native"

This tells the compiler to optimize for the CPU of the build machine. Replace -march with the specific architecture of your target platform if you’re not building on it directly.

Adding Custom Modules to NumPy

To add a custom module to NumPy, create a new directory inside the numpy folder:

mkdir numpy/my_module touch numpy/my_module/__init__.py

Place your new module’s code inside my_module, ensuring any C extensions are properly set up in a setup.py file within the same directory.

Example: A Simple Custom Module

Create a new file numpy/my_module/my_function.py:

def custom_add(a, b): return a + b

Update numpy/my_module/__init__.py:

from .my_function import custom_add __all__ = ['custom_add']

Now you can access this function within NumPy:

import numpy as np print(np.my_module.custom_add(2, 3))

Output:

5

Testing Your Custom Build

Make sure your customized build works as expected. Run NumPy’s extensive test suite to catch any issues:

python runtests.py -v

Look for failed tests and address the underlying issues before you consider your custom build stable.

Building NumPy Wheels

Finally, to distribute your customized build, package it into wheel files:

python setup.py bdist_wheel

These wheel files can be installed using pip on compatible systems.

In complex scenarios, consider tools like Docker or other containerization technologies to distribute your NumPy build within a controlled environment, guaranteeing compatibility and consistent behavior across different systems.

Conclusion

Creating a customized NumPy build allows you to tailor the library to your specific needs. Whether optimizing performance, reducing package size, or including custom functionality, with the guidelines provided, you can confidently modify and build NumPy to fulfill your unique requirements.