How to Combine, Stack, and Split Arrays in NumPy

Updated: January 23, 2024 By: Guest Contributor Post a comment

Introduction

NumPy is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional array and matrix data structures, along with a collection of high-level mathematical functions to operate on these arrays. This tutorial will cover several techniques for combining, stacking, and splitting arrays using the NumPy library, complete with code examples and their respective outputs. Understanding these operations can help in data manipulation, statistical analysis, and the preprocessing steps in Machine Learning tasks.

Prerequisites

  • Basic knowledge of Python programming
  • An installed version of Python and NumPy

Combining Arrays

Combining arrays involves joining multiple arrays into one. There are different ways to combine arrays based on the desired dimensionality of the output.

Concatenation

numpy.concatenate() function allows for joining two or more arrays along an existing axis.

import numpy as np

# Create two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concatenate arrays
result = np.concatenate((a, b))
print(result)

Output: [1 2 3 4 5 6]

This operation can be extended to two-dimensional arrays as well.

# Create two 2D arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# Concatenate along the first axis
result = np.concatenate((a, b), axis=0)
print(result)

Output: [[1 2] [3 4] [5 6]]

If the arrays have different dimensions, numpy.vstack() or numpy.hstack() can be used for vertical and horizontal stacking respectively.

Stacking

Stacking is similar to concatenation, but it allows arrays to be joined along a new axis, resulting in an increased dimension.

Vertical stacking with numpy.vstack():

# Create two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertically stack the arrays
result = np.vstack((a, b))
print(result)

Output: [[1 2 3] [4 5 6]]

Horizontal stacking with numpy.hstack():

# Horizontally stack the arrays
result = np.hstack((a, b))
print(result)

Output: [1 2 3 4 5 6]

Splitting Arrays

Splitting arrays is the reverse operation of combining. NumPy offers several functions for splitting arrays into multiple sub-arrays.

Simple Splitting

Using numpy.split(), we can divide an array into multiple sub-arrays of equal or near-equal sizes.

# Create an array
a = np.arange(9)

# Split the array into three equal parts
result = np.split(a, 3)
print(result)

Output: [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

You can also specify the indices at which you want the split to occur.

# Split the array at specified indices
result = np.split(a, [3, 5])
print(result)

Output: [array([0, 1, 2]), array([3, 4]), array([5, 6, 7, 8])]

Horizontal and Vertical Splitting

For multi-dimensional arrays, use numpy.hsplit() and numpy.vsplit() for horizontal and vertical splitting respectively.

# Create a 2D array
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Horizontal splitting
result = np.hsplit(a, 2)
print(result)

# Vertical splitting
result = np.vsplit(a, 3)
print(result)

Output: [array([[ 1, 2], [ 5, 6], [ 9, 10]]), array([[ 3, 4], [ 7, 8], [11, 12]])] [array([[1, 2, 3, 4]]), array([[5, 6, 7, 8]]), array([[ 9, 10, 11, 12]])]

Advanced Array Manipulation

Now let’s explore some advanced techniques for reshaping and manipulating arrays in NumPy.

Adding New Axes

numpy.newaxis can be used to increase the dimensions of your arrays.

# Create a 1D array
a = np.array([1, 2, 3])

# Convert to a 2D column vector
result = a[:, np.newaxis]
print(result)

Output: [[1] [2] [3]]

Repeat and Tile Arrays

Replicating arrays can be done using numpy.repeat() and numpy.tile().

# Repeat elements of an array
result = np.repeat(a, 3)
print(result)

# Tile an array
result = np.tile(a, 3)
print(result)

Output: [1 1 1 2 2 2 3 3 3] [1 2 3 1 2 3 1 2 3]

Conclusion

In this tutorial, we have explored how to combine, stack, and split arrays in NumPy, showcasing a range of functions suited to various data manipulation needs. The ability to reshape and adjust the structure of data sets is a powerful skill in data science and programming, making NumPy an indispensable tool in the programmer’s toolkit.