NumPy: Using char.upper() and char.lower() functions (4 examples)

Updated: February 29, 2024 By: Guest Contributor Post a comment

Introduction

NumPy, a fundamental package for numerical computing in Python, offers a wide array of functions to manipulate arrays efficiently. In the realm of text data, NumPy provides convenient functions to transform strings in arrays: char.upper() and char.lower(). These functions are essential for text preprocessing, especially when working with datasets requiring a uniform case format. This tutorial explores how to use these functions through four incremental examples.

The Fundamentals

Syntax of numpy.char.upper():

numpy.char.upper(a)

Where a is an array_like of str or unicode. An input array of strings to be converted to uppercase. The return is an array of the same shape as a, with each string converted to uppercase.

Syntax of numpy.char.lower():

numpy.char.lower(a)

Here, a is an array_like of str or unicode. An input array of strings to be converted to lowercase. The function returns an array of the same shape as a, with each string converted to lowercase.

Example 1: Basic Usage of char.upper() and char.lower()

Starting with the basics, let’s apply char.upper() and char.lower() to a simple array of strings:

import numpy as np

arr = np.array(['numpy', 'Python', 'DATA', 'science'])

# Convert to uppercase
upper_arr = np.char.upper(arr)
print(upper_arr)

# Convert to lowercase
lower_arr = np.char.lower(arr)
print(lower_arr)

The output will be:

['NUMPY' 'PYTHON' 'DATA' 'SCIENCE']
['numpy' 'python' 'data' 'science']

This example illustrates how straightforward it is to uniformly convert the case of strings within an array, facilitating tasks like comparing string data.

Example 2: Applying Case Conversion on 2D Arrays

Next, we demonstrate the application of these functions on 2-dimensional arrays:

import numpy as np

arr_2d = np.array([['Hello', 'WORLD'], ['NumPy', 'Python']])

# Convert to uppercase
upper_2d = np.char.upper(arr_2d)
print(upper_2d)

# Convert to lowercase
lower_2d = np.char.lower(arr_2d)
print(lower_2d)

The output:

[['HELLO' 'WORLD']
 ['NUMPY' 'PYTHON']]
[['hello' 'world']
 ['numpy' 'python']]

This second example shows that NumPy’s char functions can seamlessly handle multi-dimensional arrays, making them highly versatile for various data types and structures.

Example 3: Combining Case Conversion with Other Functions

Moving towards a more advanced example, let’s combine case conversion with another common string operation in NumPy – char.strip().

import numpy as np

arr_mixed = np.array(['   NumPy ', ' PYTHON  ', 'data '])

# Strip whitespaces and then convert to uppercase
strip_then_upper = np.char.upper(np.char.strip(arr_mixed))
print(strip_then_upper)

This code snippet first removes the leading and trailing spaces from each string in the array and then transforms them to uppercase, demonstrating how to chain string operations in NumPy elegantly.

Example 4: Using Case Conversion in Data Preprocessing

Finally, we illustrate a practical use case involving text data preprocessing for machine learning purposes. Assume we have a dataset with textual features that need to be standardized regarding their casing:

import numpy as np

text_data = np.array(['Python Machine learning', 'NUMpy DATA analysis', 'Data SCIEnce'])

# Standardize text data to lowercase
standardized_text = np.char.lower(text_data)

# Further analysis or preprocessing...

This example underscores the utility of char.upper() and char.lower() in preparing text data for consistent analysis or feeding into machine learning models.

Conclusion

The ability to transform text data uniformly with char.upper() and char.lower() is crucial in numerous data processing contexts. Through these examples, ranging from basic to more complex scenarios, we’ve seen how NumPy facilitates these transformations across different array dimensions and in conjunction with other string operations. Such techniques are invaluable for ensuring data quality and consistency in analytical and machine learning projects.