NumPy – Using char.islower() and char.isupper() functions (6 examples)

Updated: February 29, 2024 By: Guest Contributor Post a comment

Introduction

NumPy, a fundamental package for scientific computing in Python, offers an efficient interface for working with large multi-dimensional arrays and matrices. Among its various capabilities, NumPy provides support for processing arrays of strings via its vectorized string operations. This tutorial explores how to use two such operations, char.islower() and char.isupper(), through six progressively advanced examples.

When to Use char.islower() and char.isupper()?

Before diving into the examples, it’s essential to understand what char.islower() and char.isupper() do. Simply put, these functions check each character in a string array to determine if they are in lowercase or uppercase, respectively, returning an array of Boolean values. These functions are especially useful in data preprocessing, where text normalization might be necessary.

Basic Setup

To start using these functions, first import NumPy, and then create a simple string array:

import numpy as np

# Sample string array
text_array = np.array(['NumPy', 'PYTHON', 'coding', 'DATA'])

Example 1: Checking Lowercase Characters

Our first example demonstrates the basic usage of np.char.islower(). This function tests each element of the array to see if all characters are lowercase:

lowercase_check = np.char.islower(text_array)
print(lowercase_check)

Output:

[False, False, True, False]

Here, only ‘coding’ is entirely lowercase, as indicated by the True result.

Example 2: Checking Uppercase Characters

Similarly, np.char.isupper() checks if each element is in uppercase:

uppercase_check = np.char.isupper(text_array)
print(uppercase_check)

Output:

[False, True, False, True]

‘PYTHON’ and ‘DATA’ are entirely uppercase, reflected by the True indications.

Example 3: Applying Checks to Substrings

Both these functions can also apply to substrings within an array. For instance, to check the case of the first character:

first_char_lowercase = np.char.islower(text_array.view('U1,U1,U1,U1').reshape((-1,4)), axis=1)
print(first_char_lowercase)

Output (simplified for clarity):

[False, False, True, False]

This example extracts the first character of each string and checks if it’s lowercase.

Example 4: Mixed Case Detection

What if you wanted to know if strings have a mix of upper and lower case characters? Though NumPy doesn’t provide a direct function, you can use a combination of islower() and isupper() for achieving this:

# Define a mixed case array
mixed_array = np.array(['NumPy', 'python', 'CODING', 'DaTa'])

# Check for mixed case
mixed_case_detected = ~(np.char.islower(mixed_array) | np.char.isupper(mixed_array))
print(mixed_case_detected)

Output:

[True, False, False, True]

‘NumPy’ and ‘DaTa’ show a mix of cases, as highlighted by the True outcomes.

Example 5: Array of Words

You can also work with arrays of words within sentences, although it requires a bit more preprocessing to separate words into an array. Here’s how you can combine np.char.split() with case checking:

# Sentence array
sentences = np.array(['NumPy is fantastic!', 'PYTHON IS GREAT', 'We love coding.'])

# Split sentences into words
words = np.vectorize(np.char.split)(sentences)

# Flatten the array of arrays
text_items = np.concatenate(words)

# Apply case check
mixed_case = ~(np.char.islower(text_items) | np.char.isupper(text_items))
print(mixed_case)

Output (truncated for clarity):

[True, False, ...]

This approach identifies words with mixed casing in sentence arrays.

Example 6: Working with Data Frames

Last but certainly not least, these NumPy functions can be particularly useful when dealing with pandas DataFrames, often utilized for tabular data in data science projects. Suppose you have a DataFrame with a column of text entries:

import numpy as np
import pandas as pd

df = pd.DataFrame({'text': ['NumPy', 'PYTHON', 'coding', 'DATA']})

# Convert DataFrame column to NumPy array and explicitly cast to string
array_from_df = df['text'].to_numpy().astype(str)

# Apply islower check
lower_case_check = np.char.islower(array_from_df)
print("Lower case check:", lower_case_check)

# Apply isupper check
upper_case_check = np.char.isupper(array_from_df)
print("Upper case check:", upper_case_check)

# Filtering the DataFrame based on lower case check
lower_case_df = df[lower_case_check]
print("\nDataFrame with only lower case 'text':")
print(lower_case_df)

# Filtering the DataFrame based on upper case check
upper_case_df = df[upper_case_check]
print("\nDataFrame with only upper case 'text':")
print(upper_case_df)

Output:

Lower case check: [False False  True False]
Upper case check: [False  True False  True]

DataFrame with only lower case 'text':
     text
2  coding

DataFrame with only upper case 'text':
     text
1  PYTHON
3    DATA

This conversion step right before the check allows seamless integration of NumPy’s string processing capabilities with pandas data management.

Conclusion

Through these six examples, we’ve seen how np.char.islower() and np.char.isupper() can be instrumental in analyzing and preprocessing text data. Whether used individually or combined for complex checks, these functions powerfully demonstrate NumPy’s broad applicability in handling string data.