Pandas: Check if each Series element starts/ends with a substring

Updated: February 24, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a powerful Python library for data manipulation and analysis, particularly well-suited for handling structured data. One common task when working with text data is to determine whether each string in a Series begins or ends with a certain substring. In this tutorial, we’ll explore how to accomplish this using the str.startswith() and str.endswith() methods provided by Pandas, guiding you from basic examples to more advanced usage.

Getting Started

Before diving into the code examples, ensure that Pandas is installed in your Python environment. If not, you can install it using pip:

pip install pandas

With Pandas installed, let’s import it to our script:

import pandas as pd

Basic Examples

Checking Starts With

Create a Pandas Series containing strings, and then check which of these strings start with a specific substring. Here’s how:

data = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
result = data.str.startswith('a')
print(result)

This code will produce a Series of boolean values indicating whether each string starts with ‘a’:

0     True
1    False
2    False
3    False
4    False
dtype: bool

Checking Ends With

Similarly, to check if strings in a Series end with a particular substring, use the str.endswith() method:

data = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
result = data.str.endswith('e')
print(result)

This code outputs:

0    False
1     True
2    False
3     True
4     True
dtype: bool

Advanced Examples

Case Sensitivity

Both str.startswith() and str.endswith() methods are case-sensitive by default. To perform case-insensitive checks, you can use the lower() or upper() methods along with them, like so:

data = pd.Series(['Apple', 'banana', 'Cherry', 'date', 'Elderberry'])
result = data.str.lower().startswith('a')
print(result)

Note that this converts the entire Series to lowercase before performing the check, thus negating the effect of case sensitivity.

Using Regular Expressions

For more flexibility, you can use the str.match() method, which allows for regular expression matching. This can be particularly useful when you’re looking for patterns that are more complex than simple substring matches. For example, checking for strings that start with a vowel:

data = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
result = data.str.match('^[aeiou]')
print(result)

This yields a Series of boolean values, similar to the previous examples but based on the presence of a pattern rather than a fixed substring:

0     True
1    False
2    False
3    False
4     True
dtype: bool

Combining Conditions

You can also combine multiple conditions to perform more complex checks. For instance, finding strings that start with ‘a’ and end with ‘e’:

data = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
combined = data.str.startswith('a') & data.str.endswith('e')
print(combined)

This demonstrates how to apply logical operators to combine the results of separate checks, yielding:

0     True
1    False
2    False
3    False
4    False
dtype: bool

Conclusion

To effectively work with text data in Pandas, understanding and utilizing the str.startswith() and str.endswith() methods are crucial. Throughout this tutorial, we’ve seen how these methods can easily facilitate checking whether each string in a Series matches specific beginning or ending substrings, showcasing both straightforward and more nuanced applications to accommodate varying data analysis needs.