Pandas DataFrame: How to search rows that match a string keyword

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

In data analysis, searching and filtering data is one of the most common tasks. Pandas, a powerful and flexible data manipulation library in Python, provides several ways to search for specific pieces of information within your dataset. In this tutorial, you will learn how to search rows in a DataFrame that match a string keyword.

Preparation

Let’s start by importing Pandas and preparing a simple DataFrame to work with:

import pandas as pd

# Sample data
data = {'Name': ['John Doe', 'Jane Doe', 'Alice Smith', 'Bob Brown'],
        'Age': [28, 32, 24, 45],
        'Occupation': ['Engineer', 'Doctor', 'Artist', 'Engineer']}
df = pd.DataFrame(data)
print(df)

The DataFrame df looks like this: Name Age Occupation John Doe 28 Engineer Jane Doe 32 Doctor Alice Smith 24 Artist Bob Brown 45 Engineer

Basic Searching

One basic way to search for rows that contain a specific string in a DataFrame is by using the str.contains() method. This method returns a Boolean series indicating whether each element in the Series or DataFrame contains the string.

contains_engineer = df['Occupation'].str.contains('Engineer')
print(df[contains_engineer])

The output: Name Age Occupation John Doe 28 Engineer Bob Brown 45 Engineer

Case Sensitivity

By default, the str.contains() method is case sensitive. However, you can make it case-insensitive using the case parameter:

contains_engineer = df['Occupation'].str.contains('engineer', case=False)
print(df[contains_engineer])

Using Regular Expressions

The str.contains() method also supports regular expressions, allowing more complex searches. For instance, if you want to find anyone whose occupation starts with ‘E’:

starts_with_e = df['Occupation'].str.contains('^E', regex=True)
print(df[starts_with_e])

Searching Across Multiple Columns

For more advanced searching, you might want to look across multiple columns. This requires a bit more code but is still very manageable with Pandas. One way to do this is by applying a function across the rows with the apply method. The function will check if the keyword exists in any of the specified columns:

def keyword_search(row, keyword):
    return any(keyword in row[col] for col in ['Name', 'Occupation'])

contains_doe = df.apply(keyword_search, axis=1, args=('Doe',))
print(df[contains_doe])

The output will show rows where either ‘Name’ or ‘Occupation’ contains ‘Doe’: Name Age Occupation John Doe 28 Engineer Jane Doe 32 Doctor

Combining Conditions

You can also combine conditions to fine-tune your search. For example, to find engineers named John Doe:

is_john_engineer = (df['Name'] == 'John Doe') & (df['Occupation'].str.contains('Engineer'))
print(df[is_john_engineer])

The output: Name Age Occupation John Doe 28 Engineer

Conclusion

This tutorial covered several ways to search for rows in a Pandas DataFrame that match a string keyword, from basic searching to more advanced techniques involving case sensitivity, regular expressions, and searching across multiple columns. With these tools, you can efficiently sift through your dataset to find the information you need.