Sling Academy
Home/Pandas/Pandas DataFrame: How to search rows that match a string keyword

Pandas DataFrame: How to search rows that match a string keyword

Last updated: February 20, 2024

Introduction

In data analysis, searching and filtering data is one of the most common tasks. Pandas, a powerful and flexible data manipulation library in Python, provides several ways to search for specific pieces of information within your dataset. In this tutorial, you will learn how to search rows in a DataFrame that match a string keyword.

Preparation

Let’s start by importing Pandas and preparing a simple DataFrame to work with:

import pandas as pd

# Sample data
data = {'Name': ['John Doe', 'Jane Doe', 'Alice Smith', 'Bob Brown'],
        'Age': [28, 32, 24, 45],
        'Occupation': ['Engineer', 'Doctor', 'Artist', 'Engineer']}
df = pd.DataFrame(data)
print(df)

The DataFrame df looks like this: Name Age Occupation John Doe 28 Engineer Jane Doe 32 Doctor Alice Smith 24 Artist Bob Brown 45 Engineer

Basic Searching

One basic way to search for rows that contain a specific string in a DataFrame is by using the str.contains() method. This method returns a Boolean series indicating whether each element in the Series or DataFrame contains the string.

contains_engineer = df['Occupation'].str.contains('Engineer')
print(df[contains_engineer])

The output: Name Age Occupation John Doe 28 Engineer Bob Brown 45 Engineer

Case Sensitivity

By default, the str.contains() method is case sensitive. However, you can make it case-insensitive using the case parameter:

contains_engineer = df['Occupation'].str.contains('engineer', case=False)
print(df[contains_engineer])

Using Regular Expressions

The str.contains() method also supports regular expressions, allowing more complex searches. For instance, if you want to find anyone whose occupation starts with ‘E’:

starts_with_e = df['Occupation'].str.contains('^E', regex=True)
print(df[starts_with_e])

Searching Across Multiple Columns

For more advanced searching, you might want to look across multiple columns. This requires a bit more code but is still very manageable with Pandas. One way to do this is by applying a function across the rows with the apply method. The function will check if the keyword exists in any of the specified columns:

def keyword_search(row, keyword):
    return any(keyword in row[col] for col in ['Name', 'Occupation'])

contains_doe = df.apply(keyword_search, axis=1, args=('Doe',))
print(df[contains_doe])

The output will show rows where either ‘Name’ or ‘Occupation’ contains ‘Doe’: Name Age Occupation John Doe 28 Engineer Jane Doe 32 Doctor

Combining Conditions

You can also combine conditions to fine-tune your search. For example, to find engineers named John Doe:

is_john_engineer = (df['Name'] == 'John Doe') & (df['Occupation'].str.contains('Engineer'))
print(df[is_john_engineer])

The output: Name Age Occupation John Doe 28 Engineer

Conclusion

This tutorial covered several ways to search for rows in a Pandas DataFrame that match a string keyword, from basic searching to more advanced techniques involving case sensitivity, regular expressions, and searching across multiple columns. With these tools, you can efficiently sift through your dataset to find the information you need.

Next Article: Pandas: Sorting rows by multiple columns in a DataFrame

Previous Article: Pandas DataFrame: How to change data type of a column

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)