Exploring pandas.DataFrame.isin() method (with examples)

Updated: February 19, 2024 By: Guest Contributor Post a comment

Overview

The pandas.DataFrame.isin() method is an incredibly flexible tool for filtering data frames in Python’s pandas library. It allows you to select rows that have certain values in one or more columns. Understanding how to use isin() can significantly streamline data manipulation and analysis processes. This tutorial covers six practical examples, progressing from basic usage to more advanced scenarios. Whether you’re a data science enthusiast or a professional analyst, mastering isin() will elevate your data wrangling skills.

Basic Usage of isin()

Let’s start with the basics. Suppose you have a DataFrame and want to filter rows based on one column’s values. Here’s a simple example:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45]
})

# Filter rows where the name is either Alice or Bob
filtered_df = df[df['Name'].isin(['Alice', 'Bob'])]
print(filtered_df)

Output:

    Name  Age
0  Alice   25
1    Bob   30

In the coming examples, we’ll use the same test DataFrame as above.

Filtering Multiple Columns

Next, imagine you want to filter based on values in multiple columns. Here’s how:

conditions = {'Name': ['Alice', 'Eve'], 'Age': [25, 45]}

filtered_df = df.isin(conditions).all(axis=1)
print(df[filtered_df])

In this example, the all() function combines the filters across columns, ensuring a row matches all the specified conditions.

Combining isin() with Other Methods

Merging isin() with other pandas methods can produce powerful filters. Here’s a complex scenario leveraging isin() along with query():

df['Selected'] = df['Name'].isin(['Alice', 'David'])
df.query('Selected == True and Age > 30', inplace=True)
print(df)

Output:

     Name  Age  Selected
3  David   40     True

Using isin() with a Series

You can also apply isin() to a pandas Series for row-wise filtering. For instance:

ages = pd.Series([25, 35, 45])
filtered_df = df[df['Age'].isin(ages)]
print(filtered_df)

Output:

      Name  Age
0    Alice   25
2  Charlie   35
4      Eve   45

Dynamic Filtering with isin()

Filters don’t have to be static. You can dynamically generate the list of values to use with isin(). For example:

subjects = ['Math', 'Science']
students_df = pd.DataFrame({
    'Name': ['John', 'Anna', 'Mike', 'Lily'],
    'Favorite Subject': ['Science', 'Art', 'Math', 'History']
})

# Generate dynamic filter based on another DataFrame or Series
favorite_subjects = df2['Favorite Subject'].unique()
filtered_df = students_df[students_df['Favorite Subject'].isin(subjects)]
print(filtered_df)

Complex Filters Using isin() and Lambda Functions

For more intricate scenarios, combining isin() with lambda functions can be very effective. Here’s how you might filter rows based on multiple, complex conditions:

df = pd.DataFrame({
    'ID': [1, 2, 3, 4, 5],
    'Category': ['A', 'B', 'C', 'D', 'E'],
    'Value': [100, 150, 200, 250, 300]
})

# Using lambda to filter on multiple conditions
filtered_df = df[df.apply(lambda x: x['ID'] in [1, 2, 3] and x['Category'].isin(['A', 'B']), axis=1)]
print(filtered_df)

Conclusion

The pandas.DataFrame.isin() method is a versatile tool for data filtering, capable of handling a wide range of scenarios. Through the examples provided, we’ve explored its fundamental functionality along with more advanced applications. Mastering isin() empowers you to perform complex data manipulations with ease, significantly enhancing your data analysis capabilities.