Sling Academy
Home/Pandas/Pandas DataFrame: How to select rows based on column values

Pandas DataFrame: How to select rows based on column values

Last updated: February 20, 2024

Introduction

Pandas is a powerful library in Python used for data manipulation and analysis, which provides DataFrame as its primary data structure. In this tutorial, we will delve into how to select rows based on specific criteria from column values in a Pandas DataFrame. This skill is crucial for data analysis as it allows us to filter and analyze subsets of data efficiently. We will start with basic examples and progressively move to more advanced techniques.

Getting Started

Before starting, make sure you have Pandas installed in your environment:

pip install pandas

Import Pandas and create a sample DataFrame to work with:

import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 40],
        'Occupation': ['Engineer', 'Doctor', 'Artist', 'Engineer']}
df = pd.DataFrame(data)
print(df)

This will output:

    Name  Age Occupation
0   John   28   Engineer
1   Anna   34     Doctor
2  Peter   29     Artist
3  Linda   40   Engineer

Basic Selection

One of the most basic forms of selection is using the == operator to filter rows based on a single column’s value. For instance, to select all engineers:

engineers = df[df['Occupation'] == 'Engineer']
print(engineers)

This will display:

    Name  Age Occupation
0   John   28   Engineer
3  Linda   40   Engineer

Using .loc[] for Conditional Selection

Another way to select rows is using the .loc[] method, which allows for more complex conditions. For example, selecting all engineers older than 30:

older_engineers = df.loc[(df['Occupation'] == 'Engineer') & (df['Age'] > 30)]
print(older_engineers)

This outputs:

    Name  Age Occupation
3  Linda   40   Engineer

Multiple Conditions and Columns

For filtering based on multiple conditions across different columns, combine the conditions using & (AND) or | (OR). For example, to find Artists or Doctors who are under 35:

young_professionals = df.loc[(df['Occupation'] == 'Artist') | (df['Occupation'] == 'Doctor') & (df['Age'] < 35)]
print(young_professionals)

This results in:

    Name  Age Occupation
1   Anna   34     Doctor
2  Peter   29     Artist

Query Method

The query() method offers a more readable syntax for complex selections. Here’s how you can use it:

eligible_for_discount = df.query('Age < 30 and Occupation != "Engineer"')
print(eligible_for_discount)

Which outputs:

    Name  Age Occupation
2  Peter   29     Artist

Selecting Based on String Patterns

To filter rows based on string patterns, use methods like str.contains(). For instance, selecting names that start with ‘J’:

j_names = df[df['Name'].str.contains('^J')]
print(j_names)

This would output:

   Name  Age Occupation
0  John   28   Engineer

Advanced: Using Functions with apply()

For more advanced row selection, you can use the apply() method to apply a custom function across rows. For example, selecting rows where the sum of the numerical columns is greater than 65:

def filter_row(row):
    return row['Age'] + len(row['Name']) > 65

filtered_rows = df.apply(filter_row, axis=1)
print(df[filtered_rows])

This yields:

    Name  Age Occupation
1   Anna   34     Doctor
3  Linda   40   Engineer

Conclusion

Selecting rows based on column values is a fundamental operation in data analysis with Pandas. Outlined techniques, ranging from basic filtering to advanced custom functions, facilitate complex data manipulations. Mastering these methods provides a strong foundation for performing sophisticated data analysis tasks in Python.

Next Article: Pandas: How to select multiple columns from a DataFrame

Previous Article: Is it possible to use async/await in Pandas?

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)