Using pandas.Series.case_when() method (with examples)

Introduction
Working with pandas.Series.case_when()
Conclusion

Introduction

In this tutorial, we’ll delve into the pandas.Series.case_when() method introduced in Pandas version 2.2, a powerful tool for conditionally transforming data within Series objects. This method streamlines what used to require multiple conditional statements or the np.select method, making data manipulation tasks both simpler and more readable.

Working with pandas.Series.case_when()

The case_when method allows you to pass a list of boolean conditions and corresponding values to efficiently apply transformations based on those conditions. It’s akin to SQL’s CASE WHEN statement or Python’s if-elif-else logic, but optimized for Pandas Series.

Basic Usage

Firstly, let’s start with a basic example to familiarize ourselves with the syntax and functionality of case_when.

import pandas as pd
df = pd.DataFrame({'Age': [25, 35, 45, 55]})
df['Age Group'] = df['Age'].case_when([(df['Age'] < 30, 'Youth'),
                                       (df['Age'] < 40, 'Young Adult'),
                                       (df['Age'] < 60, 'Adult'),
                                       ],
                                      default='Senior')
print(df)

This will output:

   Age    Age Group
0   25        Youth
1   35  Young Adult
2   45        Adult
3   55        Adult

In the above example, we define conditions for assigning age groups to individuals based on their age. The default value ‘Senior’ is set to be used when no conditions are met.

Handling Null Values and Applying Multiple Conditions

Handling null values can be complex in data manipulation tasks. The case_when method simplifies this by allowing conditions specifically for nulls.

df['Employment Status'] = pd.Series([None, 'Employed', 'Unemployed', None])
df['Status'] = df['Employment Status'].case_when([(df['Employment Status'].isnull(), 'Unknown'),
                                                    (df['Employment Status'] == 'Employed', 'Working'),
                                                    (df['Employment Status'] == 'Unemployed', 'Seeking Job')],
                                                   default='Retired')
print(df)

This will output:

   Age    Age Group Employment Status       Status
0   25        Youth              None      Unknown
1   35  Young Adult          Employed      Working
2   45        Adult        Unemployed  Seeking Job
3   55        Adult              None      Unknown

This example demonstrates how easily case_when can handle different data scenarios, including missing values, without requiring tedious data preprocessing steps.

More Complex Decision Structures

As we become more comfortable with case_when, we can explore its potential to implement more complex decision structures.

import numpy as np

scores = pd.Series([85, 92, 78, 65, 87])
grade = scores.case_when([(scores > 90, 'A'),
                          (scores > 80, 'B'),
                          (scores > 70, 'C'),
                          (scores > 60, 'D')],
                         default='F')
print(grade)

This output will be:

Here, we’re applying a grading system that illustrates the capability to chain conditions and outcomes in a way that’s clear and concise.

Combining Conditions

Another powerful aspect of case_when is the ability to combine conditions for more nuanced data transformation. Here’s how:

customers = pd.DataFrame({'purchase_amount': [250, 75, 150, 300],
                          'country': ['US', 'US', 'Canada', 'Canada']})
customers['discount'] = customers['purchase_amount'].case_when([(customers['purchase_amount'] > 200, 0.2),
                                                                (customers['country'] == 'US', 0.1)],
                                                               combine='max')
print(customers)

Note the use of the ‘combine’ argument to specify how to handle multiple true conditions. This enables more intricate logic in applying transformations without compromising readability.

Conclusion

In this tutorial, we’ve journeyed through the basics to more advanced uses of the pandas.Series.case_when() method. This powerful tool can simplify and enhance your data manipulation tasks, providing a clear and concise way to implement conditional logic. Embrace case_when to streamline your data wrangling workflows.

Next Article: Pandas: How to remove duplicate values from a Series

Previous Article: Understanding pandas.Series.align() method (with examples)

Series: Pandas Series: From Basic to Advanced

Pandas

How to Use Pandas for Geospatial Data Analysis (3 examples)

February 28, 2024