Introduction
In this tutorial, we’ll delve into the pandas.Series.case_when() method introduced in Pandas version 2.2, a powerful tool for conditionally transforming data within Series objects. This method streamlines what used to require multiple conditional statements or the np.select method, making data manipulation tasks both simpler and more readable.
Working with pandas.Series.case_when()
The case_when method allows you to pass a list of boolean conditions and corresponding values to efficiently apply transformations based on those conditions. It’s akin to SQL’s CASE WHEN statement or Python’s if-elif-else logic, but optimized for Pandas Series.
Basic Usage
Firstly, let’s start with a basic example to familiarize ourselves with the syntax and functionality of case_when.
import pandas as pd
df = pd.DataFrame({'Age': [25, 35, 45, 55]})
df['Age Group'] = df['Age'].case_when([(df['Age'] < 30, 'Youth'),
(df['Age'] < 40, 'Young Adult'),
(df['Age'] < 60, 'Adult'),
],
default='Senior')
print(df)
This will output:
Age Age Group
0 25 Youth
1 35 Young Adult
2 45 Adult
3 55 Adult
In the above example, we define conditions for assigning age groups to individuals based on their age. The default value ‘Senior’ is set to be used when no conditions are met.
Handling Null Values and Applying Multiple Conditions
Handling null values can be complex in data manipulation tasks. The case_when method simplifies this by allowing conditions specifically for nulls.
df['Employment Status'] = pd.Series([None, 'Employed', 'Unemployed', None])
df['Status'] = df['Employment Status'].case_when([(df['Employment Status'].isnull(), 'Unknown'),
(df['Employment Status'] == 'Employed', 'Working'),
(df['Employment Status'] == 'Unemployed', 'Seeking Job')],
default='Retired')
print(df)
This will output:
Age Age Group Employment Status Status
0 25 Youth None Unknown
1 35 Young Adult Employed Working
2 45 Adult Unemployed Seeking Job
3 55 Adult None Unknown
This example demonstrates how easily case_when can handle different data scenarios, including missing values, without requiring tedious data preprocessing steps.
More Complex Decision Structures
As we become more comfortable with case_when, we can explore its potential to implement more complex decision structures.
import numpy as np
scores = pd.Series([85, 92, 78, 65, 87])
grade = scores.case_when([(scores > 90, 'A'),
(scores > 80, 'B'),
(scores > 70, 'C'),
(scores > 60, 'D')],
default='F')
print(grade)
This output will be:
0 B
1 A
2 C
3 D
4 B
Here, we’re applying a grading system that illustrates the capability to chain conditions and outcomes in a way that’s clear and concise.
Combining Conditions
Another powerful aspect of case_when is the ability to combine conditions for more nuanced data transformation. Here’s how:
customers = pd.DataFrame({'purchase_amount': [250, 75, 150, 300],
'country': ['US', 'US', 'Canada', 'Canada']})
customers['discount'] = customers['purchase_amount'].case_when([(customers['purchase_amount'] > 200, 0.2),
(customers['country'] == 'US', 0.1)],
combine='max')
print(customers)
Note the use of the ‘combine’ argument to specify how to handle multiple true conditions. This enables more intricate logic in applying transformations without compromising readability.
Conclusion
In this tutorial, we’ve journeyed through the basics to more advanced uses of the pandas.Series.case_when() method. This powerful tool can simplify and enhance your data manipulation tasks, providing a clear and concise way to implement conditional logic. Embrace case_when to streamline your data wrangling workflows.