Pandas: How to create new column using multiple if-else conditions (4 examples)

Updated: February 24, 2024 By: Guest Contributor Post a comment

Introduction

When working with data in Python, the Pandas library stands out for its powerful data manipulation capabilities. One frequent need is to create new columns based on conditions applied to existing ones. In this tutorial, we’ll explore four examples of how to use multiple if-else conditions to create new columns in a Pandas DataFrame, ranging from basic to more advanced scenarios. These techniques are essential for data preprocessing, feature engineering, and data analysis tasks.

Setup: Import Pandas and Create a Sample DataFrame

First, let’s import the Pandas library and create a sample DataFrame to work with:

import pandas as pd

df = pd.DataFrame({
    'Age': [25, 38, 15, 22, 45, 33],
    'Salary': [50000, 80000, 0, 32000, 120000, 95000],
    'Gender': ['Female', 'Male', 'Female', 'Female', 'Male', 'Male']
})

Example 1: Basic If-Else Condition

Let’s start with a simple scenario where we create a new column, ‘Adult’, to indicate whether each person is an adult (18 or over) or not:

df['Adult'] = ['Yes' if x>=18 else 'No' for x in df['Age']]
print(df)

The output should show our DataFrame with the new column:

   Age  Salary  Gender Adult
0   25   50000  Female   Yes
1   38   80000    Male   Yes
2   15       0  Female    No
3   22   32000  Female   Yes
4   45  120000    Male   Yes
5   33   95000    Male   Yes

Example 2: Advanced If-Else with Multiple Conditions

Next, let’s create a new column, ‘Financial Status’, based on multiple conditions conditioned on the ‘Salary’ and ‘Age’ columns:

df['Financial Status']='NA'
df.loc[(df['Salary']>50000)],'Financial Status'='Well-off'
df.loc[(df['Salary']<=50000) & (df['Age']<30)], 'Financial Status'='Starting Out']
df.loc[(df['Salary']<=50000) & (df['Age']>=30], 'Financial Status'='Experienced, but modest']
print(df)

The output would look like this:

   Age  Salary  Gender Financial Status
0   25   50000  Female      Starting Out
1   38   80000    Male         Well-off
2   15       0  Female               NA
3   22   32000  Female      Starting Out
4   45  120000    Male         Well-off
5   33   95000    Male         Well-off

Example 3: Using np.where

Now, for a more concise way to implement conditional logic, we turn to np.where from the NumPy library. Here, we’ll use it to add a ‘Student’ column, indicating whether the individual is likely a student.

import numpy as np

df['Student'] = np.where(df['Age'] < 22, 'Yes', 'No')
print(df)

The resulting DataFrame:

   Age  Salary  Gender Adult Financial Status Student
0   25   50000  Female   Yes      Starting Out      No
1   38   80000    Male   Yes         Well-off      No
2   15       0  Female    No               NA     Yes
3   22   32000  Female   Yes      Starting Out      No
4   45  120000    Male   Yes         Well-off      No
5   33   95000    Male   Yes         Well-off      No

Example 4: Using pd.cut for Categorical Variables

For our final example, we’ll categorize the ‘Age’ column into bins to create a new ‘Age Group’ column. This is particularly useful when working with continuous data that you’d like to analyze categorically.

df['Age Group'] = pd.cut(df['Age'], bins=[0,20, 40, 60], labels=['Youth','Adult','Senior'])
print(df)

The updated DataFrame would look like:

   Age  Salary  Gender Adult Financial Status Student Age Group
0   25   50000  Female   Yes      Starting Out      No     Adult
1   38   80000    Male   Yes         Well-off      No     Adult
2   15       0  Female    No               NA     Yes     Youth
3   22   32000  Female   Yes      Starting Out      No     Adult
4   45  120000    Male   Yes         Well-off      No    Senior
5   33   95000    Male   Yes         Well-off      No     Adult

Conclusion

Creating new columns based on multiple if-else conditions is a fundamental technique in data manipulation with Pandas. Through these examples, we’ve explored various approaches from basic to advanced, including logical operations, np.where, and pd.cut. Mastering these techniques allows for efficient and effective data analysis, enabling data scientists to gain deeper insights from their datasets.