Pandas – Using DataFrame.assign() method (5 examples)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

The assign() method in Pandas is a powerful tool for adding new columns to a DataFrame in a fluent and flexible way. This method is particularly useful in data preprocessing, feature engineering, and exploratory data analysis, enabling data scientists and analysts to prepare and transform data efficiently. In this tutorial, we will explore the assign() method through five comprehensive examples, ranging from basic to more advanced use cases.

Syntax & Parameters

Pandas is a paramount library in the Python data science ecosystem, known for its versatile and high-performance data manipulation capabilities. The assign() method exemplifies these qualities by offering a dynamic approach to modify DataFrames. Before diving into examples, it’s crucial to understand the syntax of assign():

DataFrame.assign(**kwargs)

Where **kwargs are keyword arguments in the form of column=value. Here, ‘column’ is the name of the new or existing column, and ‘value’ can be a scalar, array-like, or a callable.

Example 1: Basic Usage

Let’s begin with a basic example by creating a DataFrame and adding a new column:

import pandas as pd

df = pd.DataFrame({'A': range(1, 5), 'B': ['A', 'B', 'C', 'D']})
df = df.assign(C=df['A']*2)
print(df)

Output:

   A  B  C
0  1  A  2
1  2  B  4
2  3  C  6
3  4  D  8

This example demonstrates how to add a new column ‘C’ that is twice the value of column ‘A’.

Example 2: Using Callables

The assign() method allows for the use of callables, enhancing its flexibility. Here’s how:

df = df.assign(D=lambda x: x['A'] + x['C'])
print(df)

Output:

   A  B  C   D
0  1  A  2   3
1  2  B  4   6
2  3  C  6   9
3  4  D  8  12

This illustrates adding a new column ‘D’ by applying a lambda function that sums columns ‘A’ and ‘C’.

Example 3: Chaining Assignments

The real power of assign() shines when used in a chaining method to perform multiple operations in a single line:

df = pd.DataFrame({'A': range(1, 5), 'B': ['A', 'B', 'C', 'D']})

df = df.assign(C=lambda x: x['A']*2).assign(D=lambda x: x['A'] + x['C'])
print(df)

Output:

   A  B  C   D
0  1  A  2   3
1  2  B  4   6
2  3  C  6   9
3  4  D  8  12

This compact syntax illustrates how to sequentially add columns ‘C’ and ‘D’, showcasing the method’s efficiency in data manipulation.

Example 4: Conditional Column Creation

Now, let’s see how to add a new column based on conditions:

df = df.assign(E=lambda x: ['High' if a > 2 else 'Low' for a in x['A']])
print(df)

Output:

   A  B  C   D    E
0  1  A  2   3  Low
1  2  B  4   6  Low
2  3  C  6   9 High
3  4  D  8  12 High

This demonstrates dynamically creating a new column ‘E’ that categorizes values from column ‘A’ into ‘High’ and ‘Low’ based on a condition.

Example 5: Using External Functions

Finally, let’s utilize an external function within assign() for more complex operations:

def calculate(df):
    return df['A'] * df['D']

df = df.assign(F=calculate)
print(df)

Output:

   A  B  C   D    E   F
0  1  A  2   3  Low   3
1  2  B  4   6  Low  12
2  3  C  6   9 High  27
3  4  D  8  12 High  48

This example shows how to integrate an external function to create a new column ‘F’, further demonstrating the method’s adaptability.

Conclusion

This tutorial provided a thorough exploration of the assign() method in Pandas, showcasing its versatility through five practical examples. By leveraging assign(), data manipulation becomes more concise and expressive, enabling efficient and dynamic DataFrame transformations.