Pandas: Using DataFrame.replace() method (7 examples)

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It allows for manipulating data frames, but one of its most versatile functions is the replace() method. This tutorial will guide you through using the DataFrame.replace() method across seven different examples, ranging from basic to advanced usage.

When to Use DataFrame.replace()?

The replace() method in Pandas is used to replace a string, regex, list, dictionary, series, number, etc., from a DataFrame. This could be in a single column or the entire DataFrame. Not only does it help in data cleaning by replacing NaN values or arbitrary numbers, but it’s also quite useful for manipulating the data to better fit the needs of your data analysis.

Preparing a Sample DataFrame

Through out this tutorial, we’ll use this sample DataFrame for practice:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['a', 'b', 'c']})

Example 1: Basic Replacement

df.replace(1, 100)

Output:

     A  B  C
0  100  4  a
1    2  5  b
2    3  6  c

In the example above, all instances of the number 1 in the DataFrame were replaced with 100.

Example 2: Replacing Multiple Values at Once

df.replace([1, 3], [100, 300])

Output:

     A  B  C
0  100  4  a
1    2  5  b
2  300  6  c

This example demonstrates how to replace multiple values at once. The first list contains the values to be replaced, and the second list their respective replacements.

Example 3: Replacing Values in Specified Columns

df.replace({'A': 1, 'B': 5}, 100)

Output:

     A    B  C
0  100    4  a
1    2  100  b
2    3    6  c

This code block showcases replacing values in specified columns. The dictionary keys indicate the columns, and the values indicate the values in those columns that are to be replaced.

Example 4: Using Regex for Replacement

df = pd.DataFrame({'A': ['1x', '2y', '3z'], 'B': ['4x', '5y', '6z'], 'C': ['ax', 'by', 'cz']})

df.replace(to_replace=r'\d', value='Digit', regex=True)

Output:

       A      B   C
0  Digitx  Digitx  ax
1  Digity  Digity  by
2  Digitz  Digitz  cz

Regular expressions (Regex) provide a powerful way to identify and replace patterns in the data, not just exact matches. In this example, we used regex to replace all numeric characters with the word ‘Digit’.

Example 5: Replacing NaN Values

import numpy as np

df = pd.DataFrame({'A': [np.nan, 2, np.nan], 'B': [1, np.nan, 3]})

# Replace NaN values with -1
df.replace(np.nan, -1)

Output:

     A    B
0 -1.0  1.0
1  2.0 -1.0
2 -1.0  3.0

NaN values often represent missing data. Using replace(), you can easily replace these with a more appropriate value for further analysis, such as -1 in this case.

Example 6: Replacing with a Dictionary of Columns

df.replace({'A': np.nan, 'B': 1}, -1)

Output:

     A    B
0 -1.0 -1.0
1  2.0  NaN
2 -1.0  3.0

This example demonstrates how to use a dictionary where the keys are columns, and the values are the items to replace. It’s a powerful method for replacing specific values across multiple columns.

Example 7: Advanced Replacement with a Lambda Function

df = pd.DataFrame({'A': ['apple', 'banana', 'cherry'], 'B': ['d', 'e', 'f']})

df.replace({'A': r'^a.*'}, {'A': lambda x: x.group(0).upper()}, regex=True)

Output:

       A  B
0  APPLE  d
1 banana  e
2 cherry  f

In the most advanced example, we use a lambda function to replace values. Here, any value in column ‘A’ that starts with ‘a’ is replaced by its uppercase version. This example showcases the power of combining regex and lambda functions for dynamic replacements.

Conclusion

The replace() method in Pandas is a highly versatile tool for data preprocessing and cleaning. Throughout this tutorial, we’ve covered multiple ways it can be used, from simple value replacements to complex pattern matching with regex and lambda functions. Understanding these examples will significantly enhance your data manipulation skills and contribute to more effective data analysis workflows.