Sling Academy
Home/Pandas/Pandas: Using DataFrame.replace() method (7 examples)

Pandas: Using DataFrame.replace() method (7 examples)

Last updated: February 20, 2024

Introduction

Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It allows for manipulating data frames, but one of its most versatile functions is the replace() method. This tutorial will guide you through using the DataFrame.replace() method across seven different examples, ranging from basic to advanced usage.

When to Use DataFrame.replace()?

The replace() method in Pandas is used to replace a string, regex, list, dictionary, series, number, etc., from a DataFrame. This could be in a single column or the entire DataFrame. Not only does it help in data cleaning by replacing NaN values or arbitrary numbers, but it’s also quite useful for manipulating the data to better fit the needs of your data analysis.

Preparing a Sample DataFrame

Through out this tutorial, we’ll use this sample DataFrame for practice:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['a', 'b', 'c']})

Example 1: Basic Replacement

df.replace(1, 100)

Output:

     A  B  C
0  100  4  a
1    2  5  b
2    3  6  c

In the example above, all instances of the number 1 in the DataFrame were replaced with 100.

Example 2: Replacing Multiple Values at Once

df.replace([1, 3], [100, 300])

Output:

     A  B  C
0  100  4  a
1    2  5  b
2  300  6  c

This example demonstrates how to replace multiple values at once. The first list contains the values to be replaced, and the second list their respective replacements.

Example 3: Replacing Values in Specified Columns

df.replace({'A': 1, 'B': 5}, 100)

Output:

     A    B  C
0  100    4  a
1    2  100  b
2    3    6  c

This code block showcases replacing values in specified columns. The dictionary keys indicate the columns, and the values indicate the values in those columns that are to be replaced.

Example 4: Using Regex for Replacement

df = pd.DataFrame({'A': ['1x', '2y', '3z'], 'B': ['4x', '5y', '6z'], 'C': ['ax', 'by', 'cz']})

df.replace(to_replace=r'\d', value='Digit', regex=True)

Output:

       A      B   C
0  Digitx  Digitx  ax
1  Digity  Digity  by
2  Digitz  Digitz  cz

Regular expressions (Regex) provide a powerful way to identify and replace patterns in the data, not just exact matches. In this example, we used regex to replace all numeric characters with the word ‘Digit’.

Example 5: Replacing NaN Values

import numpy as np

df = pd.DataFrame({'A': [np.nan, 2, np.nan], 'B': [1, np.nan, 3]})

# Replace NaN values with -1
df.replace(np.nan, -1)

Output:

     A    B
0 -1.0  1.0
1  2.0 -1.0
2 -1.0  3.0

NaN values often represent missing data. Using replace(), you can easily replace these with a more appropriate value for further analysis, such as -1 in this case.

Example 6: Replacing with a Dictionary of Columns

df.replace({'A': np.nan, 'B': 1}, -1)

Output:

     A    B
0 -1.0 -1.0
1  2.0  NaN
2 -1.0  3.0

This example demonstrates how to use a dictionary where the keys are columns, and the values are the items to replace. It’s a powerful method for replacing specific values across multiple columns.

Example 7: Advanced Replacement with a Lambda Function

df = pd.DataFrame({'A': ['apple', 'banana', 'cherry'], 'B': ['d', 'e', 'f']})

df.replace({'A': r'^a.*'}, {'A': lambda x: x.group(0).upper()}, regex=True)

Output:

       A  B
0  APPLE  d
1 banana  e
2 cherry  f

In the most advanced example, we use a lambda function to replace values. Here, any value in column ‘A’ that starts with ‘a’ is replaced by its uppercase version. This example showcases the power of combining regex and lambda functions for dynamic replacements.

Conclusion

The replace() method in Pandas is a highly versatile tool for data preprocessing and cleaning. Throughout this tutorial, we’ve covered multiple ways it can be used, from simple value replacements to complex pattern matching with regex and lambda functions. Understanding these examples will significantly enhance your data manipulation skills and contribute to more effective data analysis workflows.

Next Article: Using DataFrame.droplevel() method in Pandas (4 examples)

Previous Article: Pandas: Detect non-missing values in a DataFrame

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)