Pandas: How to rename a column in a DataFrame

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

Data manipulation and analysis are crucial components of the data science workflow, and Pandas is a library in Python that simplifies these tasks. A common need in data preprocessing is renaming columns in a DataFrame. In this tutorial, you will learn various ways to rename columns effectively using Pandas, from basic to advanced examples.

Prerequisites

  • Basic understanding of Python
  • Installation of Pandas library (pip install pandas)

Creating a Sample DataFrame

Before we dive into renaming columns, let’s create a simple DataFrame to work with:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Anne', 'Bob', 'Charles'], 'Age': [22, 35, 30], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)

This code snippet will give us the following DataFrame:

      Name  Age         City
0     Anne   22     New York
1      Bob   35  Los Angeles
2  Charles   30      Chicago

Renaming Columns

Using the rename Method

One of the simplest ways to rename columns is by using the rename method:

new_df = df.rename(columns={'Name': 'First Name', 'City': 'Home City'})
print(new_df)

Output:

  First Name  Age    Home City
0        Anne   22     New York
1         Bob   35  Los Angeles
2     Charles   30      Chicago

This method allows for renaming specific columns by passing a dictionary where keys are current column names and values are new column names.

In-place Renaming

If you wish to modify the original DataFrame directly, you can use the inplace=True parameter:

df.rename(columns={'Name': 'Full Name', 'City': 'City of Residence'}, inplace=True)
print(df)

Output:

  Full Name  Age City of Residence
0      Anne   22           New York
1       Bob   35        Los Angeles
2   Charles   30            Chicago

Renaming Columns During File Read

Another efficient approach is to rename columns while reading a file using the names parameter and setting header=0:

df = pd.read_csv('sample_data.csv', names=['First Name', 'Age', 'Home Town'], header=0)
print(df)

This method is particularly useful when working with files containing undesired column names or extra header rows.

Using a Dynamic Approach

For more dynamic renaming, especially when dealing with large datasets with numerous columns, you can use a function:

df.columns = [' '.join(col.split('_')).title() for col in df.columns]

This snippet will transform all column names by replacing underscores with spaces and capitalizing each word (for instance, ‘first_name’ becomes ‘First Name’).

Advanced Techniques

Renaming Columns Using Regular Expressions

For advanced cases, such as when needing to rename columns based on pattern matching, Pandas supports renaming using regular expressions:

df.rename(columns=lambda x: re.sub(r'^([A-Z])', r'\1_', x), inplace=True)

This code will prepend an underscore to every column name starting with a capital letter.

Renaming Columns with a Mapping Function

If you have complex renaming rules, applying a mapping function to df.columns might be the right approach:

def custom_rename(column):
    if column == 'Name':
        return 'Participant Name'
    elif column == 'City':
        return 'Location'
    else:
        return column

df.columns = [custom_rename(col) for col in df.columns]
print(df)

This allows for highly customized renaming schemes based on individual column names.

Conclusion

Renaming columns in Pandas is a fundamental task that can greatly improve the readability and interpretability of your data. Whether you are renaming a few columns for clarity or standardizing an entire dataset, Pandas provides flexible methods to achieve your goals. Remember, clean data is the cornerstone of accurate analysis and modeling.