Pandas: How to rename a column in a DataFrame

Introduction
Prerequisites
Creating a Sample DataFrame
Renaming Columns
Advanced Techniques
1. Renaming Columns Using Regular Expressions
2. Renaming Columns with a Mapping Function
Conclusion

Introduction

Data manipulation and analysis are crucial components of the data science workflow, and Pandas is a library in Python that simplifies these tasks. A common need in data preprocessing is renaming columns in a DataFrame. In this tutorial, you will learn various ways to rename columns effectively using Pandas, from basic to advanced examples.

Prerequisites

Basic understanding of Python
Installation of Pandas library (pip install pandas)

Creating a Sample DataFrame

Before we dive into renaming columns, let’s create a simple DataFrame to work with:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Anne', 'Bob', 'Charles'], 'Age': [22, 35, 30], 'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)

This code snippet will give us the following DataFrame:

      Name  Age         City
0     Anne   22     New York
1      Bob   35  Los Angeles
2  Charles   30      Chicago

Renaming Columns

Using the `rename` Method

One of the simplest ways to rename columns is by using the rename method:

new_df = df.rename(columns={'Name': 'First Name', 'City': 'Home City'})
print(new_df)

Output:

  First Name  Age    Home City
0        Anne   22     New York
1         Bob   35  Los Angeles
2     Charles   30      Chicago

This method allows for renaming specific columns by passing a dictionary where keys are current column names and values are new column names.

In-place Renaming

If you wish to modify the original DataFrame directly, you can use the inplace=True parameter:

df.rename(columns={'Name': 'Full Name', 'City': 'City of Residence'}, inplace=True)
print(df)

Output:

  Full Name  Age City of Residence
0      Anne   22           New York
1       Bob   35        Los Angeles
2   Charles   30            Chicago

Renaming Columns During File Read

Another efficient approach is to rename columns while reading a file using the names parameter and setting header=0:

df = pd.read_csv('sample_data.csv', names=['First Name', 'Age', 'Home Town'], header=0)
print(df)

This method is particularly useful when working with files containing undesired column names or extra header rows.

Using a Dynamic Approach

For more dynamic renaming, especially when dealing with large datasets with numerous columns, you can use a function:

df.columns = [' '.join(col.split('_')).title() for col in df.columns]

This snippet will transform all column names by replacing underscores with spaces and capitalizing each word (for instance, ‘first_name’ becomes ‘First Name’).

Advanced Techniques

Renaming Columns Using Regular Expressions

For advanced cases, such as when needing to rename columns based on pattern matching, Pandas supports renaming using regular expressions:

df.rename(columns=lambda x: re.sub(r'^([A-Z])', r'\1_', x), inplace=True)

This code will prepend an underscore to every column name starting with a capital letter.

Renaming Columns with a Mapping Function

If you have complex renaming rules, applying a mapping function to df.columns might be the right approach:

def custom_rename(column):
    if column == 'Name':
        return 'Participant Name'
    elif column == 'City':
        return 'Location'
    else:
        return column

df.columns = [custom_rename(col) for col in df.columns]
print(df)

This allows for highly customized renaming schemes based on individual column names.

Conclusion

Renaming columns in Pandas is a fundamental task that can greatly improve the readability and interpretability of your data. Whether you are renaming a few columns for clarity or standardizing an entire dataset, Pandas provides flexible methods to achieve your goals. Remember, clean data is the cornerstone of accurate analysis and modeling.

Next Article: Pandas – Using DataFrame.reset_index() method

Previous Article: Pandas: Understanding DataFrame.reindex_like() method

Series: DateFrames in Pandas

Pandas