Pandas: How to drop a column from a DataFrame

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

When working with data in Python, Pandas is a crucial library that offers various functions for data manipulation and analysis. At times, you might need to remove unnecessary or redundant columns from your DataFrame. This tutorial guides you through different methods to drop a column from a DataFrame in Pandas, from the basic to the advanced, along with code examples and expected outputs.

Before diving into the examples, ensure you have Pandas installed in your environment. You can install Pandas using pip: pip install pandas.

Basic Usage of drop()

At its simplest, the drop() method can be used to remove a column from a DataFrame. Here is an example:

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

# Dropping the 'City' column
new_df = df.drop('City', axis=1)

print(new_df)

This should give you the following output, showing the DataFrame without the ‘City’ column:

    Name  Age
0   John   28
1   Anna   34
2  Peter   29
3  Linda   32

Using inplace=True

If you want to modify the original DataFrame without creating a new one, you can set inplace=True in the drop() method:

df.drop('City', axis=1, inplace=True)
print(df)

Now, df itself will be updated, and no new DataFrame is created.

Dropping Multiple Columns

Sometimes, you may need to remove more than one column at a time. You can pass a list of column names to the drop() method:

df.drop(['Age', 'City'], axis=1, inplace=True)
print(df)

This results in the DataFrame only containing the ‘Name’ column.

Advanced Usage

Conditionally Dropping Columns

In some scenarios, you might want to drop columns based on a condition, such as columns having more than a certain number of missing values. This can be achieved with a combination of methods:

# Assuming 'df' has missing values
missing_threshold = 2 # Threshold of missing values to drop a column
columns_to_drop = df.columns[df.isnull().sum() > missing_threshold]
df.drop(columns_to_drop, axis=1, inplace=True)
print(df)

This code checks for columns that have more than 2 missing values and drops them.

Using select_dtypes() to Drop Columns of Specific Data Types

If you want to drop columns based on their data type, for example, to remove all columns of object type, you can use the select_dtypes() method along with drop():

df.drop(df.select_dtypes(['object']).columns, axis=1, inplace=True)
print(df)

This will remove all columns in the DataFrame that are of object type.

Conclusion

Throughout this tutorial, you’ve learned several methods to drop a column or multiple columns from a DataFrame using Pandas. Whether you’re performing basic data cleaning or more advanced data preprocessing, these techniques are essential for efficient data manipulation. With practice, you’ll find dropping columns to be a straightforward task in your data processing workflows.