Sling Academy
Home/Pandas/Pandas: How to drop a column from a DataFrame

Pandas: How to drop a column from a DataFrame

Last updated: February 19, 2024

Introduction

When working with data in Python, Pandas is a crucial library that offers various functions for data manipulation and analysis. At times, you might need to remove unnecessary or redundant columns from your DataFrame. This tutorial guides you through different methods to drop a column from a DataFrame in Pandas, from the basic to the advanced, along with code examples and expected outputs.

Before diving into the examples, ensure you have Pandas installed in your environment. You can install Pandas using pip: pip install pandas.

Basic Usage of drop()

At its simplest, the drop() method can be used to remove a column from a DataFrame. Here is an example:

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

# Dropping the 'City' column
new_df = df.drop('City', axis=1)

print(new_df)

This should give you the following output, showing the DataFrame without the ‘City’ column:

    Name  Age
0   John   28
1   Anna   34
2  Peter   29
3  Linda   32

Using inplace=True

If you want to modify the original DataFrame without creating a new one, you can set inplace=True in the drop() method:

df.drop('City', axis=1, inplace=True)
print(df)

Now, df itself will be updated, and no new DataFrame is created.

Dropping Multiple Columns

Sometimes, you may need to remove more than one column at a time. You can pass a list of column names to the drop() method:

df.drop(['Age', 'City'], axis=1, inplace=True)
print(df)

This results in the DataFrame only containing the ‘Name’ column.

Advanced Usage

Conditionally Dropping Columns

In some scenarios, you might want to drop columns based on a condition, such as columns having more than a certain number of missing values. This can be achieved with a combination of methods:

# Assuming 'df' has missing values
missing_threshold = 2 # Threshold of missing values to drop a column
columns_to_drop = df.columns[df.isnull().sum() > missing_threshold]
df.drop(columns_to_drop, axis=1, inplace=True)
print(df)

This code checks for columns that have more than 2 missing values and drops them.

Using select_dtypes() to Drop Columns of Specific Data Types

If you want to drop columns based on their data type, for example, to remove all columns of object type, you can use the select_dtypes() method along with drop():

df.drop(df.select_dtypes(['object']).columns, axis=1, inplace=True)
print(df)

This will remove all columns in the DataFrame that are of object type.

Conclusion

Throughout this tutorial, you’ve learned several methods to drop a column or multiple columns from a DataFrame using Pandas. Whether you’re performing basic data cleaning or more advanced data preprocessing, these techniques are essential for efficient data manipulation. With practice, you’ll find dropping columns to be a straightforward task in your data processing workflows.

Next Article: Pandas: How to create a DataFrame from a list of tuples (5 examples)

Previous Article: Exploring pandas.DataFrame.itertuples() method (with examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)