Pandas: Selecting all columns except some from a DataFrame (4 ways)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Table Of Contents

1 Introduction

2 Getting Started

3 Method 1: Using drop Method

4 Method 2: Using column selection

5 Method 3: Using loc Property

6 Method 4: Using filter Function

7 Conclusion

Introduction

Pandas is a powerful and flexible open-source data analysis and manipulation tool, built on top of the Python programming language. Among its numerous functionalities, Pandas allows for sophisticated data selection operations in DataFrames, which are two-dimensional, size-mutable, and potentially heterogeneous tabular data structures with labeled axes (rows and columns).

In this tutorial, we will specifically explore how to select all columns from a DataFrame except for a specific few. This can be particularly useful when you have a large number of columns, and you’re only interested in excluding a small number from your analysis or visualizations, rather than manually specifying all the columns you want to include.

Getting Started

Before diving into the various methods for excluding columns, let’s set up a basic DataFrame to work with throughout this tutorial. If you haven’t already, you will need to install pandas. You can do this using pip:

pip install pandas

Once installed, let’s create a simple DataFrame:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c'],
    'C': [True, False, True],
    'D': [10.5, 20.5, 30.5]
})

print(df)

Output:

   A  B      C     D
0  1  a   True  10.5
1  2  b  False  20.5
2  3  c   True  30.5

Method 1: Using `drop` Method

One straightforward way to exclude columns is by using the drop method of the DataFrame. Here’s an example:

df.drop(columns=['B', 'D'], inplace=True)

print(df)

Output:

   A      C
0  1   True
1  2  False
2  3   True

This method is very direct, but it modifies the original DataFrame unless you set inplace=False or assign the result to a new variable.

Method 2: Using column selection

Another approach involves selecting columns by excluding the ones you don’t want. This can be done using the Python list comprehension in conjunction with the DataFrame’s columns property:

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c'],
    'C': [True, False, True],
    'D': [10.5, 20.5, 30.5]
})

selected_columns = [col for col in df.columns if col not in ['B', 'D']]
filtered_df = df[selected_columns]

print(filtered_df)

Output:

   A      C
0  1   True
1  2  False
2  3   True

This method does not modify the original DataFrame but rather creates a new one. It is particularly useful when you want to retain the original DataFrame for other operations.

Method 3: Using `loc` Property

The loc property allows for both row and column selection based on label. You can exclude columns by passing all rows (using ‘:’) and the columns to include, as shown here:

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c'],
    'C': [True, False, True],
    'D': [10.5, 20.5, 30.5]
})

filtered_df = df.loc[:, df.columns.difference(['B', 'D'])]

print(filtered_df)

Output:

   A      C
0  1   True
1  2  False
2  3   True

This method is quite elegant and readable, especially for those familiar with the loc property’s functionality. It’s particularly useful for more complex column selection logic.

Method 4: Using `filter` Function

Last but not least, pandas offers the filter function, which can be used to exclude columns as well. Instead of specifying which columns to exclude, you specify a regex that matches the columns you want to keep. Here’s how:

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c'],
    'C': [True, False, True],
    'D': [10.5, 20.5, 30.5],
    'E': [100, 200, 300]
})

# Assuming you want to keep columns that start with a letter higher than 'B'
filtered_df = df.filter(regex='^[C-Z].*')

print(filtered_df)

Output:

       C     D    E
0   True  10.5  100
1  False  20.5  200
2   True  30.5  300

This method is highly customizable and allows for complex selection criteria based on the column names. However, it requires familiarity with regex for effective use.

Conclusion

Throughout this tutorial, we’ve seen various methods to select all columns except some from a DataFrame in Pandas. Whether your preference lies in a straightforward drop, the use of list comprehensions, the flexibility of the loc property, or the power of regex with the filter function, Pandas offers a tool for all scenarios. It’s essential to choose the method that best suits your specific context to maintain code readability and efficiency.

Next Article: Pandas: Turn a DataFrame to a list of dictionaries

Previous Article: Pandas: How to drop unused levels in a MultiIndex

Series: DateFrames in Pandas

Pandas