Pandas DataFrame: How to drop labels from rows/columns

Overview
Getting Started
Basic Row and Column Removal
Using Label Lists
Advanced Usage
Conclusion

Overview

Working with data in Python often involves the use of Pandas DataFrames, powerful two-dimensional arrays that can store data of different types. A common task when manipulating these DataFrames is removing unwanted rows or columns, which can be done using the drop() method. This tutorial will guide you through various examples of how to drop labels from rows or columns in Pandas DataFrames, from basic usages to more advanced applications.

Getting Started

Before diving into the examples, ensure you have Pandas installed in your environment:

pip install pandas

Once installed, import pandas in your script:

import pandas as pd

Basic Row and Column Removal

Let’s start with a simple DataFrame:

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

To remove a row, specify the index label in the drop() method:

df.drop(1, axis=0, inplace=True)

This code removes the second row (since indexing starts at 0). The axis argument determines whether you’re operating on rows (axis=0) or columns (axis=1). Setting inplace=True modifies the original DataFrame directly. Here’s how it looks now:

   A  B  C
0  1  4  7
2  3  6  9

To drop a column, you’d change the axis and specify the column label:

df.drop('C', axis=1, inplace=True)

After dropping column ‘C’, your DataFrame will be:

   A  B
0  1  4
2  3  6

Using Label Lists

Sometimes, you might want to remove multiple rows or columns at once. Pandas allows this by passing a list of labels to the drop() method:

labels_to_drop = ['A', 'B']
df.drop(labels=labels_to_drop, axis=1, inplace=True)

After dropping columns ‘A’ and ‘B’, the DataFrame is empty (assuming these were the only two columns), emphasizing the method’s power and versatility.

Advanced Usage

Moving to more advanced examples, you can also use the drop() method with boolean indexing for more dynamic row or column removal. For instance, dropping rows based on conditions:

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['NY', 'LA', 'Chicago']
})

df.drop(df[df.Age < 30].index, inplace=True)

This code drops rows where the age is less than 30, effectively keeping rows for Bob and Charlie. The operation showcases the use of conditions to filter out rows dynamically, expanding the drop() method’s applicability.

Analogously, to remove columns based on certain conditions (for example, dropping columns that contain any missing values), you could do:

df.drop(columns=df.columns[df.isnull().any()], inplace=True)

This operation emphasizes the flexibility of Pandas for data cleaning and preprocessing, making it a critical tool for data analysis tasks.

Conclusion

Dropping rows and columns in Pandas DataFrames is a straightforward but essential task in data manipulation and cleaning. By following the rich set of options provided by the drop() method, you can efficiently refine your dataset, making it ready for analysis or modeling. Understanding these techniques strengthens your data handling skills and broadens your toolkit as a Python data practitioner.

Next Article: Pandas: Removing duplicate rows from a DataFrame (multiple ways)

Previous Article: Pandas: How to use DataFrame.between_time() method

Series: DateFrames in Pandas

Pandas

How to Use Pandas for Geospatial Data Analysis (3 examples)

February 28, 2024