Sling Academy
Home/Pandas/Pandas: Select Columns by Data Type in a DataFrame

Pandas: Select Columns by Data Type in a DataFrame

Last updated: March 10, 2023

This concise and straightforward article shows you how to select columns by data type in a DataFrame in Pandas.

What is the point?

In Pandas, we can select columns based on their data types using the select_dtypes() method. This method returns a new DataFrame containing only columns of the specified data type. We can specify the data type we want to select using the include or exclude parameters.

If you want to select only columns with numerical data types, you can use:

df.select_dtypes(include=['number'])

In case you want to select only columns with string data types, you can use:

df.select_dtypes(include=['object'])

You can also combine multiple data types using a list or a tuple. For example, if you want to retrieve columns with integer or boolean data types, use this:

df.select_dtypes(include=['int', 'bool'])

The exclude parameter is helpful when you need to specify which data types to exclude from the selection. For example, if you want to select all columns except those with datetime data types, you can use:

df.select_dtypes(exclude=['datetime'])

For a better and deeper understanding, see the real-world examples below.

Examples

Example 1

Suppose we have a DataFrame containing columns of different data types:

import pandas as pd

data = {'name': ['John', 'Jane', 'Bob'],
        'age': [25, 30, 35],
        'net_worth': [1000, 20000, 300000],
        'is_investor': [False, False, True]}

df = pd.DataFrame(data)

To select only the columns with numeric data types, we call the select_dtypes() method with the include parameter set to number like so:

numeric_cols = df.select_dtypes(include='number')
print(numeric_cols)

Output:

   age  net_worth
0   25       1000
1   30      20000
2   35     300000

To select only the non-numeric columns (object, bool, category), we can use the select_dtypes() method with the exclude parameter set to number:

non_numeric_cols = df.select_dtypes(exclude='number')
print(non_numeric_cols)

Output:

   name  is_investor
0  John        False
1  Jane        False
2   Bob         True

Example 2

Let’s say we have a dataset of customer transactions containing columns of different data types:

import pandas as pd

data = {
    'transaction_id': [101, 102, 103, 104, 105],
    'date': ['2023-01-01', '2024-01-02', '2024-01-03', '2023-01-04', '2023-01-05'],
    'customer_id': ['C001', 'C002', 'C003', 'C004', 'C005'],
    'product_name': ['A', 'B', 'C', 'A', 'B'],
    'price': [10.5, 20.0, 5.5, 7.0, 18.5],
    'quantity': [2, 1, 3, 4, 2],
    'is_returned': [False, False, True, False, True]
}

df = pd.DataFrame(data)

# Convert the date column to datetime type
df['date'] = pd.to_datetime(df['date'])

Suppose we are only interested in the dates of transactions and whether those transactions are from old customers or not, we will do the following:

numeric_cols = df.select_dtypes(include=['datetime64', 'bool'])
print(numeric_cols)

Output:

        date  is_returned
0 2023-01-01        False
1 2024-01-02        False
2 2024-01-03         True
3 2023-01-04        False
4 2023-01-05         True

What’s next?

You can practice more about selecting columns by data type in Pandas DataFrames with larger and more complex datasets provided by Sling Academy:

Good luck!

Next Article: Pandas: How to save a DataFrame in JSON format (3 examples)

Previous Article: Pandas: How to save a DataFrame to an Excel file

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)