Pandas: Count the number of rows and columns in a DataFrame

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

In data analysis, understanding the structure of your dataset is crucial before diving into more complex manipulations and analysis. One of the fundamental aspects of dataset structure is the size of your data frame, specifically, the number of rows and columns. This article will guide you through different methodologies to count the rows and columns in a DataFrame using Pandas, a cornerstone library in Python for data manipulation and analysis. We’ll start with basic techniques and gradually move to more advanced methods, including real-world applications.

Preparation

Before we can count anything, you need Pandas installed in your Python environment. You can install Pandas using pip:

pip install pandas

Once installed, you’ll need to import Pandas to start manipulating data frames:

import pandas as pd

Creating a Sample DataFrame

For the purpose of this tutorial, let’s create a simple DataFrame to work with:

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
       'Age': [28, 34, 29, 32],
       'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

This code snippet creates a DataFrame with names, ages, and cities of four individuals. Here’s our starting point.

Basic Row and Column Count

The simplest way to count the number of rows and columns is by using shape attribute of a DataFrame:

print(df.shape)

This will output a tuple where the first element represents the number of rows, and the second element the number of columns:

(4, 3)

Hence, our DataFrame has 4 rows and 3 columns.

Counting Rows

To count just the number of rows, you can use the len() function along with the index attribute:

print(len(df.index))

This outputs:

4

indicating that there are four rows.

Counting Columns

For counting columns, you’ll use the columns attribute combined with len():

print(len(df.columns))

This outputs:

3

showing there are three columns.

Using Count Method for Non-NA Values

If you’re interested in counting rows based on non-NA or non-null values for a specific column, you can use the count() method:

print(df['Age'].count())

This outputs:

4

indicating that the ‘Age’ column has four entries with non-NA values.

Advanced Techniques

Counting Rows with Conditions

Sometimes you’ll want to count the rows that meet certain criteria. For instance, to count the number of people older than 30:

print(len(df[df['Age'] > 30]))

This code filters the DataFrame for rows where the age is greater than 30, and then uses len() to count the filtered rows, outputting:

2

Counting Columns Based on Type

To count the number of columns of a certain data type, you can use the following approach:

print(len(df.select_dtypes(include=['object']).columns))

This will output the number of columns that are of type object (typically strings in Pandas), which for our DataFrame is:

2

Dynamic Row and Column Counting

In real-world scenarios, you might need to dynamically track changes in the number of rows or columns after various operations, such as after dropping missing values or adding new columns. Always use shape, len(df.index), and len(df.columns) after such operations to get updated counts.

Conclusion

Knowing how to count the number of rows and columns in a DataFrame is an essential skill in data analysis with Pandas. This tutorial explored various methods, from the basic shape attribute, to counting non-NA values and applying conditions for more insightful counts. As you become more comfortable with these techniques, you’ll be able to quickly assess and understand the structure of your datasets, paving the way for deeper data exploration and analysis.