Pandas DataFrame: Counting rows in each group (4 examples)

Updated: February 23, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a fast, powerful, flexible and easy-to-use open-source data manipulation and analysis library for Python. One of its many features includes grouping rows of data and applying a function to each group, such as counting the number of rows per group. In this tutorial, we’re going to explore four different examples of how to count rows in each group using Pandas DataFrame.

Setup

Before we start, make sure you have Pandas installed in your environment:

pip install pandas

And then, import Pandas in your script:

import pandas as pd

Example 1: Basic GroupBy and Count

Here’s the simplest way to group rows and count them. Imagine you have the following dataframe:

data = {
  'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'A'],
  'Value': [1, 2, 3, 4, 5, 6, 7, 8]
}
df = pd.DataFrame(data)

Then, to count the number of rows for each category, you use the .groupby() method followed by .size():

result = df.groupby('Category').size()
print(result)

This would yield the following output:

A    4
B    2
C    2
dtype: int64

Example 2: Using .value_counts() for Simple Counts

An even more straightforward method for simple categories is to use the .value_counts() method:

result = df['Category'].value_counts()
print(result)

This method automatically sort the counts in descending order, providing a quick overview of which category is most frequent:

A    4
B    2
C    2
Name: Category, dtype: int64

Example 3: Counting with Aggregations

For more sophisticated analysis, you can use .groupby() followed by .agg() to perform multiple aggregation functions, including counts. Let’s add another layer of detail to our dataframe:

data.update({'Subcategory': ['X', 'X', 'Y', 'Y', 'X', 'Y', 'Y', 'X']})
df = pd.DataFrame(data)

Now, we group by both ‘Category’ and ‘Subcategory’ and apply multiple aggregations:

result = df.groupby(['Category', 'Subcategory'])
            .agg({'Category': 'count', 'Value': 'mean'})
            .rename(columns={'Category': 'Count', 'Value': 'Average Value'})
print(result)

This approach allows you to get counts and also perform other calculations, like averaging, within the same operation. The output would look something like this:

                      Count  Average Value
Category Subcategory                        
A        X                  2            4.5
         Y                  2            4.5
B        X                  2            3.5
C        Y                  2            5.5

Example 4: Custom Functions with .apply()

Final example of this tutorial takes it a step further by applying custom functions to groups. This approach grants the utmost flexibility. Suppose you want to count rows but also label those with counts higher than a threshold as ‘High activity’ and the rest as ‘Low activity’:

def custom_count(group):
    count = len(group)
    if count > 3:
        return 'High activity'
    else:
        return 'Low activity'

result = df.groupby('Category').apply(custom_count)
print(result)

This would produce:

Category
A    High activity
B    Low activity
C    Low activity
dtype: object

Conclusion

Counting rows within groups is a fundamental operation for data analysis, providing insights into the structure and frequency of data categories. Pandas offers several intuitive methods to accomplish this, from basic .groupby() and .size() operations to more complex custom functions with .apply(). Having these tools at your disposal empowers you to perform deep data analysis and draw meaningful conclusions about your datasets.