Pandas DataFrame.value_counts() method: Explained with examples

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is an open-source data manipulation and analysis library for Python, offering data structures and operations for manipulating numerical tables and time series. Among its versatile functions, DataFrame.value_counts() is a crucial method for data analysis, enabling users to count the frequency of unique values in a DataFrame or Series. This tutorial delves into the value_counts() method, demonstrating its applications through progressively complex examples.

The Fundamentals

Before we dive into examples, it’s essential to understand what value_counts() does. The method returns a Series containing the counts of unique values. This function is immensely helpful in exploratory data analysis, allowing us to quickly identify frequency distributions.

To start, let’s create a basic pandas DataFrame:

import pandas as pd

# Sample DataFrame
data = {'color': ['blue', 'green', 'red', 'blue', 'green']}
dataframe = pd.DataFrame(data)

Now, let’s call the value_counts() method on the ‘color’ column:

print(dataframe['color'].value_counts())

The output will be:

blue     2
green    2
red      1
Name: color, dtype: int64

Here, you can see the frequency of each color, indicating ‘blue’ and ‘green’ appear twice, while ‘red’ only once.

Customizing value_counts()

The value_counts() method offers several parameters to customize its output, such as sort, ascending, and normalize. Let’s see how we can apply these to our data:

print(dataframe['color'].value_counts(sort=True, ascending=True))

Now, the output will list the colors in ascending order based on their count:

red      1
blue     2
green    2
Name: color, dtype: int64

By setting normalize=True, we can also get the relative frequencies:

print(dataframe['color'].value_counts(normalize=True))

The output here shows the proportion of each color:

blue     0.4
green    0.4
red      0.2
Name: color, dtype: int64

Using value_counts() on multiple columns

Unfortunately, value_counts() cannot be directly used on DataFrame objects to count across multiple columns. However, you can achieve this by concatenating columns of interest into a single Series or by applying a workaround. Let’s explore a simple method to apply value_counts() on multiple columns using melt():

# Assuming the same DataFrame 'dataframe'
dataframe['number'] = [1, 2, 1, 1, 3]  # Add a new column

melted_df = pd.melt(dataframe)
print(melted_df['value'].value_counts())

This method essentially reshapes the DataFrame, making value_counts() applicable. The output will be a combined count of values across all columns.

Advanced example: Grouping with value_counts()

Another powerful feature of pandas is grouping data using the groupby() method, which can be combined with value_counts() for more intricate analysis. Let’s look at a group-wise value count example:

# More complex DataFrame
complex_data = {'color': ['blue', 'green', 'red', 'blue', 'green', 'green'],
                'shape': ['circle', 'triangle', 'circle', 'square', 'square', 'circle']}
complex_df = pd.DataFrame(complex_data)

gb = complex_df.groupby('color')['shape'].value_counts()
print(gb)

This operation groups the DataFrame by the ‘color’ column, then applies value_counts() on the ‘shape’ column within each group. The result is a multi-index Series showing the count of shapes for each color.

Conclusion

The value_counts() method in pandas is a versatile tool for counting unique values within a Series. Throughout this tutorial, we’ve explored different ways to utilize this method, from simple frequency counts to more advanced applications involving data grouping and manipulation. By mastering value_counts(), you can enrich your data analysis process, gaining deeper insights into your datasets.