Pandas Series: Counting NaN and Non-NaN Values

Updated: February 17, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a powerful data manipulation and analysis tool for Python. It provides highly optimized performance with back-end source code purely written in C or Python. One of the basic yet crucial tasks while working with datasets is handling missing or null values. In this tutorial, we will dive deep into counting NaN (Not a Number) and non-NaN values in a Pandas Series, offering you a comprehensive understanding and practical examples ranging from basic to advanced levels.

Getting Started

Before diving into the examples, ensure you have Pandas installed. Install it using pip if you haven’t:

pip install pandas

Import Pandas in your script to get started:

import pandas as pd

Understanding NaN Values

NaN stands for Not a Number, and it is used to denote missing or null values in a dataset. Recognizing and handling NaN values accurately is vital for proper data analysis and preprocessing.

Basic Example: Creating a Series with NaN Values

import numpy as np

# Creating a Pandas Series with NaN values
seriesData = pd.Series([1, np.nan, 3, 4, np.nan])
print(seriesData)

Output:

0    1.0
1    NaN
2    3.0
3    4.0
4    NaN
dtype: float64

This series contains both NaN and non-NaN values. Let’s learn how to count them.

Counting NaN Values

To count NaN values in a Pandas Series, you can use the isna() method followed by sum(), which will return the number of NaN values.

nan_count = seriesData.isna().sum()
print(f'Number of NaN values: {nan_count}')

Output:

Number of NaN values: 2

Counting Non-NaN Values

To count non-NaN values, you can use the notna() method combined with sum(), similarly providing the count of non-NaN values.

non_nan_count = seriesData.notna().sum()
print(f'Number of non-NaN values: {non_nan_count}')

Output:

Number of non-NaN values: 3

Advanced Techniques

Using Value_counts()

The value_counts() method in Pandas Series can also be utilized to get a count. However, by default, it skips NaN values. To include them in the count, use the dropna=False parameter.

# Counting with value_counts(), including NaN
valueCountSeries = seriesData.value_counts(dropna=False)
print(valueCountSeries)

Output:

NaN    2
4.0    1
3.0    1
1.0    1
dtype: float64

Combining Conditions

You can combine conditions using logical operators to filter and count specific scenarios in your data.

# Example: Counting values greater than 2 and not NaN
filtered_count = seriesData[(seriesData > 2) & seriesData.notna()].count()
print(f'Filtered non-NaN count: {filtered_count}')

Output:

Filtered non-NaN count: 2

Working with Larger Datasets

These methods work seamlessly with larger datasets as well. Consider a larger dataset with multiple columns where you might want to apply these techniques column-wise or even across the entire DataFrame.

Conclusion

Understanding how to count NaN and non-NaN values is fundamental for data preprocessing and cleaning, which are critical steps in the data analysis process. The examples provided in this tutorial should equip you with the necessary tools to handle missing values effectively in your datasets. Whether you are working with small Series objects or large DataFrames, Pandas provides efficient and flexible methods to manage and analyze your data.