Mastering pandas.Series.fillna() method (6 examples)

Updated: February 18, 2024 By: Guest Contributor Post a comment

Overview

Pandas is an indispensable part of the Python data science ecosystem, providing robust, flexible, and efficient tools for data manipulation and analysis. Among its features, the fillna() method is a powerful way to handle missing values in data frames and series. In this tutorial, we will master the use of the Series.fillna() method with six real-world examples, moving from basic applications to more advanced use cases.

Working with pandas.Series.fillna()

Missing data is a common issue in data analysis and can significantly impact the results. The pandas.Series.fillna() method provides a convenient way to handle these missing values by replacing them with a specified value. Before diving into examples, ensure you have pandas installed:

pip install pandas

Example 1: Basic Usage

Let’s start with the simplest example, replacing NaN (Not a Number) with a specified value.

import pandas as pd

# Create a pandas Series with missing values
s = pd.Series([1, pd.NaT, 3, None])

# Fill missing values with 0
filled_s = s.fillna(0)
print(filled_s)

Output:

0    1
1    0
2    3
3    0
dtype: int64

Example 2: Using Objects

In some cases, replacing missing values with integers or floats is not suitable. So, let’s use a string to fill in the missing data.

import pandas as pd

# Create a Series with missing values
s = pd.Series(["apple", None, "banana", pd.NaT, "cherry"])

# Fill missing values with 'unknown'
filled_s = s.fillna('unknown')
print(filled_s)

Output:

0      apple
1    unknown
2     banana
3    unknown
4     cherry
dtype: object

Example 3: Forward Fill

Replacing missing data with the same value may not always be desired. A more nuanced approach is the forward fill, where the last known value propagates forward until another non-missing value is found.

import pandas as pd

# Create a Series
s = pd.Series([1, None, 2, None, None, 3])

# Use forward fill
tv_s = s.fillna(method='ffill')
print(tv_s)

Output:

0    1.0
1    1.0
2    2.0
3    2.0
4    2.0
5    3.0
dtype: float64

Example 4: Backward Fill

Backward fill is similar to forward fill but in the opposite direction. This method is particularly useful for time-series data where future values are also known.

import pandas as pd

# Create a Series
s = pd.Series([None, 2, None, 4, None, 6])

# Use backward fill
filled_s = s.fillna(method='bfill')
print(filled_s)

Output:

0    2.0
1    2.0
2    4.0
3    4.0
4    6.0
5    6.0
dtype: float64

Example 5: Using a Limit

Both forward and backward fills can be constrained with a limit to control the number of consecutive NaNs to replace. This adds more accuracy in handling datasets with missing values.

import pandas as pd

# Create a Series
s = pd.Series([1, None, None, 2, None, None, 3])

# Use forward fill with a limit
tv_s = s.fillna(method='ffill', limit=1)
print(tv_s)

Output:

0    1.0
1    1.0
2    NaN
3    2.0
4    2.0
5    NaN
6    3.0
dtype: float64

Example 6: Using a Series for Replacement

For a more sophisticated approach, filling missing values with a dynamic set determined by another pandas Series is incredibly useful. This can bring a new level of depth to your data-cleaning processes.

import pandas as pd

# Create two Series
s1 = pd.Series([1, None, 3])
s2 = pd.Series(["a", "b", "c"])

# Use s2 to fill missing values in s1
filled_s1 = s1.fillna(s2)
print(filled_s1)

Output:

0    1.0
1      b
2    3.0
dtype: object

Directly using a series in fillna without any mapping will raise an error because pandas expect a scalar or dictionary for filling missing values. To use values from another series, mapping based on index or other criteria must be performed, often involving additional steps such as merge or apply functions to design a suitable fill logic.

Conclusion

The pandas.Series.fillna() method is a versatile tool for dealing with missing data. By choosing the right approach and methods like forward fill, backward fill, setting limits, or utilizing another series for dynamic replacements, you can effectively manage and mitigate the impact of missing data in your pandas dataframes. Understanding these techniques will significantly elevate your data handling and cleaning skills in Python.