Sling Academy
Home/Pandas/Mastering pandas.Series.fillna() method (6 examples)

Mastering pandas.Series.fillna() method (6 examples)

Last updated: February 18, 2024

Overview

Pandas is an indispensable part of the Python data science ecosystem, providing robust, flexible, and efficient tools for data manipulation and analysis. Among its features, the fillna() method is a powerful way to handle missing values in data frames and series. In this tutorial, we will master the use of the Series.fillna() method with six real-world examples, moving from basic applications to more advanced use cases.

Working with pandas.Series.fillna()

Missing data is a common issue in data analysis and can significantly impact the results. The pandas.Series.fillna() method provides a convenient way to handle these missing values by replacing them with a specified value. Before diving into examples, ensure you have pandas installed:

pip install pandas

Example 1: Basic Usage

Let’s start with the simplest example, replacing NaN (Not a Number) with a specified value.

import pandas as pd

# Create a pandas Series with missing values
s = pd.Series([1, pd.NaT, 3, None])

# Fill missing values with 0
filled_s = s.fillna(0)
print(filled_s)

Output:

0    1
1    0
2    3
3    0
dtype: int64

Example 2: Using Objects

In some cases, replacing missing values with integers or floats is not suitable. So, let’s use a string to fill in the missing data.

import pandas as pd

# Create a Series with missing values
s = pd.Series(["apple", None, "banana", pd.NaT, "cherry"])

# Fill missing values with 'unknown'
filled_s = s.fillna('unknown')
print(filled_s)

Output:

0      apple
1    unknown
2     banana
3    unknown
4     cherry
dtype: object

Example 3: Forward Fill

Replacing missing data with the same value may not always be desired. A more nuanced approach is the forward fill, where the last known value propagates forward until another non-missing value is found.

import pandas as pd

# Create a Series
s = pd.Series([1, None, 2, None, None, 3])

# Use forward fill
tv_s = s.fillna(method='ffill')
print(tv_s)

Output:

0    1.0
1    1.0
2    2.0
3    2.0
4    2.0
5    3.0
dtype: float64

Example 4: Backward Fill

Backward fill is similar to forward fill but in the opposite direction. This method is particularly useful for time-series data where future values are also known.

import pandas as pd

# Create a Series
s = pd.Series([None, 2, None, 4, None, 6])

# Use backward fill
filled_s = s.fillna(method='bfill')
print(filled_s)

Output:

0    2.0
1    2.0
2    4.0
3    4.0
4    6.0
5    6.0
dtype: float64

Example 5: Using a Limit

Both forward and backward fills can be constrained with a limit to control the number of consecutive NaNs to replace. This adds more accuracy in handling datasets with missing values.

import pandas as pd

# Create a Series
s = pd.Series([1, None, None, 2, None, None, 3])

# Use forward fill with a limit
tv_s = s.fillna(method='ffill', limit=1)
print(tv_s)

Output:

0    1.0
1    1.0
2    NaN
3    2.0
4    2.0
5    NaN
6    3.0
dtype: float64

Example 6: Using a Series for Replacement

For a more sophisticated approach, filling missing values with a dynamic set determined by another pandas Series is incredibly useful. This can bring a new level of depth to your data-cleaning processes.

import pandas as pd

# Create two Series
s1 = pd.Series([1, None, 3])
s2 = pd.Series(["a", "b", "c"])

# Use s2 to fill missing values in s1
filled_s1 = s1.fillna(s2)
print(filled_s1)

Output:

0    1.0
1      b
2    3.0
dtype: object

Directly using a series in fillna without any mapping will raise an error because pandas expect a scalar or dictionary for filling missing values. To use values from another series, mapping based on index or other criteria must be performed, often involving additional steps such as merge or apply functions to design a suitable fill logic.

Conclusion

The pandas.Series.fillna() method is a versatile tool for dealing with missing data. By choosing the right approach and methods like forward fill, backward fill, setting limits, or utilizing another series for dynamic replacements, you can effectively manage and mitigate the impact of missing data in your pandas dataframes. Understanding these techniques will significantly elevate your data handling and cleaning skills in Python.

Next Article: Pandas Series.interpolate() method: A detailed guide

Previous Article: Understanding pandas.Series.ffill() method (with examples)

Series: Pandas Series: From Basic to Advanced

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)