Pandas DataFrame.fillna() method (5 examples)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

Working with data in Python often means dealing with missing values in datasets. The pandas library, a powerhouse for data manipulation and analysis, provides a versatile method fillna() to handle such missing data in DataFrames. This tutorial will walk you through five practical examples of using the fillna() method, escalating from basic applications to more advanced uses.

What does fillna() do?

The pandas.DataFrame.fillna() method is used to fill in missing values in a DataFrame. The method offers flexibility in terms of what value to use for filling gaps, allowing for constants, dictionary, Series, or DataFrame objects as inputs. It can fill missing values in place, or return a copy of the DataFrame with missing values filled.

Example 1: Filling with a Constant Value

import pandas as pd

# Creating a sample DataFrame with missing values
data = {"Name": ["John", "Jane", "Anna"], "Age": [28, None, 22], "City": [None, "New York", "London"]}
df = pd.DataFrame(data)

# Filling missing values with a constant
filled_df = df.fillna("Unknown")
print(filled_df)

This simple example demonstrates how you can fill all missing values in a DataFrame with a constant string “Unknown”. The resulting DataFrame will not have any missing values:

   Name      Age     City
0  John       28  Unknown
1  Jane  Unknown  New York
2  Anna       22   London

Example 2: Filling with Column-Specific Values

import pandas as pd

# Again, starting with our sample DataFrame
data = {"Name": ["John", "Jane", "Anna"], "Age": [28, None, 22], "City": [None, "New York", "London"]}
df = pd.DataFrame(data)

# Filling missing values using a dictionary to specify different fill values for each column
fill_values = {"Age": df["Age"].mean(), "City": "Not Provided"}
df.fillna(fill_values, inplace=True)
print(df)

In this example, missing values in the ‘Age’ column are filled with the column’s mean value, and those in the ‘City’ column with the string “Not Provided”. This method allows for more meaningful data imputation:

   Name   Age         City
0  John  28.0      Not Provided
1  Jane  25.0      New York
2  Anna  22.0      London

Example 3: Using Method Parameters (‘ffill’ and ‘bfill’)

import pandas as pd

# Yet again, our starting point DataFrame
data = {"Name": ["John", "Jane", "Anna"], "Age": [28, None, 22], "City": [None, "New York", "London"]}
df = pd.DataFrame(data)

# Using 'ffill' to forward fill the missing values
df.fillna(method='ffill', inplace=True)

# For the sake of illustration, let's reset and use 'bfill'
df = pd.DataFrame(data)
df.fillna(method='bfill', inplace=True)
print(df)

Forward fill (‘ffill’) copies a value from the previous row to fill a gap, while backward fill (‘bfill’) uses the next row’s value. This approach is suitable for time series or ordered data:

   Name   Age       City
0  John  28.0      Not Provided
1  Jane  28.0      New York
2  Anna  22.0      London

Example 4: Filling with a Series

import pandas as pd
# Sample DataFrame
data = {"Name": ["John", "Jane", "Anna"], "Sales": [None, 150, None]}
df = pd.DataFrame(data)

# Creating a Series to use for filling missing values
fill_series = pd.Series([100, 110, 120])

# Filling missing values in the 'Sales' column with the Series values
df['Sales'] = df['Sales'].fillna(fill_series)
print(df)

Here, we fill missing values in the ‘Sales’ column using a Series, demonstrating the flexibility to align by index between a DataFrame column and a Series:

   Name  Sales
0  John    100
1  Jane    150
2  Anna    120

Example 5: Filling Using a Function

import pandas as pd

# DataFrame setup
data = {"Name": ["John", "Jane", "Anna"], "Performance": [None, "Good", None]}
df = pd.DataFrame(data)

# Defining a custom function to fill missing values based on other column values or conditions
def fill_performance(row):
    if row['Name'] == 'John':
        return 'Excellent'
    else:
        return 'Satisfactory'

df['Performance'] = df.apply(lambda row: fill_performance(row) if pd.isna(row['Performance']) else row['Performance'], axis=1)
print(df)

In this advanced example, we employ a custom function to dynamically fill missing values based on other values within the same row. This illustrates the versatility of fillna() when combined with other pandas functionalities:

   Name Performance
0  John   Excellent
1  Jane       Good
2  Anna Satisfactory

Conclusion

Throughout this tutorial, we explored five different strategies for using the pandas.DataFrame.fillna() method, ranging from simple substitutions to more nuanced and conditional methods of data imputation. By understanding these techniques, you can tackle missing data in your datasets more effectively and maintain the integrity of your analysis.