Pandas – Using DataFrame.update() method

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction

The Pandas library in Python is a powerful tool for data manipulation and analysis. Among its many methods, DataFrame.update() is particularly useful for updating one DataFrame with data from another DataFrame, Series, or dictionary. This tutorial will guide you through various examples of using the DataFrame.update() method, from basic usage to more advanced applications.

Working with DataFrame.update()

The update() method in Pandas allows you to modify a DataFrame in place using data from another DataFrame, a Series, or even a dictionary. It is most commonly used to update the calling DataFrame with matching index/column labels from the second DataFrame. Let’s dive into how this method works with several examples.

Basic Example

import pandas as pd

pd.set_option('future.no_silent_downcasting', True)

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [400, 500, 600]})
df2 = pd.DataFrame({'A': [4, 5], 'B': [700, 800]}, index=[1, 2])

df1.update(df2)
print(df1)

This code block demonstrates the basic use of update(). We have two DataFrames – df1 and df2. After calling df1.update(df2), the values from df2 update the corresponding values in df1 based on the index. The output would be:

   A    B
0  1  400
1  5  800
2  3  600

Updating with a Series

import pandas as pd

ser = pd.Series([100, 101, 102], index=["A", "B", "C"])
df = pd.DataFrame({"A": [1, 2, 3], "B": [400, 500, 600], "C": [700, 800, 900]})

df.update(ser)
print(df)

This example shows how you can update a DataFrame with a Series object. The Series’ index corresponds to the column labels in the DataFrame. After the update, only the matching columns (in this case, ‘A’ and ‘B’) are modified. Output:

     A      B    C
0  100  101  700
1  100  101  800
2  100  101  900

Advanced: Using Non-Matching Indexes

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
df2 = pd.DataFrame({'B': [4, 5, 6]}, index=['y', 'z', 'a'])

df1.update(df2, join='left')
print(df1)

Output:

   A
x  1
y  2
z  3

In this advanced example, we see the use of the join parameter, updating df1 with data from df2 using left join semantics. This means only the indices present in df1 (‘x’, ‘y’, ‘z’) are considered for the operation. As ‘a’ is not in df1, it’s ignored, and since df2 doesn’t contain column ‘A’, only the index influences the update, not the content in this case. There will be no change to df1, demonstrating how the update behavior changes with different join types.

Using overwrite Parameter

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"A": [1, np.NAN, 3]}, dtype=object)
df2 = pd.DataFrame({"A": ["x", "y", "z"]}, index=[1, 2, 3])

df1.update(df2, overwrite=False)
print(df1)

Output:

   A
0  1
1  x
2  3

This snippet illustrates the use of the overwrite parameter. By setting overwrite=False, update() will only change null entries (or in Pandas, NaN values) in the calling DataFrame. Since each location in df1 that corresponds to df2‘s index contains a non-null, no update occurs. Note that to emulate null in Pandas, you can use pd.NA or np.nan for floats.

Conclusion

Through these examples, we’ve seen how the DataFrame.update() method can be utilized to modify DataFrames based on data from another DataFrame, Series, or dictionary. From updating select columns to advancing with join logic and controlling updates with the overwrite parameter, update() offers flexibility in data manipulation. Its ability to update in place makes it a handy tool for quickly modifying datasets without the need to create new DataFrame instances.