Pandas: How to subtract one DataFrame from another (element-wise)

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

Pandas, a powerful and widely-used Python library, offers an extensive set of functionalities for data manipulation and analysis. Among its many features, the ability to perform arithmetic operations on DataFrames is incredibly useful. In this tutorial, we’ll explore how to subtract one DataFrame from another, on an element-wise basis. We’ll start with the basics and gradually move to more advanced concepts, using several code examples.

Getting Started

First, ensure you have Pandas installed in your environment:

pip install pandas

Then, import Pandas in your script:

import pandas as pd

Basic Example

Let’s start with a basic example where we have two DataFrames of the same shape:

df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60]
})

result = df1 - df2
print(result)

This subtracts each element of df2 from its corresponding element in df1, resulting in:

    A   B
0  -9 -36
1 -18 -45
2 -27 -54

Handling Different Shapes

But what if the DataFrames have different shapes? Pandas handles this by aligning on the index and columns as much as possible, introducing NaN for any missing values;

df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'A': [10, 20],
    'B': [40, 50],
    'C': [70, 80]
}, index=[1, 2])

result = df1 - df2
print(result)

The result will show NaN for mismatched dimensions:

     A     B   C
0  NaN   NaN NaN
1 -18.0 -45.0 NaN
2 -27.0 -54.0 NaN

In the coming examples, we’ll use df1 and df2 again.

Operating on Specific Columns

Sometimes you might want to subtract data from specific columns only. You can do this by subtracting Series:

result = df1['A'] - df2['A']
print(result)

Or, to update in the original DataFrame:

df1['A'] = df1['A'] - df2['A']
print(df1)

Subtracting with Functions: sub()

For more control, you can use the sub() function, which allows specifying additional parameters such as fill_value:

result = df1.sub(df2, fill_value=0)
print(result)

Here, instead of NaN for missing places, it uses 0.

Advanced Example: Conditional Subtraction

Let’s consider a more complex situation where you want to subtract one DataFrame from another based on a condition. Assume we only want to subtract if the value in df2 is greater than 15:

cond = df2 > 15
result = df1.where(~cond, df1 - df2, axis='index')
print(result)

This will subtract values in df2 from df1 only where df2‘s values are greater than 15, making it a conditional element-wise subtraction.

Dealing with Non-Numeric Data

In cases where DataFrames have non-numeric data, the subtraction operation is not directly applicable. It’s vital to ensure that you are performing operations on numeric columns or pre-processing the DataFrames to handle or remove non-numeric data.

Conclusion

We’ve covered multiple approaches to subtracting one DataFrame from another, ranging from simple element-wise subtraction to more complex conditional operations. Understanding these methods enables more nuanced data manipulation, crucial for effective data analysis with Pandas. As always, the key is to experiment with these operations in the context of your specific data and analysis requirements.