Pandas: Checking equality of 2 DataFrames (element-wise)

Updated: February 25, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is an essential tool in the Python data science ecosystem, known for its robust features that enable data manipulation and analysis. Among its capabilities, comparing DataFrames element-wise is a critical operation for data scientists and analysts to understand the similarities and differences in their datasets. This tutorial will walk you through five examples, ranging from basic to advanced, demonstrating how to check equality between two DataFrames element-wise.

Basic Comparison with equals()

Let’s start with the simplest approach: using the equals() method. This method checks if two DataFrames are entirely equal, both in values and in order. While not strictly an element-wise comparison, this method serves as a good starting point.

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df1.equals(df2))

Output: True

Element-wise comparison with DataFrame.eq()

For a more granular check, you can use the eq() method. This compares two DataFrames element-wise and returns a new DataFrame of booleans showing where values are equal.

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 6]})
result = df1.eq(df2)
print(result)

Output:

       A      B
0   True   True
1   True   True
2  False   True

Comparison with a Tolerance: pd.testing.assert_frame_equal()

When dealing with floating point numbers, exact matches are sometimes too strict due to precision errors. Pandas offers pd.testing.assert_frame_equal(), which allows for a comparison with a specified tolerance.

import pandas as pd
df1 = pd.DataFrame({'A': [1.0, 2.0, 3.0], 'B': [4.0, 5.0, 6.0]})
df2 = pd.DataFrame({'A': [1.0, 2.001, 3.0], 'B': [4.0, 5.002, 6.0]})
try:
    pd.testing.assert_frame_equal(df1, df2, check_exact=False, atol=0.01)
    print('DataFrames are considered equal.')
except AssertionError as e:
    print('DataFrames are not equal:', e)

Output: DataFrames are considered equal.

Comparing Selected Columns

Often, you only want to compare certain columns between DataFrames. This can be handled by first selecting the columns you wish to compare and then using the eq() method.

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df2 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [10, 11, 12]})
columns_to_compare = ['A', 'B']
result = df1[columns_to_compare].eq(df2[columns_to_compare])
print(result)

Output:

      A     B
0  True  True
1  True  True
2  True  True

Advanced Comparison Using np.where()

For a more detailed analysis, you might want to not just know if elements are equal, but also have a quick view of their values side by side where they are not. NumPy’s where() function can be extremely useful for this.

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})
diff = np.where(df1 == df2, 'Equal', np.nan)
diff_df = pd.DataFrame(diff, columns=df1.columns, index=df1.index)
print(diff_df)

Output:

       A      B
0  Equal  Equal
1  Equal  Equal
2    NaN    NaN

Conclusion

Comparing two DataFrames element-wise is a powerful technique for data analysis and integrity checks. This tutorial covered basic to advanced methods, providing you with tools to handle various comparison scenarios. Whether you’re working with exact matches, needing tolerance allowances, focusing on specific columns, or seeking detailed comparison information, Pandas and NumPy offer robust solutions.