Sling Academy
Home/Pandas/Pandas DataFrame: How to compare 2 columns (row-wise)

Pandas DataFrame: How to compare 2 columns (row-wise)

Last updated: February 21, 2024

Introduction

Comparing two columns in a Pandas DataFrame is a common operation that you might need to perform for various data analysis tasks. Whether you’re looking to identify mismatches, find matches, or simply evaluate the differences between columns, Pandas provides versatile tools to accomplish these tasks effectively. This tutorial will guide you through several methods, from basic to advanced, including real-life code examples and their outputs.

Basic Comparison with Equality Operator

The simplest way to compare two columns in a DataFrame is by using the equality operator (==). This method allows you to check if the values in one column are equal to those in another column on a row-wise basis.

import pandas as pd
data = {'A': [1, 2, 3, 4], 'B': [1, 2, 4, 4]}
df = pd.DataFrame(data)
df['A_equals_B'] = df['A'] == df['B']
print(df)

Output:

   A  B  A_equals_B
0  1  1        True
1  2  2        True
2  3  4       False
3  4  4        True

This basic comparison gives you a new column indicating whether the values in columns A and B match on a row-wise basis.

Using the numpy.where Method

For more flexibility in comparison, such as custom output for matches and mismatches, numpy.where is an excellent choice. It allows you to specify the output for both conditions.

import numpy as np
import pandas as pd
data = {'A': [1, 2, 3, 4], 'B': [1, 2, 0, 4]}
df = pd.DataFrame(data)
df['Comparison'] = np.where(df['A'] == df['B'], 'Match', 'Mismatch')
print(df)

Output:

   A  B Comparison
0  1  1      Match
1  2  2      Match
2  3  0   Mismatch
3  4  4      Match

With numpy.where, you easily customize the output for visual distinction between matches and mismatches.

Column-wise Comparison Using apply with a Custom Function

If you require a more complex comparison, for instance, allowing a tolerance level in numeric differences, apply combined with a custom lambda function could be the answer. This approach gives you the flexibility to define any logic for comparison.

import pandas as pd
data = {'A': [100, 200, 300, 400], 'B': [100, 210, 295, 405]}
df = pd.DataFrame(data)
comparison_fn = lambda row: abs(row['A'] - row['B']) \<= 10
df['Close_Enough'] = df.apply(comparison_fn, axis=1)
print(df)

Output:

     A    B  Close_Enough
0  100  100          True
1  200  210          True
2  300  295          True
3  400  405          True

This method showcases how to apply a custom comparison that checks if the values in two columns are within a specified tolerance level.

Advanced: Using pandas.merge for Complex Comparisons

For comparing two columns from different DataFrames or conducting more sophisticated comparisons, pandas.merge can serve your needs. This method allows you to join two DataFrames based on the matching of column values.

For this example, let’s assume you have two DataFrames and you want to find common rows based on columns A and B.

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [2, 3, 4], 'B': [5, 6, 7]})
common_rows = pd.merge(df1, df2, on=['A', 'B'])
print(common_rows)

Output:

   A  B
0  2  5
1  3  6

This shows how pandas.merge can be used to find rows that match across two DataFrames based on specified columns.

Conclusion

From simple equality checks to complex comparisons involving custom logic or different DataFrames, Pandas provides a robust set of tools for comparing columns row-wise. By understanding and utilizing these methods, you can efficiently conduct various data comparison tasks to aid your data analysis process.

Next Article: Pandas: Select columns whose names start/end with a specific string (4 examples)

Previous Article: Pandas: Insert a row to a specific position in a DataFrame (3 ways)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)