Pandas: How to ‘CROSS JOIN’ 2 DataFrames (5 examples)

Updated: February 23, 2024 By: Guest Contributor Post a comment

Introduction

Cross join is a term borrowed from SQL that represents a Cartesian product between two tables, where each row from the first table is joined to all rows in the second table, resulting in a comprehensive combination of all possible rows. Pandas, the go-to data manipulation library in Python, doesn’t have a built-in function called ‘cross join’, but it still allows performing a cross join operation with its powerful data manipulation capabilities. This tutorial illustrates how to achieve a cross join between two DataFrames in Pandas through multiple examples, escalating from basic to more advanced scenarios.

Preparation

Before diving into the examples, let’s ensure two things. First, ensure pandas is installed in your environment:

pip install pandas

Second, understand the foundational DataFrame structures we’ll be working with:

import pandas as pd

# Sample DataFrame A
A = pd.DataFrame({
    'A_id': [1, 2],
    'A_val': ['A1', 'A2']
})

# Sample DataFrame B
B = pd.DataFrame({
    'B_id': [1, 2, 3],
    'B_val': ['B1', 'B2', 'B3']
})

Example 1: Basic Cross Join

To perform the most straightforward cross join, we assign a temporary key to both DataFrames that share the same value and merge them on this key.

A['key'] = 1
B['key'] = 1
result = pd.merge(A, B, on='key').drop('key', 1)

print(result)

The output will be a new DataFrame that is a Cartesian product of the original DataFrames:

   A_id A_val  B_id B_val
0     1    A1     1    B1
1     1    A1     2    B2
2     1    A1     3    B3
3     2    A2     1    B1
4     2    A2     2    B2
5     2    A2     3    B3

Example 2: Adding Conditions

Sometimes, you might want to control the cross join further by adding conditions after the fact. This doesn’t change the join process but allows filtering the result. Here’s how:

result_filtered = result[result['A_id'] <= result['B_id']]

print(result_filtered)

The trimmed output focuses on specific row combinations:

   A_id A_val  B_id B_val
0     1    A1     1    B1
1     1    A1     2    B2
2     1    A1     3    B3
3     2    A2     2    B2
4     2    A2     3    B3

Example 3: Using a Multi-level Cross Join

This example involves a more complex setup, where both DataFrames have multiple columns that might be considered for the join. You can still apply a temporary key but with a twist to maintain a hierarchy in the resulting DataFrame:

A['key'] = 1
B['key'] = 1
result = pd.merge(A, B, on='key').drop('key', 1)

# Adding hierarchy
result['hierarchy'] = result['A_id'].astype(str) + '-' + result['B_id'].astype(str)

print(result)

The output will have an added column to differentiate each row combination clearly:

   A_id A_val  B_id B_val hierarchy
0     1    A1     1    B1        1-1
1     1    A1     2    B2        1-2
2     1    A1     3    B3        1-3
3     2    A2     1    B1        2-1
4     2    A2     2    B2        2-2
5     2    A2     3    B3        2-3

Example 4: Integrating External Data

Let’s combine external data during a cross join. Suppose DataFrame A’s values are related to information stored in an external source such as a database or a different DataFrame that we integrated as follows:

# Assuming external DataFrame C is present
C = pd.DataFrame({
    'C_id': [1, 2],
    'info': ['Info1', 'Info2']
})

# Cross joining A and B first
A['key'] = 1
B['key'] = 1
AB = pd.merge(A, B, on='key').drop('key', 1)

# Merging with C
AB = AB.merge(C, left_on='A_id', right_on='C_id').drop('C_id', 1)

print(AB)

The resulting DataFrame is now a cross join of A and B, supplemented with information from C:

   A_id A_val  B_id B_val   info
0     1    A1     1    B1   Info1
1     1    A1     1    B1   Info2
2     1    A1     2    B2   Info1
3     1    A1     2    B2   Info2
4     1    A1     3    B3   Info1
5     1    A1     3    B3   Info2
6     2    A2     1    B1   Info1
7     2    A2     1    B1   Info2
8     2    A2     2    B2   Info1
9     2    A2     2    B2   Info2
10    2    A2     3    B3   Info1
11    2    A2     3    B3   Info2

Example 5: Leveraging Pandas for Complex Operations

Finally, in our last example, we’ll showcase a complex scenario where data transformations and conditional logic are applied after performing a cross join. This demonstrates Pandas’ flexibility and how it accommodates various data processing requirements:

# Cross join A and B
A['key'] = 1
B['key'] = 1
result = pd.merge(A, B, on='key').drop('key', 1)

# Example transformation
result['combined'] = result.apply(lambda x: f'{x.A_val}+{x.B_val}', axis=1)

print(result)

The DataFrame now includes a new column with combined values from A and B, exemplifying a post-join transformation:

   A_id A_val  B_id B_val combined
0     1    A1     1    B1      A1+B1
1     1    A1     2    B2      A1+B2
2     1    A1     3    B3      A1+B3
3     2    A2     1    B1      A2+B1
4     2    A2     2    B2      A2+B2
5     2    A2     3    B3      A2+B3

Conclusion

Through these examples, we’ve explored various ways to perform cross joins in Pandas, from the simple to the complex. These techniques showcase Pandas’ versatility in handling different data manipulation tasks, empowering users to tackle a wide range of data-processing challenges.