Sling Academy
Home/Pandas/Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)

Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)

Last updated: March 01, 2024

Overview

In this tutorial, we will explore how to perform a full join, often referred to as a full outer join, between two DataFrames using Pandas in Python. Performing a full join means combining the rows from two tables where there is a match on a common column(s), and including all rows from both DataFrames where there is no match. The result is a DataFrame that has all records from both tables, filled with NaNs (a standard for missing values in Pandas) where there’s no match.

Before diving in, ensure you have the latest version of Pandas installed in your Python environment. If not, you can install or upgrade Pandas using pip:

pip install pandas --upgrade

Example 1: Basic Full Join

Let’s start with the basics. Imagine you have two simple DataFrames-players from two teams, and you want to join them to see a complete roster of players.

import pandas as pd

# Create the first DataFrame
team_a = {'PlayerID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']}
df_team_a = pd.DataFrame(team_a)

# Create the second DataFrame
team_b = {'PlayerID': [4, 5, 3], 'Name': ['Diana', 'Evan', 'Charlie']}
df_team_b = pd.DataFrame(team_b)

# Perform a full join
df_full_join = pd.merge(df_team_a, df_team_b, on='PlayerID', how='outer', suffixes=('_a', '_b'))
print(df_full_join)

The output shows all the players from both teams, including a match on ‘Charlie’ who appears in both teams:

   PlayerID   Name_a   Name_b
c 0        1    Alice      NaN
c 1        2      Bob      NaN
c 2        3  Charlie  Charlie
c 3        4      NaN    Diana
c 4        5      NaN     Evan

Example 2: Adding Missing Keys

Next, consider a scenario where you have additional columns you want to match on, but those columns might have NaN values. Here’s how to handle missing keys in your join operation:

import pandas as pd

# Assuming df_team_a and df_team_b from Example 1

# Adding a new column 'Position' with potential missing values
df_team_a['Position'] = ['Forward', 'Midfield', None]
df_team_b['Position'] = [None, 'Defender', 'Forward']

# Performing a full join allowing for missing 'Position' keys
df_full_join_with_missing_keys = pd.merge(df_team_a, df_team_b, on=['PlayerID', 'Position'], how='outer', suffixes=('_a', '_b'))
print(df_full_join_with_missing_keys)

The output:

   PlayerID   Name_a   Name_b   Position
0        1    Alice      NaN   Forward
1        2      Bob      NaN  Midfield
2        3  Charlie  Charlie   Forward
3        4      NaN    Diana      NaN
4        5      NaN     Evan  Defender

Example 3: Advanced Full Join with Multiple DataFrames

Now let’s push the boundaries by joining more than two DataFrames and managing columns with different names but identical meanings using the left_on and right_on parameters. Imagine another team’s DataFrame with players’ data.

import pandas as pd

# Assuming df_team_a and df_team_b from Example 1

# Another DataFrame for team_c
team_c = {'ID': [6, 3, 7], 'Name': ['Frank', 'Charlie', 'Georgia']}
df_team_c = pd.DataFrame(team_c)

# Performing a full join between df_team_a and df_team_b, then with df_team_c
df_full_join_multi = pd.merge(pd.merge(df_team_a, df_team_b, on='PlayerID', how='outer'), df_team_c, left_on='PlayerID', right_on='ID', how='outer', suffixes=('', '_c'))
print(df_full_join_multi)

The output includes all players from the three teams, demonstrating a complex full join operation:

   PlayerID     Name Name_c   ID
0        1    Alice    NaN  NaN
1        2      Bob    NaN  NaN
2        3  Charlie  Charlie  3.0
3        4      NaN  Diana   NaN
4        5      NaN  Evan    NaN
5        6      NaN  Frank   6.0
6        7      NaN  Georgia 7.0

Conclusion

Through these examples, we’ve seen the power and flexibility of Pandas for data merging, especially for performing a full join. Starting with a simple join and then gradually incorporating nuances such as missing keys and joining multiple frames shows Pandas’ utility in data analysis and preprocessing tasks. Feel free to experiment further with your DataFrames to unlock even more insights.

Next Article: Pandas: How to ‘CROSS JOIN’ 2 DataFrames (5 examples)

Previous Article: Pandas: How to ‘RIGHT JOIN’ 2 DataFrames (with examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)
  • Understanding pandas.DataFrame.loc[] through 6 examples