Sling Academy
Home/Pandas/Understanding DataFrame.join() method in Pandas (5 examples)

Understanding DataFrame.join() method in Pandas (5 examples)

Last updated: February 20, 2024

Overview

The join() method in Pandandas handles merging datasets horizontally by aligning rows based on their indices. It’s a critical tool for data manipulation and is versatile enough to cater to a variety of data join scenarios. This tutorial aims to elucidate the join() method through five examples of increasing complexity.

Prerequisites: To follow this tutorial, you should have a basic understanding of Python and pandas. Ensure pandas is installed in your Python environment: pip install pandas.

Example #1 – Basic Join Operation

Scenario: You have two DataFrames, df1 and df2, containing user data. You wish to combine them horizontally based on their indices.

import pandas as pd

df1 = pd.DataFrame({"Name": ["John", "Anna", "Mike"], "Age": [28, 34, 45]})
df2 = pd.DataFrame(
    {"Email": ["[email protected]", "[email protected]", "[email protected]"]},
    index=[0, 1, 2],
)

result = df1.join(df2)
print(result)

Output:

   Name  Age                  Email
0  John   28     [email protected]
1  Anna   34     [email protected]
2  Mike   45  [email protected]

Example #2 – Joining with Different Indexes

Scenario: Now, let’s assume df2 has a different index alignment. You still want to join df1 and df2 based on df1‘s index.

import pandas as pd

df1 = pd.DataFrame({"Name": ["John", "Anna", "Mike"], "Age": [28, 34, 45]})
df2 = pd.DataFrame({"Email": ["[email protected]", "[email protected]"]}, index=[1, 2])
result = df1.join(df2, how="left")
print(result)

Output:

   Name  Age                Email
0  John   28                  NaN
1  Anna   34     [email protected]
2  Mike   45  [email protected]

Example #3 – Joining with Overlapping Columns

Scenario: What happens when the DataFrames to be joined have overlapping column names? The join() method provides the lsuffix and rsuffix parameters to handle this.

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["John", "Anna", "Mike"],
        "Email": ["[email protected]", "[email protected]", "[email protected]"],
    }
)
df2 = pd.DataFrame(
    {
        "Email": ["[email protected]", "[email protected]", "[email protected]"],
        "Location": ["NY", "LA", "Chicago"],
    },
    index=[0, 1, 2],
)
result = df1.join(df2, lsuffix="_left", rsuffix="_right")
print(result)

Output:

   Name            Email_left           Email_right  Location
0  John    [email protected]     [email protected]        NY
1  Anna    [email protected]     [email protected]      LA
2  Mike    [email protected]  [email protected]  Chicago

Example #4 – Using the on parameter

Scenario: Sometimes, you might want to join on a specific column rather than the index. In this case, you can use the on parameter. Note that the column specified in on must be present in both DataFrames.

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["John", "Anna", "Mike"],
        "Email": ["[email protected]", "[email protected]", "[email protected]"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["John", "Anna", "Mike"],
        "Location": ["NY", "LA", "Chicago"],
    },
    index=[0, 1, 2],
)

df1.set_index("Name", inplace=True)
df2.set_index("Name", inplace=True)
result = df1.join(df2, on="Name")
print(result)

Output:

                   Email Location
Name                             
John  [email protected]       NY
Anna   [email protected]       LA
Mike    [email protected]  Chicago

Example #5 – Complex Join with Multiple DataFrames

Scenario: For our final example, let’s combine multiple join operations into a single line of code, showcasing the versatility of the join() method.

import pandas as pd

df1 = pd.DataFrame({"Name": ["John", "Anna", "Mike"], "Age": [28, 34, 45]})
df2 = pd.DataFrame(
    {
        "Email": ["[email protected]", "[email protected]", "[email protected]"],
        "Location": ["NY", "LA", "Chicago"],
    },
    index=[0, 1, 2],
)

df3 = pd.DataFrame({"Salary": [70000, 120000, 90000]}, index=["John", "Anna", "Mike"])

result = df1.join([df2, df3])
print(result)

Output:

   Name   Age                   Email Location  Salary
0  John  28.0        [email protected]       NY     NaN
1  Anna  34.0     [email protected]       LA     NaN
2  Mike  45.0  [email protected]  Chicago     NaN

Conclusion

The join() method in pandas is a powerful function for horizontally combining DataFrames. As we’ve explored through five examples, it adapts to various data alignment and merging scenarios, making your data manipulation tasks more efficient and streamlined.

Next Article: Pandas: How to merge 2 DataFrames

Previous Article: Mastering DataFrame.compare() method in Pandas (5 examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)