Using FULL JOIN in PostgreSQL (basic and advanced examples)

Updated: January 5, 2024 By: Guest Contributor Post a comment

Introduction

FULL JOIN in PostgreSQL is a powerful SQL operation that unites result sets from two or more tables. It is invaluable for a complete analysis of data with disparate but related datasets, providing a comprehensive combination of matched and unmatched records.

Understanding FULL JOIN

A FULL JOIN, also known as FULL OUTER JOIN, in PostgreSQL is a clause used in SQL to return all records when there is a match in either left or right table records. This type of join combines the results of both LEFT JOIN and RIGHT JOIN, including all rows from the joined tables, and filling in NULL values in columns from the table that lacks a matching row. FULL JOIN can be particularly useful when you want to understand all the relationships between the data in two tables, including those records that do not have a corresponding match on either side.

Basic FULL JOIN Example

SELECT *
FROM table1
FULL JOIN table2
ON table1.id = table2.id;

This basic query will show all records from both table1 and table2. Where there are matching ids, it will show row data from both tables. Where there is no match, you will see NULL values for the missing side.

Handling Unmatched Rows with COALESCE

Let’s now use the COALESCE function to handle unmatched rows, by replacing NULL with a default value. This function takes two or more arguments and returns the first non-null value.

SELECT COALESCE(table1.id, table2.id) as id,
       COALESCE(table1.description, 'Not in Table1') as description1,
       COALESCE(table2.description, 'Not in Table2') as description2
FROM table1
FULL JOIN table2
ON table1.id = table2.id;

With COALESCE, this query assigns a default text to distinguish which table the NULL values are coming from.

Aggregate Functions with FULL JOIN

In this more advanced example, we will use aggregate functions like SUM and COUNT, to summarize data while retaining the FULL JOIN characteristic of not losing any rows.

SELECT COALESCE(table1.category, table2.category) as category,
       SUM(table1.amount) as total_amount1,
       SUM(table2.amount) as total_amount2,
       COUNT(table1.*) astotal_count1,
       COUNT(table2.*) as total_count2
FROM table1
FULL JOIN table2
ON table1.category = table2.category
GROUP BY category;

This will provide a summary of amounts and counts per category across both tables, including categories that are exclusive to one table.

Advanced FULL JOIN with Subqueries and Filtering

In concert with WHERE clauses and subqueries, FULL JOINs can become an even more powerful tool. Here’s how to filter your FULL JOIN results and using subqueries for dynamic comparisons.

SELECT *
FROM (
    SELECT id, COUNT(*) as total_sales
    FROM table1
    GROUP BY id
) as sales1
FULL JOIN (
    SELECT id, COUNT(*) as total_purchases
    FROM table2
    GROUP BY id
) as sales2
ON sales1.id = sales2.id
WHERE COALESCE(sales1.total_sales,0) > 5
   OR COALESCE(sales2.total_purchases,0) > 5;

This advanced example filters out records by the number of sales or purchases above a certain number.

When to Use FULL JOIN

Use FULL JOIN in scenarios where you need a comprehensive view of the dataset. This may include analyzing gaps between two sets of data, reconciling disparate data sources, or when you’re not sure if there some records exist in one table and not the other. Exercise caution though; FULL JOIN can produce very large result sets and should be used judiciously especially with large tables.

Pitfalls to Avoid

While FULL JOIN is a powerful tool, it can also lead to misinterpretations if not used properly. Be vigilant against: creating overly large result sets that are hard to decipher, neglecting NULL values that could distort results, and utilizing inefficient queries that could slow down database performance. Also, always verify that your ON clause makes sense to prevent incorrect associations between tables.

Conclusion

In conclusion, FULL JOIN in PostgreSQL is a versatile and essential command for any SQL user dealing with relational datasets. Whether for data analysis or integrity checks, FULL JOIN can illuminate the relationships in your data you might otherwise miss. As with any powerful tool, it requires careful handling to avoid common pitfalls and deliver meaningful results.