Self Joins in PostgreSQL: How to Join a Table to Itself

Updated: January 5, 2024 By: Guest Contributor Post a comment

Table Of Contents

1 Introduction

2 Understanding Self Joins

3 Simple Self Join Example

4 Advanced Self Join with Aggregation

5 Recursive Self Joins

6 Self Joins with Multiple Conditions

7 Performance Considerations

8 Conclusion

Introduction

A self-join in PostgreSQL is a common technique used when you want to join a table to itself to compare rows within the same table. This tutorial will walk you through the concept of self-joins and how to utilize them in various scenarios using PostgreSQL.

Understanding Self Joins

A self join is a case of a regular join, but the table is joined with itself. This can happen for a number of reasons, such as finding related records within the same table. To perform a self join, you must use table aliases to not confuse the SQL engine as you are referencing the same table multiple times.

In PostgreSQL, self joins are usually executed with an INNER JOIN, LEFT JOIN, or other join types by referring to the same table with different aliases. Here’s the generic structure of a self join query:

SELECT A.column_name, B.column_name...
FROM table_name AS A
JOIN table_name AS B
ON A.common_field = B.common_field
WHERE some_condition;

Simple Self Join Example

Imagine a table named ’employees’ with the following columns: id, name, and manager_id, where ‘manager_id’ refers to the ‘id’ of another employee who is the manager.

SELECT
    E1.name AS Employee,
    E2.name AS Manager
FROM
    employees AS E1
INNER JOIN employees AS E2
ON E1.manager_id = E2.id;

This will give you a list of employees and their respective managers.

Advanced Self Join with Aggregation

Suppose you want to count the number of employees each manager has. To achieve this, you would use a self join along with an aggregate function like COUNT().

SELECT
    E2.name AS Manager,
    COUNT(E1.id) AS EmployeeCount
FROM
    employees E1
JOIN employees E2
ON E1.manager_id = E2.id
GROUP BY E2.name;

This query lists managers and the number of employees they manage.

Recursive Self Joins

In some cases, you might need to create chains of records which normally require recursive self joins, such as representation of a hierarchy tree in an organization. This is a complex use case which often makes use of PostgreSQL’s WITH RECURSIVE clause.

WITH RECURSIVE subordinates AS (
    SELECT
        id,
        name,
        manager_id
    FROM
        employees
    WHERE
        name = 'John Doe' -- Starting point
    UNION ALL
    SELECT
        e.id,
        e.name,
        e.manager_id
    FROM
        employees e
    INNER JOIN subordinates s
    ON e.manager_id = s.id
)
SELECT *
FROM subordinates;

This query retrieves all the subordinates under ‘John Doe’ in hierarchical order.

Self Joins with Multiple Conditions

You can also perform more complex self joins involving multiple conditions or comparisons.

SELECT
    A.name,
    B.name
FROM
    employees A
INNER JOIN employees B
ON A.manager_id = B.manager_id
AND A.id <> B.id
AND CONDITION;

This would return pairs of employees who share the same manager.

Performance Considerations

Self joins may impact performance, especially on large tables. It’s important to ensure that the columns used for joining have proper indexes in place. Using EXPLAIN ANALYZE can help you understand the performance of your queries and optimize them.

Conclusion

Self joins are a powerful but sometimes underutilized feature of SQL. They can be particularly useful for comparing and grouping data within the same table. In PostgreSQL, writing efficient self joins involves understanding how to alias tables and optimizing queries for performance. Hopefully, this tutorial has given you a good grasp of how to implement self joins in your PostgreSQL databases.

Next Article: How to set up PostgreSQL and pgAdmin on Mac

Previous Article: Joining more than two tables in PostgreSQL

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL