PostgreSQL: Using Subqueries with JOINs

Introduction
Basic Concepts
1. Simple Subquery with INNER JOIN
2. Subquery in a WHERE Clause
Advanced Uses of Subqueries and JOINs
Performance Considerations
Conclusion

Introduction

Diving into relational databases often entails complex queries, especially when merging data from multiple tables. This tutorial will focus on employing subqueries alongside JOINs in PostgreSQL to create powerful and efficient SQL statements.

Basic Concepts

Before we jump into the queries, let’s clarify some key concepts. A subquery is a SQL query nested inside a larger query. A JOIN operation in SQL is used to combine rows from two or more tables, based on a related column between them.

Now, let’s look at some examples using sample tables to illustrate how subqueries can be used with joins.

Simple Subquery with INNER JOIN

SELECT e.name, d.department_name
FROM employees e
INNER JOIN (
 SELECT department_id, department_name
 FROM departments
 WHERE location_id = 'NYC'
) d ON e.department_id = d.department_id;

In this example, we are joining the employees table with a subquery that returns departments located in New York City. This can be helpful when the departments table is very large and we want to limit the number of rows joined.

Subquery in a WHERE Clause

SELECT name, salary
FROM employees
WHERE department_id IN (
 SELECT department_id
 FROM departments
 WHERE location_id = 'SF'
);

This query will return all employees who are in departments located in San Francisco. The subquery identifies which department IDs correspond to the ‘SF’ location.

Advanced Uses of Subqueries and JOINs

As we delve deeper, let’s explore more complex scenarios.

Correlated Subqueries

SELECT e.name, e.salary, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE EXISTS (
 SELECT 1
 FROM project_assignments pa
 WHERE pa.employee_id = e.employee_id
 AND pa.project_id = 'P123'
);

This query leverages EXISTS with a correlated subquery. It returns details for employees assigned to a specific project (‘P123’) within their respective departments.

JOIN with Aggregate Functions

SELECT d.department_name, AVG(e.salary) AS avg_salary
FROM departments d
LEFT JOIN (
 SELECT department_id, salary
 FROM employees
) e ON d.department_id = e.department_id
GROUP BY d.department_name;

Here we’re doing a LEFT JOIN that includes an aggregated calculation. The subquery creates a temporary table with department IDs and salaries from which we can calculate the average salary for each department.

Subquery As a JOIN Condition

SELECT e1.name AS employee_name, e2.name AS manager_name
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.employee_id
AND e2.salary > ALL (
 SELECT AVG(salary)
 FROM employees
 GROUP BY department_id
);

This query compares salaries to ensure that we only join employees (e1) with their managers (e2), where the managers have salaries higher than the average salaries of all departments, using subqueries for comparison.

Performance Considerations

Subqueries can be power tools, but they come with caveats. The excessive use of subqueries can slow down query performance, especially when dealing with large tables or performing subqueries that are not optimized. It is often a best practice to test subqueries separately, check their execution plan using EXPLAIN, and ensure that indexes are used properly.

Conclusion

By now, you should have a solid grasp on using subqueries together with JOIN operations within PostgreSQL. Remember, while subqueries can add power and flexibility to your queries, they have to be used judiciously for maintaining optimized query performance. With practice, you’ll find the perfect balance in using these features to their fullest.

Next Article: Checking Index Size in PostgreSQL

Previous Article: PostgreSQL: Order entries in UNION without using ORDER BY

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL