Sling Academy
Home/PostgreSQL/Using HAVING clause with GROUP BY in PostgreSQL

Using HAVING clause with GROUP BY in PostgreSQL

Last updated: January 06, 2024

Introduction

The HAVING clause in PostgreSQL is used to filter grouped records that result from a GROUP BY clause, similar to how the WHERE clause filters rows before grouping. This tutorial explores how to use the HAVING clause to filter aggregated data, featuring a progression of complexity through various examples.

Understanding GROUP BY

The GROUP BY clause in PostgreSQL, and in SQL generally, organizes identical data into groups. Typically, it’s used in conjunction with aggregate functions like COUNT(), MAX(), MIN(), SUM(), and AVG(). Here’s a simple example of how it can be used:

SELECT category, COUNT(*)
FROM products
GROUP BY category;

This query groups the products by their categories and tells us how many products there are in each category.

Introducing the HAVING Clause

While the WHERE clause limits rows before the aggregation takes place, the HAVING clause applies its conditions after the GROUP BY has been executed. It’s often necessary because aggregate functions cannot be used with WHERE. Below is the basic usage of HAVING:

SELECT category, COUNT(*)
FROM products
GROUP BY category
HAVING COUNT(*) > 10;

This query will only include categories having more than 10 products after the grouping is performed.

Using HAVING with Multiple Conditions

You can combine multiple conditions in the HAVING clause using AND/OR logical operators. Consider this example, where we want to filter categories based on a count of products and the average price:

SELECT category, COUNT(*), AVG(price)
FROM products
GROUP BY category
HAVING COUNT(*) > 10 AND AVG(price) > 100;

Here, only categories with more than 10 products and an average price over $100 will be selected.

HAVING with JOIN Operations

It’s also common to combine HAVING clause with JOIN operations. The below query joins two tables and filters the results using HAVING:

SELECT p.category, COUNT(o.order_id)
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.category
HAVING COUNT(o.order_id) > 50;

With this, we see which product categories result in more than 50 orders.

Advanced Filtering with HAVING by Using Subqueries

Subqueries can be used with the HAVING clause for advanced filtering scenarios. This can be helpful when you want to compare group data against a specific metric obtained from a separate query:

SELECT category
FROM products
GROUP BY category
HAVING AVG(price) > (SELECT AVG(price) FROM products);

This example shows categories where the average price is higher than the overall average price in the products table.

Dynamic HAVING Clauses

PostgreSQL also allows HAVING clauses to reference column aliases defined in the SELECT clause. The following query filters the categories based on a dynamically calculated discount value:

SELECT category, AVG(price) * 0.9 AS discounted_average
FROM products
GROUP BY category
HAVING discounted_average > 300;

This shows product categories where the discounted average price exceeds $300.

Using WINDOW Functions with HAVING

In more complex situations, WINDOW functions can be used in conjunction with the GROUP BY and HAVING clauses. While a detailed discussion of WINDOW functions is outside the scope of this tutorial, it’s worth mentioning that they enable another layer of data analysis and manipulation.

Performance Considerations

Keep in mind that using the HAVING clause can affect the query’s performance, especially with large datasets. It’s essential to analyze the query plan and potentially use indexes on grouped columns to optimize performance.

Conclusion

The HAVING clause is a powerful SQL feature that, when used with GROUP BY, allows for sophisticated filtering of aggregated data in PostgreSQL. It enables database users to execute complex analyses and extract meaningful insights from grouped datasets. When utilizing the HAVING clause, always consider the performance impacts and structure your queries wisely to ensure efficient data retrieval.

Next Article: PostgreSQL: Select rows between two dates/timestamps

Previous Article: PostgreSQL Aggregation in Group By: SUM, AVG, MIN, MAX, and COUNT

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL

You May Also Like

  • PostgreSQL with TimescaleDB: Querying Time-Series Data with SQL
  • PostgreSQL Full-Text Search with Boolean Operators
  • Filtering Stop Words in PostgreSQL Full-Text Search
  • PostgreSQL command-line cheat sheet
  • How to Perform Efficient Rolling Aggregations with TimescaleDB
  • PostgreSQL with TimescaleDB: Migrating from Traditional Relational Models
  • Best Practices for Maintaining PostgreSQL and TimescaleDB Databases
  • PostgreSQL with TimescaleDB: Building a High-Performance Analytics Engine
  • Integrating PostgreSQL and TimescaleDB with Machine Learning Models
  • PostgreSQL with TimescaleDB: Implementing Temporal Data Analysis
  • Combining PostgreSQL, TimescaleDB, and Airflow for Data Workflows
  • PostgreSQL with TimescaleDB: Visualizing Real-Time Data with Superset
  • Using PostgreSQL with TimescaleDB for Energy Consumption Analysis
  • PostgreSQL with TimescaleDB: How to Query Massive Datasets Efficiently
  • Best Practices for Writing Time-Series Queries in PostgreSQL with TimescaleDB
  • PostgreSQL with TimescaleDB: Implementing Batch Data Processing
  • Using PostgreSQL with TimescaleDB for Network Traffic Analysis
  • PostgreSQL with TimescaleDB: Troubleshooting Common Performance Issues
  • Building an IoT Data Pipeline with PostgreSQL and TimescaleDB