PostgreSQL GROUP BY: How to Group Rows by One or Multiple Columns

Introduction
Using GROUP BY with a Single Column
Grouping by Multiple Columns
Aggregate Functions
Joining Tables and Using GROUP BY
HAVING Clause with GROUP BY
Grouping Sets, Rollup, and Cube
Group by Time Intervals
Dynamic Group By in Stored Procedures
Performance Considerations
Summary

Introduction

Understanding how to effectively group rows of data in a relational database is a crucial skill for any SQL user. The PostgreSQL GROUP BY clause makes it possible, allowing for aggregation of query results into summarized data. This tutorial will guide you through the basics to more advanced uses of the GROUP BY clause, complete with clear examples.

Using GROUP BY with a Single Column

The most straightforward use of the GROUP BY clause is to summarize data by one column. Here’s a simple example:

SELECT category, COUNT(*) FROM products GROUP BY category;

This query counts the number of products within each category. Remember that each column in your SELECT statement that is not an aggregate function must be included in the GROUP BY clause.

Grouping by Multiple Columns

SELECT category, brand, COUNT(*) FROM products GROUP BY category, brand;

By listing multiple columns, you’re asking PostgreSQL to group rows that have the same values in both columns. The result set will present a count of products for each unique combination of category and brand.

Aggregate Functions

Aggregate functions, such as COUNT(), SUM(), AVG(), MAX(), and MIN(), are often used with GROUP BY to provide summary information. Here’s a query that calculates the average price in each category:

SELECT category, AVG(price) FROM products GROUP BY category;

Joining Tables and Using GROUP BY

Grouping can be especially powerful when combined with joins. You can join multiple tables and perform an aggregate function over the joined result set:

SELECT p.category, COUNT(s.order_id)
FROM products p
JOIN sales s ON p.product_id = s.product_id
GROUP BY p.category;

This query would provide you with the number of sales for each product category.

HAVING Clause with GROUP BY

The HAVING clause is used to filter groups after they have been aggregated.

SELECT category, SUM(sales) FROM products GROUP BY category HAVING SUM(sales) > 10000;

This will show categories where the total sales exceed 10,000.

Grouping Sets, Rollup, and Cube

For complex reporting needs, PostgreSQL offers GROUPING SETS, ROLLUP, and CUBE to create multiple grouping sets in a single query. This is advanced usage, but can be incredibly powerful for analytics.

SELECT category, brand, SUM(sales)
FROM products
GROUP BY GROUPING SETS ((category), (brand), (category, brand));

This query provides the sum of sales for all categories, brands, and each combination of category and brand.

Group by Time Intervals

Grouping by time intervals is common in time series data analysis. Using the DATE_TRUNC function allows you to easily group data by day, month, year, etc.:

SELECT DATE_TRUNC('month', sale_date) as sale_month, SUM(sales)
FROM sales
GROUP BY sale_month;

This query will return the total sales for each month.

Dynamic Group By in Stored Procedures

In stored procedures, you might encounter scenarios where the grouping columns are dynamic. This requires PL/pgSQL to craft a flexible GROUP BY clause:

CREATE OR REPLACE FUNCTION dynamic_grouping(field_name text)
RETURNS TABLE (category text, product_count bigint) AS $
BEGIN
RETURN QUERY EXECUTE format('SELECT %I, COUNT(*) FROM products GROUP BY %I', field_name, field_name);
END;
$ LANGUAGE plpgsql;

This function allows a caller to group products by any column.

Performance Considerations

GROUP BY operations can be resource-intensive, especially on large datasets. Indexing the columns you’re grouping by can result in significant performance gains.

Summary

Throughout this tutorial, we’ve explored various aspects of the PostgreSQL GROUP BY clause. Starting with grouping by single and multiple columns, to using aggregate functions, and employing advanced techniques such as grouping sets and time intervals. Remember that careful attention to performance and the use of indexing can make your queries much more efficient. By now, you should feel confident using GROUP BY in your PostgreSQL queries to yield concise and meaningful aggregated data.

Next Article: PostgreSQL: Group rows by day, week, month, and year

Previous Article: PostgreSQL: Using date_part and date_trunc Functions

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL