PostgreSQL: Ranking rows with RANK, DENSE_RANK, and ROW_NUMBER

Updated: January 5, 2024 By: Guest Contributor Post a comment

Introduction

When working with data, it’s often necessary to order and rank rows based on specific criteria. PostgreSQL provides powerful ranking functions like RANK, DENSE_RANK, and ROW_NUMBER to make these tasks easier. In this tutorial, you’ll learn how to use these functions with progressively complex examples.

Understanding Ranking Functions

In PostgreSQL, ranking functions are a subset of window functions that assign a unique rank value to each row within a partition of a result set. Before diving into examples, let’s define each function:

  • RANK: Assigns a unique rank to each row, with gaps for ties.
  • DENSE_RANK: Similar to RANK, but without gaps in ranking numbers for tied values.
  • ROW_NUMBER: Assigns a unique number to each row, irrespective of ties, starting from 1.

Basic Usage

Let’s start with a simple example where we rank sales data:

SELECT salesperson_id,
       sales_amount,
       RANK() OVER (ORDER BY sales_amount DESC) AS sales_rank
FROM sales;

This query assigns a rank to each salesperson based on their sales_amount in descending order.

Ranking With Ties

Now let’s see how RANK handles ties:

SELECT salesperson_id,
       sales_amount,
       RANK() OVER (ORDER BY sales_amount DESC) AS sales_rank
FROM sales;

-- If two salespeople have the same sales_amount, they will receive the same rank, and the subsequent rank will be skipped.

DENSE_RANK, on the other hand, does not skip ranks:

SELECT salesperson_id,
       sales_amount,
       DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_sales_rank
FROM sales;

Partitioning Data

Partitioning allows ranking within subsets of data. Here we partition by region:

SELECT region,
       salesperson_id,
       sales_amount,
       RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS regional_rank
FROM sales;

This will rank salespeople within each region separately.

ROW_NUMBER and its uniqueness

ROW_NUMBER assigns a distinct number to each row, which can be useful for pagination or where a strict sequence is required:

SELECT salesperson_id,
       sales_amount,
       ROW_NUMBER() OVER (ORDER BY sales_amount DESC) AS row_number
FROM sales;

Each salesperson will have a unique row number, regardless of ties in sales_amount.

Advanced Use Cases

Let’s combine these functions with other SQL features for more advanced analyses:

WITH ranked_sales AS (
    SELECT region,
           salesperson_id,
           sales_amount,
           RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS rank,
           DENSE_RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS dense_rank,
           ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS row_number
    FROM sales
)
SELECT *
FROM ranked_sales
WHERE rank <= 3;

This CTE (Common Table Expression) allows us to filter and retrieve only the top 3 ranked sales within each region.

Combining with JOINs

You can also join ranked results with other tables to enrich your data:

WITH ranked_products AS (
    SELECT product_id,
           sales_amount,
           RANK() OVER (ORDER BY sales_amount DESC) AS rank
    FROM sales
)
SELECT p.name, rp.sales_amount, rp.rank
FROM ranked_products rp
JOIN products p ON rp.product_id = p.id
WHERE rp.rank <= 5;

Here, we’re joining the top 5 ranked products with the products table to get product names alongside their sales rank.

Conclusion

RANK, DENSE_RANK, and ROW_NUMBER are incredibly versatile functions in PostgreSQL that help in ranking and distributing rows across various criteria. By incorporating these functions into your SQL queries, you can solve complex data problems with ease and deliver insightful business intelligence data.