4 Ways to Select Distinct Rows in PostgreSQL

Updated: January 4, 2024 By: Guest Contributor Post a comment

Introduction

Eliminating duplicate rows from result sets is a common task in SQL programming, and PostgreSQL offers multiple methods for selecting distinct rows. This article explores various strategies for achieving this goal and outlines the advantages and potential drawbacks of each approach.

Using DISTINCT Clause

The DISTINCT clause is a straightforward way to select unique rows in a PostgreSQL query. It operates by comparing the results row by row, excluding duplicates.

Steps:

  1. Write a SELECT statement specifying the columns you want to retrieve.
  2. Insert the DISTINCT keyword right after SELECT.
  3. Execute the query to get the distinct results.

Example:

SELECT DISTINCT column1, column2 FROM table_name;

Pros: Easy to use; Great for simple queries; No need for additional constructs.

Cons: Can be slow with large datasets; might not be practical when distinct combinations involve many columns.

Using GROUP BY Clause

The GROUP BY clause groups identical data into summary rows and can be used to find rows with distinct column values when not used with aggregate functions.

Steps:

  1. Write a SELECT statement indicating the columns that should be grouped.
  2. Include the GROUP BY clause followed by the columns you used in your SELECT.
  3. Run the query to obtain rows with distinct combinations of the grouped columns.

Example:

SELECT column1, column2 FROM table_name GROUP BY column1, column2;

Pros: Useful for complex queries; Can be combined with aggregate functions.

Cons: Might be less intuitive than DISTINCT; Similar performance trade-offs with large datasets.

Using DISTINCT ON Expression

PostgreSQL’s DISTINCT ON expression is more flexible than the simple DISTINCT clause. You can use it to select distinct rows based on specific column(s), not necessarily all output columns.

Steps:

  1. Use SELECT DISTINCT ON and specify the column(s) for uniqueness.
  2. List the rest of the columns you need in your SELECT.
  3. Add ORDER BY clause to ensure the correct row is selected in case of duplicates.
  4. Execute your complete query.

Example:

SELECT DISTINCT ON (column1) column1, column2, column3 FROM table_name ORDER BY column1, column2;

Pros: More control over which duplicates to remove; Works well when you only need uniqueness based on specific column(s).

Cons: Must be used with an ORDER BY clause; Could be less performing on large datasets.

Window Functions

Window functions can partition data into groups, and when combined with a uniqueness constraint, they can select distinct rows.

Steps:

  1. Start by selecting all columns you want from the table.
  2. Use a window function like ROW_NUMBER() over a partition of the distinct columns.
  3. Wrap your selection within a subquery.
  4. Filter on the row number being 1.

Example:

SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2) AS rn FROM table_name
) subquery WHERE rn = 1;

Pros: Offers fine-grained control and is efficient, especially with indexes.

Cons: More complex syntax; Might be overkill for simple uniqueness requirements.

Conclusion

PostgreSQL provides several options for selecting distinct rows in a database, each with its own use cases and performance considerations. For simple scenarios, DISTINCT or GROUP BY may be sufficient and easy to implement. When needing a more refined approach, particularly over specific columns, DISTINCT ON is the way to go. Meanwhile, window functions offer the highest flexibility and efficiency, suitable for complex queries. Your choice will depend on the dataset size, table structure, and the query complexity.