PostgreSQL: How to select distinct on multiple columns

Introduction
Understanding SELECT DISTINCT
Select Distinct on Multiple Columns
Using DISTINCT ON Expression
Working with Complex Queries
Advanced Use Cases
Advanced Filtering and Sorting
Joining Tables with Distinct Values
Performance Considerations
Practical Examples
Conclusion

Introduction

In the realm of SQL databases, fetching distinct records is a common task. In PostgreSQL, the SELECT DISTINCT command is a go-to solution for eliminating duplicate rows. This tutorial delves into how to use SELECT DISTINCT on multiple columns effectively.

Understanding SELECT DISTINCT

Before diving into multiple columns, let’s grasp the basic concept of the SELECT DISTINCT command. The DISTINCT clause in PostgreSQL is used to return only distinct (different) values. In a table, a column may contain duplicate values; and sometimes, you only want to list the different (unique) values.

SELECT DISTINCT column_name FROM table_name;

This command retrieves all unique values from the column_name column of the table_name table.

Select Distinct on Multiple Columns

Now, let’s extend this idea to multiple columns. Here’s how to select distinct values from more than one column:

SELECT DISTINCT column1, column2 FROM table_name;

This SQL query will return unique combinations for column1 and column2.

Using DISTINCT ON Expression

In PostgreSQL, there’s also the DISTINCT ON syntax, which allows you to select rows that are distinct based on specific columns.

SELECT DISTINCT ON (column1) column1, column2, ... FROM table_name ORDER BY column1, column2, ...;

The above query retrieves distinct rows based on column1 while selecting other columns as well.

Working with Complex Queries

What if you need to select distinct combinations along with some aggregate functions? You can use the GROUP BY clause to group the data and perform aggregates.

SELECT column1, column2, COUNT(column3) FROM table_name GROUP BY column1, column2;

This will list out unique combinations of column1 and column2, and also provide a count of column3.

Advanced Use Cases

For advanced scenarios, such as when you want to select distinct combinations based on conditions, the WHERE clause comes into play.

SELECT DISTINCT column1, column2 FROM table_name WHERE condition;

It’s also common to combine DISTINCT with window functions or subqueries to achieve more complex filtering:

SELECT DISTINCT ON (column1) column1, column2, FIRST_VALUE(column3) OVER (PARTITION BY column1 ORDER BY column2) FROM table_name;

This produces a list of distinct column1 values with the corresponding first value of column3, partitioned by column1.

Advanced Filtering and Sorting

Having distinct values is one thing, but what if you need distinct rows sorted in a particular order? This is where the ORDER BY comes into effect:

SELECT DISTINCT ON (column1) column1, column2 FROM table_name ORDER BY column1, column2 DESC;

This query not only selects distinct rows based on column1, but it also sorts the results by column1 in ascending order and column2 in descending order.

Joining Tables with Distinct Values

Often, you will need to join tables and still get distinct values. In such cases, the combination of JOIN and DISTINCT is inevitable:

SELECT DISTINCT a.column1, b.column2 FROM table1 a JOIN table2 b ON a.common_column = b.common_column;

This query returns unique rows from a join operation combining table1 and table2 where the common_column matches.

Performance Considerations

It’s important to note that using SELECT DISTINCT, especially on large datasets, can be resource-intensive, as PostgreSQL has to sort and compare the data to provide the distinct result set. Careful indexing and query optimization can help mitigate some performance impacts.

Practical Examples

To solidify understanding, let’s go through some practical examples with fictional database tables.

/* Example 1 */
SELECT DISTINCT ON (employee.department) employee.department, employee.salary FROM employee ORDER BY employee.department, employee.salary DESC;

/* Example 2 */
SELECT column1, column2, COUNT(*) AS occurrence FROM (SELECT DISTINCT column1, column2 FROM table_name) AS distinct_table GROUP BY column1, column2 ORDER BY occurrence DESC;

Conclusion

PostgreSQL’s powerful SELECT DISTINCT command simplifies the retrieval of unique values from a database. Whether you’re working with a single column or multiple columns, understanding the nuances of distinct selections can enhance your data analysis and reporting efforts. Remember that good database structure and indexing play a crucial role in maintaining performance while using these techniques.

Next Article: PostgreSQL: ASC sorting, but NULL values first/last

Previous Article: Best practices to name tables and columns in PostgreSQL

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL