MySQL: How to Select DISTINCT Rows

Updated: January 25, 2024 By: Guest Contributor Post a comment

Introduction

In MySQL, retrieving distinct rows from a table is essential for various analytical tasks, such as counting the number of unique visitors on a website, or simplifying results to avoid data redundancy. The DISTINCT keyword in MySQL helps perform this action with ease. In this tutorial, we will discuss how to use the DISTINCT clause to select unique records, through a step-by-step approach, from basic to advanced queries, complemented by clear code examples and expected outputs.

Understanding the DISTINCT Clause

Before delving into examples, it’s crucial to grasp what the DISTINCT clause in MySQL does. When included in a SELECT statement, it instructs MySQL to return only unique rows by comparing each row with all the other rows and eliminating duplicates.

SELECT DISTINCT column1, column2 FROM table_name;

In this syntax, column1, column2 are the names of the columns from which you want to retrieve unique values. If multiple columns are specified, the combination of values in these columns must be unique.

Simple Usage of DISTINCT

Let’s start with a basic example using the DISTINCT clause. Imagine we have a table users which has a column called country. We want to know the unique countries from which users come.

SELECT DISTINCT country FROM users;

This query will return a list of all unique countries present in the users table.

Selecting Distinct Rows with Multiple Columns

Now, if we want to get a list of unique combinations of countries and cities from the users table, we would write:

SELECT DISTINCT country, city FROM users;

This query will return each unique country and city pair from the users table.

Using DISTINCT with Aggregate Functions

The DISTINCT clause is particularly useful when used with aggregate functions like COUNT(), SUM(), AVG(), etc. For instance, to count the unique number of countries represented by our users:

SELECT COUNT(DISTINCT country) FROM users;

This query will provide the count of distinct countries from the users table.

Advanced Usage of DISTINCT

Distinct in Joined Tables

It is also possible to select distinct rows across multiple tables using JOIN operations. Suppose we have another table orders which has a column user_id that references users. To get distinct users that have made an order, we would write:

SELECT DISTINCT users.user_id, users.name FROM users JOIN orders ON users.user_id = orders.user_id;

Using DISTINCT in Subqueries

Subqueries can also benefit from the use of DISTINCT. For example, you might need a list of countries that have users who’ve made an order:

SELECT country FROM users WHERE user_id IN (SELECT DISTINCT user_id FROM orders);

Order by and Distinct

Combining ORDER BY with DISTINCT can help get unique rows in a specific sort order:

SELECT DISTINCT country FROM users ORDER BY country;

This will return a sorted list of unique countries.

Performance Considerations

While DISTINCT is powerful, it can be costly in terms of performance, especially on large datasets where it may lead to a full table scan. Use it judiciously and consider indexing columns used with DISTINCT.

Moreover, if you’re using DISTINCT on a column that has a very high number of unique values (such as timestamps), it’s even more important to monitor and optimize the query’s performance.

Dealing with NULL values

In SQL, NULL represents an undefined or missing value. It’s important to note that DISTINCT considers each NULL as a unique value. To treat all NULLs as the same value, you might need to use additional conditional logic in your SQL queries.

SELECT DISTINCT country, IF(city IS NULL, 'Unknown', city) FROM users;

This will return unique country and city pairs, but treat all NULL cities as ‘Unknown’.

Conclusion

In this tutorial, we’ve explored the various ways to use the MySQL DISTINCT clause to select unique rows, from basic queries to more complex ones involving multiple tables and aggregate functions. Always remember to take into account the performance implications of using DISTINCT and to treat NULL values appropriately for your specific use case.