How to Select Distinct Rows in SQLAlchemy

Updated: January 3, 2024 By: Guest Contributor Post a comment

Introduction

Removing duplicates from query results is a common necessity in data manipulation and analysis. SQLAlchemy, a widely used ORM for Python, provides an elegant way to select distinct rows from a database. In this tutorial, we’ll cover how to use SQLAlchemy to write queries that return unique records.

Understanding Distinct in SQL

In SQL, the DISTINCT keyword is used to return unique records from a table across specified columns. It eliminates duplicate rows in the result set. A basic SQL distinct query looks like:

SELECT DISTINCT column_name FROM table_name;

This translates to SQLAlchemy in several ways, as we’ll explore.

Basic Distinct Query in SQLAlchemy

Let’s start with a simple example. Assume you have a model User with a column name. To select distinct names:

from sqlalchemy import create_engine, distinct
from sqlalchemy.orm import sessionmaker
from yourapplication.database.models import User

engine = create_engine('sqlite:///yourdatabase.db')
Session = sessionmaker(bind=engine)
session = Session()

unique_names = session.query(distinct(User.name)).all()
for name in unique_names:
    print(name)

Select Distinct Across Multiple Columns

You can also select distinct records across multiple columns. Consider you want to select distinct combinations of name and email:

unique_name_email = session.query(distinct(User.name, User.email)).all()
for record in unique_name_email:
    print(record)

Using Distinct with Filters

Applying filters to queries with distinct is straightforward. Here, we get distinct user names with a specific role:

unique_admins = session.query(distinct(User.name)).filter(User.role == 'admin').all()
for admin in unique_admins:
    print(admin)

Chaining Distinct with Order By

You can combine distinct with order_by to sort the unique results:

sorted_unique_names = session.query(distinct(User.name)).order_by(User.name).all()
for name in sorted_unique_names:
    print(name)

Distinct with Joins

Joining tables while selecting distinct records is a bit more complex:

from yourapplication.database.models import Order

unique_user_orders = session.query(distinct(User.name)).join(Order).filter(Order.status == 'completed').all()
for name_order in unique_user_orders:
    print(name_order)

Using Func for Distinct Counts

Sometimes, you’d like to get a count of distinct records:

from sqlalchemy import func

count_distinct_names = session.query(func.count(distinct(User.name))).scalar()
print(f'There are {count_distinct_names} distinct user names.')

Advanced Usage: Distinct on Subqueries

For complex queries, you might need to get distinct results from a subquery:

subquery = session.query(User.name).filter(User.role == 'customer').subquery()
unique_customer_names = session.query(distinct(subquery.c.name)).all()
for name in unique_customer_names:
    print(name)

Hybrid Approaches

Combining distinct with group by can be useful:

from sqlalchemy import and_

unique_name_type = session.query(distinct(User.name), User.account_type).group_by(User.name, User.account_type).having(func.count(User.id) > 1).all()
for result in unique_name_type:
    print(result)

Understanding Query Outputs

When you execute a distinct query, SQLAlchemy returns a list of tuples, with each tuple representing a unique combination of values from the selected columns. It’s important to parse these tuples appropriately in your application.

Error Handling

It’s good practice to handle potential exceptions that may arise from database operations:

try:
    unique_names = session.query(distinct(User.name)).all()
except Exception as e:
    print(f'An error occurred: {e}')
finally:
    session.close()

Conclusion

In this guide, we’ve seen how to leverage SQLAlchemy’s distinct capabilities to select unique rows from a database. You should now be comfortable writing basic to advanced distinct queries using SQLAlchemy. Selecting distinct records is vital for accurate data analysis and with the power of SQLAlchemy, you can write concise and efficient queries to get these results.