SQLAlchemy Aggregation: Min, Max, Average, and Sum

Updated: January 3, 2024 By: Guest Contributor Post a comment

Introduction

SQLAlchemy offers a robust toolkit for working with aggregates in databases. In this tutorial, we’ll explore how to apply aggregation functions such as Min, Max, Average, and Sum in SQLAlchemy, using various examples to illustrate the process.

Setting Up the Environment

Before we can perform any aggregation, we need to set up an environment with SQLAlchemy. Below we walk through setting up an engine, session, and defining a basic model for use in our examples.

from sqlalchemy import create_engine, Column, Integer, Float, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///:memory:')
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()

# Define a simple Product model
class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    price = Column(Float)
Base.metadata.create_all(engine)

Basic Aggregations

Let’s begin by looking at some simple aggregate operations, first starting with the minimum price of all products.

min_price = session.query(func.min(Product.price)).scalar()
print('Minimum product price:', min_price)

Similarly, for finding the maximum product price:

max_price = session.query(func.max(Product.price)).scalar()
print('Maximum product price:', max_price)

Calculating the average price:

avg_price = session.query(func.avg(Product.price)).scalar()
print('Average product price:', avg_price)

And fetching the sum of all product prices:

total_price = session.query(func.sum(Product.price)).scalar()
print('Total price of all products:', total_price)

Grouping and Aggregating

Aggregates are often used with group by statements to calculate aggregates in defined groups. Below, we calculate the total number of products in each price bracket.

brackets = session.query(Product.price, func.count(Product.id)).group_by(Product.price).all()
for bracket in brackets:
    print(f'Price: {bracket[0]}, Count: {bracket[1]}')

Joining, Filtering, and Aggregating

As our queries become more complex, we may need to join tables, filter results, and still perform aggregations. Say we have another table Order and we wish to calculate the total sales per product:

class Order(Base):
    __tablename__ = 'orders'
    id = Column(Integer, primary_key=True)
    product_id = Column(Integer, ForeignKey('products.id'))
    quantity = Column(Integer)

# Now assume we have data populated in both the Product and Order tables.

# Let's find the total sales per product:
total_sales = session.query(
    Product.name,
    func.sum(Product.price * Order.quantity).label('total_sales')
).join(Order, Order.product_id == Product.id)
.group_by(Product.name).all()

for product in total_sales:
    print(f'Product: {product.name}, Total Sales: {product.total_sales}')

Advanced Aggregations: Hybrid Properties and Expressions

In advanced usage, you may find it beneficial to define aggregates on the model class itself. We’ll use a hybrid property to calculate the average cost of goods sold.

from sqlalchemy.ext.hybrid import hybrid_property, hybrid_method

class Product(Base):
    # ... existing model fields
    
    @hybrid_property
    def avg_cost(self):
        return self.price * SOME_FIXED_FACTOR

# Query using the hybrid property:
avg_cost_query = session.query(Product.name, Product.avg_cost.label('average_cost')).all()

for product in avg_cost_query:
    print(f'Product: {product.name}, Average Cost: {product.average_cost}')

Combining Aggregates with Case Statements

SQLAlchemy also allows for the use of case statements within aggregates, providing conditional aggregation. Here’s an example of how to use a case statement to categorize sales as either ‘high’ or ‘low.’

from sqlalchemy import case

sales_categories = session.query(
    Product.name,
    func.sum(case([(Order.quantity > 10, Order.quantity)], else_=0)).label('high_sales'),
    func.sum(case([(Order.quantity <= 10, Order.quantity)], else_=0)).label('low_sales')
).join(Order, Order.product_id == Product.id)
.group_by(Product.name).all()

for product in sales_categories:
    print(f"Product: {product.name}, High Sales: {product.high_sales}, Low Sales: {product.low_sales}")

Conclusion

We’ve explored a variety of aggregation functions in SQLAlchemy, from basics like min and max to more complex operations involving grouping, joining, and hybrid properties. Mastering these techniques can provide powerful insights into your dataset and greatly enhance your data manipulation capabilities within a SQLAlchemy context.