SQLAlchemy: Group by day, week, month, year etc.

Updated: January 3, 2024 By: Guest Contributor Post a comment

Introduction

SQLAlchemy as an ORM (Object Relational Mapper) facilitates the interaction between Python programs and databases. One of its core functionalities is the ability to group rows by specific criteria and then perform some level of aggregate computation. In the context of temporal data, being able to group by various time increments (day, week, month, year etc.) is particularly useful for conducting time-series analysis and generating reports.

Grouping data is essential when you need to aggregate records across different time frames. Understanding how to group by day, week, month, or year in SQLAlchemy can help manage and analyze temporal data more efficiently.

Let’s explore through a selection of examples, how to implement this in SQLAlchemy with increasing complexity.

Examples

Basic Group by Day

from sqlalchemy import create_engine, func
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import extract
from myapp.models import MyTable

engine = create_engine('sqlite:///mydatabase.db')
session = sessionmaker(bind=engine)()

# Basic group by day
query = session.query(extract('day', MyTable.date_column), func.count('*'))
                .group_by(extract('day', MyTable.date_column))

for day, count in query:
    print(f'Day: {day}, Count: {count}')

Group by Week

from datetime import datetime, timedelta

# Helper function to get the start of the week
def start_of_week(date):
    return date - timedelta(days=date.weekday())

# Group by week
query = session.query(func.min(MyTable.date_column).label('week_start'), func.count('*'))
                .group_by(func.strftime('%Y-%W', MyTable.date_column))

for week_start, count in query:
    print(f'Week starting on: {week_start.strftime('%Y-%m-%d')}, Count: {count}')

Group by Month

# Group by month
query = session.query(extract('month', MyTable.date_column), func.count('*'))
                .group_by(extract('month', MyTable.date_column))

for month, count in query:
    print(f'Month: {month}, Count: {count}')

Group by Year

# Group by year
query = session.query(extract('year', MyTable.date_column), func.count('*'))
                .group_by(extract('year', MyTable.date_column))

for year, count in query:
    print(f'Year: {year}, Count: {count}')

Advanced Time Grouping Techniques

Now that we’ve covered the basics, let’s address more complex scenarios, like how to group by custom periods or how to combine multiple time frames.

Custom Time Grouping

# Custom time period grouping
query = session.query(
    func.strftime('%Y-%m', func.date(MyTable.date_column, '-15 days')), func.count('*')
).group_by(
    func.strftime('%Y-%m', func.date(MyTable.date_column, '-15 days'))
)

for period, count in query:
    print(f'Period: {period}, Count: {count}')

Combining Group By Levels

# Combining year and month
query = session.query(
    func.strftime('%Y-%m', MyTable.date_column), func.count('*')
).group_by(
    extract('year', MyTable.date_column),
    extract('month', MyTable.date_column)
)

for year_month, count in query:
    print(f'Year-Month: {year_month}, Count: {count}')

Handling Timezones

Dealing with timezones can have a significant impact on how dates are grouped. SQLAlchemy allows some sophistication in this area too.

# Assuming MyTable.date_column has timezone information
from sqlalchemy import DateTime

# This resolves the date to a specific timezone
query = session.query(
    func.timezone('UTC', func.timezone('America/New_York', MyTable.date_column)),
    func.count('*')
).group_by(
    func.date_trunc('day', func.timezone('UTC', func.timezone('America/New_York', MyTable.date_column)))
)

for day, count in query:
    print(f'Day: {day}, Count: {count}')

Conclusion

In this tutorial, we have seen how essential it is to understand grouping mechanisms in SQLAlchemy, especially when dealing with temporal data. By progressing from simple day-based grouping to more complex structures, including custom time frames and combining dimensions, you now have a robust foundation for performing any temporal data analysis required in your projects.