Sling Academy
Home/Python/PyMongo: Grouping documents by day/month/year

PyMongo: Grouping documents by day/month/year

Last updated: February 08, 2024

Overview

Understanding how to effectively group documents by day, month, or year with PyMongo can significantly enhance your ability to analyze time-series data or perform aggregations based on temporal attributes. In MongoDB, the aggregation framework provides a powerful way to process data and perform complex transformations. This tutorial will guide you through various examples, moving from basic techniques to more advanced approaches.

Prerequisites

Before diving into the examples, ensure you have:

  • MongoDB installed and running on your machine or server.
  • The PyMongo driver installed in your Python environment. You can install it using pip install pymongo.
  • A basic understanding of MongoDB and Python.

Basic Document Grouping by Day

To begin grouping documents by their dates, consider a collection named events that contains documents with an eventDate field. Here’s how to aggregate documents by the day:

from pymongo import MongoClient
from datetime import datetime

# Establish connection to the MongoDB server
client = MongoClient('mongodb://localhost:27017/')
db = client['your_db_name']
collection = db['events']

# Construct the pipeline
pipeline = [
    {
        '$match': {
            'eventDate': {
                '$gte': datetime(2023, 1, 1),
                '$lte': datetime(2023, 12, 31)
            }
        }
    },
    {
        '$group': {
            '_id': {
                '$dayOfMonth': '$eventDate'
            },
            'count': {
                '$sum': 1
            }
        }
    },
    {
        '$sort': {
            '_id': 1
        }
    }
]

result = collection.aggregate(pipeline)

for doc in result:
    print(doc)

This simple aggregation pipeline filters documents by eventDate within a specific range and groups them by day of the month, counting the number of documents for each day. The results are then sorted by the day.

Grouping by Month

To group documents by month, you adjust the _id field in the $group stage to use $month instead:

pipeline = [
    {
        '$match': {
            'eventDate': {
                '$gte': datetime(2023, 1, 1),
                '$lte': datetime(2023, 12, 31)
            }
        }
    },
    {
        '$group': {
            '_id': {
                '$month': '$eventDate'
            },
            'count': {
                '$sum': 1
            }
        }
    },
    {
        '$sort': {
            '_id': 1
        }
    }
]

result = collection.aggregate(pipeline)

for doc in result:
    print(doc)

This modification groups the documents by month, still using the eventDate field to determine the grouping criterion.

Grouping by Year

In this example, the events are grouped by year only, ignoring the month and day. This will provide a count of events for each year within the specified date range.

pipeline = [
    {
        '$match': {
            'eventDate': {
                '$gte': datetime.now().replace(year=datetime.now().year - 1),
                '$lte': datetime.now()
            }
        }
    },
    {
        '$group': {
            '_id': {
                'year': {'$year': '$eventDate'}
            },
            'count': {'$sum': 1}
        }
    },
    {
        '$sort': {
            '_id.year': 1
        }
    }
]

result = collection.aggregate(pipeline)

for doc in result:
    print(doc)

Advanced Grouping: Year, Month, and Day

For more detailed analysis, you might want to group documents not just by day or month, but by both, alongside their year. To accomplish this, you combine multiple date operators in the _id field during the $group stage:

pipeline = [
    {
        '$match': {
            'eventDate': {
                '$gte': datetime.now().replace(year=datetime.now().year - 1),
                '$lte': datetime.now()
            }
        }
    },
    {
        '$group': {
            '_id': {
                'year': {'$year': '$eventDate'},
                'month': {'$month': '$eventDate'},
                'day': {'$dayOfMonth': '$eventDate'}
            },
            'count': {'$sum': 1}
        }
    },
    {
        '$sort': {
            '_id.year': 1,
            '_id.month': 1,
            '_id.day': 1
        }
    }
]

result = collection.aggregate(pipeline)

for doc in result:
    print(doc)

This pipeline filters documents within the last year from the current date and groups them by year, month, and day. It provides a fine-grained view of the document distribution over time.

Conclusion

Grouping documents by day, month, or year with PyMongo is a powerful technique to summarize and analyze time-based data. By carefully constructing aggregation pipelines, you can derive meaningful insights from your MongoDB collections. Always consider your specific data schema and analysis needs when building these pipelines.

Next Article: PyMongo: Saving and Querying Location Data (Latitude, Longitude)

Previous Article: PyMongo: How to save datetime with timezone

Series: Data Persistence in Python – Tutorials & Examples

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots