MongoDB: Grouping documents by date (day, month, year)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, the popular NoSQL database, offers robust features for aggregating and grouping documents. In this tutorial, we’ll focus on how to use the aggregation framework to group documents by date intervals – day, month, and year. Whether you’re generating reports or summarizing data, understanding these techniques is essential for effectively querying date-based information in MongoDB.

Basics of Date Grouping in MongoDB

To start grouping documents by date, you’ll need a collection with documents that contain a date field. Let’s say we have a ‘transactions’ collection where each document has a ‘timestamp’ field:

{
  "_id": ObjectId("xyz123"),
  "amount": 250,
  "timestamp": ISODate("2023-03-01T08:00:00Z")
}

Our goal here is to group these transactions by their timestamp. Aggregation in MongoDB can be broken down into pipelines that transform the data step by step. Here’s the most basic aggregation pipeline for grouping by day:

db.transactions.aggregate([
  {
    $group: {
      _id: {
        day: { $dayOfMonth: "$timestamp" },
        month: { $month: "$timestamp" },
        year: { $year: "$timestamp" }
      },
      totalAmount: { $sum: "$amount" }
    }
  }
]);

Output:

[{ “_id”: { “day”: 1, “month”: 3, “year”: 2023 }, “totalAmount”: 250 }]

This aggregation pipeline groups all transactions that occurred on the same day and calculates the total amount of money exchanged on each day.

Advanced Grouping Features

While grouping documents by exact dates can be useful, sometimes you may need more generalized groupings like all transactions in a specific month regardless of year, or during a particular year. The following examples demonstrate advanced grouping:

Grouping by Month Regardless of Year

db.transactions.aggregate([
  {
    $group: {
      _id: { $month: "$timestamp" },
      count: { $sum: 1 }
    }
  },
  {
    $sort: { "_id": 1 }
  }
]);

If you want to group documents by the year, you’d use the ‘$year’ operator instead of ‘$month’.

Grouping by Year

db.transactions.aggregate([
  {
    $group: {
      _id: { $year: "$timestamp" },
      count: { $sum: 1 }
    }
  },
  {
    $sort: { "_id": 1 }
  }
]);

Complex Grouping

For more complex use cases, you could group by week numbers, fiscal quarters, or any arbitrary date range. Additionally, you can project additional fields before the grouping to add further context:

Grouping by Financial Quarters

db.transactions.aggregate([
  {
    $project: {
      quarter: {
        $let: {
          vars: {
            month: { $month: "$timestamp" }
          },
          in: {
            $switch: {
              branches: [
                { case: { $lte: ["$month", 3] }, then: 1 },
                { case: { $and: [{ $gte: ["$month", 4] }, { $lte: ["$month", 6] }] }, then: 2 },
                { case: { $and: [{ $gte: ["$month", 7] }, { $lte: ["$month", 9] }] }, then: 3 },
                { case: { $gte: ["$month", 10] }, then: 4 }
              ],
              default: 0
            }
          }
        }
      },
      amount: 1
    }
  },
  {
    $group: {
      _id: "$quarter",
      totalAmount: { $sum: "$amount" }
    }
  }
]);

Using the projections and case statements, transactions are grouped by financial quarters with a total sum for each quarter.

Use Case: Grouping Sales Data

Let’s look at a realistic scenario where a retail company may want to analyze its sales data over time. It could be interested in grouping sales by day or comparing month-over-month performance throughout the year:

Grouping Sales by Day

// Grouping sales per day for the current month
var currentMonth = new Date().getMonth() + 1;
db.sales.aggregate([
  {
    $match: {
      $expr: {
        $eq: [{ $month: "$date" }, currentMonth]
      }
    }
  },
  {
    $group: {
      _id: { $dayOfMonth: "$date" },
      totalSales: { $sum: "$amount" }
    }
  },
  {
    $sort: { "_id": 1 }
  }
]);

Handling Time Zones

One common issue when grouping by dates is dealing with time zones. Data may be stored in UTC, but users may need insights according to their local time zone. Below are techniques for addressing time zone differences:

Accounting for Time Zone

// Assuming an offset of -5 hours from UTC for EST
var timezoneOffset = -5 * 60;
db.transactions.aggregate([
  {
    $addFields: {
      "localTime": {
        $toDate: {
          $subtract: [
            { $toLong: "$timestamp" },
            timezoneOffset * 60 * 1000
          ]
        }
      }
    }
  },
  ...
  // Rest of the aggregation pipeline using 'localTime' instead of 'timestamp'
]);

Performance Considerations

Aggregating large datasets can be demanding on resources. Where possible, use indexes to speed up operations, and try to reduce the working set of documents with ‘$match’ stages early in the pipeline. Additionally, considering using ‘$facet’ for parallelizing parts of the computation when aggregating diverse datasets.

Conclusion

Grouping documents by date in MongoDB is a powerful technique that can be customized to fit various use cases. By understanding and utilizing the aggregation framework effectively, you can manipulate and interpret your time-series data for a wide range of analytical purposes.