MongoDB: Grouping documents by multiple fields (with examples)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB is an incredibly powerful NoSQL database that provides flexibility and scalability for your applications’ data storage. One of the essential operations when working with data is the ability to categorize or group information. MongoDB’s aggregation framework is a robust toolkit that allows you to manipulate and transform documents in a collection. One common task within this framework is grouping documents by multiple fields.

In this tutorial, we will dive deep into how to group documents in MongoDB by more than one field using the $group stage of the aggregation pipeline. Along the way, we’ll cover various examples ranging from basic to advanced usage scenarios.

Setting Up the Dataset

To follow along with the examples in this tutorial, let’s consider we have a collection named transactions with the following schema:

{
  "_id": ObjectId("..."),
  "customerID": "xyz123",
  "amount": 500,
  "transactionDate": ISODate("2023-01-01T00:00:00Z"),
  "productCode": "A101"
}

We will use this sample transactions collection to demonstrate how to group documents.

Basic Grouping by a Single Field

Before we delve into grouping by multiple fields, let’s start with the basic group by a single field query. If we want to calculate the total amount of transactions per product code, our aggregation could look like this:

db.transactions.aggregate([
  {
    "$group": {
      "_id": "$productCode",
      "totalAmount": { "$sum": "$amount" }
    }
  }
]);

This query groups all documents that have the same productCode and then sums up their amounts. The output would group transactions per product and display the total amount:

[
  { "_id": "A101", "totalAmount": 1500 },
  { "_id": "A102", "totalAmount": 800 },
  // More grouped results...
]

Grouping Documents by Multiple Fields

The real power of the $group stage is seen when you start to group by more than one field. Let’s say you want to find the total amount of transactions per customer per year. This requires grouping by both customerID and the year from transactionDate. This can be achieved as follows:

db.transactions.aggregate([
  {
    "$group": {
      "_id": {
        "customerID": "$customerID",
        "year": { "$year": "$transactionDate" }
      },
      "totalAmount": { "$sum": "$amount" }
    }
  }
]);

The output would then display the total amount spent per customer per year:

[
  { "_id": { "customerID": "xyz123", "year": 2023 }, "totalAmount": 500 },
  { "_id": { "customerID": "abc456", "year": 2023 }, "totalAmount": 1200 },
  // More grouped results...
]

Advanced Grouping with Accumulators

MongoDB aggregation provides various accumulator operators that can be used when grouping. These accumulator operators allow for more complex calculations, such as averaging or pushing elements into an array. For instance, you can not only group by customer ID and year but also retain a list of all the unique product codes that the customer has purchased that year:

db.transactions.aggregate([
  {
    "$group": {
      "_id": {
        "customerID": "$customerID",
        "year": { "$year": "$transactionDate" }
      },
      "totalAmount": { "$sum": "$amount" },
      "productsPurchased": { "$addToSet": "$productCode" }
    }
  }
]);

The resulting output will be:

[
  { 
    "_id": { "customerID": "xyz123", "year": 2023 }, 
    "totalAmount": 500,
    "productsPurchased": ["A101"] 
  },
  { 
    "_id": { "customerID": "abc456", "year": 2023 }, 
    "totalAmount": 1200,
    "productsPurchased": ["A102", "A103"] 
  },
  // More grouped results...
]

Complex Grouping by Nested Fields

Grouping by nested fields in subdocuments becomes even more versatile. Consider a scenario where our transactions documents also have a nested field named details with a storeId field within it. To group by customerID, year, and also by storeId, we need to make sure to reference the nested field correctly:

db.transactions.aggregate([
  {
    "$group": {
      "_id": {
        "customerID": "$customerID",
        "year": { "$year": "$transactionDate" },
        "storeId": "$details.storeId"
      },
      "totalAmount": { "$sum": "$amount" }
    }
  }
]);

The potential output might be like:

[
  { "_id": { "customerID": "xyz123", "year": 2023, "storeId": "S001" }, "totalAmount": 500 },
  { "_id": { "customerID": "abc456", "year": 2023, "storeId": "S002" }, "totalAmount": 1200 },
  // Further grouped results...
]

Conclusion

Utilizing MongoDB’s aggregation framework, specifically the $group stage, offers tremendous capabilities for grouping documents by multiple fields. We’ve explored a variety of examples that showcased how to group documents by a single field, multiple fields, and even by nested fields, including the use of accumulator operators for enriched data manipulation. Mastering these techniques can provide significant insights into your data and can drive informed decision-making.