Using $group aggregation stage in MongoDB (with examples)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB is a NoSQL database that provides a high level of flexibility and scalability for managing large amounts of unstructured data. One of the most powerful features of MongoDB is its aggregation framework, which allows users to perform complex data processing and analysis by defining a pipeline of operations. At the heart of this framework lies the $group stage, which is used to group input documents by a specified identifier and apply accumulators to perform various operations on the grouped data.

This tutorial will dive deep into the $group aggregation stage in MongoDB. We will begin with basic concepts, gradually moving to more complex operations, complete with code examples and their respective outputs.

Understanding the $group Stage

The $group stage is akin to the GROUP BY clause in SQL, used for grouping documents according to one or more fields. The grouped documents can then be processed using accumulator operators like $sum, $avg, $max, and more. The syntax of the $group stage is:

{
    $group: {
        _id: <expression>, // Field to group by
        <field1>: { <accumulator1> : <expression1> },
        ...
        <fieldN>: { <accumulatorN> : <expressionN> }
    }
}

Here, _id is mandatory and specifies the group identifier. Each field after _id applies an accumulator to the collection of documents that share the same group identifier.

Basic Usage of $group

Let’s start with a basic example of using the $group stage to group documents by a single field and count the number of documents in each group.

// Group by category and count products in each category
db.products.aggregate([
    {
        $group : {
            _id : '$category', // Group by 'category' field
            count: { $sum: 1 } // Count documents in each category
        }
    }
]);

In the example above, the documents in the products collection are grouped by the category field. The accumulator $sum is used to add 1 for each document encountered, effectively counting the number of documents in each category.

Grouping by Multiple Fields

You can also group documents by multiple fields by using compound keys in the _id field as shown in the example below:

// Group by category and tags
mdb.products.aggregate([
    {
        $group : {
            _id : { category: '$category', tags: '$tags' },
            count: { $sum: 1 }
        }
    }
]);

In the updated query, documents are now grouped by both category and tags, resulting in a count of documents for every unique combination of category and tags.

Using Accumulators in $group

Accumulators are operations that process data (e.g., sums, averages) for documents within a group. Here’s an example where we calculate the average price of products in each category:

// Calculate average price for each category
db.products.aggregate([
    {
        $group: {
            _id: '$category',
            averagePrice: { $avg: '$price'}
        }
    }
]);

The $avg accumulator calculates the average of all the price fields within each group identified by category.

Combining Multiple Accumulators

You can also combine multiple accumulators within the same $group stage to produce multiple calculations. For example, you can calculate the average, maximum, and minimum price for each category:

// Combine multiple accumulators
mdb.products.aggregate([
    {
        $group: {
            _id: '$category',
            averagePrice: { $avg: '$price'},
            maxPrice: { $max: '$price'},
            minPrice: { $min: '$price'}
        }
    }
]);

This will output documents with the category field as the identifier and fields for the average, maximum, and minimum prices calculated for each category.

Advanced Grouping with Arbitrary Expressions

Besides grouping by direct field references, $group can use expressions that transform the input values. For instance, you could create groups based on the length of a string field:

// Group by string length of the 'name' field
db.products.aggregate([
    {
        $group: {
            _id: { $strLenCP: '$name' }, // Uses a string expression
db.products.aggregate([
    {
        $group: {
            _id: { length: { $strLenCP: '$name' } },
            count: { $sum: 1 }
        }
    }
]);

This aggregation groups the documents based on the computed length of the names of the products and counts how many products have names of the same length.

Nesting ‘$group’ Stages

In some cases, you might need to perform multiple grouping stages consecutively. This can be useful for performing sub-aggregations within groups. Here’s an example:

// Perform nested grouping
mdb.products.aggregate([
    {
        $group: {
            _id: '$category',
            totalSales: { $sum: '$quantitySold' }
        }
    },
    {
        $group: {
            _id: null,
            totalRevenue: { $sum: '$totalSales' }
        }
    }
]);

The first group stage computes the total sales for each category. Then, another group stage without specifying an _id (or setting it to null) aggregates across the entire collection to find cumulative sales across all categories.

Conclusion

MongoDB’s $group aggregation stage is a versatile tool that lets you perform a plethora of operations on your data collections. Whether you’re counting documents, calculating averages, or performing sub-aggregations, $group provides a convenient and powerful way to accomplish a wide variety of tasks. The examples provided illustrate only a fraction of what’s possible, enabling you to build upon these foundations for more complex data analysis and manipulation.