Using variables in MongoDB aggregation pipeline (with examples)

Introduction
Getting Started with Aggregation Pipeline
1. Basic Syntax of Aggregation Pipeline
Using Variables with $let
Advanced Variable Use Cases
Conditional Logic using Variables
Looping with $map and Variables
Application of Variables in Complex Pipelines
Conclusion

Introduction

Understanding the use of variables within a MongoDB aggregation pipeline is crucial for performing complex data operations and transformations. This tutorial will walk you through the core concepts and provide practical examples to help you leverage variables effectively in your data processing workflows.

Getting Started with Aggregation Pipeline

The aggregation pipeline is a powerful framework in MongoDB that enables you to perform data aggregation tasks. It’s analogous to a manufacturing assembly line, where your documents (data) pass through a series of stages, each applying an operation that transforms, filters, or aggregates the data.

Basic Syntax of Aggregation Pipeline

db.collection.aggregate([
    { <stage1> },
    { <stage2> },
    ...
])

In the aggregation pipeline, operators are key elements that define the stages. The documents are passed from one stage to the next unless filtered out. Without further ado, let’s dive into utilizing variables in these pipelines.

Using Variables with $let

The $let operator allows you to define variables for use within a specific expression or stage. This is akin to using variables in traditional programming to temporarily store data. Here’s a basic example of the $let operator in use:

db.collection.aggregate([
    {
        $project: {
            total: {
                $let: {
                    vars: {
                        quantity: '$quantity',
                        unitPrice: '$price'
                    },
                    in: { $multiply: ['$quantity', '$unitPrice'] }
                }
            }
        }
    }
])

This pipeline stage calculates the total price by multiplying quantity with unit price for each document, storing these in variables for easier reference.

Advanced Variable Use Cases

Moving on to more advanced use cases, let’s look at the utilization of variables across multiple stages:

db.collection.aggregate([
    {
        $project: {
            _id: 0,
            title: 1,
            totalQuantity: {
                $sum: '$items.quantity'
            }
        }
    },
    {
        $match: {
            totalQuantity: { $gt: 100 }
        }
    },
    {
        $addFields: {
            totalValue: {
                $let: {
                    vars: { totalQuantity: '$totalQuantity' },
                    in: { $multiply: ['$totalQuantity', 10] } // Assuming a fixed price of 10
                }
            }
        }
    }
])

This pipeline first calculates and thresholds by the totalQuantity of items and then calculates a total value in a separate stage, demonstrating how variables can span multiple stages in more complex pipelines.

Conditional Logic using Variables

Conditional logic can be implemented using variables that represent the conditional outcomes:

db.collection.aggregate([
    {
        $project: {
            item: 1,
            discountPrice: {
                $let: {
                    vars: {
                        price: '$price',
                        discountFactor: { $cond: { if: { $gte: ['$quantity', 100] }, then: 0.9, else: 1 } }
                    },
                    in: { $multiply: ['$price', '$discountFactor'] }
                }
            }
        }
    }
])

The above pipeline applies a discount only if the quantity is equal to or exceeds 100, using the variable discountFactor to determine the final price.

Looping with $map and Variables

Variables are also useful within array manipulations. The $map operator can be paired with the $let for processing elements of an array:

db.collection.aggregate([
    { 
        $addFields: {
            calculatedItems: {
                $map: {
                    input: '$items',
                    as: 'item',
                    in: {
                        $let: {
                            vars: {
                                total: { $multiply: ['$item.price', '$item.quantity'] }
                            },
                            in: {
                                name: '$item.name',
                                total: '$total'
                            }
                        }
                    }
                }
            }
        }
    }
])

In this example, we calculate a total for each item using variables before outputting a new array with transformed objects that include the name and calculated total.

Application of Variables in Complex Pipelines

In real-world scenarios, pipelines can get complex. Variables are a particularly powerful tool in these situations. Here’s a showcase of using variables where multiple operations are dependent on a computed value:

// Complex pipeline example
db.collection.aggregate([
    // other stages ...
    {
        $addFields: {
            totalSales: {
                $let: {
                    vars: {
                        prices: '$items.price',
                        quantities: '$items.quantity'
                    },
                    in: { $sum: {
                            $map: {
                                input: { $zip: { inputs: ['$prices', '$quantities'] } },
                                as: 'pair',
                                in: { $multiply: [{ $arrayElemAt: ['$pair', 0] }, { $arrayElemAt: ['$pair', 1] }] }
                            }
                        }
                    }
                }
            }
        }
    }
    // other stages ...
])

Here, we first delineate the prices and quantities arrays into variables, zip them together, then compute the total sales for each paired price and quantity, ultimately summing them up.

Conclusion

As we’ve seen throughout various examples, using variables in the MongoDB aggregation pipeline adds a layer of dynamism and readability to your queries. By understanding and applying these concepts, developers can write more maintainable and versatile data processing pipelines.

Next Article: Define and use variables in MongoDB Shell (mongosh)

Previous Article: MongoDB: Select/count distinct values in multiple fields

Series: MongoDB Tutorials

MongoDB