MongoDB: Using $pipeline in $lookup (for Complex Joins)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

In the world of NoSQL databases, MongoDB offers an impressive set of features that enable developers to work with data in a flexible and expressive way. One such feature is the $lookup operator, which provides a mechanism for performing join operations analogous to those in relational databases. With MongoDB 3.6 and higher, the power of $lookup has been significantly expanded by the addition of sub-pipelines within $lookup stages, which allows for the implementation of complex join conditions and transformations. In this tutorial, we will delve into how to utilize the $pipeline field of the $lookup operator to execute dynamic lookups and perform complex joins in MongoDB.

Understanding the Basics of $lookup

Before jumping into the complexities, let’s understand the basic form of $lookup. In its simplest format, $lookup can perform an equality match between a field from the documents in the collection on which you’re aggregating (the local field) and a field from the documents in the collection you’re joining with (the foreign field). The result is a new array of matched document(s) from the foreign collection inside each of the original documents.

db.orders.aggregate([
    {
        $lookup: {
            from: "items",
            localField: "itemId",
            foreignField: "_id",
            as: "itemDetails"
        }
    }
]);

This query would match an ‘itemId’ in the ‘orders’ collection with the ‘_id’ in the ‘items’ collection, embedding the matching ‘items’ documents inside the ‘order’ documents under the field ‘itemDetails’. It’s the MongoDB equivalent of a SQL LEFT OUTER JOIN.

Creating More Complex Joins Using $pipeline

Now, let’s enter the $pipeline option within $lookup. This feature allows you to go beyond simple equality matches and specify an aggregation pipeline that executes on the foreign collection. This pipeline defines more complex filtering, projection, and joining conditions, making $lookup a much more versatile operator.

db.orders.aggregate([
    {
        $lookup: {
            from: "items",
            let: { orderItemId: "$itemId", orderQty: "$quantity" },
            pipeline: [
                { $match: {
                    $expr: {
                        $and: [
                            { $eq: ["$_id",  "$orderItemId"] },
                            { $gte: ["$stockQty", "$orderQty"] }
                        ]
                    }
                }},
                { $project: { itemName: 1, price: 1 } }
            ],
            as: "itemDetails"
        }
    }
]);

This aggregation performs a more complex join that not only matches ‘itemId’ but also ensures that the quantity of the item in stock (‘stockQty’) is enough to satisfy the order quantity amount. The results include only the ‘itemName’ and ‘price’ fields for matching items. The $let operator allows us to declare variables that represent fields in the input documents, which we can then use in our pipeline stages that execute against the foreign collection.

Hands-on Examples of Using $pipeline in $lookup

Example 1: Filtering Joined Documents

db.orders.aggregate([
    {
        $lookup: {
            from: "customers",
            let: { orderId: "$_id", customerAge: 25 },
            pipeline: [
                { $match: {
                    $expr: {
                        $and: [
                            { $eq: ["$orderId", "$orders_id"] },
                            { $gt: ["$age", "$customerAge"] }
                        ]
                    }
                }},
                { $project: { _id: 0, name: 1, age: 1 } }
            ],
            as: "customerDetails"
        }
    }
]);

In this first example, we are performing a lookup from the orders collection to the customers collection to match orders with customers older than 25. We are projecting only the relevant fields of name and age into the resulting customerDetails array, excluding the _id field.

Example 2: Chains of Lookups

db.orders.aggregate([
    {
        $lookup: {
            from: "items",
            let: { itemId: "$itemId" },
            pipeline: [
                { $match: { $expr: { $eq: ["$_id",  "$itemId"] } } },
                { $lookup: {
                    from: "suppliers",
                    let: { supplierId: "$supplierId" },
                    pipeline: [
                        { $match: { $expr: { $eq: ["$_id", "$supplierId"] } } },
                        { $project: { _id: 0, name: 1, rating: 1 } }
                    ],
                    as: "supplierDetails"
                }},
                { $project: { _id: 0, itemName: 1, price: 1, supplierDetails: 1 } }
            ],
            as: "itemDetails"
        }
    }
]);

In this sophisticated example, we incorporated a second lookup within the pipeline of the first. This allows us to not only join the items and orders but to further join the supplier details, effectively creating a chain of lookups. This is powerful for modeling relational data structures in a non-relational database like MongoDB.

Example 3: Grouping and Summarizing

db.orders.aggregate([
    {
        $lookup: {
            from: "items",
            let: { orderItemId: "$itemId" },
            pipeline: [
                { $match: { $expr: { $eq: ["$_id", "$orderItemId"] } } },
                { $group: {
                    _id: "$categoryId",
                    totalSales: { $sum: { $multiply: ["$price", "$quantityOrdered"] } }
                }}
            ],
                    as: "itemSummary"
        }
    }
]);

Here we see an example of aggregation within the lookup by categorizing items and calculating total sales for each category. The $group stage is applied after the lookup match, which allows us to summarize data at a level that is more granular than the order level.

Final Notes on Performance and Use Cases

The flexibility offered by the $lookup operator with $pipeline support significantly boosts the ability to answer complex queries in MongoDB. However, use this feature judiciously as each lookup incurs a performance cost, especially as the complexity of the pipelines and the size of the data increase. Ensure that you create appropriate indexes to support your queries.

Conclusion

This tutorial has provided a deep dive into MongoDB’s $lookup operator with a particular focus on the $pipeline sub-feature. Through step-by-step examples, we examined how to craft basic to advanced joins and aggregate data efficiently. Effective use of $lookup enhances the overall flexibility of MongoDB, illustrating its capabilities as a robust option for handling complex queries and relational data patterns.