MongoDB: How to perform JOIN operations using $lookup

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, a renowned NoSQL database allowing high-performance, agile development practices through its non-relational datastore. Unlike SQL databases, MongoDB uses collections and documents instead of tables and rows. Despite its non-relational nature, MongoDB offers the $lookup aggregation stage, which bears a resemblance to an SQL JOIN operation, and enables you to combine documents from different collections. In this article, we will delve into how to perform JOIN operations in MongoDB using the $lookup operator, covering basic to advanced examples.

Getting Started with $lookup

In MongoDB, $lookup is an aggregation pipeline stage that allows you to specify which collection you want to join with the current collection. This is one way that MongoDB offers to reconcile the non-relational nature of its database with the relational concept of JOIN operations. The $lookup operator performs an equality match between a field from the documents of the collection being aggregated, often referred to as the “local” field, and a field from the documents of the collection being joined, or the “foreign” field.

Basic Syntax

{ $lookup: {
    from: "foreignCollection",
    localField: "localFieldName",
    foreignField: "foreignFieldName",
    as: "outputArrayName"
}}

Here’s a simple example:

// Assuming two collections: orders and customers
// orders document example: { _id: ObjectId("1"), customerId: "C1", total: 100 }
// customers document example: { _id: "C1", name: "John Doe" }
db.orders.aggregate([
    {
        $lookup: {
            from:"customers",
            localField:"customerId",
            foreignField:"_id",
            as: "customerDetails"
        }
    }
]);

The example above simulates a JOIN between the orders and customers collections, where the customerId field from an order is matched against the _id of a customer. The resulting documents would contain the original order data, along with a new array field named customerDetails that includes the matching customer documents. The output might look like this:

{
    _id: ObjectId("1"), 
    customerId: "C1",
    total: 100,
    customerDetails: [
        { _id: "C1", name: "John Doe" }
    ]
}

Multiplying Results

One thing to note is that similarly to SQL JOINs, using $lookup could multiply the number of result documents when multiple matches are found in the foreign collection. For each matching document in the foreign collection, a new document will be added to the output array. If there are no matches, the output array will be empty.

An example could be if a customer has multiple phone numbers stored in a separate collection. The following aggregation would result in a document that features a customer with an array of all of their phone numbers:

// Assuming a separate collection: phone_numbers
// phone_numbers document example: { _id: ObjectId("2"), customerId: "C1", number: "+123456789" }
db.customers.aggregate([
    {
        $lookup: {
            from: "phone_numbers",
            localField: "_id",
            foreignField: "customerId",
            as: "phoneNumbers"
        }
    }
]);

Here, we would have each customer document followed by a phoneNumbers array that includes all of their phone numbers.

Unwinding the Results

After performing $lookup, you might end up with an array of joined documents for each input document. This can be unwound using the $unwind stage, which is often helpful for flattening this array structure:

// Continuing from the previous examples, to flatten the customerDetails array:
db.orders.aggregate([
    {
        $lookup: {
            from: "customers",
            localField: "customerId",
            foreignField: "_id",
            as: "customerDetails"
        }
    },
    {
        $unwind: "$customerDetails"
    }
]);

The $unwind stage will generate a new document for each element in the specified array, copying all the original fields from the input document to each unwound document.

Join Conditions and Sub-pipelines

MongoDB’s $lookup capability has evolved to support more complex join conditions and the use of sub-pipelines, beyond a simple equality match. The syntax for using a $lookup with a sub-pipeline is as such:

{
    $lookup: {
        from: "foreignCollection",
        let: { localVariable: "$localField" },
        pipeline: [{
            $match: {
                $expr: { $eq: ["$foreignField", "$localVariable"] }
            }
        },
        // Other aggregation stages can go here as needed
        ],
        as: "outputArrayName"
    }
}

This form allows you to specify a pipeline that the foreign collection’s documents must pass in order to be included in the output array. The example starts by defining localVariable that uses the let option to capture a local field’s value, which is then used in the sub-pipeline’s match expression.

Handling Null and Missing Fields

When a local or foreign field is null or does not exist, MongoDB will not output a joined result. Use $lookup in combination with other aggregation stages, such as $match or $ifNull, to control how these scenarios are handled.

Consider an example where you want to include orders even if the customer is missing:

// Utilize $ifNull to provide a default value for potentially null fields
db.orders.aggregate([
    {
        $lookup: {
            from: "customers",
            localField: "customerId",
            foreignField: "_id",
            as: "customerDetails"
        }
    },
    {
        $project: {
            _id: 1,
            customerId: 1,
            total: 1,
            customerDetails: { $ifNull: ["$customerDetails", []] }
        }
    }
]);

In this aggregation, the $project stage ensures that customerDetails is an empty array when no customer information is available, ensuring the data’s consistency.

Advantages and Limitations

Using $lookup can greatly enhance the power of MongoDB queries by allowing relational operations, but it also comes with certain limitations. There might be performance considerations when using $lookup on large datasets, as MongoDB has to execute additional processing to join the data. It’s also important to remember the differences in a non-relational database context, such as the potential for duplicated data and the lack of enforcement for referential integrity that is common in SQL databases.

Conclusion

In this article, we explored the $lookup stage in MongoDB’s aggregation framework, starting with simple equality-based joins and moving on to more complex examples involving sub-pipelines and handling null values. As seen through various examples, $lookup provides versatility and power to MongoDB queries reminiscent of JOIN operations in relational databases.