MongoDB: Using $setUnion to Combine Multiple Arrays into One

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB provides a rich set of operators for manipulating and querying documents within collections. One particularly powerful operator for array operations is $setUnion, which merges multiple arrays, removing duplicate elements, and returns a single array containing all distinct elements. This operator can be instrumental when dealing with complex data structures and the need to combine information from different sources arises.

This tutorial provides an in-depth understanding of the $setUnion operator, including its syntax, usage, and practical applications with code examples ranging from basic to advanced scenarios.

Understanding $setUnion

The $setUnion operator takes two or more arrays and returns an array containing elements that appear in any of the provided arrays. The returned array has no duplicate entries, even if a value appears multiple times in the input arrays.

Here is the basic syntax of $setUnion:

{
    $setUnion: [ <array1>, <array2>, ...
}

Now, let’s start with some basic examples and gradually move towards more complex scenarios.

Basic Usage of $setUnion

Example 1: Merging two arrays with distinct values.

db.collection.aggregate([
   {
      $project: {
         combinedArray: {
            $setUnion: [ ["apple", "banana"], ["cherry", "date"] ]
         }
      }
   }
]);

Output:

{
   "combinedArray": ["apple", "banana", "cherry", "date"]
}

Example 2: Merging two arrays with overlapping values.

db.collection.aggregate([
   {
      $project: {
         combinedArray: {
            $setUnion: [ ["apple", "banana"], ["banana", "cherry"] ]
         }
      }
   }
]);

Output:

{
   "combinedArray": ["apple", "banana", "cherry"]
}

Combining Arrays from Documents

In real-world scenarios, you often need to combine arrays that are stored in the documents of a collection. Let’s take a look at an example where we merge arrays from documents.

Example 3: Given a collection of book documents, each containing an array of authors, merge the author arrays into a unique set of all authors appearing in the collection.

db.books.aggregate([
   {
      $group: {
         _id: null,
         allAuthors: {
            $setUnion: [ "$authors",  [] ]
         }
      }
   }
]);

Output: All unique authors from the books collection.

Advanced Usage of $setUnion

As the complexity of your data increases, $setUnion can be combined with other aggregation framework stages and operators to perform more sophisticated operations.

Example 4: Combining tags from a collection of posts, where each post contains a tags array, and also excluding any tags that are included in a specified exclude list.

db.posts.aggregate([
   {
      $project: {
         relevantTags: {
            $setDifference: [
               {
                  $setUnion: [ "$tags", [] ]
               },
               ["exclude_tag1", "exclude_tag2"]
            ]
         }
      }
   }
]);

Output: An array of unique, relevant tags from all posts, excluding specific unwanted tags.

Using $setUnion with Multiple Collections

In some cases, you might be working across multiple collections and need to combine their respective arrays into a unified set. Here is how you can achieve this in MongoDB using aggregation.

Example 5: Combine unique user roles from two different collections, employees and managers.

// First, aggregate roles from the employees collection.
const employeeRoles = db.employees.aggregate([
   {
      $group: {
         _id: null,
         roles: {
            $addToSet: "$role"
         }
      }
   }
]).toArray();

// Aggregate roles from the managers collection.
const managerRoles = db.managers.aggregate([
   {
      $group: {
         _id: null,
         roles: {
            $addToSet: "$role"
         }
      }
   }
]).toArray();

// Combine the roles using $setUnion.
db.collection.aggregate([
   {
      $project: {
         allRoles: {
            $setUnion: [ employeeRoles[0].roles, managerRoles[0].roles ]
         }
      }
   }
]);

Output: A unified array of distinct employee and manager roles across both collections.

Indexing and Performance Considerations

When using $setUnion, be aware of its performance implications, especially on large datasets. Indexes cannot be used in the $setUnion operation directly, but they can still impact the performance indirectly by improving the efficiency of preceding stages in the aggregation pipeline.

Conclusion

In conclusion, MongoDB’s $setUnion is a versatile operator for array manipulation within the aggregation framework. Through the examples discussed in this tutorial, we have seen how $setUnion can be an essential tool in processing and combining data from arrays, documents, and even multiple collections. Learning to use $setUnion effectively will enhance your ability to manage complex data structures and derive meaningful insights from your database.