MongoDB: Filter documents based on array length (with examples)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, the popular NoSQL database known for its flexibility with unstructured data, often stores documents that contain arrays as one of the field types. Sometimes, we need to perform operations that involve filtering documents based on the length of these arrays, which could be a list of tags, comments, or other elements. This tutorial will guide you on various methods to filter documents considering the length of an array field. We will go from basic examples to more advanced concepts and showcase the outputs wherever applicable.

Basic Filtering Based on Array Length

Starting with the basics, let’s assume we have a collection called posts that contains fields such as title, content, and tags (which is an array). Your task is to find all posts that have exactly three tags:

db.posts.find({
    "tags": { "$size": 3 }
})

The $size operator matches any array with the number of elements specified by the argument. Here’s how you might see this query’s output:

// Output:
{
    "_id": ObjectId("5f650b"),
    "title": "MongoDB Array Filtering",
    "tags": ["NoSQL", "Database", "Filter"]
}

Combining $size with Other Query Operators

You can combine $size with other query criteria. For instance, suppose you want to find posts with exactly two tags, and one of them must be ‘Tutorial’. Here’s how you would write this query:

db.posts.find({
    "tags": { "$size": 2 },
    "tags": "Tutorial"
})

However, in this syntax, the tags field will be overridden by the last criteria. To properly execute this query, we need to use the $elemMatch operator:

db.posts.find({
    "tags": { "$elemMatch": { "$eq": "Tutorial" } },
    "tags": { "$size": 2 }
})

Now, MongoDB understands that you want to match documents where one of the two elements of the tags array is ‘Tutorial’.

Using Aggregation to Filter by Array Length

The $size operator is straightforward but has limitations: it does not allow comparison against a range (for example, more than two elements). When you need to compare array lengths using >, <, >=, <=, and !=, you will need to utilize MongoDB’s aggregation framework. Consider the following pipeline:

db.posts.aggregate([
    { "$project": {
        "title": 1,
        "tags": 1,
        "numberOfTags": { "$size": "$tags" }
    }},
    { "$match": {
        "numberOfTags": { "$gt": 2 }
    }}
])

In this aggregation pipeline, the $project stage creates a new field called numberOfTags containing the length of each document’s tags array. The $match stage then filters documents to only include those with more than two tags.

Performing Complex Filtering

Sometimes you might need to filter based on more complex criteria. For instance, what if you need to find posts that have at least one tag longer than a certain length? For such operations, you can use a combination of aggregation stages like the below example:

db.posts.aggregate([
    { "$match": {
        "tags": { "$elemMatch": { "$regex": /^.{5,}$/ } }
    }},
    { "$project": {
        "title": 1,
        "tags": 1
    }}
])

This pipeline filters for documents where at least one tag is at least five characters long. First, we use $match with $elemMatch to filter, and then we project the results with $project.

Concluding Example: Filtering with a Range and Specific Element

Let’s carry out a more sophisticated query by finding documents where the array length is within a range and contains a specific element. Here’s how we could build such an aggregation pipeline:

db.posts.aggregate([
    { "$project": {
        "title": 1,
        "tags": 1,
        "numberOfTags": { "$size": "$tags" }
    }},
    { "$match": {
        "numberOfTags": { "$gte": 2, "$lte": 4 },
        "tags": "Tutorial"
    }}
])

The above query adds both range ($gte and $lte) and element (‘Tutorial’) filters using $match. The results will reflect documents with a tags array of length 2 to 4, inclusive, and containing the element ‘Tutorial’.

Conclusion

Filtering MongoDB documents by array length involves an understanding of the $size operator in basic queries and the aggregation framework for advanced operations. Utilizing these tools effectively can yield very specific subsets of your data, tailored to your application’s needs.