Sling Academy
Home/MongoDB/MongoDB: How to select distinct values from a collection (with examples)

MongoDB: How to select distinct values from a collection (with examples)

Last updated: February 03, 2024

Introduction

Working with databases frequently involves handling large datasets with potential duplicates. Such is the case with MongoDB, a NoSQL database popular for its flexibility and scalability. One common requirement for database operations is to retrieve unique instances of values for certain fields across your records (or documents, in MongoDB parlance). In this tutorial, we will explore the process and varied approaches to selecting distinct values from a collection in MongoDB.

Let’s say we have a MongoDB collection named ‘products’ which contains multiple documents with potential duplicate items. Our target is to filter the duplicate entries and retrieve a list of unique products. To achieve this, MongoDB offers the distinct command. It performs an operation that identifies unique values within a specified field across a single collection.

Basic Usage of Distinct

To start with a straightforward example, suppose we want to retrieve all the unique ‘category’ values from our ‘products’ collection. The fundamental distinct command would look like this:

db.products.distinct("category")

This command will return an array of unique categories from the products collection. Here is a sample output you might expect:

["Electronics", "Clothing", "Kitchenware"]

Using Distinct with a Query

Now, let’s take it a step further. What if you want to get distinct values conditioned by a certain filter? MongoDB’s distinct function also allows you to specify a query object to filter the documents before fetching the distinct values. Here’s an example:

db.products.distinct("category", { price: { $gt: 99 } })

In this case, we are asking MongoDB to provide a list of distinct categories for products priced over $99. Accordingly, the output would adapt to this condition:

["Electronics", "Kitchenware"]

Advanced Usage with Aggregation Pipeline

The distinct command provides a quick and easy way to extract unique values, however, it has its limitations when it comes to more complex operations. For advanced scenarios, MongoDB’s aggregation pipeline comes into play—it allows you to string together a series of operations to process your data.

To illustrate this, imagine you want to know the distinct categories with an added requirement of knowing how many products fall under each. You’d need to combine $group and $sum operators:

db.products.aggregate([
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    }
])

The output now includes each unique category alongside the count of corresponding products:

[
    { "_id": "Electronics", "count": 15 },
    { "_id": "Clothing", "count": 25 },
    { "_id": "Kitchenware", "count": 8 }
]

Combining Filters with Aggregation

If we extend our example to include a filtering step before grouping the data, we can add a $match stage to our pipeline:

db.products.aggregate([
    {
        $match: { price: { $gt: 99 } }
    },
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    }
])

This returns a count of categories for products above $99, filtering out any others. The same selective listing, now based on the price, might look like this:

[
    { "_id": "Electronics", "count": 10 },
    { "_id": "Kitchenware", "count": 5 }
]

Incorporating Sort and Project

Continuing from here, you may also want to sort these categories based on their product count or include/exclude specific fields from the results. To do so, you’ll add $sort and $project stages respectively to the pipeline:

db.products.aggregate([
    {
        $match: { price: { $gt: 99 } }
    },
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    },
    {
        $sort: { count: -1 }
    },
    {
        $project: { category: "$_id", _id: 0, itemCount: "$count" }
    }
])

The categories are now presented in descending order based on the number of items they have above the specified price, and they have been reformatted for readability:

[
    { "category": "Clothing", "itemCount": 25 },
    { "category": "Electronics", "itemCount": 10 },
    { "category": "Kitchenware", "itemCount": 5 }
]

Projection of Distinct Sub-Documents

For complex structures such as nested documents, getting distinct sub-documents can be achieved through a combination of $unwind, $group, and careful projection. This ensures you can flatten arrays, consolidate them into groups, and then project distinct sub-documents.

Assuming every product has multiple reviews that are sub-documents within an array, and we want distinct user names from reviews, the aggregation framework provides a systematic methodology:

db.products.aggregate([
    {
        $unwind: "$reviews"
    },
    {
        $group: {
            _id: "$reviews.username",
            count: { $sum: 1 }
        }
    },
    {
        $project: { username: "$_id", _id: 0 }
    }
])

The output would be something like this:

[
    { "username": "johndoe" },
    { "username": "janedoe" }
]

Conclusion

Throughout this tutorial, we have explored a variety of strategies for selecting distinct values from a MongoDB collection. Whether you require a simple list of distinct field values or a complex aggregation of unique data patterns after comprehensive transformation and filtering, MongoDB offers powerful and flexible solutions to address your needs. We progressed from basic distinct field queries to advanced aggregation pipelines techniques, each with appropriate code examples and expected outputs to guide you through your own use cases.

Next Article: MongoDB: Counting distinct values in each group

Previous Article: MongoDB: How to retry on read/write failure (with examples)

Series: MongoDB Tutorials

MongoDB

You May Also Like

  • MongoDB: How to combine data from 2 collections into one
  • Hashed Indexes in MongoDB: A Practical Guide
  • Partitioning and Sharding in MongoDB: A Practical Guide (with Examples)
  • Geospatial Indexes in MongoDB: How to Speed Up Geospatial Queries
  • Understanding Partial Indexes in MongoDB
  • Exploring Sparse Indexes in MongoDB (with Examples)
  • Using Wildcard Indexes in MongoDB: An In-Depth Guide
  • Matching binary values in MongoDB: A practical guide (with examples)
  • Understanding $slice operator in MongoDB (with examples)
  • Caching in MongoDB: A practical guide (with examples)
  • CannotReuseObject Error: Attempted illegal reuse of a Mongo object in the same process space
  • How to perform cascade deletion in MongoDB (with examples)
  • MongoDB: Using $not and $nor operators to negate a query
  • MongoDB: Find SUM/MIN/MAX/AVG of each group in a collection
  • References (Manual Linking) in MongoDB: A Developer’s Guide (with Examples)
  • MongoDB: How to see all fields in a collection (with examples)
  • Type checking in MongoDB: A practical guide (with examples)
  • How to query an array of subdocuments in MongoDB (with examples)
  • MongoDB: How to compare 2 documents (with examples)