MongoDB: Select/count distinct values in multiple fields

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, as a leading NoSQL database, offers you the flexibility to work with data in documents that resemble JSON objects. One common task when dealing with databases is selecting and counting distinct values. This tutorial delves into how to perform these operations in MongoDB, focusing on distinct values in multiple fields, a task which can have varied levels of complexity depending on the structure of your data.

We’ll start with the basics and gradually move to more advanced queries, complete with code examples and potential outputs. Let’s dive in!

Prerequisites

  • A running MongoDB instance
  • Basic familiarity with the MongoDB shell or Compass UI
  • A dataset in your MongoDB where you can test the queries

Using distinct function

The distinct function in MongoDB is straightforward for querying distinct values in a single field. It’s equivalent to the SELECT DISTINCT statement in SQL.

db.collection.distinct('fieldName')

Let’s say we have a collection products with the following documents:

[
  { "_id" : 1, "category": "electronics", "manufacturer": "ABC Corp" },
  { "_id" : 2, "category": "home appliances", "manufacturer": "XYZ Inc" },
  { "_id" : 3, "category": "electronics", "manufacturer": "ABC Corp" }
]

To get distinct categories, we’d use:

db.products.distinct('category')

The output would be:

[
  "electronics",
  "home appliances"
]

Combining $group and $addToSet

For combining distinct values across multiple fields, we can rely on the aggregation framework provided by MongoDB.
An aggregation pipeline transforms data into aggregated results using stages such as $group, $project, $match, and more.

Here’s a basic example to combine distinct manufacturers and categories:

db.products.aggregate([
  {
    $group: {
      _id: null,
      distinctiveCategories: { $addToSet: "$category" },
      distinctiveManufacturers: { $addToSet: "$manufacturer" }
    }
  }
])

This pipeline would produce:

{
  "_id" : null,
  "distinctiveCategories" : [ "electronics", "home appliances" ],
  "distinctiveManufacturers" : [ "ABC Corp", "XYZ Inc" ]
}

Advanced Aggregation

If we need to perform a more complex operation, such as counting the distinct values, we can extend our aggregation pipeline with additional stages like $project or $unwind.

Here is an example that counts the number of distinct categories and manufacturers:

db.products.aggregate([
  {
    $group: {
      _id: null,
      distinctiveCategories: { $addToSet: "$category" },
      distinctiveManufacturers: { $addToSet: "$manufacturer" }
    }
  },
  {
    $project: {
      _id: 0,
      numberOfDistinctCategories: { $size: "$distinctiveCategories" },
      numberOfDistinctManufacturers: { $size: "$distinctiveManufacturers" }
    }
  }
])

This query will yield:

{
  "numberOfDistinctCategories" : 2,
  "numberOfDistinctManufacturers" : 2
}

The previous examples cover a general case. However, your dataset might require more complex querying. MongoDB allows the combination of various operators in different stages to create powerful aggregations.

Working with Nested Documents

Sometimes, your value might be nested within another document or array. In such cases, you need to use the $unwind operator to flatten the array before grouping the entries.

For example, suppose we have the following documents:

[
  { "_id" : 1, "products": [ { "category": "electronics" }, { "category": "fitness" } ] },
  { "_id" : 2, "products": [ { "category": "home appliances" } ] }
  { "_id" : 3, "products": [ { "category": "gaming" } ] }
]

To count the distinct categories present in the nested products array, we would use an aggregation like this:

db.orders.aggregate([
  { $unwind: "$products" },
  { $group: { _id: null, distinctCategories: { $addToSet: "$products.category" } } },
  { $project: { _id: 0, count: { $size: "$distinctCategories" } } }
])

This would return:

{ "count" : 4 }

Conclusion

Throughout this guide, we’ve explored selecting and counting distinct values in MongoDB. Starting with the simple distinct method, advancing through aggregation with $group and $addToSet, and considering more complex scenarios involving nested documents and arrays. Whether you are dealing with flat or hierarchical data structures, MongoDB’s flexible aggregation framework can satisfy a wide range of querying needs.

In summary, MongoDB provides a robust set of tools that enable developers to efficiently handle unique value extraction and aggregation tasks across various complexities. These capabilities are crucial for data analysis, ensuring data integrity, and generating insightful reports.