MongoEngine: How to query distinct values

Updated: February 10, 2024 By: Guest Contributor Post a comment

Overview

MongoEngine is a Document-Object Mapping (DOM) library for working with MongoDB from Python. It translates Python classes to MongoDB documents and vice versa, providing a high-level abstraction to interact with MongoDB. One of the tasks often encountered when working in data-driven applications is the need to query distinct values from a dataset to ensure data uniqueness or to perform analysis tasks. MongoEngine supports querying distinct values directly, leveraging the capabilities of MongoDB. In this tutorial, we will explore how to query distinct values using MongoEngine with a series of examples progressing from basic to advanced.

Getting Started

Before diving into querying distinct values, ensure MongoEngine is installed. If not, you can install it using pip:

pip install mongoengine

Next, connect to your MongoDB database:

from mongoengine import connect
connect('your_database_name', host='your_database_host', port=your_database_port)

Once connected, define your document models. For this tutorial, we’re using a simple User model as an example:

from mongoengine import Document, StringField

class User(Document):
    name = StringField(max_length=50)
    email = StringField(required=True)

Querying Basic Distinct Values

Let’s begin with the most straightforward case – querying for distinct names in the database:

User.objects.distinct('name')

This will return a list of distinct names from the User collection. The output will depend on the documents available in your dataset but expect something along the lines of:

["John", "Doe", "Jane"]

Querying Distinct Values with Conditions

Moving a step forward, it’s often required to fetch distinct values based on certain conditions. MongoEngine allows you to apply filters before calling distinct. For instance, filtering users with email addresses from a specific domain:

User.objects(email__endswith='@example.com').distinct('name')

The output would be similar to the basic query, but limited to users with their email domain as ‘@example.com’.

Advanced Query: Aggregation Framework

For more complex scenarios, MongoEngine offers integration with MongoDB’s aggregation framework, enabling sophisticated queries and data transformations. Here is how you can use the aggregation framework to get distinct values:

User.objects.aggregate(*[{'$group': {'_id': '$name', 'total': {'$sum': 1}}}, {'$match': {'total': {'$gt': 1}}}, {'$sort': {'total': -1}}])

This pipeline groups users by names, counts occurrences, filters names appearing more than once, and sorts them in descending order by count. The output would be an iterator yielding documents like:

{'_id': 'John', 'total': 2}

Working with Embedded Documents

In some cases, you might have to query distinct values from fields within embedded documents. Let’s say we have an embedded document for a user’s address:

class Address(Document):
    city = StringField(required=True)
    country = StringField(required=True)

class User(Document):
    name = StringField(max_length=50)
    email = StringField(required=True)
    address = EmbeddedDocumentField(Address)

To query distinct cities from the user addresses:

User.objects.distinct('address.city')

The output will show a list of distinct cities from all the user addresses.

Conclusion

Querying distinct values is a vital operation for data analysis, ensuring uniqueness, and cleaning data sets. This tutorial demonstrated the utility and flexibility of MongoEngine in querying distinct values from a MongoDB database, ranging from straightforward queries to complex aggregations with conditions. With these examples, developers can leverage MongoEngine effectively in their projects to manage data in MongoDB.