Sling Academy
Home/Python/MongoEngine: How to query distinct values

MongoEngine: How to query distinct values

Last updated: February 10, 2024

Overview

MongoEngine is a Document-Object Mapping (DOM) library for working with MongoDB from Python. It translates Python classes to MongoDB documents and vice versa, providing a high-level abstraction to interact with MongoDB. One of the tasks often encountered when working in data-driven applications is the need to query distinct values from a dataset to ensure data uniqueness or to perform analysis tasks. MongoEngine supports querying distinct values directly, leveraging the capabilities of MongoDB. In this tutorial, we will explore how to query distinct values using MongoEngine with a series of examples progressing from basic to advanced.

Getting Started

Before diving into querying distinct values, ensure MongoEngine is installed. If not, you can install it using pip:

pip install mongoengine

Next, connect to your MongoDB database:

from mongoengine import connect
connect('your_database_name', host='your_database_host', port=your_database_port)

Once connected, define your document models. For this tutorial, we’re using a simple User model as an example:

from mongoengine import Document, StringField

class User(Document):
    name = StringField(max_length=50)
    email = StringField(required=True)

Querying Basic Distinct Values

Let’s begin with the most straightforward case – querying for distinct names in the database:

User.objects.distinct('name')

This will return a list of distinct names from the User collection. The output will depend on the documents available in your dataset but expect something along the lines of:

["John", "Doe", "Jane"]

Querying Distinct Values with Conditions

Moving a step forward, it’s often required to fetch distinct values based on certain conditions. MongoEngine allows you to apply filters before calling distinct. For instance, filtering users with email addresses from a specific domain:

User.objects(email__endswith='@example.com').distinct('name')

The output would be similar to the basic query, but limited to users with their email domain as ‘@example.com’.

Advanced Query: Aggregation Framework

For more complex scenarios, MongoEngine offers integration with MongoDB’s aggregation framework, enabling sophisticated queries and data transformations. Here is how you can use the aggregation framework to get distinct values:

User.objects.aggregate(*[{'$group': {'_id': '$name', 'total': {'$sum': 1}}}, {'$match': {'total': {'$gt': 1}}}, {'$sort': {'total': -1}}])

This pipeline groups users by names, counts occurrences, filters names appearing more than once, and sorts them in descending order by count. The output would be an iterator yielding documents like:

{'_id': 'John', 'total': 2}

Working with Embedded Documents

In some cases, you might have to query distinct values from fields within embedded documents. Let’s say we have an embedded document for a user’s address:

class Address(Document):
    city = StringField(required=True)
    country = StringField(required=True)

class User(Document):
    name = StringField(max_length=50)
    email = StringField(required=True)
    address = EmbeddedDocumentField(Address)

To query distinct cities from the user addresses:

User.objects.distinct('address.city')

The output will show a list of distinct cities from all the user addresses.

Conclusion

Querying distinct values is a vital operation for data analysis, ensuring uniqueness, and cleaning data sets. This tutorial demonstrated the utility and flexibility of MongoEngine in querying distinct values from a MongoDB database, ranging from straightforward queries to complex aggregations with conditions. With these examples, developers can leverage MongoEngine effectively in their projects to manage data in MongoDB.

Next Article: MongoEngine: Set a default value for a field

Previous Article: MongoEngine Aggregation: A Practical Guide

Series: Data Persistence in Python – Tutorials & Examples

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots