MongoDB: 3 Ways to Select N Random Documents

Updated: February 3, 2024 By: Guest Contributor Post a comment

Introduction

Selecting random documents from a MongoDB collection is a common requirement for various applications such as sampling datasets, implementing random page views, or simply shuffling data. In this guide, we explore several methods to achieve this, considering their implementation and performance implications.

Approach 1: $sample Aggregation Stage

This method uses the aggregation pipeline’s $sample stage to randomly select documents.

  1. Start an aggregation pipeline with the $sample stage.
  2. Specify the size of the sample with the size option.

Code example:

db.collection.aggregate([
 { $sample: { size: N } }
])

Output: Returns N random documents from the collection.

Notes:

  • Efficient for large collections.
  • Does not guarantee uniqueness in a sharded environment.

Approach 2: Random Field and Sort

Add a random field to each document that you can later sort on. This is less efficient and not recommended for large datasets.

  1. Add a field with a random value to each document.
  2. Perform a sort on this random field.
  3. Limit the number of documents to N.

Code example:

db.collection.find().forEach(function(doc) {
    db.collection.update(
        { _id: doc._id },
        { $set: { random: Math.random() } }
    );
});

db.collection.find().sort({ random: 1 }).limit(N);

Notes:

  • Can be resource-intensive.
  • Requires a temporary field in documents.
  • Not efficient for large datasets or frequent operations.

Approach 3: Random Attribute Skip

Another method is to use a random number to skip documents, although this also is not optimal for performance and large collections.

  1. Calculate the collection size and determine a skip value.
  2. Use skip and limit with find operation to fetch N documents.

Code example:

const totalDocs = db.collection.count();
const randomSkip = Math.floor(Math.random() * (totalDocs - N));
db.collection.find().skip(randomSkip).limit(N);

Notes:

  • Affects performance negatively.
  • Not suitable for large collections.
  • Skip operation is often inefficient.

Conclusion

When selecting N random documents from a MongoDB collection, the $sample stage is generally the most performant and straightforward method, especially for large datasets. Alternatives, such as adding a random field or skipping documents, can be implemented but come with several limitations, notably performance overhead and scaling issues. Understanding the size and frequency of your data operations will guide you to choosing the most appropriate technique.