Sling Academy
Home/MongoDB/MongoDB: 3 Ways to Select N Random Documents

MongoDB: 3 Ways to Select N Random Documents

Last updated: February 03, 2024

Introduction

Selecting random documents from a MongoDB collection is a common requirement for various applications such as sampling datasets, implementing random page views, or simply shuffling data. In this guide, we explore several methods to achieve this, considering their implementation and performance implications.

Approach 1: $sample Aggregation Stage

This method uses the aggregation pipeline’s $sample stage to randomly select documents.

  1. Start an aggregation pipeline with the $sample stage.
  2. Specify the size of the sample with the size option.

Code example:

db.collection.aggregate([
 { $sample: { size: N } }
])

Output: Returns N random documents from the collection.

Notes:

  • Efficient for large collections.
  • Does not guarantee uniqueness in a sharded environment.

Approach 2: Random Field and Sort

Add a random field to each document that you can later sort on. This is less efficient and not recommended for large datasets.

  1. Add a field with a random value to each document.
  2. Perform a sort on this random field.
  3. Limit the number of documents to N.

Code example:

db.collection.find().forEach(function(doc) {
    db.collection.update(
        { _id: doc._id },
        { $set: { random: Math.random() } }
    );
});

db.collection.find().sort({ random: 1 }).limit(N);

Notes:

  • Can be resource-intensive.
  • Requires a temporary field in documents.
  • Not efficient for large datasets or frequent operations.

Approach 3: Random Attribute Skip

Another method is to use a random number to skip documents, although this also is not optimal for performance and large collections.

  1. Calculate the collection size and determine a skip value.
  2. Use skip and limit with find operation to fetch N documents.

Code example:

const totalDocs = db.collection.count();
const randomSkip = Math.floor(Math.random() * (totalDocs - N));
db.collection.find().skip(randomSkip).limit(N);

Notes:

  • Affects performance negatively.
  • Not suitable for large collections.
  • Skip operation is often inefficient.

Conclusion

When selecting N random documents from a MongoDB collection, the $sample stage is generally the most performant and straightforward method, especially for large datasets. Alternatives, such as adding a random field or skipping documents, can be implemented but come with several limitations, notably performance overhead and scaling issues. Understanding the size and frequency of your data operations will guide you to choosing the most appropriate technique.

Next Article: Using $group aggregation stage in MongoDB (with examples)

Previous Article: MongoDB: 3 ways to select a random document from a collection

Series: MongoDB Tutorials

MongoDB

You May Also Like

  • MongoDB: How to combine data from 2 collections into one
  • Hashed Indexes in MongoDB: A Practical Guide
  • Partitioning and Sharding in MongoDB: A Practical Guide (with Examples)
  • Geospatial Indexes in MongoDB: How to Speed Up Geospatial Queries
  • Understanding Partial Indexes in MongoDB
  • Exploring Sparse Indexes in MongoDB (with Examples)
  • Using Wildcard Indexes in MongoDB: An In-Depth Guide
  • Matching binary values in MongoDB: A practical guide (with examples)
  • Understanding $slice operator in MongoDB (with examples)
  • Caching in MongoDB: A practical guide (with examples)
  • CannotReuseObject Error: Attempted illegal reuse of a Mongo object in the same process space
  • How to perform cascade deletion in MongoDB (with examples)
  • MongoDB: Using $not and $nor operators to negate a query
  • MongoDB: Find SUM/MIN/MAX/AVG of each group in a collection
  • References (Manual Linking) in MongoDB: A Developer’s Guide (with Examples)
  • MongoDB: How to see all fields in a collection (with examples)
  • Type checking in MongoDB: A practical guide (with examples)
  • How to query an array of subdocuments in MongoDB (with examples)
  • MongoDB: How to compare 2 documents (with examples)