Mongoose: How to get N random documents

Introduction
Basic Random Sampling
Getting Random documents with a Query
Cursor Based Random Sampling
Advanced Techniques
Performance Considerations
Conclusion

Introduction

Finding random documents in a MongoDB collection turns what would be a simple operation in SQL into a more challenging task in the context of NoSQL databases. Mongoose, a MongoDB object modeling tool for Node.js, does not include a native method for directly fetching a random sample. However, by leaning into native MongoDB features and Mongoose’s flexibility, developers can achieve this functionality. This guide will provide various methods to retrieve N random documents from a collection using Mongoose.

Basic Random Sampling

The simplest way to get a random document is to use MongoDB’s aggregate framework with the $sample stage. Below is a basic example to fetch a single random document:

const mongoose = require('mongoose');
const { Schema } = mongoose;

const yourSchema = new Schema({ /* Your schema definition */ });

const YourModel = mongoose.model('YourModel', yourSchema);

const getRandomDocument = async () => {
    const randomDoc = await YourModel.aggregate([
        { $sample: { size: 1 } }
    ]);
    return randomDoc;
};

getRandomDocument().then(doc => console.log(doc)).catch(err => console.error(err));

To get N random documents, you can modify the size property in the $sample stage:

const N = 5; // Number of random documents you want

const getRandomDocuments = async (numberOfDocs) => {
    const randomDocs = await YourModel.aggregate([
        { $sample: { size: numberOfDocs } }
    ]);
    return randomDocs;
};

getRandomDocuments(N).then(docs => console.log(docs)).catch(err => console.error(err));

Getting Random documents with a Query

If you need to select random documents based on a certain criteria, you’ll need a more complex approach. Here’s how you can include a match stage before sampling:

const getRandomDocumentsWithQuery = async (query, numberOfDocs) => {
    const randomDocs = await YourModel.aggregate([
        { $match: query },
        { $sample: { size: numberOfDocs } }
    ]);
    return randomDocs;
};

// Example usage
const query = { active: true };
getRandomDocumentsWithQuery(query, N).then(docs => console.log(docs)).catch(err => console.error(err));

Cursor Based Random Sampling

For larger collections, using a cursor-based approach could be more performance-friendly. The below code example utilizes a random cursor position to fetch documents:

const getRandomDocumentsCursorBased = async (numberOfDocs) => {
    const count = await YourModel.countDocuments();
    const random = Math.floor(Math.random() * count);
    const randomDocs = await YourModel.find().skip(random).limit(numberOfDocs);
    return randomDocs;
};

getRandomDocumentsCursorBased(N).then(docs => console.log(docs)).catch(err => console.error(err));

this approach suffers from a performance penalty on large collections since skip can be slow.

Advanced Techniques

When dealing with very large collections or needing more controlled randomness, advanced methods become necessary. One approach is precalculating a random field upon document creation and then sorting by this field.

const YourAdvancedSchema = new Schema({
  // your schema fields
  random: { type: Number, default: () => Math.random() }
});

// When fetching:
const getRandomDocumentsAdvanced = async (numberOfDocs) => {
  const randomDocs = await YourModel.find().sort('random').limit(numberOfDocs);
  return randomDocs;
};

getRandomDocumentsAdvanced(N).then(docs => console.log(docs)).catch(err => console.error(err));

A reminder that any advanced method you choose should take into consideration the feasible trade-offs in terms of performance, accuracy, and maintenance of additional fields or indexes.

Performance Considerations

It’s important to emphasize that different approaches to fetching random documents have varying impacts on performance. Using MongoDB’s native $sample mechanism is typically the fastest since it’s optimized by the database engine itself. However, it may take more time as the volume of data grows or when there’s a need for query-specific randomness.

Cursor-based approaches have more predictable performance but can be prohibitive in large datasets due to the way skipping operates internally in MongoDB.

Adding a random field and sorting also comes at a cost, especially if frequent writes occur: it adds overhead to every insertion due to the additional index that must be kept and may cause additional write and storage costs.

Conclusion

Random document retrieval in MongoDB with Mongoose requires understanding various methods and their respective trade-offs. From the simplest use of the $sample aggregator to more resources-intensive methods like cursor-based sampling, the right choice depends on your specific use case, especially considering the size of your collection and performance requirements. Consider all options carefully and perform adequate testing with realistic data volumes to make an informed decision on which method to implement.

Next Article: Mongoose $lookup operator (with examples)

Previous Article: Mongoose: How to get a random document

Series: Mongoose.js Tutorials

Node.js