Sling Academy
Home/Node.js/Mongoose: How to get N random documents

Mongoose: How to get N random documents

Last updated: December 30, 2023

Introduction

Finding random documents in a MongoDB collection turns what would be a simple operation in SQL into a more challenging task in the context of NoSQL databases. Mongoose, a MongoDB object modeling tool for Node.js, does not include a native method for directly fetching a random sample. However, by leaning into native MongoDB features and Mongoose’s flexibility, developers can achieve this functionality. This guide will provide various methods to retrieve N random documents from a collection using Mongoose.

Basic Random Sampling

The simplest way to get a random document is to use MongoDB’s aggregate framework with the $sample stage. Below is a basic example to fetch a single random document:

const mongoose = require('mongoose');
const { Schema } = mongoose;

const yourSchema = new Schema({ /* Your schema definition */ });

const YourModel = mongoose.model('YourModel', yourSchema);

const getRandomDocument = async () => {
    const randomDoc = await YourModel.aggregate([
        { $sample: { size: 1 } }
    ]);
    return randomDoc;
};

getRandomDocument().then(doc => console.log(doc)).catch(err => console.error(err));

To get N random documents, you can modify the size property in the $sample stage:

const N = 5; // Number of random documents you want

const getRandomDocuments = async (numberOfDocs) => {
    const randomDocs = await YourModel.aggregate([
        { $sample: { size: numberOfDocs } }
    ]);
    return randomDocs;
};

getRandomDocuments(N).then(docs => console.log(docs)).catch(err => console.error(err));

Getting Random documents with a Query

If you need to select random documents based on a certain criteria, you’ll need a more complex approach. Here’s how you can include a match stage before sampling:

const getRandomDocumentsWithQuery = async (query, numberOfDocs) => {
    const randomDocs = await YourModel.aggregate([
        { $match: query },
        { $sample: { size: numberOfDocs } }
    ]);
    return randomDocs;
};

// Example usage
const query = { active: true };
getRandomDocumentsWithQuery(query, N).then(docs => console.log(docs)).catch(err => console.error(err));

Cursor Based Random Sampling

For larger collections, using a cursor-based approach could be more performance-friendly. The below code example utilizes a random cursor position to fetch documents:

const getRandomDocumentsCursorBased = async (numberOfDocs) => {
    const count = await YourModel.countDocuments();
    const random = Math.floor(Math.random() * count);
    const randomDocs = await YourModel.find().skip(random).limit(numberOfDocs);
    return randomDocs;
};

getRandomDocumentsCursorBased(N).then(docs => console.log(docs)).catch(err => console.error(err));

this approach suffers from a performance penalty on large collections since skip can be slow.

Advanced Techniques

When dealing with very large collections or needing more controlled randomness, advanced methods become necessary. One approach is precalculating a random field upon document creation and then sorting by this field.

const YourAdvancedSchema = new Schema({
  // your schema fields
  random: { type: Number, default: () => Math.random() }
});

// When fetching:
const getRandomDocumentsAdvanced = async (numberOfDocs) => {
  const randomDocs = await YourModel.find().sort('random').limit(numberOfDocs);
  return randomDocs;
};

getRandomDocumentsAdvanced(N).then(docs => console.log(docs)).catch(err => console.error(err));

A reminder that any advanced method you choose should take into consideration the feasible trade-offs in terms of performance, accuracy, and maintenance of additional fields or indexes.

Performance Considerations

It’s important to emphasize that different approaches to fetching random documents have varying impacts on performance. Using MongoDB’s native $sample mechanism is typically the fastest since it’s optimized by the database engine itself. However, it may take more time as the volume of data grows or when there’s a need for query-specific randomness.

Cursor-based approaches have more predictable performance but can be prohibitive in large datasets due to the way skipping operates internally in MongoDB.

Adding a random field and sorting also comes at a cost, especially if frequent writes occur: it adds overhead to every insertion due to the additional index that must be kept and may cause additional write and storage costs.

Conclusion

Random document retrieval in MongoDB with Mongoose requires understanding various methods and their respective trade-offs. From the simplest use of the $sample aggregator to more resources-intensive methods like cursor-based sampling, the right choice depends on your specific use case, especially considering the size of your collection and performance requirements. Consider all options carefully and perform adequate testing with realistic data volumes to make an informed decision on which method to implement.

Next Article: Mongoose $lookup operator (with examples)

Previous Article: Mongoose: How to get a random document

Series: Mongoose.js Tutorials

Node.js

You May Also Like

  • NestJS: How to create cursor-based pagination (2 examples)
  • Cursor-Based Pagination in SequelizeJS: Practical Examples
  • MongooseJS: Cursor-Based Pagination Examples
  • Node.js: How to get location from IP address (3 approaches)
  • SequelizeJS: How to reset auto-increment ID after deleting records
  • SequelizeJS: Grouping Results by Multiple Columns
  • NestJS: Using Faker.js to populate database (for testing)
  • NodeJS: Search and download images by keyword from Unsplash API
  • NestJS: Generate N random users using Faker.js
  • Sequelize Upsert: How to insert or update a record in one query
  • NodeJS: Declaring types when using dotenv with TypeScript
  • Using ExpressJS and Multer with TypeScript
  • NodeJS: Link to static assets (JS, CSS) in Pug templates
  • NodeJS: How to use mixins in Pug templates
  • NodeJS: Displaying images and links in Pug templates
  • ExpressJS + Pug: How to use loops to render array data
  • ExpressJS: Using MORGAN to Log HTTP Requests
  • NodeJS: Using express-fileupload to simply upload files
  • ExpressJS: How to render JSON in Pug templates