The maximum size of a document in MongoDB

Updated: February 2, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, a leading NoSQL database, is widely known for its flexibility in handling large datasets and varied document structures. It’s known for its BSON (Binary JSON) format, which allows for the efficient storage and retrieval of complex documents. However, just like any database, MongoDB has its limits and one such is the maximum size of a document. In this article, we will explore the scale to which MongoDB can accommodate document size and best practices for managing large documents, supported by relevant code examples.

Understanding MongoDB’s Document Size Limit

MongoDB imposes a size limit of 16MB on a single document. This limitation ensures that a single document cannot grow beyond a manageable size, allowing for consistent performance even as your dataset grows. Let’s start with basic operations in MongoDB then move on to handling larger documents. We assume that you have MongoDB installed and a database called ‘testdb’.

To begin with, let’s insert a simple document into a collection called ‘simpledocs’. We’ll use the MongoDB shell:

use testdb
 db.simpledocs.insert({name: 'Tutorial', description: 'Learn about MongoDB document sizes.'})

Let’s check the size of our inserted document.

db.simpledocs.stats().avgObjSize

This will return the average size of the objects within the collection in bytes.

Approaching the Limit

As your application evolves, your documents might grow in size. It’s essential to keep track of their sizes to prevent hitting the limit unexpectedly. Here’s how to find the largest documents in your collection:

db.simpledocs.find().sort({$natural: -1}).limit(1).forEach(function(doc) {
  printjson({ _id: doc._id, size: Object.bsonsize(doc) });
});

This code orders documents by the natural order (basically insertion order unless the order is changed by other operations such as an update) and retrieves the size of the largest.

Working with Large Documents

If your use case requires handling large documents, you will want to be aware of their growth. Here’s a sample operation that could lead to a growth in a document’s size:

db.largeDocs.update(
  {_id: 'largeDocId'},
  {$push: { 'largeArrayField': 'A lot of content that could contribute to document size growth' }}
);

It’s also important to frequently check the size of such documents:

var docSize = Object.bsonsize(db.largeDocs.findOne({_id: 'largeDocId'}))
print("Document size in bytes: " + docSize);

If you find that a document’s size is approaching 16MB, one solution is to consider a pattern like Bucketing, which involves segmenting the data into multiple documents or utilizing GridFS, a MongoDB specification for storing and retrieving files that exceed the BSON-document size limit.

GridFS and Large Files

When working with files larger than 16MB, MongoDB supports GridFS. The following snippet demonstrates how to store a large file using GridFS in a Node.js application.

const { MongoClient, GridFSBucket } = require('mongodb');
const fs = require('fs');

MongoClient.connect('mongodb://localhost:27017', { useUnifiedTopology: true }, (err, client) => {
  if(err) throw err;
  const db = client.db('testdb');
  const bucket = new GridFSBucket(db, { bucketName: 'largeFiles' });

  fs.createReadStream('/path/to/large/file').pipe(
    bucket.openUploadStream('largeFile')
  ).on('error', function(error) {
    assert.ifError(error);
  }).on('finish', function() {
    console.log('File uploaded successfully.');
    client.close();
  });
});

This connects to the ‘testdb’ database, creates a GridFS bucket, and uses a Node.js stream to read a file from the system and pipe it into MongoDB under the provided file name.

Advanced Document Patterns

For complex applications, design patterns like the Outlier Pattern where you split out large fields into separate documents or the Extended Reference Pattern, combining bucketing and references, may be suitable. These patterns can maximize the efficiency of your data model and adhere to the size limits. When using multiple documents to represent what would be a larger document, you will employ references to link them:

db.extendedRef.insertMany([
  { _id: 'parentDoc', otherFields: '...'},
  { parentId: 'parentDoc', largeField: 'Content that necessitated extending into another document.' }
]);

You would use the ‘parentId’ field to join data when necessary.

Monitoring Document Growth

Continuous monitoring is crucial when working near MongoDB’s document limit. Writing scripts or employing monitoring tools can assist in this regard. The earlier you detect an approaching limit, the easier it is to refactor the structure or apply patterns.

Best Practices Summarized

  • Regularly check the size of your documents and approach data architecture with the 16MB limit in mind.
  • Consider schema design patterns that break up large documents when possible.
  • Use GridFS for files that naturally exceed 16MB.
  • Implement monitoring to proactively handle document growth.

Conclusion

The 16MB document size limit in MongoDB is a guideline that fosters good data modeling and promotes consistency in database performance. By understanding this limit and implementing strategies to deal with large documents, you can ensure that your MongoDB deployments remain scalable and efficient.