Understanding MongoDB capped collections (fixed-size collections)

Updated: February 2, 2024 By: Guest Contributor Post a comment

Introduction

MongoDB, a popular NoSQL database, provides a powerful feature called capped collections. These are special types of collections that maintain insertion order and have a fixed size. Once the space is filled, older documents are overwritten by new ones. This behavior makes capped collections ideal for scenarios such as logging or caching where you only need to store the most recent entries. In this article, we’ll explore capped collections in MongoDB and learn how to work with them effectively through several code examples.

Creating Capped Collections

To start, you’ll need to have MongoDB installed and running on your local machine or server. Open your MongoDB client, and let’s create a new capped collection using the createCollection method with the capped option set to true:

db.createCollection("logs", { capped: true, size: 1024*1024 })

This example creates a capped collection named ‘logs’ with a maximum size of 1MB. Any documents inserted into this collection will follow the first-in first-out (FIFO) order of insertion — meaning older documents are purged to make room for new ones once the size limit is reached.

Inserting Documents

Now that we have our capped collection, inserting documents is similar to how we would insert into any other collection:

db.logs.insertOne({
  level: "info",
  message: "This is an info log.",
  timestamp: new Date()
});

Here, we add a single document containing a log level, a message, and a timestamp. To insert multiple documents, you can use insertMany.

Querying Capped Collections

Querying documents in a capped collection doesn’t differ from querying normal collections. However, documents in capped collections are returned in the order of insertion:

db.logs.find()

Your output will show all logs in the order they were entered, displaying them with the newest log last by default.

Watching Real-time Changes

Capped collections are conducive to real-time operations because they maintain natural insertion order. MongoDB provides the watch method to monitor real-time changes:

const changeStream = db.logs.watch();
changeStream.on('change', next => {
    // Handle the new change
    printjson(next);
});

This snippet sets up a change stream that will notify you of any insertions, modifications, or deletions in the capped collection logs. In applications such as dashboards or live feeds, this functionality is invaluable.

Auto-Expire Documents Based on Time

Besides a fixed-size constraint, you can also configure documents in capped collections to auto-expire after a set amount of time using a TTL (Time To Live) index:

db.logs.createIndex(
  { "timestamp": 1 },
  { expireAfterSeconds: 3600 }
);

This will create an index on the timestamp field and automatically remove documents that are older than 1 hour. Note that TTL indexes can only be set on time fields.

Converting Collections to Capped

If you have an existing collection that you would like to convert to a capped collection, MongoDB provides the convertToCapped command:

db.runCommand({
    convertToCapped: 'myCollection',
    size: 100000
})

This command will convert an existing collection named myCollection into a capped collection with a specified size. Be careful with this command; converting to capped can lead to data loss since documents will be removed if the collection’s size exceeds the capped size limit.

Tailing Cursors

For more advanced use-cases, you can create a tailable cursor that allows you to tail the contents of the capped collection, similar to the Unix tail -f:

const cursor = db.logs.find().addOption(DBQuery.Option.tailable);
while (true) {
    if (!cursor.hasNext()) {
        sleep(1000);
    } else {
        printjson(cursor.next());
    }
}

Tailable cursors never become invalid and can be used to continuously watch for new documents in a capped collection, which is useful for logging or real-time analysis systems.

Managing Capped Collections

Capped collections are not without their limitations. For example, you cannot delete individual documents from a capped collection. If required, you can perform a collection-level operation to remove all documents:

db.logs.drop()

This command completely removes the capped collection. After that, you would have to recreate it if needed.

Best Practices

  • Perform write operations with the acknowledgment to ensure data integrity since capped collections do not support document-level locking.
  • Due to their nature, capped collections are perfect for write-intensive applications that require high-performance inserts.
  • Monitor the size of your capped collections to ensure they are tuned correctly for the use case.

Advanced Considerations

The straightforward nature of capped collections in MongoDB belies their powerful use cases. For instance, developers often use capped collections in conjunction with compound indexes to build high-performance, real-time analytics systems. Furthermore, when it comes to data replication and sharding, understanding how capped collections interact with these processes is crucial to maintaining database performance.

Conclusion

Capped collections offer a versatile and efficient means to manage fixed-size and high-throughput data in MongoDB. With careful planning and understanding of their behaviors, developers can leverage capped collections to craft superior applications with built-in data eviction policies for logging, caching, and real-time streaming analytics.