How to view the disk space used by MongoDB

Updated: February 1, 2024 By: Guest Contributor Post a comment

Introduction

Understanding the amount of disk space utilized by MongoDB is crucial for maintaining the performance and scalability of applications. Whether you’re a system administrator or a developer, knowing how to check and manage disk usage can help you make informed decisions when it comes to resource allocation, monitoring, and optimization. This tutorial will guide you through various approaches to view and analyze MongoDB’s disk space consumption.

Basic MongoDB Disk Usage Information

Let’s start with the simplest methods to check disk usage for MongoDB.

Using the ‘db.stats()’ Method

// Connect to your MongoDB shell and select the database.
use myDatabase

// Retrieve the statistics for the current database.
db.stats()
{
  "db" : "myDatabase",
  "collections" : 3,
  "views" : 0,
  ...,
  "fileSize" : NumberLong(207775744),
  "storageSize" : NumberLong(204472320),
  ...,
  "dataSize" : NumberLong(27876710),
  ...
}

Here, fileSize provides the total size of the database file on disk, storageSize refers to the amount of space currently used to store data for all collections (excluding indexes), and dataSize is the space used by document data only.

Keep in mind that ‘db.stats()’ provides storage information in bytes.

Checking Collection-level Statistics

// Connect to your MongoDB shell and select the database.
use myDatabase

// Retrieve stats for a particular collection.
db.myCollection.stats()
{
  "ns" : "myDatabase.myCollection",
  ...
  "size" : 16384, // Size of the collection’s data in bytes
  "count" : 4,    // Number of documents in the collection
  "storageSize" : 32768, // Allocated storage space in bytes
  ...
}

Keep an eye on storageSize for an estimate of the physical space taken by collection on disk, which includes preallocated space and padding.

Advanced Disk Usage Commands

Moving to more advanced operations, you can utilize additional shell methods and tools to monitor the disk space in MongoDB.

Aggregating Data Size Across Databases

db.adminCommand({ listDatabases: 1 }).databases.forEach(function(database) {
  var dbStats = db.getSiblingDB(database.name).stats();
  print(database.name + ': ' + tojson(dbStats.dataSize));
});
admin: 24576
test: 8192
myDatabase: 27876710
...

This script will display data size for each database within a MongoDB instance, which can be particularly useful for analyzing usage on a more macro scale.

Using File System Tools

In addition to MongoDB methods, file system tools such as du (Disk Usage) can be employed to measure the size of MongoDB’s storage directory directly in the file system:

du -sh /var/lib/mongodb
2.3G    /var/lib/mongodb

The command provides an aggregate size of the complete store, which if broken down with the -a flag, can give insights on individual files:

du -ah /var/lib/mongodb
...

MongoDB’s WiredTiger Storage Engine

WiredTiger, MongoDB’s default storage engine since version 3.2, offers additional considerations and metrics for disk usage analysis:

Viewing WiredTiger Metrics

db.serverStatus().wiredTiger.cache
{
  "bytes currently in the cache": 536870912,
  "maximum bytes configured": 1073741824,
  ...
  "tracked dirty bytes in the cache": 358400,
  ...
}

This output gives a snapshot of your database cache size which can influence performance directly.

Analyzing Storage Efficiency

Assessing storage efficiency within WiredTiger involves reviewing compression ratios and other statistics:

db.collection.stats().wiredTiger
{
  "block-manager": {
    "file allocation unit size": 4096,
    "blocks allocated": 256,
    ...
  },
  ...
  "compression": {
    "compressed pages read": 3466,
    "compressed pages written": 19923,
    "page written failed to compress": 0,
    "page read failed to decompress": 0,
    "compressed bytes written": 16450,
    "uncompressed bytes read": 8210,
    ...
  },
  ...
}

Monitoring compression metrics can help determine how effectively data is being compressed, while block-manager statistics can provide insight into how disk space is being managed at a lower level.

Tuning Storage Settings

Once disk usage and statistics are understood, MongoDB provides ways to tune how data is stored and managed, potentially leading to more efficient disk usage:

Adjusting the WiredTiger Cache

You may decide to adjust your WiredTiger cache settings based on the reported usage:

// Modify the cache size to be 1 GB upon restart.
db.adminCommand({ "setParameter": 1, "wiredTigerEngineRuntimeConfig": "cache_size=1G" });

Configure these settings with caution, and always monitor the impact of such changes.

Sharding and Disk Usage

For highly scalable MongoDB deployments, sharding partitions large datasets across multiple machines, influencing disk space requirements:

db.printShardingStatus()
{
  "shardedDBs" : {
    "myShardedDB" : {
        "collections": 2,
        "shards": {
            "shard0000" : {"size" : "912MiB"},
            "shard0001" : {"size" : "890MiB"}
        },
        ...
    },
    ...
  }
}

This offers a high-level view of data distribution across shards and the corresponding disk usage for each shard.

Conclusion

In conclusion, monitoring disk space usage in MongoDB is essential for maintaining optimal performance and scalability. Using MongoDB shell commands and file system tools can provide comprehensive insights into database and collection storage statistics. As your database grows, routinely perform disk space analysis to prevent unexpectedly reaching storage capacity limits.