Python: Calculating total size of a folder and its contents

Updated: January 13, 2024 By: Guest Contributor Post a comment

Introduction

Calculating the total size of a folder and its contents is a common task in many computing scenarios. Whether you’re monitoring disk usage, performing housekeeping on your file system, or simply curious about the amount of data a directory contains, Python offers efficient and easy-to-understand methods to get the job done. In this tutorial, we will explore how to calculate the size of a directory using Python’s built-in libraries.

The os module

Python’s built-in os module is the cornerstone of our folder size calculation operation. It provides a portable way of using operating system-dependent functionality such as reading, writing, and managing files and directories.

To calculate the size of a directory, we need to:

  1. Iterate over each file in the directory.
  2. Get the size of each file.
  3. Add up the sizes for a total sum.

Walking the directory tree

To begin, we must navigate the directory tree. This can be accomplished using os.walk(). Here’s what each part of the walk provides:

  • dirpath – the path to the directory
  • dirnames – a list of subdirectory names in dirpath
  • filenames – a list of the names of the non-directory files in dirpath

Example:

import os

folder_path = '/path/to/your/folder'
for dirpath, dirnames, filenames in os.walk(folder_path):
    print(f'Found directory: {dirpath}')
    for file in filenames:
        print(file)

The above code will print out all directories and files within the specified folder path.

Calculating the size of files

The next step is to obtain the size of each file. We can do this with os.path.getsize(), which returns the size in bytes of a given file path.

Example:

import os

folder_path = '/path/to/your/folder'
total_size = 0

for dirpath, dirnames, filenames in os.walk(folder_path):
    for file in filenames:
        file_path = os.path.join(dirpath, file)
        total_size += os.path.getsize(file_path)

print(f'Total size of the folder {folder_path} is: {total_size} bytes')

This code snippet will calculate the complete size of the given folder and print the total size in bytes.

Handling Exceptions

When working with files and folders, it’s possible to encounter errors such as FileNotFoundError or PermissionError. It’s good practice to handle these exceptions so that our script doesn’t crash during execution.

Example code with exception handling:

import os

folder_path = '/path/to/your/folder'
total_size = 0

try:
    for dirpath, dirnames, filenames in os.walk(folder_path):
        for file in filenames:
            file_path = os.path.join(dirpath, file)
            if not os.path.islink(file_path):  # Skip symbolic links
                total_size += os.path.getsize(file_path)
except Exception as e:
    print(f'An error occurred: {e}')
else:
    print(f'Total size of the folder {folder_path} is: {total_size} bytes')

This additional if statement checks whether the file path is a symbolic link because calling getsize() on a symbolic link would return the size of the link rather than the actual file.

Formatting the Output

While the size in bytes is exact, it isn’t always the most user-friendly way to understand size. To make our output more readable, let’s format the size to a more conventional measurement like kilobytes (KB), megabytes (MB), or gigabytes (GB).

Example code with formatted output:

def format_size(bytes, unit='MB'):
    """Format bytes to the selected unit."""
    units = {
        'KB': 1024,
        'MB': 1024**2,
        'GB': 1024**3,
        'TB': 1024**4
    }
    if unit not in units:
        raise ValueError('Unsupported unit. Available units: KB, MB, GB, TB')

    formatted_size = bytes / units[unit]
    return f'{formatted_size:.2f} {unit}'

import os

folder_path = '/path/to/your/folder'
total_size = 0

for dirpath, dirnames, filenames in os.walk(folder_path):
    for file in filenames:
        file_path = os.path.join(dirpath, file)
        if not os.path.islink(file_path):
            total_size += os.path.getsize(file_path)

formatted_total_size = format_size(total_size, 'MB')
print(f'Total size of the folder {folder_path} is: {formatted_total_size}')

We created a function, format_size, that will take in the number of bytes and output the formatted size. The final print statement now outputs the total size in a friendly format.

Conclusion

In this guide, we’ve learned how to use Python’s os module to walk through a directory tree and calculate the total size of its contents. We covered handling exceptions to ensure stable script execution and formatting the output to different size units for better readability. Whether you’re a system administrator, developer, or just someone interested in file system management, you now have a valuable tool to assess folder sizes easily with Python.

Remember to frequently adjust the scripting technique to accommodate the various file system structures you encounter, ensuring robustness and efficiency in your Python scripts.