Python: How to iterate over all files in a directory

Updated: January 13, 2024 By: Guest Contributor Post a comment

Overview

Working with filesystems is a common task for many Python developers. Whether you’re building a script to automate some tasks, creating a server application, or processing a batch of files, you may need to access and read through the contents of directories. Python, with its rich library set, simplifies these tasks allowing developers to easily iterate over all files in a directory.

In this tutorial, we’ll explore various methods to iterate over all files in a directory in Python. We’ll cover the use of the os module, the more modern pathlib module added in Python 3.4, and some third-party libraries that further simplify the task.

Prerequisites

  • Python, preferably the latest version. (As of my last update, Python 3.10 is commonly in use)
  • Basic knowledge of Python syntax and the file system
  • A directory with some files to practice with

Using the os Module

Let’s begin with the standard os library which provides a way to perform directory and file operations. The os.listdir() and os.walk() functions are most commonly used for iterating through files in a directory.

Using os.listdir()

The os.listdir() function returns a list of names of entries in the directory given by path:

import os

directory = '/path/to/directory'

for filename in os.listdir(directory):
    if filename.endswith('.txt'):
        # Do something with the file
        print(os.path.join(directory, filename))

Using os.walk()

The os.walk() is a more powerful tool for directory traversal. It generates the file names in a directory tree by walking the tree either top-down or bottom-up:

import os

for subdir, dirs, files in os.walk('/path/to/directory'):
    for file in files:
        # Do something with the file
        filepath = os.path.join(subdir, file)
        print(filepath)

Using the pathlib Module

In newer versions of Python, the pathlib module is the recommended way to work with files and directories. Below is a simple example of iteration with pathlib.Path():

from pathlib import Path

dir_path = Path('/path/to/directory')

for file_path in dir_path.iterdir():
    if file_path.is_file() and file_path.suffix == '.txt':
        print(file_path)

Using glob Module with pathlib

When paired with glob patterns, pathlib becomes even more powerful:

for file_path in dir_path.glob('*.txt'):
    print(file_path)

Using the os.scandir() and with Statement

Python 3.5 introduced os.scandir() which returns an iterator instead of a list. It’s more efficient when you’re working with large directories:

import os

with os.scandir('/path/to/directory') as entries:
    for entry in entries:
        if entry.is_file() and entry.name.endswith('.txt'):
            print(entry.path)

Error Handling

When iterating through directories, you may encounter permissions errors or broken links. It’s important to handle these exceptions:

import os

try:
    with os.scandir('/path/to/directory') as entries:
        for entry in entries:
            if entry.is_file() and entry.name.endswith('.txt'):
                print(entry.path)
except PermissionError as e:
    print(f'Permission denied: {e}')

Advanced Directory Traversal With Third-Party Libraries

While the standard library provides decent capabilities for directory traversal, there are several third-party libraries such as scandir and glob2. These deliver improved functionality or simpler syntax for complex tasks.

Conclusion

In conclusion, Python provides several methods to iterate over all the files in a directory. Your choice will depend on your exact requirements – for simpler tasks, os.listdir() may be adequate, whereas for walking a directory tree, os.walk() or pathlib.Path() along with glob patterns gives you a powerful toolset. Remember to handle any potential errors in order to make your scripts robust and reliable.

No matter which method you choose, you’ll be able to build efficient scripts that can harness the capabilities of Python’s file handling to accomplish a wide array of tasks. The code samples provided here offer a jumping-off point to get started with iterating files in a directory. With this knowledge in hand, you can confidently handle file systems in your next Python project.