Python asyncio: How to download a list of files sequentially

Updated: February 11, 2024 By: Guest Contributor Post a comment

In this tutorial, we’re going to delve into the realm of Python’s asyncio library, with a focus on downloading a list of files sequentially. Despite asyncio being commonly associated with concurrent operations, understanding how to control its behavior to perform sequential operations is crucial for a broad range of applications.

Getting Started

Before we dive into the specifics of downloading files, it’s important to understand some basics about the asyncio library. Introduced in Python 3.4, asyncio is a library to write concurrent code using the async/await syntax.

Concurrency does not inherently mean that operations are executed simultaneously. It’s about structuring your program in a way that allows for parallelism, or, in our case, controlled sequencing of tasks. Here’s how to set up a simple asyncio environment:

import asyncio

async def main():
    print('Hello')
    await asyncio.sleep(1)
    print('world')

asyncio.run(main())

Sequential File Download with AsyncIO

To download files sequentially using asyncio, we first need to integrate asyncio with network operations. We’ll be using aiohttp for async HTTP requests. You’ll have to install it via pip:

pip install aiohttp

Once installed, we can start by setting up our asynchronous environment specifically for downloading files:

import asyncio
import aiohttp

async def download_file(session, url):
    async with session.get(url) as response:
        filename = url.split('/')[-1]
        with open(filename, 'wb') as f:
            while True:
                chunk = await response.content.read(1024)
                if not chunk:
                    break
                f.write(chunk)
        print(f'Downloaded {filename}')

async def main(urls):
    async with aiohttp.ClientSession() as session:
        for url in urls:
            await download_file(session, url)

urls = ['http://example.com/file1.jpg', 'http://example.com/file2.jpg']
asyncio.run(main(urls))

In this example, the main function takes a list of URLs and iterates through them, passing each URL to the download_file function alongside the active aiohttp.ClientSession. This sequential behavior is facilitated by the await keyword before download_file, ensuring each download process completes before moving to the next URL in the list.

Understanding Asyncio’s Sequential Nature

It may seem counterintuitive to use an asynchronous library like asyncio for sequential processing. However, the key lies in how the await keyword is employed. await essentially yields control back to the event loop, allowing it to execute something else while waiting for an awaited operation to complete. In the case of sequential downloads, it simply waits because there’s nothing else to execute concurrently.

Handling Exceptions

When dealing with network operations, it’s important to handle potential exceptions, such as connection errors or timeouts. Here’s how you could update the download_file function to manage exceptions:

async def download_file(session, url):
    try:
        async with session.get(url) as response:
            # Previous code for downloading
    except aiohttp.ClientError as e:
        print(f'Failed to download {url}: {str(e)}')

With error handling in place, your downloader will be more robust and able to continue the sequential download process even if a particular file cannot be downloaded.

Conclusion

Through this tutorial, we’ve seen how to leverage Python’s asyncio library for sequential file downloads, a task that might initially seem paradoxical given asyncio’s concurrency-centric design. By carefully using the await keyword to manage how tasks are executed, we gain fine-grained control over the sequential flow of operations, enabling an efficient and effective download process. This approach demonstrates the versatility and power of asynchronous programming, even beyond the typical use cases of concurrent execution.