Python asyncio: How to download a large file and show progress (percentage)

Getting Started
Setting Up Your Download Function
Implementing Progress Tracking
Putting It All Together
Handling Larger Files and Rate Limiting
Conclusion

Getting Started

Before diving into the code, ensure you have Python 3.7 or higher installed on your machine, as asyncio and aiohttp leverage the latest async/await syntax introduced in Python 3.5. You’ll also need to install aiohttp library if it’s not already installed. You can do so using pip:

pip install aiohttp

Setting Up Your Download Function

First, let’s import the required libraries and define a simple asynchronous function for downloading files. Here’s a skeletal structure:

import asyncio
import aiohttp

async def download_file(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            with open(file_name, 'wb') as fd:
                while True:
                    chunk = await resp.content.read(1024)
                    if not chunk:
                        break
                    fd.write(chunk)

This function initializes an aiohttp session, requests the file, and then saves it in chunks. This is effective for large files, preventing memory overflow issues.

Implementing Progress Tracking

To show progress, we need to know the total file size. This can usually be fetched from the content-length header. Here’s how:

progress = 0
file_size = int(resp.headers['Content-Length'])

Then, update the progress within the file writing loop:

progress += len(chunk)
percentage = (progress / file_size) * 100
print(f"Download progress: {percentage:.2f}%")

Note the calculation of the download progress percentage and printing it out. This provides users with real-time feedback.

Putting It All Together

Let’s combine everything into a full download function with progress reporting:

import asyncio
import aiohttp

async def download_file(url, file_name):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            if resp.status == 200:
                file_size = int(resp.headers['Content-Length'])
                with open(file_name, 'wb') as fd:
                    progress = 0
                    while True:
                        chunk = await resp.content.read(1024)
                        if not chunk:
                            break
                        fd.write(chunk)
                        progress += len(chunk)
                        percentage = (progress / file_size) * 100
                        print(f"Download Progress: {percentage:.2f}%")
            else:
                print("Failed to download the file.")

This script will download a file, showing the download progress in the terminal. To run, save the code in a file (e.g., async_download.py) and execute it using:

python async_download.py

Handling Larger Files and Rate Limiting

For exceptionally large files, or to avoid hammering the server with requests, it’s wise to manage the speed of chunk downloading. This can be done easily by inserting an asyncio.sleep(x) within the loop, where x is the number of seconds to wait between chunks.

progress += len(chunk)
await asyncio.sleep(0.1)  # Download throttle for 100ms

This moderation helps in treating server resources respectfully while smoothly downloading large files.

Conclusion

Python’s asyncio and aiohttp offer an effective way to handle asynchronous file downloads, making your application more efficient and user-friendly by showing the download progress. They represent a killer combination for network-bound tasks. Remember that understanding how asynchronous operations work is key to taking full advantage of these features.

By grappling with these concepts and implementing the code from this tutorial, you’re well on your way to adding robust file download capabilities to your Python applications, with clear progress indication that keeps users informed throughout the process.

Next Article: Python: How to create your own asyncio TCP server (and test it using cURL)

Previous Article: Python asyncio: How to download a list of files in parallel

Series: Python Asynchronous Programming Tutorials

Python