Python & aiohttp: How to download files using streams

Overview

aiohttp is a modern library that provides asynchronous HTTP client and server functionality for Python. Streams are a way of handling data in chunks, without loading the whole file into memory at once. This can be useful for downloading large files or handling multiple requests concurrently.

In general, you can download files (especially large files of several hundred MB or more) with aiohttp streams by following the steps listed below:

Create an aiohttp.ClientSession object, which represents a connection pool for making HTTP requests.
Use the session.get() method to send a GET request to the file URL and get an aiohttp.ClientResponse object, which represents the response from the server.
Use the response.content attribute to access an aiohttp.StreamReader object, which is a stream for reading the response body.
Use the stream.read() or stream.readany() methods to read chunks of data from the stream and write them to a file object on your computer.
Close the response and the session objects when you are done (this can be done automatically by using the async with statement).

Words might be confusing and hard to understand. Let’s examine the following example for more clarity.

Complete Example

What we’re going to do is download 2 files at the same time. One file is CSV, and the other is PDF. Here’s the URL of the CSV file:

https://api.slingacademy.com/v1/sample-data/files/student-scores.csv

And this is the URL of the PDF file:

https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf

Before writing code, make sure you don’t forget to install aiohttp:

pip install aiohttp

The complete code (with explanations):

# SlingAcademy.com
# This code uses Python 3.11.4

import asyncio
import aiohttp

# This function downloads a file from a URL and saves it to a local file
# The function is asynchronous and can handle large files because it uses aiohttp streams
async def download_file(url, filename):
    async with aiohttp.ClientSession() as session:
        print(f"Starting download file from {url}")
        async with session.get(url) as response:
            assert response.status == 200
            with open(filename, "wb") as f:
                while True:
                    chunk = await response.content.readany()
                    if not chunk:
                        break
                    f.write(chunk)
            print(f"Downloaded {filename} from {url}")


# This function downloads two files at the same time
async def main():
    await asyncio.gather(
        # download a CSV file
        download_file(
            "https://api.slingacademy.com/v1/sample-data/files/student-scores.csv",
            "test.csv",
        ),

        # download a PDF file
        download_file(
            "https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf",
            "test.pdf",
        ),
    )

# Run the main function
asyncio.run(main())

After running the code, you’ll see this output:

Starting download file from https://api.slingacademy.com/v1/sample-data/files/student-scores.csv
Starting download file from https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf
Downloaded test.pdf from https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf
Downloaded test.csv from https://api.slingacademy.com/v1/sample-data/files/student-scores.csv

And the downloaded files will be saved in the same directory as your Python script, as shown in the screenshot below: