Sling Academy
Home/Python/Python & aiohttp: How to download files using streams

Python & aiohttp: How to download files using streams

Last updated: August 20, 2023

Overview

aiohttp is a modern library that provides asynchronous HTTP client and server functionality for Python. Streams are a way of handling data in chunks, without loading the whole file into memory at once. This can be useful for downloading large files or handling multiple requests concurrently.

In general, you can download files (especially large files of several hundred MB or more) with aiohttp streams by following the steps listed below:

  1. Create an aiohttp.ClientSession object, which represents a connection pool for making HTTP requests.
  2. Use the session.get() method to send a GET request to the file URL and get an aiohttp.ClientResponse object, which represents the response from the server.
  3. Use the response.content attribute to access an aiohttp.StreamReader object, which is a stream for reading the response body.
  4. Use the stream.read() or stream.readany() methods to read chunks of data from the stream and write them to a file object on your computer.
  5. Close the response and the session objects when you are done (this can be done automatically by using the async with statement).

Words might be confusing and hard to understand. Let’s examine the following example for more clarity.

Complete Example

What we’re going to do is download 2 files at the same time. One file is CSV, and the other is PDF. Here’s the URL of the CSV file:

https://api.slingacademy.com/v1/sample-data/files/student-scores.csv

And this is the URL of the PDF file:

https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf

Before writing code, make sure you don’t forget to install aiohttp:

pip install aiohttp

The complete code (with explanations):

# SlingAcademy.com
# This code uses Python 3.11.4

import asyncio
import aiohttp

# This function downloads a file from a URL and saves it to a local file
# The function is asynchronous and can handle large files because it uses aiohttp streams
async def download_file(url, filename):
    async with aiohttp.ClientSession() as session:
        print(f"Starting download file from {url}")
        async with session.get(url) as response:
            assert response.status == 200
            with open(filename, "wb") as f:
                while True:
                    chunk = await response.content.readany()
                    if not chunk:
                        break
                    f.write(chunk)
            print(f"Downloaded {filename} from {url}")


# This function downloads two files at the same time
async def main():
    await asyncio.gather(
        # download a CSV file
        download_file(
            "https://api.slingacademy.com/v1/sample-data/files/student-scores.csv",
            "test.csv",
        ),

        # download a PDF file
        download_file(
            "https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf",
            "test.pdf",
        ),
    )

# Run the main function
asyncio.run(main())

After running the code, you’ll see this output:

Starting download file from https://api.slingacademy.com/v1/sample-data/files/student-scores.csv
Starting download file from https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf
Downloaded test.pdf from https://api.slingacademy.com/v1/sample-data/files/text-and-table.pdf
Downloaded test.csv from https://api.slingacademy.com/v1/sample-data/files/student-scores.csv

And the downloaded files will be saved in the same directory as your Python script, as shown in the screenshot below:

Their names are test.csv and test.pdf.

Next Article: Python Requests module: Setting custom user agent

Previous Article: Python Requests module: Is it possible to use async/await?

Series: Python: Network & JSON tutorials

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots