Python requests module: How to download files from URLs

Overview
Getting Started
Simple File Download
Streaming Large Files
Error Handling
Downloading Files with Progress Indicator
Handling Redirects
Session Objects for Efficiency
Setting Custom Headers
Handling Cookies
Conclusion

Overview

In this tutorial, we’ll explore how to use the Python requests module to download files from the Internet. We’ll cover everything from basic file downloads to handling large files and error checking.

Getting Started

Before we can start downloading files, we need to install the requests module. If you haven’t already, you can install it using pip:

pip install requests

Now that we have requests installed, let’s start with a basic example.

Simple File Download

import requests

url = 'http://example.com/some_file.pdf'
r = requests.get(url)

with open('some_file.pdf', 'wb') as f:
    f.write(r.content)

This code downloads a PDF file and saves it to the local file system. The ‘wb’ in the open function stands for ‘write binary’, which is necessary for non-text files like PDFs.

Streaming Large Files

To download large files without consuming too much memory, you can stream the file and write it in chunks:

import requests

url = 'http://example.com/big_file.zip'

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open('big_file.zip', 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

By setting stream=True, you tell requests not to download the whole file into memory before saving it. The iter_content method lets us write the file in small chunks.

Error Handling

When downloading files, it’s crucial to handle errors gracefully. Here’s how you can do that:

import requests
from requests.exceptions import HTTPError

url = 'http://example.com/some_file.pdf'
try:
    r = requests.get(url)
    r.raise_for_status()
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')
except Exception as err:
    print(f'An error occurred: {err}')

This script will print an error message if something goes wrong, instead of crashing your application.

Downloading Files with Progress Indicator

For a better user experience, especially when dealing with large file downloads, you can include a progress indicator:

import requests
from tqdm import tqdm

url = 'http://example.com/large_file.avi'
response = requests.get(url, stream=True)
total = int(response.headers.get('content-length', 0))

with open('large_file.avi', 'wb') as file, tqdm(
        desc='large_file.avi',
        total=total,
        unit='iB',
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
    for data in response.iter_content(chunk_size=1024):
        size = file.write(data)
        bar.update(size)

In this code, we use tqdm to create a progress bar. Adjust the chunk_size if you want larger or smaller updates.

Handling Redirects

Sometimes the URL you request redirects to another URL. By default, the requests module will follow redirects, but you can alter this behavior:

response = requests.get(url, allow_redirects=True)

If you want to handle redirects manually, setting allow_redirects=False will make requests stop at the first response.

Session Objects for Efficiency

When downloading multiple files from the same website, it might be more efficient to use a session object, which reuses the underlying TCP connection:

import requests

urls = ['http://example.com/file1.pdf', 'http://example.com/file2.pdf']

with requests.Session() as session:
    for url in urls:
        with session.get(url) as response:
            filename = url.split('/')[-1]
            with open(filename, 'wb') as f:
                f.write(response.content)

This approach is generally faster and more efficient than creating a new connection for each file.

Setting Custom Headers

In some cases, you may need to send custom headers with your request:

import requests

url = 'http://example.com/some_file.pdf'
headers = {
    'User-Agent': 'MyApp/1.0',
    'Authorization': 'Bearer '
}

response = requests.get(url, headers=headers)

Custom headers are used for various purposes, such as authentication or simulating a particular browser.

Handling Cookies

Cookies are often required when you need to maintain a state or session with the server:

import requests

url = 'http://example.com/file'
cookies = {'session_token': '123456789'}

response = requests.get(url, cookies=cookies)

The cookies parameter allows you to send cookies with your HTTP request.

Conclusion

This tutorial covered how to download files in Python using the requests module, including basic file downloads, streaming large files, error handling, and additional features like progress indicators and sessions. You now have a solid foundation to incorporate file downloads into your Python programs with ease and efficiency.

Next Article: Python requests module: How to download a large file smoothly

Previous Article: Python requests module: How to upload files (form-data)

Series: Python: Network & JSON tutorials

Python