Sling Academy
Home/Python/Python requests module: How to download files from URLs

Python requests module: How to download files from URLs

Last updated: January 02, 2024

Overview

In this tutorial, we’ll explore how to use the Python requests module to download files from the Internet. We’ll cover everything from basic file downloads to handling large files and error checking.

Getting Started

Before we can start downloading files, we need to install the requests module. If you haven’t already, you can install it using pip:

pip install requests

Now that we have requests installed, let’s start with a basic example.

Simple File Download

import requests

url = 'http://example.com/some_file.pdf'
r = requests.get(url)

with open('some_file.pdf', 'wb') as f:
    f.write(r.content)

This code downloads a PDF file and saves it to the local file system. The ‘wb’ in the open function stands for ‘write binary’, which is necessary for non-text files like PDFs.

Streaming Large Files

To download large files without consuming too much memory, you can stream the file and write it in chunks:

import requests

url = 'http://example.com/big_file.zip'

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open('big_file.zip', 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

By setting stream=True, you tell requests not to download the whole file into memory before saving it. The iter_content method lets us write the file in small chunks.

Error Handling

When downloading files, it’s crucial to handle errors gracefully. Here’s how you can do that:

import requests
from requests.exceptions import HTTPError

url = 'http://example.com/some_file.pdf'
try:
    r = requests.get(url)
    r.raise_for_status()
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')
except Exception as err:
    print(f'An error occurred: {err}')

This script will print an error message if something goes wrong, instead of crashing your application.

Downloading Files with Progress Indicator

For a better user experience, especially when dealing with large file downloads, you can include a progress indicator:

import requests
from tqdm import tqdm

url = 'http://example.com/large_file.avi'
response = requests.get(url, stream=True)
total = int(response.headers.get('content-length', 0))

with open('large_file.avi', 'wb') as file, tqdm(
        desc='large_file.avi',
        total=total,
        unit='iB',
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
    for data in response.iter_content(chunk_size=1024):
        size = file.write(data)
        bar.update(size)

In this code, we use tqdm to create a progress bar. Adjust the chunk_size if you want larger or smaller updates.

Handling Redirects

Sometimes the URL you request redirects to another URL. By default, the requests module will follow redirects, but you can alter this behavior:

response = requests.get(url, allow_redirects=True)

If you want to handle redirects manually, setting allow_redirects=False will make requests stop at the first response.

Session Objects for Efficiency

When downloading multiple files from the same website, it might be more efficient to use a session object, which reuses the underlying TCP connection:

import requests

urls = ['http://example.com/file1.pdf', 'http://example.com/file2.pdf']

with requests.Session() as session:
    for url in urls:
        with session.get(url) as response:
            filename = url.split('/')[-1]
            with open(filename, 'wb') as f:
                f.write(response.content)

This approach is generally faster and more efficient than creating a new connection for each file.

Setting Custom Headers

In some cases, you may need to send custom headers with your request:

import requests

url = 'http://example.com/some_file.pdf'
headers = {
    'User-Agent': 'MyApp/1.0',
    'Authorization': 'Bearer '
}

response = requests.get(url, headers=headers)

Custom headers are used for various purposes, such as authentication or simulating a particular browser.

Handling Cookies

Cookies are often required when you need to maintain a state or session with the server:

import requests

url = 'http://example.com/file'
cookies = {'session_token': '123456789'}

response = requests.get(url, cookies=cookies)

The cookies parameter allows you to send cookies with your HTTP request.

Conclusion

This tutorial covered how to download files in Python using the requests module, including basic file downloads, streaming large files, error handling, and additional features like progress indicators and sessions. You now have a solid foundation to incorporate file downloads into your Python programs with ease and efficiency.

Next Article: Python requests module: How to download a large file smoothly

Previous Article: Python requests module: How to upload files (form-data)

Series: Python: Network & JSON tutorials

Python

You May Also Like

  • Python Warning: Secure coding is not enabled for restorable state
  • Python TypeError: write() argument must be str, not bytes
  • 4 ways to install Python modules on Windows without admin rights
  • Python TypeError: object of type ‘NoneType’ has no len()
  • Python: How to access command-line arguments (3 approaches)
  • Understanding ‘Never’ type in Python 3.11+ (5 examples)
  • Python: 3 Ways to Retrieve City/Country from IP Address
  • Using Type Aliases in Python: A Practical Guide (with Examples)
  • Python: Defining distinct types using NewType class
  • Using Optional Type in Python (explained with examples)
  • Python: How to Override Methods in Classes
  • Python: Define Generic Types for Lists of Nested Dictionaries
  • Python: Defining type for a list that can contain both numbers and strings
  • Using TypeGuard in Python (Python 3.10+)
  • Python: Using ‘NoReturn’ type with functions
  • Type Casting in Python: The Ultimate Guide (with Examples)
  • Python: Using type hints with class methods and properties
  • Python: Typing a function with default parameters
  • Python: Typing a function that can return multiple types