Overview
In this tutorial, we’ll explore how to use the Python requests
module to download files from the Internet. We’ll cover everything from basic file downloads to handling large files and error checking.
Getting Started
Before we can start downloading files, we need to install the requests
module. If you haven’t already, you can install it using pip:
pip install requests
Now that we have requests
installed, let’s start with a basic example.
Simple File Download
import requests
url = 'http://example.com/some_file.pdf'
r = requests.get(url)
with open('some_file.pdf', 'wb') as f:
f.write(r.content)
This code downloads a PDF file and saves it to the local file system. The ‘wb’ in the open
function stands for ‘write binary’, which is necessary for non-text files like PDFs.
Streaming Large Files
To download large files without consuming too much memory, you can stream the file and write it in chunks:
import requests
url = 'http://example.com/big_file.zip'
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open('big_file.zip', 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
By setting stream=True
, you tell requests
not to download the whole file into memory before saving it. The iter_content
method lets us write the file in small chunks.
Error Handling
When downloading files, it’s crucial to handle errors gracefully. Here’s how you can do that:
import requests
from requests.exceptions import HTTPError
url = 'http://example.com/some_file.pdf'
try:
r = requests.get(url)
r.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')
except Exception as err:
print(f'An error occurred: {err}')
This script will print an error message if something goes wrong, instead of crashing your application.
Downloading Files with Progress Indicator
For a better user experience, especially when dealing with large file downloads, you can include a progress indicator:
import requests
from tqdm import tqdm
url = 'http://example.com/large_file.avi'
response = requests.get(url, stream=True)
total = int(response.headers.get('content-length', 0))
with open('large_file.avi', 'wb') as file, tqdm(
desc='large_file.avi',
total=total,
unit='iB',
unit_scale=True,
unit_divisor=1024,
) as bar:
for data in response.iter_content(chunk_size=1024):
size = file.write(data)
bar.update(size)
In this code, we use tqdm
to create a progress bar. Adjust the chunk_size
if you want larger or smaller updates.
Handling Redirects
Sometimes the URL you request redirects to another URL. By default, the requests
module will follow redirects, but you can alter this behavior:
response = requests.get(url, allow_redirects=True)
If you want to handle redirects manually, setting allow_redirects=False
will make requests
stop at the first response.
Session Objects for Efficiency
When downloading multiple files from the same website, it might be more efficient to use a session object, which reuses the underlying TCP connection:
import requests
urls = ['http://example.com/file1.pdf', 'http://example.com/file2.pdf']
with requests.Session() as session:
for url in urls:
with session.get(url) as response:
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(response.content)
This approach is generally faster and more efficient than creating a new connection for each file.
Setting Custom Headers
In some cases, you may need to send custom headers with your request:
import requests
url = 'http://example.com/some_file.pdf'
headers = {
'User-Agent': 'MyApp/1.0',
'Authorization': 'Bearer '
}
response = requests.get(url, headers=headers)
Custom headers are used for various purposes, such as authentication or simulating a particular browser.
Handling Cookies
Cookies are often required when you need to maintain a state or session with the server:
import requests
url = 'http://example.com/file'
cookies = {'session_token': '123456789'}
response = requests.get(url, cookies=cookies)
The cookies
parameter allows you to send cookies with your HTTP request.
Conclusion
This tutorial covered how to download files in Python using the requests
module, including basic file downloads, streaming large files, error handling, and additional features like progress indicators and sessions. You now have a solid foundation to incorporate file downloads into your Python programs with ease and efficiency.