Introduction
Understanding how to handle cookies is pivotal for web scraping and automation tasks. This guide will cover the basics to more advanced usage scenarios of handling cookies using Python’s requests module.
Setting Basic Cookies
First, let’s look at setting cookies on a simple GET request. The requests module allows us to send HTTP/1.1 requests using Python. We’ll also handle a session object which maintains certain parameters across requests.
import requests
url = 'http://example.com'
cookies = {'sample_cookie': 'cookie_value'}
response = requests.get(url, cookies=cookies)
print(response.cookies)
This code sets a single cookie named ‘sample_cookie’ with a value of ‘cookie_value’ and prints out the cookies sent back from the server.
Using Session Objects
Session objects provide a way to persist certain parameters across requests.
with requests.Session() as session:
session.cookies.set('session_cookie', 'session_value', domain='example.com', path='/')
response = session.get('http://example.com')
print(session.cookies)
In the example above, a session cookie is set explicitly for ‘example.com’ and printed out after making a GET request to the server.
Extracting and Setting Cookies from Responses
You can also extract cookies from a server’s response and use them for subsequent requests. The requests module automatically handles this for sessions.
with requests.Session() as session:
first_response = session.get('http://example.com/login')
for c in first_response.cookies:
session.cookies.set(c.name, c.value)
second_response = session.get('http://example.com/dashboard')
print(second_response.cookies)
This will extract cookies received after a login request and use them to access another page in the same session.
Handling Cookie Domains and Paths
When manually setting cookies, it’s important to consider the domain and path parameters, as they determine when the cookie should be sent to the server.
import requests
from http.cookies import SimpleCookie
rawdata = 'PHPSESSID=q2t7ib3folu7bdujc6ui1qe016; path=/; domain=.example.com'
cookie = SimpleCookie()
cookie.load(rawdata)
session = requests.Session()
for key, morsel in cookie.items():
session.cookies.set(morsel.key, morsel.value, domain=morsel['domain'], path=morsel['path'])
response = session.get('http://example.com')
print(response.cookies)
This code snippet uses the http.cookies
module to parse a raw cookie string, then it sets these cookies into a session.
Advanced Usage: Custom CookieJar
For fine-tuned control over cookies, perhaps across different domains, using a custom CookieJar
can be useful.
from requests import Request, Session
from requests.cookies import RequestsCookieJar
jar = RequestsCookieJar()
jar.set('cookie_name', 'cookie_value', domain='example.com', path='/dashboard')
s = Session()
req = Request('GET', 'http://example.com/dashboard', cookies=jar)
prepped = s.prepare_request(req)
response = s.send(prepped)
print(response.text)
Here, a custom RequestsCookieJar
is populated with a cookie and then used in a prepared request within a session, giving you more control over which cookies are included in the request.
Secure and HttpOnly Cookies
For security-focused applications, it’s crucial to properly handle Secure and HttpOnly cookies. Although requests can’t set these annotations, it’s respectful of them when receiving from a server.
import requests
url = 'https://secure.example.com'
response = requests.get(url)
for cookie in response.cookies:
if cookie.secure:
print('Secure cookie:', cookie.name)
if cookie.has_nonstandard_attr('HttpOnly'):
print('HttpOnly cookie:', cookie.name)
This code will iterate over cookies from a secured server response and identify the Secure and HttpOnly cookies.
Conclusion
This guide provided a comprehensive dive into setting cookies while making HTTP requests using Python’s requests module. We touched upon the basics, session objects, advanced custom CookieJars, and finished up by considering security aspects of handling cookies. This knowledge lays the groundwork for tackling a wide range of web interaction tasks efficiently and securely.