How to Block Requests Sent by Python ‘requests’ Module?

Introduction
Identifying Requests
Blocking by User-Agent
Rate Limiting
Advanced Techniques: Behavioral Analysis
IP-based Blocking
Using a CAPTCHA
Conclusion

Introduction

Python’s requests module is a powerful tool for making HTTP and HTTPS requests from Python. However, server owners might need to block these requests to prevent scraping, bot activity, or maintain server health. In this tutorial, we’ll explore ways to detect and block requests made by this popular module.

Identifying Requests

First off, we should understand how to identify requests coming from the requests module. Typically, this can be done by checking the User-Agent header that accompanies HTTP requests. The default User-Agent for requests is something like \’python-requests/x.y.z\’, where x.y.z is the installed version.

def is_python_requests(request):
    user_agent = request.headers.get('User-Agent')
    if 'python-requests' in user_agent:
        return True
    return False

Blocking by User-Agent

Once you’ve identified the requests, you can block them based on the User-Agent header. On the server side, configure your web server (e.g., Nginx or Apache) to deny access to User-Agents that include ‘python-requests’.

# Example configuration snippet for Nginx
cif_http_user_agent $bad_agents {
  default 0;
  ~*python-requests 1;
}

map $bad_agents $block_bad_agents {
  default 0;
  1 1;
}

server {
  if ($block_bad_agents) {
    return 403;
  }
  ...
}

In Apache, you could use the mod_rewrite module to achieve a similar effect.

Rate Limiting

A more nuanced approach is to rate-limit requests from suspicious User-Agents. Tools like fail2ban or web application firewalls (WAFs) offer ways to automatically add rate limits or temporary bans on offending IP addresses.

# Example of a fail2ban filter
[Definition]
failregex = .*python-requests.*
...

Advanced Techniques: Behavioral Analysis

If the client changes their User-Agent to avoid detection, you may need to analyze the behavior of requests over a period and look for patterns typical of automated scripts — consistent intervals between requests, for example, or a high number of requests in a short period.

class RequestBehavior:
    def __init__(self):
        self.request_times = []

    def log_request(self, time_of_request):
        self.request_times.append(time_of_request)

    def is_suspicious(self):
        # Implement analysis of self.request_times
        ...

IP-based Blocking

Another method of blocking unwanted requests is by IP. Standards such as IP use IP address blacklists to block a range of IP addresses that are known to be associated with scrapers or bots.

# IP-based blocking using iptables
sudo iptables -A INPUT -s 123.45.67.89 -j DROP

Using a CAPTCHA

Implementing a CAPTCHA on your site is a common and effective way of filtering out bots. This won’t block the requests per se, but it will prevent automated access to areas of your site by requiring user interaction.

<form method="POST">
    <!-- Include CAPTCHA here -->
</form>

Conclusion

In this tutorial, we explored multiple methods to detect and block requests coming from Python’s requests module. These strategies help maintain server health against unwanted scraping or automated traffic. Remember to balance the strength of these measures against the legitimate need for bot activity, such as search engine indexing.

Next Article: Python ‘requests’ module: How to force use of IPv4 or IPv6

Previous Article: Python ‘requests’ module: Handle CSV response

Series: Python: Network & JSON tutorials

Python