Apache: How to deny requests by user agent

Updated: January 20, 2024 By: Guest Contributor Post a comment

Introduction

Denying requests based on user agent is an important technique for improving website security, managing server load, or simply directing web traffic more effectively. In this tutorial, we will explore how to deny user agents using the Apache HTTP Server. From using .htaccess rules to more advanced configurations within virtual host setups, we’ll cover methods suited to various levels of expertise.

Understanding .htaccess

The .htaccess file is a powerful Apache configuration file that allows for decentralized management of web server configuration. Before we jump into the code examples, it’s crucial to ensure that your Apache configuration allows for .htaccess files to be used. The following snippet shows how to allow .htaccess files within your httpd.conf or apache2.conf:

<Directory /var/www/example.com>
    AllowOverride All
</Directory>

Deny User Agents with .htaccess

Let’s start with the most straightforward example: denying access to a particular user agent using the .htaccess file.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} badbot
RewriteRule ^ - [F,L]

This code checks for a user agent containing the string ‘badbot’ and denies access with a 403 Forbidden response.

Using the ‘SetEnvIf’ Directive

The SetEnvIf directive allows you to set environment variables based on characteristics of the request, like the user agent. Here’s an example:

SetEnvIf User-Agent ^badbot block_user
Order allow,deny
Allow from all
Deny from env=block_user

In this configuration, we set an environment variable ‘block_user’ if the user agent starts with ‘badbot’ and we deny access to those marked by this variable.

Advanced Blocking Techniques

For advanced users, more complex blocking can be applied.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^badbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^evilbot
RewriteRule ^ - [F,L]

Here, we deny access to any requests coming from user agents beginning with either ‘badbot’ or ‘evilbot’.

Creating a User Agent Blacklist

A blacklist file lets you manage a list of denied user agents in one central location.

RewriteEngine On
RewriteMap badbots txt:/etc/apache2/badbots.txt
RewriteCond %{HTTP_USER_AGENT} ${badbots:%{HTTP_USER_AGENT}|false} [NC]
RewriteRule ^ - [F,L]

This configuration uses a RewriteMap to refer to a list of bad bots in ‘badbots.txt’ and deny them access accordingly.

Deny New Rows

A technique for denying assaults or bots that repeatedly hit new URLs:

RewriteMap leeches txt:/etc/apache2/newrowsleechees.txt
RewriteCond %{REQUEST_URI} /new-row
RewriteCond ${leeches:%{REMOTE_ADDR}|NOT_FOUND} !NOT_FOUND
RewriteRule .* - [F]

This denies access if the IP address is listed in the ‘newrowsleechees.txt’ when trying to access URLs with ‘/new-row’.

Handling False Positives

Blocking user agents can sometimes result in false positives where legitimate users are denied access.

RewriteCond %{HTTP_USER_AGENT} ^Mozilla
RewriteCond %{HTTP_USER_AGENT} !(Slurp|bingbot)
RewriteRule ^ - [F,L]

This rule denies access to user agents that begin with Mozilla, except for ‘Slurp’ or ‘bingbot’, reducing the risk of blocking legitimate crawlers.

Logging Blocks

To debug and monitor blocks, enabling logging can be helpful:

RewriteCond %{HTTP_USER_AGENT} badbot
RewriteRule ^ - [F,L]
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined env=block_user

This rule logs denied requests marked by ‘block_user’ to the access log.

Conclusion

Controlling traffic based on user agent is an essential aspect of web server management, reducing unwanted load and potential threats. The strategies discussed provide a toolbox for selecting the right approach depending on the individual needs. Testing any changes in a secure environment is recommended to minimize potential disruptions caused by incorrect blocking.