Node & Express: Block incoming requests by user agent

Updated: December 28, 2023 By: Guest Contributor Post a comment

Overview

In this tutorial, we will explore how to enhance the security and performance of your Node.js applications running on Express by blocking incoming HTTP requests based on the user agent string. This is particularly useful for preventing scrapers, bots, or undesirable browsers from accessing your service. We’ll cover how to achieve this, with step-by-step examples, from a basic blocklist approach to a more advanced, dynamic blocking mechanism.

Basic Blocking Using Middleware

To begin with, let’s create a simple piece of middleware that checks the user agent of each request and blocks those matching a predefined list of disallowed agents.

const express = require('express');
const app = express();

const blockList = ['BadBot', 'InvalidUserAgent'];

app.use((req, res, next) => {
  const userAgent = req.get('User-Agent');
  if (blockList.includes(userAgent)) {
    return res.status(403).send('Access Denied');
  }
  next();
});

app.get('/', (req, res) => {
  res.send('Hello, World!');
});

app.listen(3000, () => {
  console.log('Server started on port 3000');
});

This middleware function will run for every incoming request. If the user agent of the request matches any in the block list, a 403 Forbidden response will be sent.

Regular Expression Based Blocking

If you want to block user agents following a pattern, you can use regular expressions to match against the user agent string.

app.use((req, res, next) => {
  const userAgent = req.get('User-Agent');
  if (/BadBot|Crawler|Spider/i.test(userAgent)) {
    return res.status(403).send('Not Allowed');
  }
  next();
});

This piece of middleware investigates the incoming request’s user agent and blocks it if it matches the regular expression pattern, which in this case is looking for user agents with ‘BadBot’, ‘Crawler’, or ‘Spider’ in the string, regardless of the case.

Dynamic Blocking with Function

For a more sophisticated approach, you might want to block user agents based on some dynamic criteria. Below is a function that decides whether to block a request based on the user agent and optionally other logic like request times or IP addresses.

function shouldBlockUserAgent(userAgent) {
  // Place your dynamic logic here
  // For demonstration, we'll just use a simple condition
  return /BadBot|Crawler|Spider/i.test(userAgent);
}

app.use((req, res, next) => {
  const userAgent = req.get('User-Agent');
  if (shouldBlockUserAgent(userAgent)) {
    return res.status(403).send('Forbidden');
  }
  next();
});

The shouldBlockUserAgent function encapsulates the logic for deciding whether to block the user agent, making the middleware cleaner and more testable.

Advanced: Integrating with a Database

In a real-world scenario, you might want to store your blocklist in a database and update it frequently. This section provides an example of how you could integrate a MongoDB database to manage your blocklist dynamically.

// MongoDB setup and connection

app.use(async (req, res, next) => {
  const userAgent = req.get('User-Agent');
  const blockList = await getBlockListFromDatabase();
  if (blockList.includes(userAgent)) {
    return res.status(403).send('User Agent Blocked');
  }
  next();
});

async function getBlockListFromDatabase() {
  // Fetch the list of blocked user agents from your database
  // This function will vary depending on your database setup
  return ['BlockedBot', 'AnotherBadBot'];
}

Make sure to handle any database connection errors and ensure the asynchronous call to fetch the blocklist does not significantly delay the handling of your routes.

Conclusion

We’ve covered several methods to block incoming requests by user agent in a Node.js application with Express. These methods range from simple hardcoded lists to regular expressions and dynamic database-driven solutions. By implementing these strategies, you can better protect your application from unwanted traffic and potentially harmful bots. Remember, however, that user agent strings can be spoofed, so this should not be your sole line of defense. Combining user agent blocking with other security measures, such as rate limiting and CAPTCHA, can provide more robust protection for your application.

Happy coding!