Health Checks in NGINX: The Complete Guide

Updated: January 19, 2024 By: Guest Contributor Post a comment

Introduction

Health checks are an essential component of maintaining high availability and fault tolerance for web applications. NGINX, as a powerful and flexible web server, offers a variety of methods to implement health checks. In this guide, we’ll explore how to configure NGINX for both active and passive health checks, ensuring that your services remain robust and reliable.

What are Health Checks?

Health checks are tests conducted by load balancers or reverse proxies to determine if a backend server is able to handle requests. Active health checks proactively test servers at regular intervals. In contrast, passive health checks monitor the ongoing communication and flag servers as unhealthy when errors reach a certain threshold.

Active Health Checks

Let’s start with a basic example of an active health check configuration in NGINX:

http {
    upstream backend {
        server backend1.example.com;
        server backend2.example.com;
        check interval=3000 rise=2 fall=5 timeout=1000 type=http;
        check_http_send "HEAD /health HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n";
        check_http_expect_alive http_2xx http_3xx;
    }

    server {
        location / {
            proxy_pass http://backend;
        }
    }
}

In this example, NGINX will send a HEAD request to the /health endpoint every 3 seconds (interval=3000). The server is considered healthy if it returns a 2xx or 3xx HTTP status code twice in a row (rise=2). If the server fails to respond properly 5 consecutive times (fall=5), it is marked as unhealthy.

Passive Health Checks

For passive health checks, NGINX relies on live traffic analysis. Here’s how to set it up:

stream {
    upstream backend {
        server backend1.example.com;
        server backend2.example.com max_fails=2 fail_timeout=30s;
    }

    server {
        listen 80;
        proxy_pass backend;
        proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
    }
}

This configuration specifies that if a backend server fails to respond correctly twice (max_fails=2) within 30 seconds (fail_timeout=30s), it will be temporarily removed from the pool. The ‘proxy_next_upstream’ directive outlines which errors should trigger a retry with a different server.

Advanced Health Check Configurations

More complex scenarios might require advanced health check configurations. The following code shows an example with shared memory zone and custom failure detection:

http {
    upstream backend {
        zone backend 64k;
        server backend1.example.com;
        server backend2.example.com;

        health_check
        match=healthy
        interval=2000
        fails=3
        passes=1
        uri=/custom_check;
    }

    match healthy {
        status 200-399;
    }

    server {
        location / {
            proxy_pass http://backend;
            proxy_set_header Host $host;
        }
    }
}

In this example, a ‘health_check’ block has been added to conduct checks every 2 seconds with more specific conditions for failing and passing the check. The status codes that mark a healthy server are defined in a separate ‘match’ block labeled ‘healthy’.

Testing and Troubleshooting Health Checks

Here, find how to verify that your health checks are functioning correctly using NGINX’s logging capabilities:

server {
        location / {
            proxy_pass http://backend;
            access_log /var/log/nginx/backend_access.log;
            error_log /var/log/nginx/backend_error.log;
        }

        location /health {
            access_log /var/log/nginx/health_access.log;
            proxy_pass http://backend;
        }
}

By directing the access logs for the health check endpoint to a separate file, you can easily monitor the status of your health checks. The error log can be consulted for troubleshooting any issues that arise.

Automating Health Checks with NGINX Plus

NGINX Plus users have access to dynamic health checks and a dashboard for easy management. Configuration is straightforward:

upstream backend {
    zone backend 64k;
    server backend1.example.com;
    server backend2.example.com;

    health_check;
}

status_zone backend;

After this setup, you can use the live activity monitoring dashboard to keep an eye on server health in real-time.

Conclusion

Configuring health checks in NGINX is a critical part of ensuring your services stay online and perform well. Active and passive health checks can be tailored to meet the specific needs of your environment. With the right setup, NGINX can help automate the process of managing server availability and contribute to a robust failover strategy. Always make sure to test your configurations thoroughly to ensure the desired outcomes.