How to Auto-Reboot a Ubuntu Server When Running Out of Resources

Updated: January 28, 2024 By: Guest Contributor Post a comment

Overview

One essential task in server management is ensuring that your server remains operational and does not succumb to resource depletion. Automating the reboot of an Ubuntu server when it’s running low on resources can help maintain system stability and availability. In this tutorial, we will explore several methods to implement this self-recovery feature.

Understanding Resources

Before we automate the reboot process, it’s crucial to understand which resources could lead to server instability:

  • CPU Usage: High CPU usage can result in slow performance and unresponsiveness.
  • Memory (RAM) Usage: When your server runs out of memory, it might start swapping, which severely degrades performance.
  • Disk Usage: A full disk can lead to a whole host of issues, including the inability to write logs or create temporary files.

Prerequisites

To follow along with this tutorial, you should have:

  • A Ubuntu server
  • Basic knowledge of Linux command line
  • Sudo privileges or access to the root user

Monitoring System Health

It’s important to first monitor and diagnose your server’s resources before triggering a reboot. You can use tools like top, htop, or free to analyze your system’s health in real-time.

$ top
$ htop
$ free -h

Scripting a Basic Reboot Trigger

A basic shell script can be written to check system resource usage using built-in commands and trigger a reboot if needed.

#!/bin/bash

# Define the usage thresholds
max_cpu=90
max_mem=90

# Fetch the current CPU and memory usage
current_cpu=$(top -bn1 | grep '%Cpu(s)' | awk '{print $2}')
current_mem=$(free -m | awk '/Mem:/ { print ($3/$2)*100 }')

# Check if current usage exceeds the thresholds
if [[ `echo "$current_cpu > $max_cpu" | bc` -eq 1 ]] || [[ `echo "$current_mem > $max_mem" | bc` -eq 1 ]];
then
  sudo shutdown -r now
fi

This script should be run as a cron job set up to execute at regular intervals.

Advanced System Monitoring with Monit

Monit is a utility for managing and monitoring Unix systems. You can leverage Monit to watch for resource usage thresholds and perform automated reboots.

Install Monit:

sudo apt-get update
sudo apt-get install monit

Configure Monit to monitor system resources and perform a reboot:

sudo nano /etc/monit/monitrc

Add the following to the configuration file:

check system $HOST
  if loadavg (5min) > 4 for 5 cycles then restart
  if memory usage > 75% for 5 cycles then restart
  if cpu usage (user) > 70% for 5 cycles then restart

set init

Above configurations will trigger a reboot if certain conditions are met over a span of cycles. Don’t forget to enable and start the Monit service:

sudo systemctl enable monit
sudo systemctl start monit

Combining Log Analysis with Reboot

If you suspect specific services are causing depletions, you could create log-triggered actions. Tools like logwatch or swatchdog can be configured to watch logs and execute commands based on patterns.

Create a Swap File

In cases of insufficient memory, before rebooting, consider first adding a swap file:

sudo fallocate -l 1G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Add the swap file to /etc/fstab to make it permanent:

/swapfile swap swap defaults 0 0

Using Cloud Provider’s Auto-Scaling

If your server is cloud-based, providers such as AWS, Azure, or Google Cloud provide auto-scaling features that monitor and automatically adjust your server’s resources.

Testing the Reboot Strategy

Prior to a production deployment, test your monitoring and reboot strategies in a controlled environment. This ensures that the system reacts correctly and avoids unexpected behavior.

Conclusion

In this tutorial, we’ve discussed how to monitor and automatically respond to low resource situations by rebooting your Ubuntu server. By ensuring your system can self-heal, it enhances reliability and could prevent potential downtimes.