Sling Academy
Home/DevOps/How to Auto-Reboot a Ubuntu Server When Running Out of Resources

How to Auto-Reboot a Ubuntu Server When Running Out of Resources

Last updated: January 28, 2024

Overview

One essential task in server management is ensuring that your server remains operational and does not succumb to resource depletion. Automating the reboot of an Ubuntu server when it’s running low on resources can help maintain system stability and availability. In this tutorial, we will explore several methods to implement this self-recovery feature.

Understanding Resources

Before we automate the reboot process, it’s crucial to understand which resources could lead to server instability:

  • CPU Usage: High CPU usage can result in slow performance and unresponsiveness.
  • Memory (RAM) Usage: When your server runs out of memory, it might start swapping, which severely degrades performance.
  • Disk Usage: A full disk can lead to a whole host of issues, including the inability to write logs or create temporary files.

Prerequisites

To follow along with this tutorial, you should have:

  • A Ubuntu server
  • Basic knowledge of Linux command line
  • Sudo privileges or access to the root user

Monitoring System Health

It’s important to first monitor and diagnose your server’s resources before triggering a reboot. You can use tools like top, htop, or free to analyze your system’s health in real-time.

$ top
$ htop
$ free -h

Scripting a Basic Reboot Trigger

A basic shell script can be written to check system resource usage using built-in commands and trigger a reboot if needed.

#!/bin/bash

# Define the usage thresholds
max_cpu=90
max_mem=90

# Fetch the current CPU and memory usage
current_cpu=$(top -bn1 | grep '%Cpu(s)' | awk '{print $2}')
current_mem=$(free -m | awk '/Mem:/ { print ($3/$2)*100 }')

# Check if current usage exceeds the thresholds
if [[ `echo "$current_cpu > $max_cpu" | bc` -eq 1 ]] || [[ `echo "$current_mem > $max_mem" | bc` -eq 1 ]];
then
  sudo shutdown -r now
fi

This script should be run as a cron job set up to execute at regular intervals.

Advanced System Monitoring with Monit

Monit is a utility for managing and monitoring Unix systems. You can leverage Monit to watch for resource usage thresholds and perform automated reboots.

Install Monit:

sudo apt-get update
sudo apt-get install monit

Configure Monit to monitor system resources and perform a reboot:

sudo nano /etc/monit/monitrc

Add the following to the configuration file:

check system $HOST
  if loadavg (5min) > 4 for 5 cycles then restart
  if memory usage > 75% for 5 cycles then restart
  if cpu usage (user) > 70% for 5 cycles then restart

set init

Above configurations will trigger a reboot if certain conditions are met over a span of cycles. Don’t forget to enable and start the Monit service:

sudo systemctl enable monit
sudo systemctl start monit

Combining Log Analysis with Reboot

If you suspect specific services are causing depletions, you could create log-triggered actions. Tools like logwatch or swatchdog can be configured to watch logs and execute commands based on patterns.

Create a Swap File

In cases of insufficient memory, before rebooting, consider first adding a swap file:

sudo fallocate -l 1G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Add the swap file to /etc/fstab to make it permanent:

/swapfile swap swap defaults 0 0

Using Cloud Provider’s Auto-Scaling

If your server is cloud-based, providers such as AWS, Azure, or Google Cloud provide auto-scaling features that monitor and automatically adjust your server’s resources.

Testing the Reboot Strategy

Prior to a production deployment, test your monitoring and reboot strategies in a controlled environment. This ensures that the system reacts correctly and avoids unexpected behavior.

Conclusion

In this tutorial, we’ve discussed how to monitor and automatically respond to low resource situations by rebooting your Ubuntu server. By ensuring your system can self-heal, it enhances reliability and could prevent potential downtimes.

Next Article: Forget Ubuntu root password? Here’s how to reset it

Previous Article: 4 Ways to Recover Accidentally Deleted Files in Ubuntu

Series: Linux Tutorials

DevOps

You May Also Like

  • How to reset Ubuntu to factory settings (4 approaches)
  • Making GET requests with cURL: A practical guide (with examples)
  • Git: What is .DS_Store and should you ignore it?
  • NGINX underscores_in_headers: Explained with examples
  • How to use Jenkins CI with private GitHub repositories
  • Terraform: Understanding State and State Files (with Examples)
  • SHA1, SHA256, and SHA512 in Terraform: A Practical Guide
  • CSRF Protection in Jenkins: An In-depth Guide (with examples)
  • Terraform: How to Merge 2 Maps
  • Terraform: How to extract filename/extension from a path
  • JSON encoding/decoding in Terraform: Explained with examples
  • Sorting Lists in Terraform: A Practical Guide
  • Terraform: How to trigger a Lambda function on resource creation
  • How to use Terraform templates
  • Understanding terraform_remote_state data source: Explained with examples
  • Jenkins Authorization: A Practical Guide (with examples)
  • Solving Jenkins Pipeline NotSerializableException: groovy.json.internal.LazyMap
  • Understanding Artifacts in Jenkins: A Practical Guide (with examples)
  • Using Jenkins with AWS EC2 and S3: A Practical Guide