Bash Script

The Ultimate Bash Script for Server Health Monitoring

In the world of DevOps and SRE, the worst kind of problem is the one that surprises you. An unexpected server crash or maxed-out disk can bring your applications down. The solution is to move from reactive fixes to proactive monitoring.

This guide provides the foundational tool for that strategy: a powerful, all-in-one Bash script that performs a comprehensive health check on any Linux-based server. It's lightweight, customizable, and the first step towards building a robust observability pipeline.

What Does This Script Do?

This utility gathers critical system telemetry in a single, easy-to-read report. It checks the vital signs of your server, including:

The output is color-coded, making it instantly clear if a resource is in a healthy, warning, or critical state.

How to Use This Script

  1. Save the Script: Click "Copy Script" below and save the code as `health_check.sh`.
  2. Make It Executable: In your terminal, run this command to grant permission:
    chmod +x health_check.sh
  3. Run the Health Check: Execute the script for an instant report:
    ./health_check.sh

Taking It to the Next Level: True Automation

Running the script manually is useful, but its real power comes from automation:


#!/bin/bash

# =============================================================================
#           System Health Monitoring Script for Linux Servers
# Description: An all-in-one Bash script for a quick overview of system health.
# =============================================================================

CPU_WARN_THRESHOLD=75
MEM_WARN_THRESHOLD=80
DISK_WARN_THRESHOLD=85

RED='\033[0;31m'
NC='\033[0m'

print_header() {
    printf "\n%s\n" "================================================================="
    printf "    %-50s \n" "$1"
    printf "%s\n" "================================================================="
}

print_header "System Information"
printf "Hostname:         %s\n" "$(hostname)"
printf "Operating System: %s\n" "$(source /etc/os-release && echo $PRETTY_NAME)"
printf "Kernel Version:   %s\n" "$(uname -r)"
printf "System Uptime:    %s\n" "$(uptime -p)"

print_header "CPU Usage & Load Average"
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print 100 - $1}')
LOAD_AVERAGE=$(uptime | awk -F'load average:' '{ print $2 }')

printf "Current CPU Usage:  %.2f%%\n" "$CPU_USAGE"
printf "Load Average (1m, 5m, 15m):%s\n" "$LOAD_AVERAGE"

if (( $(echo "$CPU_USAGE > $CPU_WARN_THRESHOLD" | bc -l) )); then
    printf "${RED}CRITICAL: CPU usage is above ${CPU_WARN_THRESHOLD}%%!${NC}\n"
fi

print_header "Memory (RAM) Usage"
MEM_INFO=$(free -m | awk 'NR==2{printf "Total: %s MB | Used: %s MB | Free: %s MB", $2, $3, $4}')
MEM_PERCENTAGE=$(free -m | awk 'NR==2{printf "%.2f", $3*100/$2 }')

printf "Memory Stats:     %s\n" "$MEM_INFO"
printf "Memory Usage:     %s%%\n" "$MEM_PERCENTAGE"

if (( $(echo "$MEM_PERCENTAGE > $MEM_WARN_THRESHOLD" | bc -l) )); then
    printf "${RED}CRITICAL: Memory usage is above ${MEM_WARN_THRESHOLD}%%!${NC}\n"
fi

print_header "Disk Filesystem Usage"
df -h --output=source,pcent,size,used,avail | grep -vE 'tmpfs|devfs|squashfs'
df -H | grep -vE '^Filesystem|tmpfs|devtmpfs' | awk '{ print $5 " " $1 }' | while read output;
do
  usage=$(echo $output | awk '{ print $1}' | sed 's/%//g')
  filesystem=$(echo $output | awk '{ print $2 }')
  if [ $usage -ge $DISK_WARN_THRESHOLD ]; then
    printf "${RED}CRITICAL: Filesystem '%s' usage is at %s%% (threshold: %s%%)!${NC}\n" "$filesystem" "$usage" "$DISK_WARN_THRESHOLD"
  fi
done

print_header "Network Information"
printf "Primary IP Address: %s\n" "$(hostname -I | awk '{print $1}')"
printf "Total Running Processes: %d\n" "$(ps aux | wc -l)"

printf "\n%s\n" "===================== Health Check Complete ====================="