The Ultimate Bash Script for Server Health Monitoring
In the world of DevOps and SRE, the worst kind of problem is the one that surprises you. An unexpected server crash or maxed-out disk can bring your applications down. The solution is to move from reactive fixes to proactive monitoring.
This guide provides the foundational tool for that strategy: a powerful, all-in-one Bash script that performs a comprehensive health check on any Linux-based server. It's lightweight, customizable, and the first step towards building a robust observability pipeline.
What Does This Script Do?
This utility gathers critical system telemetry in a single, easy-to-read report. It checks the vital signs of your server, including:
- CPU Load: Identifies sustained high processing demand.
- Memory Usage: Reports total, used, and free RAM, flagging high usage.
- Disk Space: Scans all filesystems and alerts you if any disk is nearly full.
- Running Processes & Uptime: Provides a quick status overview.
The output is color-coded, making it instantly clear if a resource is in a healthy, warning, or critical state.
How to Use This Script
- Save the Script: Click "Copy Script" below and save the code as `health_check.sh`.
- Make It Executable: In your terminal, run this command to grant permission:
chmod +x health_check.sh - Run the Health Check: Execute the script for an instant report:
./health_check.sh
Taking It to the Next Level: True Automation
Running the script manually is useful, but its real power comes from automation:
- Automate with Cron: Schedule the script to run automatically. Open your crontab (`crontab -e`) and add a line to run it hourly:
0 * * * * /path/to/your/health_check.sh - Proactive Alerting: Modify the script to send an email or a Slack notification if any metric crosses a critical threshold.
- CI/CD Integration: Run a health check on a server *before* deploying new code to ensure the environment is stable.
#!/bin/bash
# =============================================================================
# System Health Monitoring Script for Linux Servers
# Description: An all-in-one Bash script for a quick overview of system health.
# =============================================================================
CPU_WARN_THRESHOLD=75
MEM_WARN_THRESHOLD=80
DISK_WARN_THRESHOLD=85
RED='\033[0;31m'
NC='\033[0m'
print_header() {
printf "\n%s\n" "================================================================="
printf " %-50s \n" "$1"
printf "%s\n" "================================================================="
}
print_header "System Information"
printf "Hostname: %s\n" "$(hostname)"
printf "Operating System: %s\n" "$(source /etc/os-release && echo $PRETTY_NAME)"
printf "Kernel Version: %s\n" "$(uname -r)"
printf "System Uptime: %s\n" "$(uptime -p)"
print_header "CPU Usage & Load Average"
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print 100 - $1}')
LOAD_AVERAGE=$(uptime | awk -F'load average:' '{ print $2 }')
printf "Current CPU Usage: %.2f%%\n" "$CPU_USAGE"
printf "Load Average (1m, 5m, 15m):%s\n" "$LOAD_AVERAGE"
if (( $(echo "$CPU_USAGE > $CPU_WARN_THRESHOLD" | bc -l) )); then
printf "${RED}CRITICAL: CPU usage is above ${CPU_WARN_THRESHOLD}%%!${NC}\n"
fi
print_header "Memory (RAM) Usage"
MEM_INFO=$(free -m | awk 'NR==2{printf "Total: %s MB | Used: %s MB | Free: %s MB", $2, $3, $4}')
MEM_PERCENTAGE=$(free -m | awk 'NR==2{printf "%.2f", $3*100/$2 }')
printf "Memory Stats: %s\n" "$MEM_INFO"
printf "Memory Usage: %s%%\n" "$MEM_PERCENTAGE"
if (( $(echo "$MEM_PERCENTAGE > $MEM_WARN_THRESHOLD" | bc -l) )); then
printf "${RED}CRITICAL: Memory usage is above ${MEM_WARN_THRESHOLD}%%!${NC}\n"
fi
print_header "Disk Filesystem Usage"
df -h --output=source,pcent,size,used,avail | grep -vE 'tmpfs|devfs|squashfs'
df -H | grep -vE '^Filesystem|tmpfs|devtmpfs' | awk '{ print $5 " " $1 }' | while read output;
do
usage=$(echo $output | awk '{ print $1}' | sed 's/%//g')
filesystem=$(echo $output | awk '{ print $2 }')
if [ $usage -ge $DISK_WARN_THRESHOLD ]; then
printf "${RED}CRITICAL: Filesystem '%s' usage is at %s%% (threshold: %s%%)!${NC}\n" "$filesystem" "$usage" "$DISK_WARN_THRESHOLD"
fi
done
print_header "Network Information"
printf "Primary IP Address: %s\n" "$(hostname -I | awk '{print $1}')"
printf "Total Running Processes: %d\n" "$(ps aux | wc -l)"
printf "\n%s\n" "===================== Health Check Complete ====================="