Monitoring script sharing of Linux server hardware running status and fault email reminder

Time:2021-9-25

Monitor hardware health

The shell monitors the CPU, memory, and load average and records them in the log. When the load is under pressure, the shell sends an email to the administrator.
Principle:
1. Get the values of CPU, memory and load average
2. Judge whether the value exceeds the user-defined range, for example (CPU > 90%, memory < 10%, load average > 2)
3. If the value exceeds the range, send an email to notify the administrator. There is a time interval for sending, and it will only be sent once every hour.
4. Write the value to the log.
5. Set crontab to run every 30 seconds.

ServerMonitor.sh

#!/bin/bash 
 
#The system monitors and records CPU, memory and load average. When the specified value is exceeded, email the administrator 
 
# *** config start *** 
 
#Current directory path 
ROOT=$(cd "$(dirname "$0")"; pwd) 
 
#Current server name 
HOST=$(hostname) 
 
#Log file path 
CPU_LOG="${ROOT}/logs/cpu.log" 
MEM_LOG="${ROOT}/logs/mem.log" 
LOAD_LOG="${ROOT}/logs/load.log" 
 
#Notification email list 
NOTICE_EMAIL='[email protected]' 
 
#CPU, memory, and load average record the last time the notification email was sent 
CPU_REMARK='/tmp/servermonitor_cpu.remark' 
MEM_REMARK='/tmp/servermonitor_mem.remark' 
LOAD_REMARK='/tmp/servermonitor_loadaverage.remark' 
 
#Notification email interval 
REMARK_EXPIRE=3600 
NOW=$(date +%s) 
 
# *** config end *** 
 
 
# *** function start *** 
 
#Get CPU usage 
function GetCpu() { 
  cpufree=$(vmstat 1 5 |sed -n '3,$p' |awk '{x = x + $15} END {print x/5}' |awk -F. '{print $1}') 
  cpuused=$((100 - $cpufree)) 
  echo $cpuused 
 
  local remark 
  remark=$(GetRemark ${CPU_REMARK}) 
 
  #Check whether the CPU usage exceeds 90% 
  if [ "$remark" = "" ] && [ "$cpuused" -gt 90 ]; then 
    echo "Subject: ${HOST} CPU uses more than 90% $(date +%Y-%m-%d' '%H:%M:%S)" | sendmail ${NOTICE_EMAIL} 
    echo "$(date +%s)" > "$CPU_REMARK" 
  fi 
} 
 
#Get memory usage 
function GetMem() { 
  mem=$(free -m | sed -n '3,3p') 
  used=$(echo $mem | awk -F ' ' '{print $3}') 
  free=$(echo $mem | awk -F ' ' '{print $4}') 
  total=$(($used + $free)) 
  limit=$(($total/10)) 
  echo "${total} ${used} ${free}" 
 
  local remark 
  remark=$(GetRemark ${MEM_REMARK}) 
 
  #Check whether the memory usage exceeds 90% 
  if [ "$remark" = "" ] && [ "$limit" -gt "$free" ]; then 
    echo "Subject: ${HOST} Memory uses more than 90% $(date +%Y-%m-%d' '%H:%M:%S)" | sendmail ${NOTICE_EMAIL} 
    echo "$(date +%s)" > "$MEM_REMARK" 
  fi 
} 
 
#Get load average 
function GetLoad() { 
  load=$(uptime | awk -F 'load average: ' '{print $2}') 
  m1=$(echo $load | awk -F ', ' '{print $1}') 
  m5=$(echo $load | awk -F ', ' '{print $2}') 
  m15=$(echo $load | awk -F ', ' '{print $3}') 
  echo "${m1} ${m5} ${m15}" 
 
  m1u=$(echo $m1 | awk -F '.' '{print $1}') 
 
  local remark 
  remark=$(GetRemark ${LOAD_REMARK}) 
 
  #Check the load for pressure 
  if [ "$remark" = "" ] && [ "$m1u" -gt "2" ]; then 
    echo "Subject: ${HOST} Load Average more than 2 $(date +%Y-%m-%d' '%H:%M:%S)" | sendmail ${NOTICE_EMAIL} 
    echo "$(date +%s)" > "$LOAD_REMARK" 
  fi 
} 
 
#Get the last email sent 
function GetRemark() { 
  local remark 
 
  if [ -f "$1" ] && [ -s "$1" ]; then 
    remark=$(cat $1) 
 
    if [ $(( $NOW - $remark )) -gt "$REMARK_EXPIRE" ]; then 
      rm -f $1 
      remark="" 
    fi 
  else 
    remark="" 
  fi 
 
  echo $remark 
} 
 
 
# *** function end *** 
 
cpuinfo=$(GetCpu) 
meminfo=$(GetMem) 
loadinfo=$(GetLoad) 
 
echo "cpu: ${cpuinfo}" >> "${CPU_LOG}" 
echo "mem: ${meminfo}" >> "${MEM_LOG}" 
echo "load: ${loadinfo}" >> "${LOAD_LOG}" 
 
exit 0

Monitor whether the website is abnormal
Shell monitors the script of the website for exceptions. If there are exceptions, it will automatically email the administrator.
technological process:
1. Check the HTTP returned by the website_ Whether the code is equal to 200. If it is not 200, it will be regarded as an exception.
2. Check the access time of the website. If it exceeds maxloadtime (10 seconds), it will be regarded as an exception.
3. After sending the notification email, click / TMP / monitor_ Load.remark record the sending time and do not repeat the sending within one hour. If one hour later, clear / TMP / monitor_ load.remark。

#!/bin/bash 
 
SITES=(" http://web01.example.com " " http://web02.example.com ") # sites to monitor 
NOTICE_ EMAIL=' [email protected] '# admin email 
Maxloadtime = 10 # access timeout setting 
REMARKFILE='/tmp/monitor_ Load. Remark '# record whether the notification email has been sent. If it has been sent, it will not be sent within one hour 
Issend = 0 # did you send email 
Exhibit = 3600 # the number of seconds between emails sent 
NOW=$(date +%s) 
 
if [ -f "$REMARKFILE" ] && [ -s "$REMARKFILE" ]; then 
  REMARK=$(cat $REMARKFILE) 
   
  #Delete expired email sending time record file 
  if [ $(( $NOW - $REMARK )) -gt "$EXPIRE" ]; then 
    rm -f ${REMARKFILE} 
    REMARK="" 
  fi 
else 
  REMARK="" 
fi 
 
#Loop to judge each site 
for site in ${SITES[*]}; do 
 
  printf "start to load ${site}\n" 
  site_load_time=$(curl -o /dev/null -s -w "time_connect: %{time_connect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}" "${site}") 
  site_access=$(curl -o /dev/null -s -w %{http_code} "${site}") 
  time_total=${site_load_time##*:} 
 
  printf "$(date '+%Y-%m-%d %H:%M:%S')\n" 
  printf "site load time\n${site_load_time}\n" 
  printf "site access:${site_access}\n\n" 
 
  # not send 
  if [ "$REMARK" = "" ]; then 
    # check access 
    if [ "$time_total" = "0.000" ] || [ "$site_access" != "200" ]; then 
      echo "Subject: ${site} can access $(date +%Y-%m-%d' '%H:%M:%S)" | sendmail ${NOTICE_EMAIL} 
      ISSEND=1 
    else 
      # check load time 
      if [ "${time_total%%.*}" -ge ${MAXLOADTIME} ]; then 
        echo "Subject: ${site} load time total:${time_total} $(date +%Y-%m-%d' '%H:%M:%S)" | sendmail ${NOTICE_EMAIL} 
        ISSEND=1 
      fi 
    fi 
  fi 
 
done 
 
#Record the sending time after sending the email 
if [ "$ISSEND" = "1" ]; then 
  echo "$(date +%s)" > $REMARKFILE 
fi 
 
exit 0

Recommended Today

Supervisor

Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]