Shortest monitoring script on earth

Right after I got my quite elaborate Nagios setup running in my local network, I thought to myself “What if my network loses connectivity to the world and my GSM-SMS gateway fails? How am I going to be notified of such a grand disaster?”

Well, as usual, a shell one-liner comes to rescue.

This command line needs to be put to a crontab on any remote Linux machine.

*/5 * * * *     nc -zw1 corp-gw.123unix.com 443 || { l=$HOME/.last-notification-time; n=`date +\%s`; [ ! -e $l -o $((`cat $l`+7200)) -lt $n ] && echo $n | tee $l | mail -s REMOTE-LAN-PROBLEM admin@123unix.com; }

( Be sure to adjust values in bold to match local environment)

Here is what it does:

  1. nc -zw1 corp-gw.123unix.com 443
    checks if the TCP service on host corp.123unix.com at port 443 is listening for connections;
    -z tells nc to just probe for a service on a given host and port and not set up a connection;
    -w1 sets the probe timeout to 1 second.
    Note that a more traditional ping -qc1 -w1 could also be used for connectivity testing. It is just that nc is more versatile and really helps when ICMP echo-reply is blocked on the host in question.
  2. shell variable l (short for “last”) is set to point to a file which will hold the time when the last notification was sent out.
  3. shell variable n (short for “now”) is set to the current time in seconds since 1970-01-01 00:00:00 UTC
  4. [ ! -e $l -o $((`cat $l`+7200)) -lt $n ] ensures the notifications are sent at most once in two hours (7200 seconds)
  5. echo $n | tee $l | mail -s REMOTE-LAN-PROBLEM admin@123unix.com updates the
    last-notification-time file and sends out the notification.

For redundancy this “script” may be put to several remote servers, perhaps with different frequencies of checks and notifications. That way the weakest link in the admin’s business becomes his mobile mail reader.