#summary HASTMON -- cluster monitoring daemon.

= Introduction. =

HASTMON is a monitoring daemon that allows a couple of hosts to run a
service providing automatic failover. Those machines will be called a
cluster and each machine is one cluster node. HASTMON is designed for
clusters that work in Primary-Secondary configuration, which means
that only one of the cluster nodes can be active at any given
time. Active node will be called Primary node. This is the node that
will run the service. Other nodes will be running as Secondary
ensuring that the service is not started there. There should be also
at least one node acting as a watchdog -- it checks periodically
status of all nodes and sends complaints to Secondary nodes if Primary
is not available. Secondary node makes decision to change its role to
Primary when two conditions are meat: there is no connection from
primary and there are complaints from watchdog.

Most of the HASTMON's code was taken from 
[http://wiki.freebsd.org/HAST FreeBSD HAST project]
and it was developed as a monitoring daemon for HAST cluster but can be
used for other setups.

This software is being developed and tested under FreeBSD. Since 3.0
it is supposed to work on NetBSD and Linux too.

= Installation. =

Since version 0.3.0 when support for several platforms was added
hastmon requires 
[http://sourceforge.net/projects/mk-configure/ mk-configure]
to be built and installed. mk-configure uses 
[http://www.crufty.net/help/sjg/bmake.html bmake] (NetBSD make).

mk-configure and bmake have been already packaged on some platforms
(FreeBSD, NetBSD, some Linux distros) but if it is not your case go to
[http://sourceforge.net/projects/mk-configure/ mk-configure page],
download the sources and read instructions how to install.

When mk-configure is installed hastmon can be built and installed
running the following commands:

{{{
cd <path to hastmon sources>
mkcmake
mkcmake install
}}}

= Configuration. =

There should be at least 3 nodes: two that run the service and acting
as Primary-Secondary and one is Watchdog node. Configuration for nodes
is stored in /etc/hastmon.conf file, which is designed in a way that
exactly the same file can be (and should be) used on all nodes.
HASTMON can monitor several resources. For every resource the script
should be provided that will be used to start/stop the resource and
check its status. See hastmon.conf(5) and Examples section below how
to write the configuration file and rc script.

After the nodes are started their role is set up using hastmonctl
utility. This utility is also used to check current status of the
cluster.

= Examples. =

In this example two resources will be set up -- one is some
application/daemon that may run only on one server and another is HAST
cluster that provides NFS storage.

The cluster is run on three nodes: lolek, bolek -- running the services
(resources) and acting as Primary-Secondary, and reksio -- acting as
Watchdog.

Configuration file /etc/hastmon.conf is the same on all nodes:
{{{
resource daemon {
        exec /etc/daemon.sh
        friends lolek bolek reksio

        on lolek {
                remote tcp4://bolek
                priority 0
        }
        on bolek {
                remote tcp4://lolek
                priority 1
        }
        on reksio {
                remote tcp4://lolek tcp4://bolek
        }
}

resource storage {
        exec /etc/storage.sh
        friends lolek bolek reksio

        on lolek {
                remote tcp4://bolek
                priority 0
        }
        on bolek {
                remote tcp4://lolek
                priority 1
        }
        on reksio {
                remote tcp4://lolek tcp4://bolek
        }
}

}}}
Exec scripts, which are stored on all three nodes:

/etc/daemon.sh:
{{{
#!/bin/sh

DAEMON=/etc/rc.d/lpd # :-)

case $1 in
    start)
        ${DAEMON} onestart
        ;;
    stop)
        ${DAEMON} onestop
        ;;
    status)
        ${DAEMON} onestatus
        ;;
    role|connect|disconnect|complain)
        exit 0
        ;;
    *)
        echo "usage: $0 stop|start|status|role|connect|disconnect|complain"
        exit 1
        ;;
esac
}}}

/etc/storage.sh is more complicated:
{{{
#!/bin/sh

PROV=storage
POOL=storage
IF=em0
IP=172.20.68.100
FS=storage/test
MOUNTPOINT=/storage/test
DEV="/dev/hast/${PROV}"

HAST=/etc/rc.d/hastd
HASTCTL=/sbin/hastctl
ZPOOL=/sbin/zpool
ZFS=/sbin/zfs
IFCONFIG=/sbin/ifconfig
MOUNTD=/etc/rc.d/mountd
NFSD=/etc/rc.d/nfsd
MOUNT=/sbin/mount

RUN=0
STOPPED=1
UNKNOWN=2

progname=$(basename $0)

start()
{
    logger -p local0.debug -t "${progname}[$$]" "Starting $PROV..."
    # Check if hastd is started and start if it is not.
    "${HAST}" onestatus || "${HAST}" onestart

    # If there is secondary worker process, it means that remote primary process is
    # still running. We have to wait for it to terminate.
    for i in `jot 30`; do
        pgrep -f "hastd: ${PROV} \(secondary\)" >/dev/null 2>&1 || break
        sleep 1
    done
    if pgrep -f "hastd: ${PROV} \(secondary\)" >/dev/null 2>&1; then
        logger -p local0.error -t "${progname}[$$]" \
	    "Secondary process for resource ${PROV} is still running after 30 seconds."
        exit 1
    fi
    logger -p local0.debug -t "${progname}[$$]" "Secondary process in not running."

    # Change role to primary for our resource.
    out=`${HASTCTL} role primary "${PROV}" 2>&1`
    if [ $? -ne 0 ]; then
        logger -p local0.error -t "${progname}[$$]" \
	    "Unable to change to role to primary for resource ${PROV}: ${out}."
        exit 1
    fi

    # Wait few seconds for provider to appear.
    for i in `jot 50`; do
        [ -c "${DEV}" ] && break
        sleep 0.1
    done
    if [ ! -c "${DEV}" ]; then
        logger -p local0.error -t "${progname}[$$]" "Device ${DEV} didn't appear."
        exit 1
    fi
    logger -p local0.debug -t "${progname}[$$]" "Role for resource ${prov} changed to primary."

    # Import ZFS pool. Do it forcibly as it remembers hostid of the
    # other cluster node. Before import we check current status of
    # zfs: it might be that the script is called second time by
    # hastmon (because not all operations were successful on the first
    # run) and zfs is already here.

    "${ZPOOL}" list | egrep -q "^${POOL} "
    if [ $? -ne 0 ]; then
	out=`"${ZPOOL}" import -f "${POOL}" 2>&1`
	if [ $? -ne 0 ]; then
            logger -p local0.error -t "${progname}[$$]" \
		"ZFS pool import for resource ${PROV} failed: ${out}."
            exit 1
	fi
	logger -p local0.debug -t "${progname}[$$]" "ZFS pool for resource ${PROV} imported."
    fi

    zfs mount | egrep -q "^${FS} "
    if [ $? -ne 0 ]; then
	out=`zfs mount "${FS}" 2>&1`
	if [ $? -ne 0 ]; then
            logger -p local0.error -t "${progname}[$$]" \
		"ZFS mount for ${FS} failed: ${out}."
            exit 1
	fi
	logger -p local0.debug -t "${progname}[$$]" "ZFS {$FS} mounted."
    fi
 
    "${IFCONFIG}" "${IF}" alias "${IP}" netmask 0xffffffff

    out=`"${MOUNTD}" onerestart 2>&1`
    if [ $? -ne 0 ]; then
        logger -p local0.error -t "${progname}[$$]" \
	    "Can't start mountd: ${out}."
        exit 1
    fi
    
    out=`"${NFSD}" onerestart 2>&1`
    if [ $? -ne 0 ]; then
        logger -p local0.error -t "${progname}[$$]" \
	    "Can't start nfsd: ${out}."
        exit 1
    fi

    logger -p local0.debug -t "${progname}[$$]" "NFS started."
}

stop()
{
    logger -p local0.debug -t "${progname}[$$]" "Stopping $PROV..."

    # Kill start script if it still runs in the background.
    sig="TERM"
    for i in `jot 30`; do
        pgid=`pgrep -f '/etc/storage.sh start' | head -1`
        [ -n "${pgid}" ] || break
        kill -${sig} -- -${pgid}
        sig="KILL"
        sleep 1
    done
    if [ -n "${pgid}" ]; then
        logger -p local0.error -t "${progname}[$$]" \
	    "'/etc/storage.sh start' process for resource ${PROV} is still running after 30 seconds."
        exit 1
    fi
    logger -p local0.debug -t "${progname}[$$]" "'/etc/storage.sh start' is not running."

    "${NFSD}" onestop
    "${MOUNTD}" onestop

    "${IFCONFIG}" "${IF}" -alias "${IP}" netmask 0xffffffff
    
    if "${HAST}" onestatus; then
	"${ZPOOL}" list | egrep -q "^${POOL} "
	if [ $? -eq 0 ]; then
            # Forcibly export file pool.
            out=`${ZPOOL} export -f "${POOL}" 2>&1`
            if [ $? -ne 0 ]; then
		logger -p local0.error -t "${progname}[$$]" \
		    "Unable to export pool for resource ${PROV}: ${out}."
		exit 1
            fi
	    logger -p local0.debug -t "${progname}[$$]" \
		"ZFS pool for resource ${PROV} exported."
	fi
    else
	"${HAST}" onestart
    fi

    # Change role to secondary for our resource.
    out=`${HASTCTL} role secondary "${PROV}" 2>&1`
    if [ $? -ne 0 ]; then
        logger -p local0.error -t "${progname}[$$]" \
	    "Unable to change to role to secondary for resource ${PROV}: ${out}."
        exit 1
    fi
    logger -p local0.debug -t "${progname}[$$]" \
	"Role for resource ${PROV} changed to secondary."

    logger -p local0.info -t "${progname}[$$]" \
	"Successfully switched to secondary for resource ${PROV}."
}

status()
{
    "${HASTCTL}" status "${PROV}" |
    grep -q '^ *role: *primary *$' && 
    "${ZFS}" list "${POOL}" > /dev/null 2>&1 &&
    "${MOUNT}" | grep -q "${MOUNTPOINT}" &&
    "${NFSD}" onestatus > /dev/null 2>&1 &&
    "${MOUNTD}" onestatus > /dev/null 2>&1 &&
    return ${RUN}
 
    "${HASTCTL}" status "${PROV}" |
    grep -q '^ *role: *secondary *$' && 
    return ${STOPPED}
    
    return ${UNKNOWN}
}

case $1 in
    start)
	start
	;;
    stop)
	stop
	;;
    status)
	status
	;;
    role|connect|disconnect|complain)
	exit 0
	;;
    *)
	echo "usage: $0 stop|start|status|role|connect|disconnect|complain"
	exit 1
	;;
esac
}}}

Start hastmon daemon and set up role on all hosts:
{{{
lolek# hastmon
lolek# hastmonctl role primary all

bolek# hastmon
bolek# hastmonctl role secondary all

reksio# hastmon
reksio# hastmonctl role watchdog all
}}}

Check nodes' status: 
{{{
lolek# hastmonctl status
daemon:
  role: primary
  exec: /etc/daemon.sh
  remoteaddr: tcp4://bolek(connected)
  state: run
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec
storage:
  role: primary
  exec: /etc/storage.sh
  remoteaddr: tcp4://bolek(connected)
  state: run
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec

bolek# hastmonctl status
daemon:
  role: secondary
  exec: /etc/daemon.sh
  remoteaddr: tcp4://lolek(connected)
  state: stopped
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec
storage:
  role: secondary
  exec: /etc/storage.sh
  remoteaddr: tcp4://lolek(connected)
  state: stopped
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec

reksio# hastmonctl status
daemon:
  role: watchdog
  exec: /etc/daemon.sh
  remoteaddr: tcp4://lolek(primary/run) tcp4://bolek(secondary/stopped)
  state: run
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec
storage:
  role: watchdog
  exec: /etc/storage.sh
  remoteaddr: tcp4://lolek(primary/run) tcp4://bolek(secondary/stopped)
  state: run
  attempts: 0 from 5
  complaints: 0 for last 60 sec (threshold 3)
  heartbeat: 10 sec
}}}
