The clusterXL_monitor_process Script

Description

You can use the clusterXL_monitor_process script to monitor if the specified user space processes run, and cause cluster fail-over if these processes do not run. For this script to work, you must write the correct case-sensitive names of the monitored processes in the $FWDIR/conf/cpha_proc_list file - each process name on a separate line. This file does not support comments or spaces.

Location of this script on your Cluster Members is:

$FWDIR/bin/clusterXL_monitor_process

Script Workflow

  1. Registers Critical Devices (with the status "ok") called as the names of the processes you specified in the $FWDIR/conf/cpha_proc_list file.

  2. While the script detects that the specified process runs, it does not change the status of the corresponding Critical Device.

  3. If the script detects that the specified process do not run anymore, it reports the state of the corresponding Critical Device as "problem".

    This gracefully changes the state of the Cluster Member to "DOWN".

    If the script detects that the specified process runs again, it changes the status of the corresponding Critical Device to "ok" again.

For more information, see sk92904.

Important - You must do these changes on all Cluster Members.

Example

#!/bin/sh
#
# This script monitors the existance of processes in the system. The process names should be written
# in the $FWDIR/conf/cpha_proc_list file one every line.
#
# USAGE :
# cpha_monitor_process X silent
# where X is the number of seconds between process probings.
# if silent is set to 1, no messages will appear on the console.
#
#
# We initially register a pnote for each of the monitored processes
# (process name must be up to 15 charachters) in the problem notification mechanism.
# when we detect that a process is missing we report the pnote to be in "problem" state.
# when the process is up again - we report the pnote is OK.
 
if [ "$2" -le 1 ]
then
        silent=$2
else
        silent=0
fi
if [ -f $FWDIR/conf/cpha_proc_list ]
then
        procfile=$FWDIR/conf/cpha_proc_list
else
        echo "No process file in $FWDIR/conf/cpha_proc_list "
        exit 0
fi
 
arch=`uname -s`
 
for process in `cat $procfile`
do
        $FWDIR/bin/cphaconf set_pnote -d $process -t 0 -s ok -p register > /dev/null 2>&1
done
 
while [ 1 ]
do
 
        result=1
 
        for process in `cat $procfile`
        do
                ps -ef  | grep $process | grep -v grep > /dev/null 2>&1
 
                status=$?
 
                if [ $status = 0 ]
                then
                         if [ $silent = 0 ]
                            then
                                  echo " $process is alive"
                        fi
#                       echo "3, $FWDIR/bin/cphaconf set_pnote -d $process -s ok report"
                        $FWDIR/bin/cphaconf set_pnote -d $process -s ok report
                else
                      if [ $silent = 0 ]
                         then
                                echo " $process is down"
                         fi
 
                     $FWDIR/bin/cphaconf set_pnote -d $process -s problem report
                      result=0
                fi
 
        done
 
        if [ $result = 0 ]
 
        then
                if [ $silent = 0 ]
                    then
                  echo " One of the monitored processes is down!"
                fi
         else
                if [ $silent = 0 ]
                    then
                        echo " All monitored processes are up "
              fi
 
        fi
        if [ "$silent" = 0 ]
        then
                echo "sleeping"
        fi
 
       sleep $1
 
done