Debug Size:

Complete CLI Guide


: RSF-1 Configuration File

: RSF-1 Configuration File Explained

In the example configuration file on the previous page, there are three main sections:

  1. Header with global variables
  2. The "Machines" section defining which nodes form the cluster and how heartbeats are configured
  3. The "Services" section which details the services that are managed by the cluster
The areas highlighted in are those, which should be amended for your environment.

Other variables and settings can also be modified but they are beyond the scope of this Quick Start guide.

The defaults given here are sufficient for general evaluation configuration and testing. Further information is available in the RSF-1 Administrator's Guide.

The first section contains some global variable settings:
The second section details the cluster nodes and heartbeats: For each node, the corresponding lines detail the heartbeat configurations with each other node in the cluster.
NOTE - For disk heartbeats, the offset ":518:512" component is reversed on the other node to ":512:518" – these numbers detail the "read" and "write" disk blocks respectively, so the first node
The final section details the services to be managed by RSF-1:
In this case, we have defined two services: POOLA and POOLB. The syntax of the SERVICE line is as follows:
SERVICE <service-name> <VIP-address> / <netmask> "<Description>"
For the POOLA service, we have defined the VIP address resolving to sales_staff-public with a netmask of 255.255.255.0. The VIP will be plumbed into the IPDEVICE interface (in this case bge0) as a secondary IP addressing mechanism (alongside any IP address already defined on bge0).

We have also specified the two ZFS pools A and C with their respective mount points are to be part of this service.

For the POOLB service, the associated VIP is support_staff-public with a netmask of 255.255.255.0 and ZFS pool B.

Each service has two timeouts associated: INITIMEOUT and RUNTIMEOUT. If the service has not been running on any cluster node, INITIMEOUT must countdown before the service will be started if other cluster nodes are not contactable. If the service has been running elsewhere, then RUNTIMEOUT must countdown before the service will be restarted.
NOTE - the ordering of the "SERVER" section at the end of each SERVICE section is important, as the first node in the list is the preferred node to start that service. If all services have the same SERVER order, all services will start on that preferred node. In this example, we want romulus to be the preferred starting node for the POOLA service and remus for POOLB

: Verifying and Installing the RSF-1 Configuration File

This text file is located in /opt/HAC/RSF-1/etc/config and can be modified with an editor of your choice.

NOTE - The config file must be identical on all members of the cluster, and check summing is used to ensure this is the case. If they are different, in any way, including white space, RSF-1 will start but service startups will be explicitly disabled.
The recommended method of amending and testing RSF-1 configuration files is to take a copy of the existing file, make the changes and then test before installing.
NOTE - It is only necessary to create and edit the config file on one node and then use the RSF-1 config_dist distribution tool to install and update all other cluster nodes.
At this point, you may want to add /opt/HAC/RSF-1/bin to your PATH.
romulus# cp /opt/HAC/RSF-1/etc/config /tmp/config.test
romulus# vi /tmp/config.test
romulus# /opt/HAC/RSF-1/bin/rsfmon –c /tmp/config.test
[28609 Oct 2 16:17:13] ----------------- RSF-1 starting -----------------
[28609 Oct 2 16:17:13] RSF-1 monitor 3.9.8 p4 started on Thu Oct 2 16:17:13 2014
[28609 Oct 2 16:17:13] Compiled on 01 Oct 2014 10:41 for 19:solaris-x86
[28609 Oct 2 16:17:13] Copyright High-Availability.Com Ltd
[28609 Oct 2 16:17:14] Using machine ID 0x42029b314
[28609 Oct 2 16:17:14] Configuration file parsed OK, CRC is 0x692c
[28609 Oct 2 16:17:14] INFO: This copy of RSF-1 expires on 2014-12-22
[28609 Oct 2 16:17:14] INFO: This copy of RSF-1 is licenced for automatic service startup
romulus#
Once the verification process above has succeeded, the new config file can be distributed and installed on all nodes in the cluster as follows:
romlus# /opt/HAC/RSF-1/bin/config_dist /tmp/config.test romulus remus
Oct 2 16:22:42 romulus RSF-1[25130]: [ID 702911 local0.alert] RSF-1 cold restart: All services stopped.
romulus#
NOTE – The config_dist command sets the state of any newly added services described in the config file to auto on all nodes. Alternatively, if rsfmon is started manually without config_dist, newly added services are set to manual mode.

: Storage

Shared storage devices (real or virtual) must be visible to all nodes in the cluster that are to be configured to be capable of running the associated ZFS pools. There is however no requirement for device naming to be identical across all nodes.

Other than iSCSI devices, where virtual storage devices are being used for testing, , disk reservations using SCSI-2 or PGR3 are not supported and should be disabled.
NOTE - Disk heartbeats should generally not be used on SSD devices as this may reduce the lifespan of these devices

1: What is RSF-1 for ZFS?

RSF-1 for ZFS is an Enterprise proven High Availability Cluster product that manages the availability of critical ZFS storage pools.

Each RSF-1 Service contains one or more ZFS storage pools with associated file and block services. An RSF-1 ZFS Cluster consists of two or more servers that have any number of RSF-1 Services (pools) configured. RSF-1 provides high availability of ZFS pools by managing the start-up and failover of RSF-1 Services within an RSF-1 Cluster.

A typical Active/Active 2-node RSF-1 Cluster configuration consists of two RSF-1 Services, each of which have an independent ZFS Pool and a single associated VIP. Under normal operation, each node is responsible for providing services to one ZFS pool, and in the event of either node failing, the surviving node will run both.

When the failed node has been repaired and restarted, it will rejoin the cluster and the administrator can control when the ZFS pools are redistributed. To provide optimum uptime, RSF-1 does not automatically failback RSF-1 Services.

1.1: RSF-1 Communication

Each node in the Cluster communicates with all others via a number of Heartbeat mechanisms:
NOTE - On ZFS RSF-1 Clusters, it is most common for each node to have two independent network heartbeats (one private back-to-back connection, and a public network) and two or more disk heartbeats. If disk heartbeats are not desirable (for example in all-flash configurations), then a serial heartbeat is recommended in addition to network heartbeats.
RSF-1 detects system failure when no updates from a node have been seen across all heartbeat mechanisms for a given configurable time period.

1.2: RSF-1 Services

Each RSF-1 Service consists of:

1.3: RSF-1 Service States

RSF-1 Services are managed independently from one another and have a number of possible States per node of which the most important are:

1.4: RSF-1 Run Modes

Each RSF-1 Service also has an independent Run Mode per node:

Run Modes are controlled by other processes within the RSF-1 framework to prevent RSF-1 Service start-up if certain dependencies have not been met (for example, the requirement for a public network interface to be available). The usage of this facility is beyond the scope of this document but further information is available in the RSF-1 Administrator's Guide.

1.5: RSF-1 Switchover Modes

Each RSF-1 Service has an independent Switchover Mode associated per node::
NOTE - Under normal operating mode, each RSF-1 Service will be in Automatic mode on all cluster nodes. For maintenance purposes, the administrator can prevent automatic failovers or start-ups by changing the Switchover Mode to Manual

1.6: RSF-1 Startup and Shutdown Scripts

For each RSF-1 service, the following startup / shutdown scripts (located in /opt/HAC/RSF-1/etc/rc.appliance.c/ ) are executed in ascending order of script file name (in the same method used for init.d):

2: System Requirements

2.1: Operating Systems

The supported Operating Systems are as follows:

2.2: Hardware


RSF-1 can be deployed on real or virtual servers.

As RSF-1 is a very lightweight process, there are no minimal hardware requirements.

It is however expected that the servers to be used have a reasonable amount of memory and CPU power to provide ZFS storage services.

2.3: Network and Firewalls

It is recommended that two separate network ports be used for heartbeats: a private connection (using exclusive back-to-back Ethernet cable) and a public network.

As a minimum, each node in the cluster must be able to utilize port 1195 using both udp and tcp across all network heartbeats.

NOTE - If firewalls are deployed, a rule should be added to allow tcp and udp access via port 1195 between the cluster nodes

2.4: Storage

Shared storage devices (real or virtual) must be visible to all nodes in the cluster that are to be configured to be capable of running the associated ZFS pools. There is however no requirement for device naming to be identical across all nodes.

Other than iSCSI devices, where virtual storage devices are being used for testing, , disk reservations using SCSI-2 or PGR3 are not supported and should be disabled.
NOTE - Disk heartbeats should generally not be used on SSD devices as this may reduce the lifespan of these devices

2.5: IP Addresses

Each RSF-1 cluster node requires at least one fixed IP address.

Except in the case of Fibre Channel only, at least one spare IP address is also required for each RSF-1 Service, per network that ZFS services will be available.

2.6: Required Packages

In addition to the core RSF-1 product set, if you are intending to use the COMSTAR framework, you will also need to install the COMSTAR stack, which can be done as follows:
# pkg install -v storage-server

3: GETTING STARTED (Solaris and derivatives)

3.1: Getting the RSF-1 Package

You can download the latest RSF-1 Package and documentation here or anonymous ftp from ftp://ftp.high-availability.com
# ftp ftp.high-availability.com
Connected to ftp.high-availability.com (213.171.204.157). 220-=(<*>)=-.:. (( Welcome to PureFTPd 1.1.0 )) .:.-=(<*>)=- 220-You are user number 8 of 50 allowed
220-Local time is now 11:35 and the load is 0.05. Server port: 21. 220 You will be disconnected after 15 minutes of inactivity.
Name (ftp.high-availability.com:root): anonymous
230 Anonymous user logged in
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ls
227 Entering Passive Mode (213,171,204,157,168,181).
150 Accepted data connection
drwxr-xr-x 4 0 0 drwxr-xr-x 4 0 0 drwxrwxrwx 2 0 0 drwxrwxrwx 5 0 0 -rw-r--r-- 1 0 0 6.4.x86_64.rpm
-rw-r--r-- 1 0 0 x86.3.9.10.2014-12-04.pkg 226-Options: -a -l
4096 Oct 2 11:35 .
4096 Oct 2 11:35 ..
4096 Mar 28 2013 Docs
4096 Apr 19 2013 RSF-1 Gui
2180703 Sep 23 11:24 rsf-1-3.9.1-7-CentOS- 27474432 Dec 11 17:47 rsf-1-solaris-5.11-
226 7 matches total
ftp> get rsf-1-solaris-5.11-x86.3.9.10.2014-12-04.pkg
local: rsf-1-solaris-5.11-x86.3.9.10.2014-12-04.pkg remote: rsf-1-solaris- 5.11-x86.3.9.10.2014-12-04.pkg
229 Extended Passive mode OK (|||44589|)
150-Accepted data connection
150 26830.5 kbytes to download
100% |***********************************| 26830 KiB
ETA
226-File successfully transferred
226 6.898 seconds (measured here), 3.80 Mbytes per second
27474432 bytes received in 00:06 (3.78 MiB/s)
ftp>
Online documentation is also available at:
  • http://www.high-availability.com/resources
  • 3.2: Installing the RSF-1 Package

    Copy the downloaded package (actual package name will vary with future releases) onto both servers and, as root, install as follows:
    # pkgadd -d rsf-1-solaris-5.11-x86.3.9.10.2014-12-04.pkg
    Once the pkgadd has completed, the product will be installed in the directory hierarchy /opt/HAC and /opt/HAC/RSF-1, and the final output line from the pkgadd command will advise how to connect to the GUI:

    .........
    install_http_server: generating HTML for HTTP server (standalone) Server is running
    Server is running
    install_http_server: RSF-1 GUI now available on http://romulus:8020/
    Installation of <rsf-1> was successful. #

    You can now choose to complete the installation and configuration using the Graphical User Interface (GUI), described in section 9, or the Command Line Interface (CLI) following here.

    3.3: RSF-1 Licences

    The only difference between evaluation and production versions of RSF-1 is the licence string used to activate the product. Evaluation licenses normally have an expiry of 30 days from issue whereas permanent (purchased) licence keys are perpetual. Evaluation licenses can be upgraded to permanent ones without reconfiguration or service disruption.

    The licence string is generated from the node's hostid, which can be determined once the base package has been installed as follows:
    # /opt/HAC/bin/hac_hostid
    390ea5a5
    #
    NOTE - A unique licence is required for each RSF-1 node in the cluster, and therefore the hostid of each server is required.
    Evaluation licences may be obtained as follows:
    Using the CLI:
    Send an email to support@high-availability.com with the following body:
    Subject: RSF-1 Evaluation License Request

    nodea: <node1_hostid>
    nodeb: <node2_hostid>
    type: temp
    custref: <organisation name, your name>
    os: <Operating System and version>
    By return, you will receive an email with the license strings required to activate the product together with: install script, RSF-1 password files and End User Licence Agreement (EULA).

    By installing the licence keys, you are agreeing to the terms of the EULA.

    On both nodes, temporarily save the received attachments in /tmp and install with install_lic.sh
    # sh /tmp/install_lic.sh
    Installing licenses for cluster nodes with hostid 42029b314 and 49e5e6b0 License successfully installed on this node.
    #
    To verify the licence strings, use rsfmon –v
    # /opt/HAC/RSF-1/bin/rsfmon –v
    RSF-1 monitor release 3.9.9 (06 Oct 2014 15:43) for 19:solaris (built on 5.11)
    Copyright High-Availability.Com Ltd
    [29786 Oct 6 16:56:56] Using machine ID 0x42029b314
    This copy of RSF-1 is licensed for up to 128 services.
    The licence expires on 2014-12-22
    [29786 Oct 6 16:56:56] This host is licenced for automatic service startup
    #
    NOTE – If RPC services are not already enabled, rsfmon –v will warn as follows, and you should enable and the start the service as shown:
    RSF-1: Note: SMF property for network/rpc/bind/local_only is set to true. RSF-1: This means remote cluster RPC operations will fail!"
    RSF-1: Resolve by issuing:
    RSF-1:
    RSF-1: svccfg -s svc:/network/rpc/bind setprop config/local_only = false RSF-1: svcadm refresh network/rpc/bind:default
    RSF-1:
    RSF-1:To allow remote RPC operations.
    # svccfg -s svc:/network/rpc/bind setprop config/local_only = false
    # svcadm refresh network/rpc/bind:default

    3.4: Starting RSF-1

    Once the licenses have been verified, RSF-1 can now be started in bootstrap mode. Run the following on each node:
    # /opt/HAC/RSF-1/bin/rsfctl start
    RSF-1: (1) Registering ZFS sysevent watcher: /opt/HAC/RSF-1/bin/rsf-zfs-event

        ____  _____ ______   ___
       / __ \/ ___// ____/  <  /
      / /_/ /\__ \/ /_______/ /
     / _, _/___/ / __/_____/ /
    /_/ |_|/____/_/       /_/

    RSF-1 (c) Copyright High-Availability.Com Ltd, 1995-2014
    http://www.high-availability.com/
    (For support call 0845 736 1974)
    (or email support@high-availability.com)


    RSF-1: (1) Starting rsfmon
    RSF-1: (1) Starting RSF-1 cluster RPC services.
    RSF-1: (1) Starting cache file sync process...
    #
    RSF-1 is now ready for configuration.

    3.5: RSF-1 Processes

    At this stage, a number of RSF-1 processes should be running:
    # ps -ef | grep -i rsf
    root 1168 1 0 09:54:24 ? 0:00 /opt/HAC/RSF-1/bin/rsfpmon -v -w 1 -l /var/run/rsfpmon_rpchasvc /opt/HAC/RSF-1/
    root 1173 11680 09:54:24 ?0:00 /opt/HAC/RSF-1/bin/rpchasvc
    root 1186 11850 09:54:24 ?0:02 python /opt/HAC/RSF-1/bin/rpc_server.pyc
    root 118510 09:54:24 ?0:00 /opt/HAC/RSF-1/bin/rsfpmon -v -w 1 -l /var/run/rsfpmon_rpc_py python /opt/HAC/R
    root 1207 12020 09:54:24 ?0:01 rsfmon -i
    root 120210 09:54:24 ?0:01 rsfmon -i
    root 1208 12020 09:54:24 ?0:02 rsfmon -i
    root 1209 12020 09:54:24 ?0:03 rsfmon -i
    root 329010 09:54:54 ?0:00 /bin/sh /opt/HAC/RSF-1/bin/rsf-zpool-cache-sync pool=poola
    root 329110 09:54:54 ?0:00 /bin/sh /opt/HAC/RSF-1/bin/rsf-zpool-cache-sync pool=poolb
    #

    3.6: RSF-1 Configuration and Management

    RSF-1 can be configured and administered via both Graphical User Interface rsfgui and Command Line Interface rsfcli.

    Optional additional Application Programmable Interfaces (APIs) are also available for integration with other toolsets.

    3.7: RSF-1 User Authentication and Security

    RSF-1 uses special user names and passwords to administer the cluster.

    These are separate to those used by the underlying operating system and must be maintained separately, and are installed as part of the license generation procedure described previously.

    Further information about managing RSF-1 usernames and passwords is in the RSF-1 Administration Guide
    NOTE - It is possible to bypass the need to input the password each time you use the rsfcli command using rsfcli –i0

    4: RSF-1 CONFIGURATION

    4.1: Example Topology

    In this example, we have two servers: romulus and remus and three ZFS pools. The ZFS pools (POOLA, POOLB and POOLC) have all been created on top of the underlying shared storage and both servers have the capability to import and export them.

    We are going to use two network heartbeats: one private (shown below as a red dotted line) and one public (shown in orange). We'll also use three independent disk heartbeats, one per ZFS pool, (shown as black dotted lines).

    Example Topology

    The system hostnames and identities on the public network are romulus and remus respectively, and romulus_priv and remus_priv respectively on the private network.

    Let's assume we want to set up an Active/Active cluster configuration with two RSF-1 Services, the first (which we'll call POOLA) consisting of ZFS pools A and C, and the second (called POOLB) just ZFS pool B.

    For service POOLA, we'll associate the VIP sales_staff-public and for service POOLB, we'll use support_staff-public. User access to ZFS pools A and C is therefore via the sales_staff-public VIP address, and access to ZFS pool B via the support_staff-public VIP address.
    NOTE - When the POOLA service is failed over between servers, both ZFS pools A and C and the VIP sales_staff-public will migrate as part of that service. When POOLB service is failed over, only pool B is moved together with the support_staff-public VIP.
    Before we start to configure RSF-1, we need to ensure the ZFS pools can be successfully imported and exported between the two servers.

    4.2: RSF-1 Configuration Using RSFADM

    The command 'rsfadm' (/opt/HAC/RSF-1/bin/rsfadm) can be used to easily configure the cluster. It makes changes to the cluster using an HTTP API, storing the configuration parameters in a database before writing them to a config file. Because of this, the config file should not be edited manually, since that would cause it to hold different information to the database.
    NOTE - Ensure that all ZFS pools are imported on one node. In our example, POOLA and POOLC are imported on romulus and POOLB is imported on remus. Also ensure the VIPs are not already in use elsewhere before proceeding with RSF-1 configuration
    Before the cluster can be initialised, rsfmon should be running in bootstrap mode. This can be checked using the 'show' subcommand - it should show a cluster of one node, with the cluster name 'Ready_For_Cluster_Configuration':
    root@romulus:~# rsfadm show Global information:
    Nodes:
    Cluster name : Ready_For_Cluster_Configuration Poll time : 2
    Config CRC : 7f14
    0 : romulus (romulus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    Services:
    (none)
    Heartbeats: (none)
    root@romulus:~#
    To make sure RSF-1 can communicate with RSF-1 on the remote node, the discover- nodes subcommand can be used:
    root@romulus:~# rsfadm discover-nodes Node 0: remus root@romulus:~#
    Now, to create the cluster, the hostnames of the two nodes will need to be provided to the init subcommand:
    root@romulus:~# rsfadm init romulus remus
    Oct 22 15:23:07 romulus RSF-1[907]: [ID 702911 local0.alert] RSF-1 hot restart: services may be running.
    root@romulus:~#
    Now that the cluster is initialised, the 'show' subcommand should show a 2 node cluster with a single network heartbeat between the nodes:
    root@romulus:~# rsfadm show Global information:
    Nodes:
    Cluster name : HA-Cluster Poll time : 1
    Config CRC : 9e82
    0 : remus (remus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    1 : romulus (romulus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    Services:
    (none)
    Heartbeats:
    0: NET remus --> romulus VIA romulus: Up - last heartbeat #6 (updated
    Thu Oct 22 15:26:44)
    1: NET romulus --> remus VIA remus: Up - last heartbeat #5 (updated Thu
    Oct 22 15:26:44)
    2 heartbeats configured, 2 up, 0 down root@romulus:~#
    A second network heartbeat can be added (optionally) to increase resiliency. In this example, romulus_priv and remus_priv will be used for the network addresses:

    root@romulus:~# \
    rsfadm create-hb -t net -d romulus:romulus_priv,remus:remus_priv Machine: romulus, Interface: romulus_priv
    Machine: remus, Interface: remus_priv
    Oct 22 15:29:12 romulus RSF-1[29192]: [ID 702911 local0.alert] RSF-1 hot restart: services may be running.
    root@romulus:~#
    root@romulus:~# rsfadm show heartbeats
    0: NET
    (updated
    1: NET
    remus --> romulus Thu Oct 22 15:29:25) remus --> romulus
    VIA romulus_priv: Up - last heartbeat #12
    VIA romulus: Up - last heartbeat #12 (updated VIA remus_priv: Up - last heartbeat #12
    VIA remus: Up - last heartbeat #12 (updated
    Thu Oct 22 15:29:24)
    2: NET
    (updated
    3: NET
    romulus --> remus Thu Oct 22 15:29:26) romulus --> remus
    Thu Oct 22 15:29:26)
    4 heartbeats configured, 4 up, 0 down root@romulus:~#
    The cluster is now set up with two network heartbeat channels, so we can now add a service to control the zpool POOLA (currently imported on romulus). As in the example in the previous section, the service will use the floating IP address 'sales_staff-public' (defined in /etc/hosts on both nodes). Also, the primary server for this service will be set to romulus, and the vip will be plumbed into net1:
    root@romulus:~# \
    rsfadm create-svc POOLA -v sales_staff-public -i net1 -p romulus
    Oct 22 16:53:58 romulus RSF-1[29880]: [ID 702911 local0.alert] RSF-1 hot restart: services may be running.
    root@romulus:~#
    root@romulus:~#
    root@romulus:~#
    root@romulus:~# rsfadm show
    Global information:
    Nodes:
    Services:
    0 :
    Heartbeats: 0: NET
    Cluster name : HA-Cluster Poll time : 1
    Config CRC : 9fbb
    0 :
    1 :
    remus (remus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    romulus (romulus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    POOLA, IP address sales_staff-public, "RSF-1 cluster service"
    remus --> romulus (updated Thu Oct 22 16:57:22) 1: NET remus --> romulus
    VIA romulus_priv: Up - last heartbeat #203
    VIA romulus: Up - last heartbeat #203 (updated
    4: NET romulus --> remus (updated Thu Oct 22 16:57:22) 5: NET romulus --> remus
    VIA remus_priv: Up - last heartbeat #205
    VIA remus: Up - last heartbeat #204 (updated
    stopped automatic running automatic
    unblocked on remus unblocked on romulus
    Thu Oct 22 16:57:23)
    2: DISC remus --> romulus VIA
    id1,sd@n60018400000055659e5a0001/a,raw:512,id1,sd@n60018400000055659e5a0001/ a,raw:518: Up - last heartbeat #203 (updated Thu Oct 22 16:57:23)
    3: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0002/a,raw:512,id1,sd@n60018400000055659e5b0002/ a,raw:518: Up - last heartbeat #203 (updated Thu Oct 22 16:57:23)
    Thu Oct 22 16:57:22)
    6: DISC romulus --> remus VIA
    id1,sd@n60018400000055659e5a0001/a,raw:518,id1,sd@n60018400000055659e5a0001/ a,raw:512: Up - last heartbeat #204 (updated Thu Oct 22 16:57:22)
    7: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0002/a,raw:518,id1,sd@n60018400000055659e5b0002/ a,raw:512: Up - last heartbeat #204 (updated Thu Oct 22 16:57:22)
    8 heartbeats configured, 8 up, 0 down
    root@romulus:~#
    Next is to add the second service - POOLB, which will run on remus by default and use floating IP hostname 'support_staff-public', which will also be plumbed into net1:
    root@romulus:~# zpool list NAME SIZE ALLOC FREE POOLA 2.95G 237K 2.95G POOLB 2.95G 184K 2.95G rpool 9.69G 6.02G 3.67G root@romulus:~# \
    CAP DEDUP HEALTH ALTROOT 0% 1.00x ONLINE -
    0% 1.00x ONLINE -
    62% 1.00x ONLINE -
    rsfadm -v create-svc POOLB -v support_staff-public -i net1 -p remus
    Oct 22 17:13:09 romulus RSF-1[23839]: [ID 702911 local0.alert] RSF-1 hot restart: services may be running.
    root@romulus:~# rsfadm show
    Global information:
    Nodes:
    Services:
    0 :
    1 :
    Heartbeats: 0: NET
    Cluster name : HA-Cluster Poll time : 1
    Config CRC : f200
    0 :
    1 :
    remus (remus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    romulus (romulus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    POOLA, IP address sales_staff-public, "RSF-1 cluster service" stopped automatic unblocked on remus
    running automatic unblocked on romulus
    POOLB, IP address support_staff-public, "RSF-1 cluster service" starting automatic unblocked on remus
    stopped automatic unblocked on romulus
    remus --> romulus (updated Thu Oct 22 17:13:27) 1: NET remus --> romulus
    VIA romulus_priv: Up - last heartbeat #18
    VIA romulus: Up - last heartbeat #18 (updated
    Thu Oct 22 17:13:28)
    2: DISC remus --> romulus VIA
    id1,sd@n60018400000055659e5a0001/a,raw:512,id1,sd@n60018400000055659e5a0001/ a,raw:518: Up - last heartbeat #18 (updated Thu Oct 22 17:13:28)
    3: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0002/a,raw:512,id1,sd@n60018400000055659e5b0002/ a,raw:518: Up - last heartbeat #18 (updated Thu Oct 22 17:13:28)
    4: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0004/a,raw:512,id1,sd@n60018400000055659e5b0004/ a,raw:518: Up - last heartbeat #18 (updated Thu Oct 22 17:13:28)
    5: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0005/a,raw:512,id1,sd@n60018400000055659e5b0005/ a,raw:518: Up - last heartbeat #18 (updated Thu Oct 22 17:13:28)
    6: NET romulus --> remus (updated Thu Oct 22 17:13:30) 7: NET romulus --> remus
    VIA remus_priv: Up - last heartbeat #20
    VIA remus: Up - last heartbeat #19 (updated
    Thu Oct 22 17:13:30)
    8: DISC romulus --> remus VIA
    id1,sd@n60018400000055659e5a0001/a,raw:518,id1,sd@n60018400000055659e5a0001/ a,raw:512: Up - last heartbeat #19 (updated Thu Oct 22 17:13:30)
    9: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0002/a,raw:518,id1,sd@n60018400000055659e5b0002/ a,raw:512: Up - last heartbeat #19 (updated Thu Oct 22 17:13:30)
    10: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0004/a,raw:518,id1,sd@n60018400000055659e5b0004/ a,raw:512: Up - last heartbeat #19 (updated Thu Oct 22 17:13:30)
    11: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0005/a,raw:518,id1,sd@n60018400000055659e5b0005/ a,raw:512: Up - last heartbeat #19 (updated Thu Oct 22 17:13:30)
    12 heartbeats configured, 12 up, 0 down
    root@romulus:~#
    From the show command, you can see that when each service is added, rsfadm chooses two disks from the corresponding zpool to use for heartbeats, so now that we have two services, there are 4 disk heartbeats. Further rsfadm commands can be used to add or remove disk heartbeats if necessary.
    Finally, the third pool - POOLC - needs to be added to the first service - POOLA - so that POOLA and POOLC fail over together:
    root@romulus:~/rsf-1/rsfadm# rsfadm modify-svc POOLA create-pool POOLC Oct 22 17:44:23 romulus RSF-1[2090]: [ID 702911 local0.alert] RSF-1 hot restart: services may be running.
    root@romulus:~/rsf-1/rsfadm# rsfadm show
    Global information:
    Cluster name : HA-Cluster
    Nodes:
    Services:
    0 :
    1 :
    Heartbeats: 0: NET
    Poll time : 1 Config CRC : 1c47
    0 :
    1 :
    remus (remus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    romulus (romulus) available
    RSF-1 release 3.11.0p9, built on 16-Sep-2015-16:13
    POOLA, IP address sales_staff-public, "RSF-1 cluster service" stopped automatic unblocked on remus
    running automatic unblocked on romulus
    POOLB, IP address support_staff-public, "RSF-1 cluster service"
    remus --> romulus (updated Thu Oct 22 17:45:16) 1: NET remus --> romulus
    VIA romulus_priv: Up - last heartbeat #53
    VIA romulus: Up - last heartbeat #53 (updated
    8: NET romulus --> remus (updated Thu Oct 22 17:45:16) 9: NET romulus --> remus
    VIA remus_priv: Up - last heartbeat #53
    VIA remus: Up - last heartbeat #53 (updated
    running automatic stopped automatic
    unblocked on remus unblocked on romulus
    Thu Oct 22 17:45:17)
    2: DISC remus --> romulus VIA
    id1,sd@n60018400000055659e5b0007/a,raw:518,id1,sd@n60018400000055659e5b0007/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    3: DISC remus --> romulus VIA id1,sd@n60018400000055659e5a0001/a,raw:512,id1,sd@n60018400000055659e5a0001/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    4: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0002/a,raw:512,id1,sd@n60018400000055659e5b0002/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    5: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0004/a,raw:512,id1,sd@n60018400000055659e5b0004/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    6: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0005/a,raw:512,id1,sd@n60018400000055659e5b0005/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    7: DISC remus --> romulus VIA id1,sd@n60018400000055659e5b0008/a,raw:518,id1,sd@n60018400000055659e5b0008/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:17)
    Thu Oct 22 17:45:16)
    10: DISC romulus --> remus VIA
    id1,sd@n60018400000055659e5b0007/a,raw:512,id1,sd@n60018400000055659e5b0007/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    11: DISC romulus --> remus VIA id1,sd@n60018400000055659e5a0001/a,raw:518,id1,sd@n60018400000055659e5a0001/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    12: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0002/a,raw:518,id1,sd@n60018400000055659e5b0002/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    13: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0004/a,raw:518,id1,sd@n60018400000055659e5b0004/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    14: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0005/a,raw:518,id1,sd@n60018400000055659e5b0005/ a,raw:512: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    15: DISC romulus --> remus VIA id1,sd@n60018400000055659e5b0008/a,raw:512,id1,sd@n60018400000055659e5b0008/ a,raw:518: Up - last heartbeat #53 (updated Thu Oct 22 17:45:16)
    16 heartbeats configured, 16 up, 0 down
    root@romulus:~/rsf-1/rsfadm#
    Again from the show command, two more disk heartbeats have been added.

    The full list of rsfadm commands can be found by running 'rsfadm –h':
    root@romulus:~# rsfadm
    rsfadm -h
    CLI for administration of an RSF-1 cluster Usage:
    rsfadm [OPTIONS] [command options] Options:
    -v|--verbose
    increase output debug level
    -V|--version print version
    Subcommands: discover-nodes
    init
    destroy
    reset
    show
    create-hb
    delete-hb
    create-svc
    delete-svc
    modify-svc
    create-vip delete-vip create-pool delete-pool create-hb delete-hb
    control-svc help
    root@romulus:~#
    The 'help' subcommand can be used to get a summary of all commands, or more information about one command:
    root@romulus:~# rsfadm help destroy
    destroy
    root@romulus:~#
    Destroy cluster Usage:
    rsfadm destroy [OPTIONS]
    Options:
    -f|--force
    Force destroy the cluster

    4.3: RSF-1 Administration Commands

    When config_dist has been used to distribute the config file, all newly configured services will be started on all nodes and set in auto mode. You can use the rsfcli to see the various service statuses:
    romulus# /opt/HAC/RSF-1/bin/rsfcli -v list
    romulus:
    POOLA running auto unblocked sales_staff-public bge0 20 8
    POOLB running auto unblocked support_staff-public bge0 20 8
    remus:
    POOLA running auto unblocked sales_staff-public bge0 20 8
    POOLB running auto unblocked support_staff-public bge0 20 8
    romulus#
    Heartbeat status can also be checked as follows:
    romulus# /opt/HAC/RSF-1/bin/rsfcli -v heartbeats
    remus : net=2 disc=4 serial=0
    romulus net remus
    romulus net remus_priv
    romulus disc /dev/rdsk/c3t20000011C6CBCAD2d0s0:518,
    /dev/rdsk/c5t20000011C6CBCAD2d0s0:512
    romulus disc /dev/rdsk/c3t40000012F5E33CA0Dd0s0:518,
    /dev/rdsk/c5t40000012F5E33CA0Dd0s0:512
    romulus disc /dev/rdsk/c3t40000012F5E38D6EEd0s0:518,
    /dev/rdsk/c5t40000012F5E38D6EEd0s0:512
    romulus disc /dev/rdsk/c3t60000016D55EAD32Fd0s0:518,
    /dev/rdsk/c5t60000016D55EAD32Fd0s0:512
    romulus : net=2 disc=4 serial=0
    remus net romulus
    remus net romulus_priv
    remus disc /dev/rdsk/c5t20000011C6CBCAD2d0s0:512,
    /dev/rdsk/c3t20000011C6CBCAD2d0s0:518
    remus disc /dev/rdsk/c5t40000012F5E33CA0Dd0s0:512,
    /dev/rdsk/c3t40000012F5E33CA0Dd0s0:518
    remus disc /dev/rdsk/c5t40000012F5E38D6EEd0s0:512,
    /dev/rdsk/c3t40000012F5E38D6EEd0s0:518
    remus disc /dev/rdsk/c5t60000016D55EAD32Fd0s0:512,
    /dev/rdsk/c3t60000016D55EAD32Fd0s0:512
    To see the full service situation including heartbeat status and time-stamps:
    romulus# /opt/HAC/RSF-1/bin/rsfcli stat
    Contacted 127.0.0.1 in cluster "HA-Cluster", CRC = 0x692c, ID = <none>

    Host romulus (192.168.33.91) UP, service startups enabled,
    RSF-1 release 3.9.8, built on 01-Oct-2014-10:41 "3.9.8".

    Host remus (192.168.33.92) UP, service startups enabled,
    RSF-1 release 3.9.8, built on 01-Oct-2014-10:41 "3.9.8".

    2 nodes configured, 2 online.
    0 Service POOLA, IP Address sales_staff-public, "Sales Staff Pools":
    stopped auto unblocked on romulus
    running auto unblocked on remus

    1 Service POOLB, IP Address support_staff-public, "Support Staff Pools":
    running auto unblocked on romulus
    stopped auto unblocked on remus


    2 services configured
    1 service instances stopped
    1 service instances running

    Heartbeats:

    00 net romulus -> remus [192.168.33.92]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:18 BST
    01 net romulus -> remus_priv [10.1.1.92]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:19 BST
    02 disc romulus -> remus (via /dev/rdsk/c3t20000011C6CBCAD2d0s0:518, /dev/rdsk/c5t20000011C6CBCAD2d0s0:512) [(20]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:18 BST
    03 disc romulus -> remus (via /dev/rdsk/c3t40000012F5E33CA0Dd0s0:518, /dev/rdsk/c5t40000012F5E33CA0Dd0s0:512) [(20]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:18 BST
    04 disc romulus -> remus (via /dev/rdsk/c3t40000012F5E38D6EEd0s0:518, /dev/rdsk/c5t40000012F5E38D6EEd0s0:512) [(20]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:18 BST
    05 disc romulus -> remus (via /dev/rdsk/c3t60000016D55EAD32Fd0s0:518, /dev/rdsk/c5t60000016D55EAD32Fd0s0:512) [(20]: Up, last heartbeat #2135 Thu 2014-10-02 16:58:18 BST
    06 net remus -> romulus: Up, last heartbeat #2138 Thu 2014-10-02 16:58:18 BST
    06 net remus -> romulus_priv: Up, last heartbeat #2138 Thu 2014-10-02 16:58:19 BST
    07 disc remus -> romulus (via /dev/rdsk/c5t20000011C6CBCAD2d0s0:512, /dev/rdsk/c3t20000011C6CBCAD2d0s0:518) [(20]: Up, last heartbeat #2138 Thu 2014-10-02 16:58:18 BST
    08 disc remus -> romulus (via /dev/rdsk/c5t40000012F5E33CA0Dd0s0:512, /dev/rdsk/c3t40000012F5E33CA0Dd0s0:518) [(20]: Up, last heartbeat #2138 Thu 2014-10-02 16:58:18 BST
    09 disc remus -> romulus (via /dev/rdsk/c5t40000012F5E38D6EEd0s0:512, /dev/rdsk/ c3t40000012F5E38D6EEd0s0:518) [(20]: Up, last heartbeat #2138 Thu 2014-10-02 16:58:18 BST
    10 disc remus -> romulus (via /dev/rdsk/c5t60000016D55EAD32Fd0s0:512, /dev/rdsk/c3t60000016D55EAD32Fd0s0:518) [(20]: Up, last heartbeat #2138 Thu 2014-10-02 16:58:18 BST
    12 heartbeats configured, 12 up, 0 down

    Errors:
    No errors detected
    Assuming there are no configuration issues, all newly added services should be running. If however there are any errors reported, you can view the RSF-1 text log file at /opt/HAC/RSF-1/log/rsfmon.log for further information, errors or warnings.

    5: Starting RSF-1 Services

    If services are set to stopped and manual on the cluster nodes, they can be started by simply putting the service into Automatic mode on each node.
    NOTE - It is advisable to put the primary node mode to Automatic before doing this on the other node, so for example, put POOLA service in Automatic mode on romulus, and POOLB service in Automatic mode on remus
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto POOLA
    remus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto POOLB
    rsfcli can again be used to verify the services are now running:
    romulus# /opt/HAC/RSF-1/bin/rsfcli -v list
    romulus:
    POOLA running auto unblocked sales_staff-public bge0 20 8
    POOLB stopped manual unblocked support_staff-public bge0 20 8
    remus:
    POOLA stopped manual unblocked sales_staff-public bge0 20 8
    POOLB running auto unblocked support_staff-public bge0 20 8
    romulus#
    It is now safe to put all services in automatic mode across the cluster:
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto POOLB
    remus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto staff_staff
    romulus# /opt/HAC/RSF-1/bin/rsfcli -v list
    romulus:
    POOLA running auto unblocked sales_staff-public bge0 20 8
    POOLB stopped auto unblocked support_staff-public bge0 20 8
    remus:
    POOLA stopped auto unblocked sales_staff-public bge0 20 8
    POOLB running auto unblocked support_staff-public bge0 20 8
    romulus#
    If either of the services are not shown as running, view the RSF-1 log files to look for reasons. If a service is shown as broken_safe, it can be repaired and reset to automatic as follows:
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 repair POOLA
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto POOLA
    If however a service is showing as broken_unsafe, further investigation is required before a retry should be attempted. Please refer to the RSF-1 Administrator's Guide for further information and detail.

    If the services are successfully running, we are now ready to run some failure and failover tests.

    6: RSF-1 Log Files


    All RSF-1 log files are located in /opt/HAC/RSF-1/log, and are readable text files written sequentially in ascending date order. A new log file is created when RSF-1 is started, and will remain current until RSF-1 is stopped.

    For troubleshooting purposes, examining the log files on both cluster nodes, is required.

    Example log file:
    [28430 Oct 6 16:46:24] ------------------- RSF-1 starting -------------------
    [28430 Oct 6 16:46:24] RSF-1 monitor 3.9.9 p1 started on Mon Oct 6 16:46:24 2014
    [28430 Oct 6 16:46:24] Compiled on 06 Oct 2014 15:43 for 19:solaris-x86
    [28430 Oct 6 16:46:24] Copyright High-Availability.Com Ltd
    [28430 Oct 6 16:46:24] Using machine ID 0x42029b314
    [28430 Oct 6 16:46:24] INFO: This copy of RSF-1 expires on 2014-12-22
    [28430 Oct 6 16:46:24] INFO: This copy of RSF-1 is licenced for automatic service startup
    [28430 Oct 6 16:46:24] Configuration file parsed OK, CRC is 0xe10d
    [28430 Oct 6 16:46:24] Using syslog facility code 128: LOG_LOCAL0
    [28430 Oct 6 16:46:24] Running as machine nextest1 on host nextest1 (5436EHA9D)
    [28541 Oct 6 16:46:24] Running at realtime RR scheduling priority 1
    [28541 Oct 6 16:46:24] NOTICE: Service tank1 is now stopped on nextest1 (was unknown)
    [28541 Oct 6 16:46:24] Socket state change. UDP port 1195 was down, is up
    [28541 Oct 6 16:46:24] Socket state change. TCP port 1195 was down, is up
    [28541 Oct 6 16:46:24] Process mlocked in memory
    [28541 Oct 6 16:46:24] Starting main heartbeat loop
    [28541 Oct 6 16:46:24] network interface monitor process (re)started, pid = 28542
    [28541 Oct 6 16:46:24] event notification process (re)started, pid = 28543
    [28542 Oct 6 16:46:24] Process mlocked in memory
    [28541 Oct 6 16:46:24] disc heartbeat process (re)started, pid = 28544
    [28541 Oct 6 16:46:24[28543 Oct 6 16:46:24] Process mlocked in memory
    [28544 Oct 6 16:46:24] Process mlocked in memory
    [28545 Oct 6 16:46:24] Process mlocked in memory
    [28541 Oct 6 16:46:25] INFO: event-notify: LOG_INFO RSF_DAEMON machine=nextest1 state=start
    [28541 Oct 6 16:46:25] INFO: event-notify: LOG_INFO RSF_SERVICE service=tank1 state=stopped mode=manual block=unblocked
    [28541 Oct 6 16:46:25] INFO: event-notify: LOG_INFO RSF_HEARTBEAT heartbeat=3 type=net from=nextest2 state=Unavailable
    [28541 Oct 6 16:46:25] NOTICE: net heartbeat (3, seq 37) from nextest2 OK
    [28541 Oct 6 16:46:25] CRIT: Established contact with nextest2
    [28541 Oct 6 16:46:25] nextest2.tank1 unknown/manual/unblocked -> stopped/manual/unblocked
    [28541 Oct 6 16:46:25] NOTICE: Service tank1 not running, is manual/unblocked, not starting it.
    [28541 Oct 6 16:46:26] INFO: event-notify: LOG_NOTICE RSF_HEARTBEAT heartbeat=3 type=net from=nextest2 state=Up latest=1412610385
    [28541 Oct 6 16:46:26] INFO: event-notify: LOG_INFO RSF_MACHINE remote=nextest2 state=Up
    [28541 Oct 6 16:46:26] bge0 network interface state change: unknown -> running
    [28541 Oct 6 16:46:26] NOTICE: Service tank1 block state change: unknown -> unblocked
    [28541 Oct 6 16:46:27] INFO: event-notify: LOG_NOTICE RSF_NET_DEVICE netdevice=bge0 state=OK
    [28541 Oct 6 16:46:29] NOTICE: disc heartbeat (4, seq 40) from nextest2 OK
    [28541 Oct 6 16:46:29] NOTICE: disc heartbeat (5, seq 40) from nextest2 OK
    [28541 Oct 6 16:46:55] Full date and timestamp: Mon Oct 6 16:46:55 2014 BST
    [28541 Oct 6 16:46:55] INFO: RSF-1 version 3.9.9 OK on nextest1, (hostname nextest1, 5436EHA9D)
    [28541 Oct 6 17:00:01] Full date and timestamp: Mon Oct 6 17:00:01 2014 BST
    [28541 Oct 6 17:00:01] INFO: RSF-1 version 3.9.9 OK on nextest1, (hostname nextest1, 5436EHA9D)

    Whilst the above shows a typical startup sequence, log-file analysis is more helpful when diagnosing or debugging specific service startup and shutdown sequences. The following is the initial startup of a service tank1:
    ...
    [28334 Nov 18 15:16:53] NOTICE: Service tank1 not running, start in 6 seconds
    [28334 Nov 18 15:16:59] Time to start tank1
    [28334 Nov 18 15:16:59] Running start scripts for nextest2.tank1, pid = 16528
    [28334 Nov 18 15:16:59] NOTICE: Service tank1 is now starting on nextest2 (was stopped)
    [16530 Nov 18 15:16:59] Service tank1 start log capture file /opt/HAC/RSF-1/log/nextest2.tank1.start.log created.
    [28334 Nov 18 15:17:00] INFO: event-notify: LOG_INFO RSF_SERVICE service=tank1 state=starting mode=automatic block=unblocked
    [28334 Nov 18 15:17:00] Service tank1 is now starting on nextest2 (was stopped on nextest1)
    [16915 Nov 18 15:17:01] User _rsfadmin setting service tank1 on nextest2 to manual
    [16528 Nov 18 15:17:02] [tank1 rsfexec] Running /opt/HAC/RSF-1/etc/rc.appliance.c/S01announce start 1

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [16931 Nov 18 15:17:02] [tank1 S01announce] Startup of service:tank1 started - attempt:1
    [28334 Nov 18 15:17:02] NOTICE: Service tank1 is now manual on nextest2 (was automatic)
    [16528 Nov 18 15:17:02] [tank1 rsfexec] Running /opt/HAC/RSF-1/etc/rc.appliance.c/S02ApplianceStarting start 1
    [16960 Nov 18 15:17:02] [tank1 S02ApplianceStarting] ========= ifconfig before interface/vip plumbing =========

    [16528 Nov 18 15:17:09] [tank1 rsfexec] ========= ifconfig/netstat complete =========
    [16528 Nov 18 15:17:09] [tank1 rsfexec] Total run time for service start: 7 seconds
    [17577 Nov 18 15:17:09] [tank1 S21res_drives] Service running on this node - refreshing /opt/HAC/RSF-1/etc/.res_drives.tank1...
    [28334 Nov 18 15:17:10] Script child 16528 exited with code 0
    [28334 Nov 18 15:17:10] Service tank1 start scripts succeeded (pid 16528 exit 0)
    [28334 Nov 18 15:17:10] NOTICE: Service tank1 is now running on nextest2 (was starting)
    [Nov 18 15:17:12] [tank1 disk_list] Created /opt/HAC/RSF-1/etc/.res_drives.tank1.new with 9 drives
    [28334 Nov 18 15:17:14] INFO: event-notify: ZFS_INFO IMPORT_COMPLETE machine=nextest2 service=tank1
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_EXEC_SCRIPT machine=nextest2 script=/opt/HAC/RSF-1/etc/rc.appliance.c/S21res_drives state=start
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_EXEC_SCRIPT machine=nextest2 script=/opt/HAC/RSF-1/etc/rc.appliance.c/S98ApplianceStarted state
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_EXEC_SCRIPT machine=nextest2 script=/opt/HAC/RSF-1/etc/rc.appliance.c/S99announce state=start a
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_NETIF_UP machine=nextest2 ipface=bge0 vip=vip01
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_SERVICE_STARTUP_COMPLETE machine=nextest2 service=tank1
    [28334 Nov 18 15:17:14] INFO: event-notify: LOG_INFO RSF_SERVICE service=tank1 state=running mode=manual block=unblocked


    The following log-file segment shows an example startup error where the service goes broken_safe during startup due to a disk reservation issue:
    ...
    [Nov 11 19:06:15] [cvol disk_list] Unknown disk
    [2376 Nov 11 19:06:15] [cvol S14res_drives] All reservation drives are either missing or return I/O error!
    [2376 Nov 11 19:06:15] [cvol S14res_drives] Aborting because of PROP_ABORT_UNFENCED_IMPORT: true
    [1427 Nov 11 19:06:15] [cvol rsfexec] *** /opt/HAC/RSF-1/etc/rc.appliance.c/S14res_drives failed BROKEN_SAFE! ***
    [1288 Nov 11 19:06:16] Script child 1427 exited with code 4
    [1288 Nov 11 19:06:16] Service cvol start scripts set broken safe state (pid 1427 exit 4)
    [1288 Nov 11 19:06:16] NOTICE: Service cvol is now broken_safe on nextest2 (was starting)

    7: Pool Services


    ZFS zvols can be used to create shares using NFS, CIFS/SMB, iSCSI or Fibre Channel.

    The following shows how to create a simple share with both NFS and SMB:
    # zfs create POOLA/vol1
    # zfs set sharesmb=on POOLA/vol1
    # zfs set sharenfs=on POOLA/vol1
    As the ZFS properties are being used to create the shares, nothing else needs to be done to enable failover of the shares.

    The following shows how to create a COMSTAR LU and expose it to clients using iSCSI. Note that RSF-1 handles failover of the LU and views, but you will need to ensure any host groups, target groups, target portal groups and targets are identical on both nodes. The groups are optional depending on your needs, but if used, they must be identical on both nodes. Assume POOLA service is running on romulus, stopped on remus:
    NOTE – The following commands need to be executed on both nodes (as shown).
    Create TPG for VIP (sales_staff-public = 192.168.33.105):
    root@romulus:~# itadm create-tpg poola-tpg 192.168.33.105
    root@remus:~# itadm create-tpg poola-tpg 192.168.33.105
    Create target using TPG:
    root@romulus:~# itadm create-target -t poola-tpg
    Target iqn.1986-03.com.sun:02:af94b9da-6905-e22a-f5ed-981dbd97c5c6 successfully created
    root@remus:~# itadm create-target -t poola-tpg -n iqn.1986-03.com.sun:02:af94b9da-6905-e22a-f5ed-981dbd97c5c6
    Target iqn.1986-03.com.sun:02:af94b9da-6905-e22a-f5ed-981dbd97c5c6 successfully created
    NOTE – The following commands manipulate the ZFS pools and so only need to be executed on the node that has the pool imported.
    Create ZFS volume to be shared:
    root@romulus:~# zfs create -V 1G POOLA/vol1
    Create LU:
    root@romulus:~# stmfadm create-lu /dev/zvol/rdsk/POOLA/vol01
    Logical unit created: 600144F08C1BC4000000546F44110001
    Create view
    root@romulus:~# stmfadm add-view 600144F08C1BC4000000546F44110001
    root@romulus:~#
    root@romulus:~# stmfadm list-view -l 600144F08C1BC4000000546F44110001
    View Entry: 0
    Host group : All
    Target group : All
    LUN : 0
    Back up view information for RSF-1 failover:
    root@romulus:~# stmfha backup POOLA
    root@romulus:~# ls -l /POOLA/.mapping/
    total 3
    -rw------- 1 root root 44 2014-11-21 13:59 @@RAN475204875@@POOLA@-vol01
    -rw-r--r-- 1 root root 11 2014-11-21 13:59 tidyUpFile495739573966393
    -rw-r--r-- 1 root root 0 2014-11-21 13:59 timeOfCreation68366902028957
    root@romulus:~# cat /POOLA/.mapping/@@RAN475204875@@POOLA@-vol01
    BeginMap
    index=0
    TG=All
    HG=All
    LUN=0
    EndMap
    NOTE - Any clients should always connect to the VIP address for the HA service, rather than the fixed IP address of either of the servers

    8: Testing RSF-1

    The following tests show how RSF-1 can be used to manually move services around the cluster and perform automatic failovers in the event of failures. The user may find it useful to open two terminal sessions per node whilst undertaking these tests; one for monitoring the RSF-1 log files, and the other to issue the various commands.
    NOTE - When connecting to cluster nodes, use the fixed IP addresses, not the service VIPs as these sessions will hang or exit during failover tests
    The recommended way to monitor log files is as follows (on each cluster node):
    romulus# tail -f /opt/HAC/RSF-1/log/rsfmon.log
    [9555 Oct 3 09:15:02] Process mlocked in memory
    [9556 Oct 3 09:15:02] Process mlocked in memory
    [9552 Oct 3 09:15:03] INFO: event-notify: LOG_INFO RSF_DAEMON machine=romulus state=start
    [9552 Oct 3 09:15:03] INFO: event-notify: LOG_INFO RSF_SERVICE service=POOLA state=stopped mode=manual block=unblocked
    [9552 Oct 3 09:15:03] INFO: event-notify: LOG_INFO RSF_SERVICE service=POOLB state=stopped mode=manual block=unblocked
    [9552 Oct 3 09:15:03] NOTICE: net heartbeat (7, seq 3) from remus OK
    [9552 Oct 3 09:15:03] CRIT: Established contact with remus
    [9552 Oct 3 09:15:03] romulus.POOLA unknown/manual/unblocked -> stopped/manual/unblocked
    [9552 Oct 3 09:15:03] romulus.POOLB unknown/manual/unblocked -> stopped/manual/unblocked
    [9552 Oct 3 09:15:03] NOTICE: Service POOLA not running, is manual/unblocked, not starting it.
    [9552 Oct 3 09:15:03] NOTICE: Service POOLB not running, is manual/unblocked, not starting it.

    8.1: Manual Failover Testing

    To move a service from one node to another, the rsfcli command can be used in two ways:

    Using rsfcli move:
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 move POOLA remus
    This causes the following to happen: Or using rsfcli stop:
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 stop POOLA
    This alternate method simply tells romulus to stop the service (running through steps 1-3 above). If remus has switchover mode for POOLA set to automatic, as soon as it sees the service is not running anywhere (and cannot run anywhere else), it will start the service after its RUNTIMEOUT countdown timer has expired.

    Once a service has been moved elsewhere and is running successfully, the switchover mode of the previous node should be reset to automatic:
    romulus# /opt/HAC/RSF-1/bin/rsfcli –i0 auto POOLA
    The above procedures can be used to manually move RSF-1 services around the cluster. Other than for initial testing purposes, these procedures can be used for planned maintenance situations, load redistribution etc.

    Automatic Failover Testing

    Automatic failover testing is best achieved by faking a system crash of one of the nodes that is running one or more services using one of the following methods:
    NOTE - The reboot command is not a recommended method for faking a system crash, as the shut down is not guaranteed to cleanly shut down any VIPs
    As soon as communication via all heartbeat channels ceases, the surviving node, assuming switchover modes are set to automatic, will initiate countdown and failover of those services lost by the crash.

    8.2: Faking Heartbeat Failures

    In order to trigger an automatic failover situation, all heartbeats must fail. Individual heartbeat mechanisms can be tested by temporarily disconnecting cables (e.g. serial and back-to-back network), or by temporarily unplumbing network interfaces.
    NOTE – Disconnecting SAS cables is not recommended for faking disk heartbeat issues as hardware damage can result.

    8.3: Testing for Split-Brain Scenarios

    A split-brain situation occurs when both nodes believe the other has gone away, and both try to start and manage the storage service. RSF-1 uses disk ring-fencing mechanisms to prevent this and force one or both servers to panic in order to protect the pool and prevent data corruption.

    Assuming romulus is currently running the POOLA service, and remus has its switchover mode for the staff_sales service set to automatic, the easiest way to test this is to hang romulus using the use mdb command. This causes the Operating System to lock up until resumed by the operator.
    romulus# mdb -K
    remus will detect this failure and will initiate failover of the POOLA. Once this has completed, hung romulus can be reawakened:
    ::quit -u
    romulus#
    As romulus resumes, and assumes it is still running the POOLA service, attempts to access the underlying disks (now reserved by remus) will cause the ring-fencing mechanism to kick in and immediately cause romulus to panic and safely reboot.
    root@romulus:~#
    panic[cpu1]/thread=fffffffc80b5ac20: Reservation Conflict
    Disk: /ethdrv/sd@1,0

    fffffffc80b5aa80 fffffffff797526d ()
    fffffffc80b5aae0 sd:sd_mhd_watch_cb+b6 ()
    fffffffc80b5ab30 scsi:scsi_watch_request_intr+144 ()
    fffffffc80b5ab60 ethdrv:ethdrv_retire+c5 ()
    fffffffc80b5ac00 genunix:taskq_thread+22e ()
    fffffffc80b5ac10 unix:thread_start+8 ()

    syncing file systems... done
    dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
    When romulus restarts, it will rejoin the cluster and will see all services are running on remus and will remain in stopped automatic mode for all services.