8.3 - Testing for Split-Brain Scenarios
A split-brain situation occurs when both nodes believe the other has gone away, and both try to start and manage the storage service. RSF-1 uses disk ring-fencing mechanisms to prevent this and force one or both servers to panic in order to protect the pool and prevent data corruption.
Assuming romulus is currently running the POOLA service, and remus has its switchover mode for the staff_sales service set to automatic, the easiest way to test this is to hang romulus using the use mdb command. This causes the Operating System to lock up until resumed by the operator.
romulus# mdb -K
remus will detect this failure and will initiate failover of the POOLA. Once this has completed, hung romulus can be reawakened:
resumes, and assumes it is still running the POOLA
service, attempts to access the underlying disks (now reserved by remus) will cause the ring-fencing mechanism to kick in and immediately cause romulus to panic and safely reboot.
panic[cpu1]/thread=fffffffc80b5ac20: Reservation Conflict
fffffffc80b5aa80 fffffffff797526d ()
fffffffc80b5aae0 sd:sd_mhd_watch_cb+b6 ()
fffffffc80b5ab30 scsi:scsi_watch_request_intr+144 ()
fffffffc80b5ab60 ethdrv:ethdrv_retire+c5 ()
fffffffc80b5ac00 genunix:taskq_thread+22e ()
fffffffc80b5ac10 unix:thread_start+8 ()
syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
restarts, it will rejoin the cluster and will see all services are running on remus
and will remain in stopped automatic mode for all services.