2011-06-01 10:31:23
DRBD and Heartbeat
I spent a considerable amount of time over the last couple of days working with DRBD and Heartbeat.
Below are the links I used to get things running:
http://wiki.centos.org/HowTos/Ha-Drbd
http://www.howtoforge.com/vm_replication_failover_vmware_debian_etch_p3
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/s-intro-pacemaker.html
http://www.drbd.org/users-guide/s-heartbeat-r1.html
http://www.drbd.org/users-guide/s-heartbeat-config.html
http://www.drbd.org/users-guide/s-heartbeat-crm.html
Part of my problem was not understanding the difference between R1 and DRM style clusters and their accompanying daemons; heartbeat, pacemaker and the different protocol versions. Pacemaker is a more advanced cluster resource manager that can work with both Corosync and Heartbeat. Heartbeat uses an older protocol whereas pacemaker uses OpenAIS to be compatible with RedHat cluster services.
Regardless here are my notes for configuration, and just for completeness my notes are a mix of doing this first on VMWare and then on a Xen cluster so any inconsistencies are a result of doing this multiple times in different environments. Regardless the errors are mine and I would recommend reading the documentation linked above.
The basics behind the setup is that DRBD replicates data between two servers. DRBD is the network block device that mirrors the data. The heartbeat daemon keeps track of the shared IP, the daemons that are in HA and runs the init scripts appropriately.
DRBD Initialization
Format the disk:
fdisk /dev/xvdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 10443. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): p Disk /dev/xvdb: 85.8 GB, 85899345920 bytes 255 heads, 63 sectors/track, 10443 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-10443, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-10443, default 10443): Using default value 10443 Command (m for help): t Selected partition 1 Hex code (type L to list codes): 83 Command (m for help): p Disk /dev/xvdb: 85.8 GB, 85899345920 bytes 255 heads, 63 sectors/track, 10443 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/xvdb1 1 10443 83883366 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
Make sure that the names names are consistent throughout all of these configuration files. This may mean ensuring they are correct in DNS and /etc/hosts.
Locally configure name for this server:
uname -n drbd01.chainringcircus.org uname -n drbd02.chainringcircus.org
DNS name for this server:
dig +short drbd01.chainringcircus.org 192.168.1.191 dig +short drbd02.chainringcircus.org 192.168.1.192
The /etc/drbd.conf file was designed to allow a verbatim copy on both nodes of the cluster.
cat /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # global { usage-count no; } common { protocol C; handlers { pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; #pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; #pri-on-incon-degr This handler is called if the node is primary, degraded and the local #copy of the data is inconsistent. It broadcasts an error, sleeps for 60 seconds and then halts. } startup { wfc-timeout 10; # Wait for connection timeout. The init script blocks the boot process # until the DRBD resources are connected. We wait for 10 seconds. degr-wfc-timeout 30; # Wait for connection timeout if this node was a degraded cluster. } disk { on-io-error detach; } # or panic, ... net { cram-hmac-alg "sha1"; shared-secret "CHANGEME"; # Don't forget to choose a secret for auth max-buffers 20000; # Play with this setting to achieve highest possible performance unplug-watermark 12000; # Play with this setting to achieve highest possible performance max-epoch-size 20000; # Should be the same as max-buffers } syncer { rate 100M; } } resource sites { device /dev/drbd0; disk /dev/sdb; meta-disk internal; # Internal means that the last part of the backing device is used to store the metadata. on drbd01.chainringcircus.org { #on hostname as seen in uname -n and the DNS lookup. address 192.168.1.191:7788; } on drbd02.chainringcircus.org { address 192.168.1.192:7788; } }
Copy the configuration file:
scp /etc/drbd.conf root@drbd02.chainringcircus.org:/etc/
Tried to start DRBD but got an error:
service drbd start Starting DRBD resources: [ sites no suitable meta data found 😦 Command '/sbin/drbdmeta 0 v08 /dev/sdb internal check-resize' terminated with exit code 255 drbdadm check-resize sites: exited with code 255 d(sites) 0: Failure: (119) No valid meta-data signature found. ==> Use 'drbdadm create-md res' to initialize meta-data area. <== [sites] cmd /sbin/drbdsetup 0 disk /dev/sdb /dev/sdb internal --set-defaults --create-device --on-io-error=detach failed - continuing! s(sites) n(sites) ].......... /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 m:res cs ro ds p mounted fstype 0:sites WFConnection Secondary/Unknown Diskless/DUnknown C /etc/init.d/drbd stop Stopping all DRBD resources: .
I did not initialize the meta data storage and this needs to be done before a DRBD resource can be brought online. The DRBD resource needs to be down or detached from its backing storage.
drbdadm create-md sites md_offset 1073737728 al_offset 1073704960 bm_offset 1073672192 Found some data ==> This might destroy existing data! <== Do you want to proceed? [need to type 'yes' to confirm] yes Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. service drbd start Starting DRBD resources: [ sites Found valid meta data in the expected location, 1073737728 bytes into /dev/sdb. d(sites) s(sites) n(sites) ]..........
Check the status:
cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1048508
Make it primary:
drbdadm -- --overwrite-data-of-peer primary sites cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---- ns:67584 nr:0 dw:0 dr:67584 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:980924 [>...................] sync'ed: 6.7% (980924/1048508)K delay_probe: 10 finish: 0:01:27 speed: 11,264 (11,264) K/sec [root@localhost etc]# cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---- ns:1019904 nr:0 dw:0 dr:1019904 al:0 bm:62 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:28604 [==================>.] sync'ed: 97.7% (28604/1048508)K delay_probe: 195 finish: 0:00:02 speed: 11,132 (10,404) K/sec [root@localhost etc]# cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1048508 nr:0 dw:0 dr:1048508 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 You have new mail in /var/spool/mail/root
Make a file system:
mkfs.ext3 /dev/drbd0 mke2fs 1.39 (29-May-2006) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 131072 inodes, 262127 blocks 13106 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 24 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Testing the filesystem:
mount /dev/drbd0 /sites mount /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda5 on /home type ext3 (rw) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) .host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1) none on /proc/fs/vmblock/mountPoint type vmblock (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/drbd0 on /sites type ext3 (rw) touch /sites/test.txt ls /sites lost+found test.txt umount /sites drbdadm secondary sites
On the second server:
drbdadm primary sites mount /dev/drbd0 /sites/ mount /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda5 on /home type ext3 (rw) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) .host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1) none on /proc/fs/vmblock/mountPoint type vmblock (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/drbd0 on /sites type ext3 (rw) ls /sites lost+found test.txt
Heartbeat R1-style
Heartbeat in R1 configuration uses 3 files that must be configured if you are using the heartbeat protocol.
/etc/ha.d/ha.cf
/etc/ha.d/haresources
/etc/ha.d/authkeys
cat /etc/ha.d/authkeys auth 1 # A numerical identifier between 1 and 15 inclusive # must be unique within the file. 1 sha1 CHANGEME # Methods can be md5 sha1 or crc. # The password is just a string.
chmod 600 /etc/ha.d/authkeys
Before we take care of the ha.cf file we need to set up the ha_logd configuration file.
cp /usr/share/doc/heartbeat-2.1.3/logd.cf /etc/
And make changes to the logd.cf file accordingly. Be sure to copy /etc/logd.cf to both servers. Also note that I had to completely stop and then restart the heartbeat daemon for my logging changes to take affect.
cat /etc/logd.cf # File to write debug messages to # Default: /var/log/ha-debug debugfile /var/log/ha-debug.log # # # File to write other messages to # Default: /var/log/ha-log logfile /var/log/ha.log # # # Facility to use for syslog()/logger # Default: daemon #logfacility daemon # Entity to be shown at beginning of a message # for logging daemon # Default: "logd" entity logd # Do we register to apphbd # Default: no #useapphbd no # There are two processes running for logging daemon # 1. parent process which reads messages from all client channels # and writes them to the child process # # 2. the child process which reads messages from the parent process through IPC # and writes them to syslog/disk # set the send queue length from the parent process to the child process # #sendqlen 256 # set the recv queue length in child process # #recvqlen 256
cat /etc/ha.d/ha.cf # The recommendation is to use logd. use_logd yes # Default option is 0, values are 0-255 with 1-3 being the most useful. debug 0 # Timing according to the FAQ at www.linux-ha.org/wiki/FAQ # warntime should be at least 2 * keepalive # warntime should be 1/2 to 1/4 deadtime # The interval between heartbeat packets. keepalive 1 # How quickly Heartbeat should issue a "late heartbeat" warning. Warntime is # important for tuning deadtime. warntime 5 # How long to decide a cluster node is dead. Too low will flasely declare # a death and too high will hinder takeover during a failure. # Can be specified as a floating point number followed by a untis-specifier. # If units are omitted it defaults to seconds. # deadtime 1 # deadtime 100ms 100 milliseconds # deadtime 1000us 1000 microseconds deadtime 10 # 694 is the default but can be changed if multiple clusters are in use. udpport 694 # Which interfaces send UDP broadcast traffic, more than one can be specified. bcast eth0 # auto_failback can be "on" "off" or "legacy" auto_failback off # Set the nodes in the cluster. node in1.eamc.org node in2.eamc.org # Make sure this IP address is pingable from the bcast network above. ping 192.168.1.1 respawn hacluster /usr/lib/heartbeat/ipfail
cat /etc/ha.d/haresources drbd01 192.168.1.190 drbddisk::sites Filesystem::/dev/drbd0::/sites::ext3 httpd # Explanation: # Primary Server name --> virtual IP address to be used --> DRBD resource as configurd in /etc/drbd.conf # --> where to mount the DRBD resource and the filesystem type --> resource to start/stop in case of failover
Cluster Management
To take over cluster management from a primary server:
/usr/lib/heartbeat/hb_takeover
Relinquishing cluster management to a secondary server:
/usr/lib/hearbeat/hb_standby /etc/init.d/heartbeat stop
The order of operations as set by the init scripts:
ls -al /etc/rc3.d/ | egrep "hear|drb" lrwxrwxrwx 1 root root 14 Apr 1 11:40 S70drbd -> ../init.d/drbd lrwxrwxrwx 1 root root 19 Jun 1 08:58 S75heartbeat -> ../init.d/heartbeat
Note for Xen users:
# cat /etc/modprobe.d/drbd.conf options debd disable_sendpage=1