DRBD and Heartbeat

2011-06-01 10:31:23
DRBD and Heartbeat
I spent a considerable amount of time over the last couple of days working with DRBD and Heartbeat.

Below are the links I used to get things running:
http://wiki.centos.org/HowTos/Ha-Drbd
http://www.howtoforge.com/vm_replication_failover_vmware_debian_etch_p3
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/s-intro-pacemaker.html
http://www.drbd.org/users-guide/s-heartbeat-r1.html
http://www.drbd.org/users-guide/s-heartbeat-config.html
http://www.drbd.org/users-guide/s-heartbeat-crm.html

Part of my problem was not understanding the difference between R1 and DRM style clusters and their accompanying daemons; heartbeat, pacemaker and the different protocol versions. Pacemaker is a more advanced cluster resource manager that can work with both Corosync and Heartbeat. Heartbeat uses an older protocol whereas pacemaker uses OpenAIS to be compatible with RedHat cluster services.

Regardless here are my notes for configuration, and just for completeness my notes are a mix of doing this first on VMWare and then on a Xen cluster so any inconsistencies are a result of doing this multiple times in different environments. Regardless the errors are mine and I would recommend reading the documentation linked above.

The basics behind the setup is that DRBD replicates data between two servers. DRBD is the network block device that mirrors the data. The heartbeat daemon keeps track of the shared IP, the daemons that are in HA and runs the init scripts appropriately.

DRBD Initialization

Format the disk:

fdisk /dev/xvdb 
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 10443.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p
Disk /dev/xvdb: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-10443, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-10443, default 10443): 
Using default value 10443

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): p
Disk /dev/xvdb: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/xvdb1               1       10443    83883366   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Make sure that the names names are consistent throughout all of these configuration files. This may mean ensuring they are correct in DNS and /etc/hosts.

Locally configure name for this server:

uname -n
drbd01.chainringcircus.org

uname -n
drbd02.chainringcircus.org

DNS name for this server:

dig +short drbd01.chainringcircus.org
192.168.1.191
dig +short drbd02.chainringcircus.org
192.168.1.192

The /etc/drbd.conf file was designed to allow a verbatim copy on both nodes of the cluster.

cat /etc/drbd.conf
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#

global { 
        usage-count no; 
}

common {
        protocol C;
        handlers {
                pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
                #pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
                #pri-on-incon-degr This handler is called if the node is primary, degraded and the local
                #copy of the data is inconsistent.  It broadcasts an error, sleeps for 60 seconds and then halts.
        }

        startup { 
                wfc-timeout 10;                 # Wait for connection timeout.  The init script blocks the boot process 
                                                          # until the DRBD resources are connected.  We wait for 10 seconds.
                degr-wfc-timeout 30;        # Wait for connection timeout if this node was a degraded cluster.
        }

        disk { 
                on-io-error detach; 
        } # or panic, ...

        net {  
                cram-hmac-alg "sha1"; 
                shared-secret "CHANGEME";        # Don't forget to choose a secret for auth
                max-buffers   20000;                  # Play with this setting to achieve highest possible performance
                unplug-watermark   12000;         # Play with this setting to achieve highest possible performance
                max-epoch-size 20000;               # Should be the same as max-buffers
        } 
        syncer { 
                rate 100M; 
        }
}

resource sites {
        device /dev/drbd0;
        disk /dev/sdb;
        meta-disk internal;     # Internal means that the last part of the backing device is used to store the metadata.
        on drbd01.chainringcircus.org {       #on hostname as seen in uname -n and the DNS lookup.
                address 192.168.1.191:7788;
        }
        on drbd02.chainringcircus.org {
                address 192.168.1.192:7788;
        }
}

Copy the configuration file:

scp /etc/drbd.conf root@drbd02.chainringcircus.org:/etc/

Tried to start DRBD but got an error:

service drbd start
Starting DRBD resources: [ 
sites
no suitable meta data found 😦
Command '/sbin/drbdmeta 0 v08 /dev/sdb internal check-resize' terminated with exit code 255
drbdadm check-resize sites: exited with code 255
d(sites) 0: Failure: (119) No valid meta-data signature found.

        ==> Use 'drbdadm create-md res' to initialize meta-data area. <==


[sites] cmd /sbin/drbdsetup 0 disk /dev/sdb /dev/sdb internal --set-defaults --create-device --on-io-error=detach  failed - continuing!
 
s(sites) n(sites) ]..........
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
m:res    cs            ro                 ds                 p  mounted  fstype
0:sites  WFConnection  Secondary/Unknown  Diskless/DUnknown  C


/etc/init.d/drbd stop
Stopping all DRBD resources: .

I did not initialize the meta data storage and this needs to be done before a DRBD resource can be brought online. The DRBD resource needs to be down or detached from its backing storage.

drbdadm create-md sites
md_offset 1073737728
al_offset 1073704960
bm_offset 1073672192

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

service drbd start
Starting DRBD resources: [ 
sites
Found valid meta data in the expected location, 1073737728 bytes into /dev/sdb.
d(sites) s(sites) n(sites) ]..........

Check the status:

cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1048508

Make it primary:

drbdadm -- --overwrite-data-of-peer primary sites
cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:67584 nr:0 dw:0 dr:67584 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:980924
        [>...................] sync'ed:  6.7% (980924/1048508)K delay_probe: 10
        finish: 0:01:27 speed: 11,264 (11,264) K/sec
[root@localhost etc]# cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:1019904 nr:0 dw:0 dr:1019904 al:0 bm:62 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:28604
        [==================>.] sync'ed: 97.7% (28604/1048508)K delay_probe: 195
        finish: 0:00:02 speed: 11,132 (10,404) K/sec
[root@localhost etc]# cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:1048508 nr:0 dw:0 dr:1048508 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
You have new mail in /var/spool/mail/root

Make a file system:

mkfs.ext3 /dev/drbd0
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262127 blocks
13106 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376

Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Testing the filesystem:

mount /dev/drbd0 /sites

mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/drbd0 on /sites type ext3 (rw)

touch /sites/test.txt

ls /sites
lost+found  test.txt

umount /sites

drbdadm secondary sites

On the second server:

drbdadm primary sites

mount /dev/drbd0 /sites/

mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/drbd0 on /sites type ext3 (rw)

ls /sites
lost+found  test.txt

Heartbeat R1-style

Heartbeat in R1 configuration uses 3 files that must be configured if you are using the heartbeat protocol.
/etc/ha.d/ha.cf
/etc/ha.d/haresources
/etc/ha.d/authkeys

cat /etc/ha.d/authkeys 
auth 1          # A numerical identifier between 1 and 15 inclusive
                    # must be unique within the file.
1 sha1 CHANGEME   # Methods can be md5 sha1 or crc.
                                # The password is just a string.
chmod 600 /etc/ha.d/authkeys

Before we take care of the ha.cf file we need to set up the ha_logd configuration file.

cp /usr/share/doc/heartbeat-2.1.3/logd.cf /etc/

And make changes to the logd.cf file accordingly. Be sure to copy /etc/logd.cf to both servers. Also note that I had to completely stop and then restart the heartbeat daemon for my logging changes to take affect.

cat /etc/logd.cf 
#       File to write debug messages to
#       Default: /var/log/ha-debug
debugfile /var/log/ha-debug.log

#
#
#       File to write other messages to
#       Default: /var/log/ha-log
logfile /var/log/ha.log

#
#
#       Facility to use for syslog()/logger 
#       Default: daemon
#logfacility    daemon

#       Entity to be shown at beginning of a message
#       for logging daemon
#       Default: "logd"
entity logd

#       Do we register to apphbd
#       Default: no
#useapphbd no

#       There are two processes running for logging daemon
#               1. parent process which reads messages from all client channels 
#               and writes them to the child process 
#  
#               2. the child process which reads messages from the parent process through IPC
#               and writes them to syslog/disk

#       set the send queue length from the parent process to the child process
#
#sendqlen 256 

#       set the recv queue length in child process
#
#recvqlen 256
cat /etc/ha.d/ha.cf 
# The recommendation is to use logd.
use_logd yes
# Default option is 0, values are 0-255 with 1-3 being the most useful.
debug 0
# Timing according to the FAQ at www.linux-ha.org/wiki/FAQ
# warntime should be at least 2 * keepalive 
# warntime should be 1/2 to 1/4 deadtime
# The interval between heartbeat packets.
keepalive 1
# How quickly Heartbeat should issue a "late heartbeat" warning.  Warntime is 
# important for tuning deadtime.
warntime 5
# How long to decide a cluster node is dead.  Too low will flasely declare
# a death and too high will hinder takeover during a failure.
# Can be specified as a floating point number followed by a untis-specifier.
# If units are omitted it defaults to seconds.
# deadtime 1
# deadtime 100ms 100 milliseconds
# deadtime 1000us 1000 microseconds
deadtime 10
# 694 is the default but can be changed if multiple clusters are in use.
udpport 694
# Which interfaces send UDP broadcast traffic, more than one can be specified.
bcast   eth0
# auto_failback can be "on" "off" or "legacy"
auto_failback off
# Set the nodes in the cluster.
node    in1.eamc.org         
node    in2.eamc.org
# Make sure this IP address is pingable from the bcast network above.
ping 192.168.1.1    
respawn hacluster /usr/lib/heartbeat/ipfail
cat /etc/ha.d/haresources 
drbd01 192.168.1.190 drbddisk::sites Filesystem::/dev/drbd0::/sites::ext3 httpd
# Explanation:
# Primary Server name --> virtual IP address to be used --> DRBD resource as configurd in /etc/drbd.conf
# --> where to mount the DRBD resource and the filesystem type --> resource to start/stop in case of failover

Cluster Management
To take over cluster management from a primary server:

/usr/lib/heartbeat/hb_takeover

Relinquishing cluster management to a secondary server:

/usr/lib/hearbeat/hb_standby
/etc/init.d/heartbeat stop

The order of operations as set by the init scripts:

ls -al /etc/rc3.d/ | egrep "hear|drb"
lrwxrwxrwx  1 root root   14 Apr  1 11:40 S70drbd -> ../init.d/drbd
lrwxrwxrwx  1 root root   19 Jun  1 08:58 S75heartbeat -> ../init.d/heartbeat

Note for Xen users:

# cat /etc/modprobe.d/drbd.conf 
options debd disable_sendpage=1
This entry was posted in Linux. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s