= Replicated Storage with DRBD =

== Background ==
Even if you're serving up static websites, having to manually synchronize
the contents of that website to all the machines in the cluster is not
ideal. For dynamic websites, such as a wiki, it's not even an option. Not
everyone care afford network-attached storage but somehow the data needs
to be kept in sync. Enter DRBD which can be thought of as network based
RAID-1. See http://www.drbd.org/ for more details.

== Install the DRBD Packages ==

Since its inclusion in the upstream 2.6.33 kernel, everything needed to
use DRBD ships with Fedora 13. All you need to do is install it:

[source,Bash]
----
# yum install -y drbd-pacemaker drbd-udev
Loaded plugins: presto, refresh-packagekit
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package drbd-pacemaker.x86_64 0:8.3.7-2.fc13 set to be updated
--> Processing Dependency: drbd-utils = 8.3.7-2.fc13 for package: drbd-pacemaker-8.3.7-2.fc13.x86_64
--> Running transaction check
---> Package drbd-utils.x86_64 0:8.3.7-2.fc13 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=================================================================================
 Package                Arch           Version              Repository      Size
=================================================================================
Installing:
 drbd-pacemaker         x86_64         8.3.7-2.fc13         fedora          19 k
Installing for dependencies:
 drbd-utils             x86_64         8.3.7-2.fc13         fedora         165 k

Transaction Summary
=================================================================================
Install       2 Package(s)
Upgrade       0 Package(s)

Total download size: 184 k
Installed size: 427 k
Downloading Packages:
Setting up and reading Presto delta metadata
fedora/prestodelta                                        | 1.7 kB     00:00     
Processing delta metadata
Package(s) data still to download: 184 k
(1/2): drbd-pacemaker-8.3.7-2.fc13.x86_64.rpm             |  19 kB     00:01     
(2/2): drbd-utils-8.3.7-2.fc13.x86_64.rpm                 | 165 kB     00:02     
 ---------------------------------------------------------------------------------
Total                                             45 kB/s | 184 kB     00:04     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : drbd-utils-8.3.7-2.fc13.x86_64                            1/2 
  Installing     : drbd-pacemaker-8.3.7-2.fc13.x86_64                        2/2 

Installed:
  drbd-pacemaker.x86_64 0:8.3.7-2.fc13                                           

Dependency Installed:
  drbd-utils.x86_64 0:8.3.7-2.fc13                                               

Complete!
----

== Configure DRBD ==

Before we configure DRBD, we need to set aside some disk for it to use.

=== Create A Partition for DRBD ===

If you have more than 1Gb free, feel free to use it. For this guide
however, 1Gb is plenty of space for a single html file and sufficient for
later holding the GFS2 metadata.

[source,Bash]
----
# lvcreate -n drbd-demo -L 1G VolGroup
Logical volume "drbd-demo" created
# lvs
LV    VG    Attr  LSize  Origin Snap% Move Log Copy% Convert
drbd-demo VolGroup -wi-a- 1.00G                   
lv_root  VolGroup -wi-ao  7.30G                   
lv_swap  VolGroup -wi-ao 500.00M
----

Repeat this on the second node, be sure to use the same size partition.

[source,Bash]
----
# ssh pcmk-2 -- lvs 
LV   VG    Attr  LSize  Origin Snap% Move Log Copy% Convert
lv_root VolGroup -wi-ao  7.30G                   
lv_swap VolGroup -wi-ao 500.00M                   
# ssh pcmk-2 -- lvcreate -n drbd-demo -L 1G VolGroup
Logical volume "drbd-demo" created
# ssh pcmk-2 -- lvs
LV    VG    Attr  LSize  Origin Snap% Move Log Copy% Convert
drbd-demo VolGroup -wi-a- 1.00G                   
lv_root  VolGroup -wi-ao  7.30G                   
lv_swap  VolGroup -wi-ao 500.00M
----

=== Write the DRBD Config ===

There is no series of commands for building a DRBD configuration, so simply
copy the configuration below to /etc/drbd.conf

Detailed information on the directives used in this configuration (and
other alternatives) is available from
http://www.drbd.org/users-guide/ch-configure.html

[WARNING]
=========

Be sure to use the names and addresses of your nodes if they differ from
the ones used in this guide.

=========

....
global { 
 usage-count yes; 
}
common {
 protocol C;
}
resource wwwdata {
 meta-disk internal;
 device  /dev/drbd1;
 syncer {
  verify-alg sha1;
 }
 net { 
  allow-two-primaries; 
 }
 on pcmk-1 {
  disk   /dev/mapper/VolGroup-drbd--demo;
  address  192.168.122.101:7789; 
 }
 on pcmk-2 {
  disk   /dev/mapper/VolGroup-drbd--demo;
  address  192.168.122.102:7789;      
 }
}
....

[NOTE]
=======

TODO: Explain the reason for the allow-two-primaries option

=======

=== Initialize and Load DRBD ===

With the configuration in place, we can now perform the DRBD
initialization

[source,Bash]
----
# drbdadm create-md wwwdata
md_offset 12578816
al_offset 12546048
bm_offset 12541952

Found some data 
==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
----

Now load the DRBD kernel module and confirm that everything is sane

[source,Bash]
----
# modprobe drbd# drbdadm up wwwdata# cat /proc/drbdversion: 8.3.6 (api:88/proto:86-90)
GIT-hash: f3606c47cc6fcf6b3f086e425cb34af8b7a81bbf build by root@pcmk-1, 2009-12-08 11:22:57
 1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
  ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12248 
----

Repeat on the second node

[source,Bash]
----
# ssh pcmk-2 -- drbdadm --force create-md wwwdata
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
# ssh pcmk-2 -- modprobe drbd
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
# ssh pcmk-2 -- drbdadm up wwwdata
# ssh pcmk-2 -- cat /proc/drbd
version: 8.3.6 (api:88/proto:86-90)
GIT-hash: f3606c47cc6fcf6b3f086e425cb34af8b7a81bbf build by root@pcmk-1, 2009-12-08 11:22:57
 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12248
----

Now we need to tell DRBD which set of data to use. Since both sides
contain garbage, we can run the following on pcmk-1:

[source,Bash]
----
# drbdadm -- --overwrite-data-of-peer primary wwwdata
# cat /proc/drbd
version: 8.3.6 (api:88/proto:86-90)
GIT-hash: f3606c47cc6fcf6b3f086e425cb34af8b7a81bbf build by root@pcmk-1, 2009-12-08 11:22:57
1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
   ns:2184 nr:0 dw:0 dr:2472 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10064
    [=====>..............] sync'ed: 33.4% (10064/12248)K
    finish: 0:00:37 speed: 240 (240) K/sec
# cat /proc/drbd
version: 8.3.6 (api:88/proto:86-90)
GIT-hash: f3606c47cc6fcf6b3f086e425cb34af8b7a81bbf build by root@pcmk-1, 2009-12-08 11:22:57
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
   ns:12248 nr:0 dw:0 dr:12536 al:0 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----

pcmk-1 is now in the Primary state which allows it to be written to.
Which means it's a good point at which to create a filesystem and populate
it with some data to serve up via our WebSite resource.


=== Populate DRBD with Data ===

[source,Bash]
----
# mkfs.ext4 /dev/drbd1
mke2fs 1.41.4 (27-Jan-2009)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
3072 inodes, 12248 blocks
612 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=12582912
2 block groups
8192 blocks per group, 8192 fragments per group
1536 inodes per group
Superblock backups stored on blocks: 
    8193

Writing inode tables: done              
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
----

Now mount the newly created filesystem so we can create our index file

[source,Bash]
----
# mount /dev/drbd1 /mnt/
# cat <<-END >/mnt/index.html
 <html>
  <body>My Test Site - drbd</body>
 </html>
 END
# umount /dev/drbd1
----

== Configure the Cluster for DRBD ==

One handy feature of the crm shell is that you can use it in
interactive mode to make several changes atomically.

First we launch the shell. The prompt will change to indicate you're
in interactive mode.

[source,Bash]
----
# crm cib
crm(live) #
----

Next we must create a working copy of the current configuration. This is
where all our changes will go. The cluster will not see any of them until
we say it's ok. Notice again how the prompt changes, this time to indicate
that we're no longer looking at the live cluster.

[source,Bash]
----
cib crm(live) # cib new drbd
INFO: drbd shadow CIB created
crm(drbd) #
----

Now we can create our DRBD clone and display the revised configuration.

[source,Bash]
----
crm(drbd) # configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
    op monitor interval=60s
crm(drbd) # configure ms WebDataClone WebData meta master-max=1 master-node-max=1 \
    clone-max=2 clone-node-max=1 notify=truecrm(drbd) # configure shownode pcmk-1
node pcmk-2primitive WebData ocf:linbit:drbd \
 params drbd_resource="wwwdata" \
 op monitor interval="60s"primitive WebSite ocf:heartbeat:apache \
    params configfile="/etc/httpd/conf/httpd.conf" \
    op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
    params ip="192.168.122.101" cidr_netmask="32" \
    op monitor interval="30s"ms WebDataClone WebData \
 meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location prefer-pcmk-1 WebSite 50: pcmk-1
colocation website-with-ip inf: WebSite ClusterIP
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
    dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
    resource-stickiness="100"
----

Once we're happy with the changes, we can tell the cluster to start using
them and use crm_mon to check everything is functioning.

[source,Bash]
----
crm(drbd) # cib commit drbdINFO: commited 'drbd' shadow CIB to the cluster
crm(drbd) # quitbye
# crm_mon
============
Last updated: Tue Sep 1 09:37:13 2009
Stack: openais
Current DC: pcmk-1 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ pcmk-1 pcmk-2 ]

ClusterIP    (ocf::heartbeat:IPaddr):    Started pcmk-1
WebSite (ocf::heartbeat:apache):    Started pcmk-1Master/Slave Set: WebDataClone Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ]
----

[NOTE]
=======

Include details on adding a second DRBD resource

=======

Now that DRBD is functioning we can configure a Filesystem resource to
use it. In addition to the filesystem's definition, we also need to
tell the cluster where it can be located (only on the DRBD Primary)
and when it is allowed to start (after the Primary was promoted).

Once again we'll use the shell's interactive mode

[source,Bash]
----
# crm
crm(live) # cib new fs
INFO: fs shadow CIB created
crm(fs) # configure primitive WebFS ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4"
crm(fs) # configure colocation fs_on_drbd inf: WebFS WebDataClone:Master
crm(fs) # configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start

We also need to tell the cluster that Apache needs to run on the same
machine as the filesystem and that it must be active before Apache can
start.

crm(fs) # configure colocation WebSite-with-WebFS inf: WebSite WebFS
crm(fs) # configure order WebSite-after-WebFS inf: WebFS WebSite
----

Time to review the updated configuration:

[source,Bash]
----
crm(fs) # crm configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
    params drbd_resource="wwwdata" \
    op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4"
primitive WebSite ocf:heartbeat:apache \
    params configfile="/etc/httpd/conf/httpd.conf" \
    op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
    params ip="192.168.122.101" cidr_netmask="32" \
    op monitor interval="30s"
ms WebDataClone WebData \
    meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location prefer-pcmk-1 WebSite 50: pcmk-1
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
    dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
    resource-stickiness="100"
----

After reviewing the new configuration, we again upload it and watch the
cluster put it into effect.

[source,Bash]
----
crm(fs) # cib commit fs
INFO: commited 'fs' shadow CIB to the cluster
crm(fs) # quit
bye
# crm_mon
============
Last updated: Tue Sep 1 10:08:44 2009
Stack: openais
Current DC: pcmk-1 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ pcmk-1 pcmk-2 ]

ClusterIP    (ocf::heartbeat:IPaddr):    Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Master/Slave Set: WebDataClone
    Masters: [ pcmk-1 ]
    Slaves: [ pcmk-2 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-1
----

=== Testing Migration ===

We could shut down the active node again, but another way to safely
simulate recovery is to put the node into what is called "standby
mode".  Nodes in this state tell the cluster that they are not allowed
to run resources. Any resources found active there will be moved
elsewhere. This feature can be particularly useful when updating the
resources' packages.

Put the local node into standby mode and observe the cluster move all
the resources to the other node. Note also that the node's status will
change to indicate that it can no longer host resources.

[source,Bash]
----
# crm node standby
# crm_mon
============
Last updated: Tue Sep 1 10:09:57 2009
Stack: openais
Current DC: pcmk-1 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Node pcmk-1: standbyOnline: [ pcmk-2 ]

ClusterIP    (ocf::heartbeat:IPaddr):    Started pcmk-2
WebSite (ocf::heartbeat:apache):    Started pcmk-2
Master/Slave Set: WebDataClone
    Masters: [ pcmk-2 ]    Stopped: [ WebData:1 ]
WebFS  (ocf::heartbeat:Filesystem):  Started pcmk-2
----

Once we've done everything we needed to on pcmk-1 (in this case nothing,
we just wanted to see the resources move), we can allow the node to be a
full cluster member again.

[source,Bash]
----
# crm node online
# crm_mon
============
Last updated: Tue Sep 1 10:13:25 2009
Stack: openais
Current DC: pcmk-1 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
ClusterIP    (ocf::heartbeat:IPaddr):    Started pcmk-2
WebSite (ocf::heartbeat:apache):    Started pcmk-2
Master/Slave Set: WebDataClone
    Masters: [ pcmk-2 ]
    Slaves: [ pcmk-1 ]
WebFS  (ocf::heartbeat:Filesystem):  Started pcmk-2
----

Notice that our resource stickiness settings prevent the services from
migrating back to pcmk-1.

