Quick & Dirty Disaster Recovery: eSATA + iSCSI + RAID-1 + BackupPC

With the purchase of 4x 1TB Western Digital Caviar (Black) drives, my trusty iSCSI RAID “array”, an eSATA external drive (hotplug) and a car, I’ve got myself the hardware infrastructure to enable offsite backups.

Two of the 1TB drives are mirrored in my iSCSI RAID. The iSCSI RAID array exports this volume to my backup master host and I use this iSCSI volume as 1/2 of a MD based Raid-1 mirror that hosts the backup spool for my environment. Every morning, I’ve got a script to stop the backup software, sync the drive and unmount the ext4 file system hosting the spool before splitting the MD Raid-1 mirror and disconnecting the eSATA connected 3d drive. I can then safely remove the drive, put it in protective packaging and bring it to my office where I swap this 3rd drive with a 4th that was stored in my office overnight.

At night, the reverse process is scripted to connect the “office-drive” to the backup volume. Scanning the bus for the eSATA device triggers a udev action to join the office drive to the /dev/md device and thus start the resync of the two volumes. This is fully non-disruptive, if you ignore the performance ‘hit’, and the goal is that before the morning “split” job starts running, the mirror will be merged and a new fresh copy can go to the office.

I recognize the fact that the /usr/local/bin/{add,drop}-device scripts are very quick & dirty and can stand to be cleaned up quite a bit. For instance, the retry logic could be pulled out in bash functions and called in a loop with a configurable number of retries, etc, etc, etc. But, the point is it was “quick and dirty”.

In /etc/mdadm.conf

MAILADDR root@sjolshagen.net
DEVICE /dev/raid-mirror0-p1
DEVICE /dev/raid-backup4-p1
ARRAY /dev/md/127 metadata=1.2 UUID=56a8ce1e:294930a4:72073552:3f6dd2a7 name=virt1-backup.sjolshagen.net:127

To start the host mirror (RAID-1) containing the iSCSI volume & the eSATA drive (/dev/md127 in my case) I’ve included the following in /etc/rc.local and disabled the BackupPC init service (# chkconfig backuppc stop ):

/sbin/mdadm /dev/md127 --assemble --force --auto=yes --bitmap=/var/lib/md-bitmaps/md127.bitmap /dev/raid-mirror0-p1
mount /dev/md127 /var/lib/BackupPC
res=$?
if [ ${res} -ne 0 ]
then
     echo "Mount of /dev/md127 to /var/lib/Backup failed!"
     exit 1
fi
service backuppc start

Additionally, I’ve created the following udev rules to add the eSATA drive to the mirror when the drive(s) are detected by the kernel:

# Saved in /etc/udev/rules.d/10-local.rules
#
KERNEL=="*[0-9]", IMPORT{parent}=="ID_*"

# ESATA drives (using their serial numbers/IDs)
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WMATV3081234", SYMLINK+="raid-backup0-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-backup0-p%n"
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WMATV3131170", SYMLINK+="raid-backup1-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-backup1-p%n"

# For the iSCSI volume
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WCATR0362494", SYMLINK+="raid-iscsi-mirror-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-iscsi-mirror-p%n"

The /usr/local/bin/add-device script that runs multiple times on weekday nights:

#!/bin/bash
#
DATE=$(date +%c)
echo "**************** ${DATE} *******************"
/bin/ls /dev/raid-backup* >>/dev/null 2>&1
res=$?

if [ ${res} -ne 0 ]
then
	echo "The drive is not connected yet"
	echo "- - -" > /sys/class/scsi_host/host6/scan
	echo "- - -" > /sys/class/scsi_host/host7/scan
	exit 0
else
	echo "The drive is already connected - no need to look (scan) for it"
        exit 0
fi

The /usr/local/bin/drop-device script that runs every weekday morning:

#!/bin/bash
#
DEV=$(ls -l /dev/raid-backup?-p1 | awk '{ print $11 }' | cut -c 1-3)
MD_DEV=$(cat /proc/mdstat | grep md |grep -v bitmap | awk '{ print $1 }')

DATE=$(date +%c)
MAIL_ADM="root@host.com"

# Add "header"
echo "*********** ${DATE} ***********"

/bin/ls /dev/raid-backup* >>/dev/null 2>&1
res=$?

if [ ${res} -ne 0 ]
then
     echo "Drive not connected - nothing to do"
     echo "Exiting"
     exit
fi

echo "---------------------------------------"
/bin/cat /proc/mdstat
echo "---------------------------------------"

RAID_STATE=$(/sbin/mdadm --misc -t --detail /dev/${MD_DEV} >/dev/null)
res=$?

if [ ${res} -ne 0 ]
then
        echo "The array is not clean & rebuilt"
        echo "Rescheduling and will try again in 15 minutes"
        /usr/bin/at -f /usr/local/bin/drop-device now + 15 minutes >> /var/log/drop-device.log 2>&1
        exit 1
fi

/sbin/service backuppc stop
res=$?

if [ ${res} -ne 0 ]
then
        echo "Unable to stop BackupPC service - Exiting"
        exit 1
fi

sync ; sync ; sync
umount /var/lib/BackupPC
res=$?

if [ ${res} -ne 0 ]
then
        echo "Failed to unmount ${BACKUP_LOC}"
        echo "Restarting service and quitting"
        service backuppc start
        exit 1
fi

/sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
res=$?

if [ ${res} -ne 0 ]
then
        echo "Failed to remove /dev/${DEV}1 from /dev/${MD_DEV}"
        sleep 5
        echo "Retrying...."
        /sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
        res=$?

        if [ ${res} -ne 0 ]
        then
                echo "Removal of /dev/${DEV}1 from /dev/${MD_DEV} failed" | mail -s "Failed mirror operation" ${MAIL_ADM}
                exit 1
        fi
fi

sleep 15
/sbin/mdadm /dev/${MD_DEV} --remove /dev/${DEV}1
res=$?

if [ ${res} -ne 0 ]
then
        echo "Failed to hotplug /dev/${DEV}1 from /dev/${MD_DEV}"
        sleep 5
        echo "Retrying...."
        /sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
        res=$?

        if [ ${res} -ne 0 ]
        then
                echo "Still no luck on the hotplug operation"
                echo "giving up and quitting"
                echo "Unplug of /dev/${DEV}1 from /dev/${MD_DEV} failed" | mail -s "Failed mirror operation" ${MAIL_ADM}
                exit 1
        fi
fi

/bin/mount /dev/${MD_DEV} /var/lib/BackupPC
res=$?

if [ ${res} -ne 0 ]
then
        echo "Unable to mount /dev/${MD_DEV} to /var/lib/BackupPC"
        exit 1
else
        /sbin/service backuppc start
fi

# Remove the block device from the system
echo 1 > /sys/block/${DEV}/device/delete

The cron entry for the two scripts:

# sudo crontab -l
45 05 * * 1-5 /usr/local/bin/drop-device >> /var/log/drop-device.log 2>&1
0,15,30,45 17-23 * * 1-5 /usr/local/bin/add-device >> /var/log/add-device.log 2>&1

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://linux.sjolshagen.net/2010/09/23/quick-dirty-disaster-recovery-esata-iscsi-raid-1-backuppc/trackback/