With the purchase of 4x 1TB Western Digital Caviar (Black) drives, my trusty iSCSI RAID “array”, an eSATA external drive (hotplug) and a car, I’ve got myself the hardware infrastructure to enable offsite backups.
Two of the 1TB drives are mirrored in my iSCSI RAID. The iSCSI RAID array exports this volume to my backup master host and I use this iSCSI volume as 1/2 of a MD based Raid-1 mirror that hosts the backup spool for my environment. Every morning, I’ve got a script to stop the backup software, sync the drive and unmount the ext4 file system hosting the spool before splitting the MD Raid-1 mirror and disconnecting the eSATA connected 3d drive. I can then safely remove the drive, put it in protective packaging and bring it to my office where I swap this 3rd drive with a 4th that was stored in my office overnight.
At night, the reverse process is scripted to connect the “office-drive” to the backup volume. Scanning the bus for the eSATA device triggers a udev action to join the office drive to the /dev/md device and thus start the resync of the two volumes. This is fully non-disruptive, if you ignore the performance ‘hit’, and the goal is that before the morning “split” job starts running, the mirror will be merged and a new fresh copy can go to the office.
I recognize the fact that the /usr/local/bin/{add,drop}-device scripts are very quick & dirty and can stand to be cleaned up quite a bit. For instance, the retry logic could be pulled out in bash functions and called in a loop with a configurable number of retries, etc, etc, etc. But, the point is it was “quick and dirty”.
In /etc/mdadm.conf
MAILADDR root@sjolshagen.net DEVICE /dev/raid-mirror0-p1 DEVICE /dev/raid-backup4-p1 ARRAY /dev/md/127 metadata=1.2 UUID=56a8ce1e:294930a4:72073552:3f6dd2a7 name=virt1-backup.sjolshagen.net:127
To start the host mirror (RAID-1) containing the iSCSI volume & the eSATA drive (/dev/md127 in my case) I’ve included the following in /etc/rc.local and disabled the BackupPC init service (# chkconfig backuppc stop ):
/sbin/mdadm /dev/md127 --assemble --force --auto=yes --bitmap=/var/lib/md-bitmaps/md127.bitmap /dev/raid-mirror0-p1
mount /dev/md127 /var/lib/BackupPC
res=$?
if [ ${res} -ne 0 ]
then
echo "Mount of /dev/md127 to /var/lib/Backup failed!"
exit 1
fi
service backuppc start
Additionally, I’ve created the following udev rules to add the eSATA drive to the mirror when the drive(s) are detected by the kernel:
# Saved in /etc/udev/rules.d/10-local.rules
#
KERNEL=="*[0-9]", IMPORT{parent}=="ID_*"
# ESATA drives (using their serial numbers/IDs)
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WMATV3081234", SYMLINK+="raid-backup0-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-backup0-p%n"
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WMATV3131170", SYMLINK+="raid-backup1-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-backup1-p%n"
# For the iSCSI volume
KERNEL=="sd*", ENV{ID_SERIAL_SHORT}=="WD-WCATR0362494", SYMLINK+="raid-iscsi-mirror-p%n", RUN+="/usr/local/bin/attach_raid.sh /dev/md127 /dev/raid-iscsi-mirror-p%n"
The /usr/local/bin/add-device script that runs multiple times on weekday nights:
#!/bin/bash
#
DATE=$(date +%c)
echo "**************** ${DATE} *******************"
/bin/ls /dev/raid-backup* >>/dev/null 2>&1
res=$?
if [ ${res} -ne 0 ]
then
echo "The drive is not connected yet"
echo "- - -" > /sys/class/scsi_host/host6/scan
echo "- - -" > /sys/class/scsi_host/host7/scan
exit 0
else
echo "The drive is already connected - no need to look (scan) for it"
exit 0
fi
The /usr/local/bin/drop-device script that runs every weekday morning:
#!/bin/bash
#
DEV=$(ls -l /dev/raid-backup?-p1 | awk '{ print $11 }' | cut -c 1-3)
MD_DEV=$(cat /proc/mdstat | grep md |grep -v bitmap | awk '{ print $1 }')
DATE=$(date +%c)
MAIL_ADM="root@host.com"
# Add "header"
echo "*********** ${DATE} ***********"
/bin/ls /dev/raid-backup* >>/dev/null 2>&1
res=$?
if [ ${res} -ne 0 ]
then
echo "Drive not connected - nothing to do"
echo "Exiting"
exit
fi
echo "---------------------------------------"
/bin/cat /proc/mdstat
echo "---------------------------------------"
RAID_STATE=$(/sbin/mdadm --misc -t --detail /dev/${MD_DEV} >/dev/null)
res=$?
if [ ${res} -ne 0 ]
then
echo "The array is not clean & rebuilt"
echo "Rescheduling and will try again in 15 minutes"
/usr/bin/at -f /usr/local/bin/drop-device now + 15 minutes >> /var/log/drop-device.log 2>&1
exit 1
fi
/sbin/service backuppc stop
res=$?
if [ ${res} -ne 0 ]
then
echo "Unable to stop BackupPC service - Exiting"
exit 1
fi
sync ; sync ; sync
umount /var/lib/BackupPC
res=$?
if [ ${res} -ne 0 ]
then
echo "Failed to unmount ${BACKUP_LOC}"
echo "Restarting service and quitting"
service backuppc start
exit 1
fi
/sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
res=$?
if [ ${res} -ne 0 ]
then
echo "Failed to remove /dev/${DEV}1 from /dev/${MD_DEV}"
sleep 5
echo "Retrying...."
/sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
res=$?
if [ ${res} -ne 0 ]
then
echo "Removal of /dev/${DEV}1 from /dev/${MD_DEV} failed" | mail -s "Failed mirror operation" ${MAIL_ADM}
exit 1
fi
fi
sleep 15
/sbin/mdadm /dev/${MD_DEV} --remove /dev/${DEV}1
res=$?
if [ ${res} -ne 0 ]
then
echo "Failed to hotplug /dev/${DEV}1 from /dev/${MD_DEV}"
sleep 5
echo "Retrying...."
/sbin/mdadm /dev/${MD_DEV} --fail /dev/${DEV}1
res=$?
if [ ${res} -ne 0 ]
then
echo "Still no luck on the hotplug operation"
echo "giving up and quitting"
echo "Unplug of /dev/${DEV}1 from /dev/${MD_DEV} failed" | mail -s "Failed mirror operation" ${MAIL_ADM}
exit 1
fi
fi
/bin/mount /dev/${MD_DEV} /var/lib/BackupPC
res=$?
if [ ${res} -ne 0 ]
then
echo "Unable to mount /dev/${MD_DEV} to /var/lib/BackupPC"
exit 1
else
/sbin/service backuppc start
fi
# Remove the block device from the system
echo 1 > /sys/block/${DEV}/device/delete
The cron entry for the two scripts:
# sudo crontab -l 45 05 * * 1-5 /usr/local/bin/drop-device >> /var/log/drop-device.log 2>&1 0,15,30,45 17-23 * * 1-5 /usr/local/bin/add-device >> /var/log/add-device.log 2>&1