Crazy Times with zxfer

I’ve started using zxfer that @AllanJude referred me to recently. It does a nice job. My main difficulty was how to get it to work efficiently over the 10Mbps that’s my effective DSL speed.

First, I made a copy of zxfer (zmxfer) that incorporates mbuffer. This is a crude hack, but helps me ensure that I’m getting around the mysterious hanging transmits I have previously seen sending zfs to zfs. Mbuffer seems to smooth this out well.

$LZFS send -i "$copyprev" "$copysrc" \| \
/usr/local/bin/mbuffer -q -s 128k -m 128M \
| /usr/local/bin/mbuffer -q -s 128k -m 128M \
| $RZFS receive $option_F "$copydest" \
|| { echo "Error when zfs send/receiving."; beep; exit 1; }

My off-site transfer script ssh’s to the primary backup server, queries a list of zfs filesystems to replicate and copies that back:

rm -f /tmp/zxfer_cmds*
if [ `ls /tmp/xfer-* 2>/dev/null | wc -l` -gt 0 ] ; then
   echo "Previous transfer in progress, bye."
   exit 1
ssh -i $SK juno ./ || \
   ( echo "Crap, didn't generate file-system list, bye."; exit 1 )
scp -i $SK juno:/tmp/vol_list /tmp || \
   ( echo "Crap, didn't copy file-system list, bye."; exit 1 )

We need to turn that list of filesystems into actual transfer commands. I create a file that full of the commands to execute later:

while read FS ; do
   [ -z "$FS" ] && continue;
   PFS=`dirname $FS`
   if [ "$PFS" == "." ] ; then 
   echo "[ ! -f /tmp/stop-xfer ] && sudo zmxfer -dFPsv \
 -O \"-i .ssh/backup_dsa ctbu@juno sudo \" \
 -N tank/$FS $PFS"
done < /tmp/vol_list > $CMDLIST

You might think, “what a lot of sudo!” It’s good practice. I have dedicated a backup user to do this instead of root. I’ve configured the necessary sudoers file entries to make this work.

TIP: disable requiretty in sudoers [S.O.]

We want to increase the parallelism of these zfs transfers as much as possible. The time it takes to transfer zero-length snapshots in serial is prohibitive.

L=`wc -l < $CMDLIST`
Q=$[ $[ $L + 8 ] / 8 ]
split -l $Q $CMDLIST $XFPRE

Now we run these in screen, partly because ssh and sudo and mbuffer all tend to get a bit grouchy if they can’t agree on if the really need a tty or not…and mostly because I want to keep tabs on where any transfer hangups are. This keeps script output collated. First we test for and fire up a detached screen as necessary:

screen -ls xfer | fgrep -q '.xfer' || screen -dmS xfer
sleep 1

And then we fill the screen with some commands. (We need to have a .screenrc that defines eight screens.)

for x in $XFPRE* ; do
   echo "rm /tmp/xfer-$i" >> $x
   cmd="touch /tmp/xfer-$i"
   screen -S xfer -p$i -X stuff $"$cmd\n"
   screen -S xfer -p$i -X stuff $"time bash -x $x\n"
   i=$[ $i + 1 ]

Once this script of mine is run, you can connect to the screen using:

screen -S xfer -x

And watch the scripts do their stuff. (That stuff command is actually a true screen directive: stuff $crap into terminal $p.)

I’ve been able to get my transfer time down from 140 minutes to about 14 minutes. Also many of the backups I started transferring I figured out how to reduce in scope by stopping hourly snapshots on file systems that don’t require them.