Author: jedreynolds

About jedreynolds

A programmer that has adopted an all-weather bicycle commuting lifestyle.

Two drives fail at once? Oh yeah…

Just as I thot I was all cool for having a sixteen drive NAS, today’s opening of it and trying a new network card (did not fit) left me with bad news on the next powerup.

 > dmesg | grep ata | grep error:
[   23.223221] ata13.00: error: { ABRT }
[   23.234448] ata13.00: error: { ABRT }
[   31.262674] ata13.00: error: { ABRT }
[   31.275241] ata13.00: error: { ABRT }
[   31.288012] ata13.00: error: { ABRT }
[   39.073802] ata13.00: error: { ABRT }
[   50.815339] ata13.00: error: { ABRT }
[   50.827082] ata13.00: error: { ABRT }
[   57.606645] ata13.00: error: { ABRT }
[   69.616356] ata7.00: error: { ABRT }
[   69.616451] ata13.00: error: { ABRT }

That’s failure of two drives. TWO at the same time! ….and look at this:

 > zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 4h43m with 0 errors on Sat Sep  6 00:13:22 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9PDPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9SBBC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6GMGC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95REC  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9LH9C  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95JPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6LUDC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G5PXYC  ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUOS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUNS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT11T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT17T            ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT12T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT13T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT14T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT0ZT            ONLINE       0     0     0
        logs
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part5   ONLINE       0     0     0
        cache
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part6   ONLINE       0     0     0

errors: No known data errors

Two checksum errors in the same Raid 5 volume. That’s going to be a very tricky replacement. I think I’m going to either replace one disk at a time and hope for the best resilver possibilities, or maybe…add a PCI controller back in there and add another zvol and migrate data from one zvol to another? That’ll be a wild trick.

It will frack up my backups for a while, that’s for sure. Oh, and those Toshiba drives? That’s three Toshiba failures, zero Hitatchi failures.

ZFS on Linux machine

Beavertaill Cactus [Wikipedia]

Beavertaill Cactus [Wikipedia]


Here is my ZFS on Linux story, and some of you might have seen these pictures when I started this project last year: I recycled an old Athlon system from work and placed an 35 Watt AMD A2 processor with 8GB 1600 ram on an Asus mobo in it. I name my home systems after after cacti, and after I installed Ubuntu 12.04 on it, I named this one Beavertail.bitratchet.net.

My previous experiences with storage systems involved them dying from heat. So I decided I would avoid full sized drives and stick with laptop drives and boot off an SSD. I have the SSD partitioned with a /boot, root, and two more partitions for ZIL and L2ARC. The bulk of the storage is a mix of 750GB Hitachi and 500GB Toshiba laptop hard drives, 16 total.  I have lost two drives in this system, which I would label “normal drive attrition.” Boot drive is a 128GB OCZ Vertex 2.

image

Half the drives are on the bottom, and half are on top. At work I have access to gobs of full-height card brackets and this is what I built drive cages out of.

To get all the drives wired up, I started with a bunch of 1x and 2x PCIe sata expanders and used up all my mobo sata ports, but by the time I got to about 12 drives, I only had a PCI slot left, so had to use that. When looking at my disk utilization in iostat -Nx and dstat --disk-util it was plainly clear that I had a swath of drives underperforming and they were all connected to the slowest PCI controller.

Supermicro HBA 8-port SAS controllers

Supermicro HBA 8-port SAS controllers

I saved up and remedied that by purchasing two SuperMicro SAS HBA’s with Marvel chipsets. They are only 3G SATA (equivalent) but they each control eight drives, and they do so consistently. They take 8x PCIe lanes, and that’s great use for the two 16x PCIe slots on the mobo.

02:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

It took me a while to find out my network bandwidth issues. The problem was my motherboard: it has an onboard Realtek chipset. It would max out at 500Mbps download and 250Mbps upload…and very often wedge the system. I got a PCIe 1x Instell card and I got a good clean 955Mbps both ways out of that with one iperf stream, and 985+Mpbs with two iperf streams. To actually achieve this, I needed to put an Intel nic in my workstation as well. (My switch is a 16-port unmanaged Zyxel).

picture of drives

Eight drives on top

I am able to push close to full network capacity to Beavertail. As you can see, the results speak for themselves: the screenie below shows iftop displaying better than 880Mps and I saw it grab 910Mbps during this backup. Clearly part of the success is having a Samsung 840EVO in my laptop, but having a stripe of four zvols clearly allows plenty of IO headroom.

screen capture of iftop

910Mbps transfer from laptop to NAS.

 

Here are some other nerdy stats, mostly on how my drives are arranged:

 > zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 4h43m with 0 errors on Sat Sep  6 00:13:22 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9PDPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9SBBC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6GMGC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95REC  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9LH9C  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95JPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6LUDC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G5PXYC  ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUOS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUNS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT11T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT17T            ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT12T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT13T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT14T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT0ZT            ONLINE       0     0     0
        logs
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part5   ONLINE       0     0     0
        cache
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part6   ONLINE       0     0     0

errors: No known data errors

And to finish up, this system has withstood a series of in-place Ubuntu upgrades. It is now running 14.04. My advice on this, and Linux kernels is this:

  • Do not rush to install new mainline kernels, you have to wait for dkms and spl libraries to sync up with mainline and to send out PPA updates through the ubuntu-zfs ppa.
  • If you do a dist-upgrade and reboot, and your zpool does not return on reboot, this is easily fixed by doing a ubuntu-zfs reinstall: apt-get install --reinstall ubuntu-zfs. This will re-link your kernel modules and you should be good to go.

Like I said, this has been working for four releases of Ubuntu for me, along with replacing controllers anbd drives. My only complaint is that doing sequences of small file operations in it tends to bring the speed down a lot (or has, have not recreated on 14.04 yet). But for streaming large files, I get massive throughput…which is great for my large photo collection!

I have to go to Florida to get to Ferndale

From Bellingham, there is there really no peering point in Qwest’s backbone to Comcast? It seems preposterous that I takes me 102ms to ping 11 miles. I wish I could brag about biking that fast. But at least I do not have to bike to Florida to get to Ferndale:

traceroute to firewall.candelatech.com (70.89.124.249), 30 hops max, 60 byte packets
1  gateway (192.168.45.1)  0.951 ms  0.703 ms  0.590 ms
2  tukw-dsl-gw66.tukw.qwest.net (63.231.10.66)  23.762 ms  23.282 ms  22.912 ms
3  tukw-agw1.inet.qwest.net (71.217.186.9)  22.525 ms  22.756 ms  22.992 ms
4  nap-edge-04.inet.qwest.net (67.14.29.166)  104.071 ms  104.336 ms  104.139 ms
5  65.122.166.78 (65.122.166.78)  105.126 ms  105.112 ms  106.764 ms
6  be-10-cr01.miami.fl.ibone.comcast.net (68.86.82.114)  106.921 ms be-13-cr01.miami.fl.ibone.comcast.net (68.86.82.126)  105.262 ms  104.893 ms
7  be-15-cr01.ashburn.va.ibone.comcast.net (68.86.84.221)  106.538 ms  107.032 ms  106.481 ms
8  he-0-12-0-0-cr01.losangeles.ca.ibone.comcast.net (68.86.86.117)  111.232 ms  110.836 ms  110.693 ms
9  he-2-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.86.97)  111.653 ms  109.401 ms  109.044 ms
10  68.86.93.30 (68.86.93.30)  106.729 ms  106.942 ms  107.967 ms
11  be-41-sur02.ferndale.wa.seattle.comcast.net (69.139.164.30)  110.153 ms  110.399 ms  110.225 ms
12  te-1-0-0-ten01.ferndale.wa.seattle.comcast.net (68.87.206.242)  116.914 ms  117.572 ms  117.213 ms
13  c-50-135-136-13.hsd1.wa.comcast.net (50.135.136.13)  127.260 ms  154.153 ms  157.281 ms
14  * * *

 

Pedal cleats

image

These cleats have been with me for over a year. They got worn smooth from walking on them. The previous pair I left in for two years and I had to drill one of them out. Advice for cleats: use some white lith grease on the bolts when you apply them. Use a long handled hex wrench or ratchet to tighten them. When removing them, drip on some light oil like TriFlow to work into the seams. Wait at least ten minutes for the oil to work in. Take something g sharp like  n awl or a pocket knife or the tip of a new drywall screw to dig out all the crap in the bolt head. Even after that prep, you might not be able to fit your hex bit in. Next try a Torx bit of the same size. The wear on the bolt head might have screwed up the insides of the bolt head, but if you can mallet a torx bit in there, it should grip long enough to use a ratchet to back it out. Otherwise you will want to go to the screw-reverser bit in your drill.

Lesson: use that white lith grease first when applying new bolts!