Two drives fail at once? Oh yeah…

Just as I thot I was all cool for having a sixteen drive NAS, today’s opening of it and trying a new network card (did not fit) left me with bad news on the next powerup.

 > dmesg | grep ata | grep error:
[   23.223221] ata13.00: error: { ABRT }
[   23.234448] ata13.00: error: { ABRT }
[   31.262674] ata13.00: error: { ABRT }
[   31.275241] ata13.00: error: { ABRT }
[   31.288012] ata13.00: error: { ABRT }
[   39.073802] ata13.00: error: { ABRT }
[   50.815339] ata13.00: error: { ABRT }
[   50.827082] ata13.00: error: { ABRT }
[   57.606645] ata13.00: error: { ABRT }
[   69.616356] ata7.00: error: { ABRT }
[   69.616451] ata13.00: error: { ABRT }

That’s failure of two drives. TWO at the same time! ….and look at this:

 > zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 4h43m with 0 errors on Sat Sep  6 00:13:22 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9PDPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9SBBC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6GMGC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95REC  ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G9LH9C  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G95JPC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G6LUDC  ONLINE       0     0     0
            ata-Hitachi_HTS547575A9E384_J2190059G5PXYC  ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUOS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_X3EJSVUNS            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT11T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT17T            ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT12T            ONLINE       0     0     0
            ata-TOSHIBA_MQ01ABD050_933PTT13T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT14T            ONLINE       0     0     2
            ata-TOSHIBA_MQ01ABD050_933PTT0ZT            ONLINE       0     0     0
        logs
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part5   ONLINE       0     0     0
        cache
          ata-OCZ-AGILITY4_OCZ-77Z13FI634825PNW-part6   ONLINE       0     0     0

errors: No known data errors

Two checksum errors in the same Raid 5 volume. That’s going to be a very tricky replacement. I think I’m going to either replace one disk at a time and hope for the best resilver possibilities, or maybe…add a PCI controller back in there and add another zvol and migrate data from one zvol to another? That’ll be a wild trick.

It will frack up my backups for a while, that’s for sure. Oh, and those Toshiba drives? That’s three Toshiba failures, zero Hitatchi failures.

Discover more from Bitratchet

Subscribe now to keep reading and get access to the full archive.

Continue reading