December 8, 2013 Thomas Jansson16 Comments
Introduction
I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:
[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0 [ 8347.726694] ata6.00: irq_stat 0x40000008 [ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED [ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in [ 8347.726705] res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <F> [ 8347.726709] ata6.00: status: { DRDY ERR } [ 8347.726711] ata6.00: error: { UNC } [ 8347.731152] ata6.00: configured for UDMA/133 [ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code [ 8347.731183] sd 5:0:0:0: [sde] [ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 8347.731188] sd 5:0:0:0: [sde] [ 8347.731190] Sense Key : Medium Error [current] [descriptor] [ 8347.731194] Descriptor sense data with sense descriptors (in hex): [ 8347.731195] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 8347.731204] 17 00 10 78 [ 8347.731208] sd 5:0:0:0: [sde] [ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed [ 8347.731214] sd 5:0:0:0: [sde] CDB: [ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00 [ 8347.731224] end_request: I/O error, dev sde, sector 385880184 [ 8347.731227] end_request: I/O error, dev sde, sector 385880184 [ 8347.731241] ata6: EH complete [ 8348.531767] raid5_end_read_request: 2 callbacks suppressed [ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1) [ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1) [ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1) [ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1) [ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1) [ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1) [ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1) [20132.633534] md: md0: data-check done.
Investigating the bad drive
To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:
# smartctl -i /dev/sde smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF) Device Model: WDC WD10EARS-003BB1 Serial Number: WD-WCAV5K430328 LU WWN Device Id: 5 0014ee 2afe6f748 Firmware Version: 80.00A80 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Mon Dec 2 22:09:37 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled
This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:
smartctl -t long /dev/sde
The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.
# smartctl -l selftest /dev/sde smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 23574 267040872
I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.
Removing the faulty disk
A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:
# hdparm -i /dev/sde | grep SerialNo Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328
and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:
mdadm --manage /dev/md0 --fail /dev/sde1 mdadm --manage /dev/md0 --remove /dev/sde1
Adding the new drive
Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:
# hdparm -i /dev/sde | grep SerialNo Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166
Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.
# parted -a optimal /dev/sde (parted) mklabel gpt (parted) mkpart primary 1 -1 (parted) set 1 raid on (parted) print Model: ATA WDC WD30EFRX-68E (scsi) Disk /dev/sde: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 3001GB 3001GB primary raid (parted) quit Information: You may need to update /etc/fstab.
Now the disk was ready for inclusion in the raid:
mdadm --manage /dev/md0 --add /dev/sde1
Over the next 3 hours I could monitor the rebuild using the following command:
[root@kelvin ~][20:43]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4] 2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU] [>....................] recovery = 0.5% (4893636/976760320) finish=176.9min speed=91536K/sec bitmap: 4/8 pages [16KB], 65536KB chunk unused devices: <none>
Monitoring health of the raid
I have several systems in place to monitor the health of my raid (among other things):
- logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
- mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
- smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
- munin – graphical and historical monitoring of performance and all stats of the server.