Replacing a failed disk in a mdadm RAID 5

December 8, 2013 Thomas Jansson 16 Comments

Introduction

I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:

[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0
[ 8347.726694] ata6.00: irq_stat 0x40000008
[ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED
[ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in
[ 8347.726705]          res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <F>
[ 8347.726709] ata6.00: status: { DRDY ERR }
[ 8347.726711] ata6.00: error: { UNC }
[ 8347.731152] ata6.00: configured for UDMA/133
[ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code
[ 8347.731183] sd 5:0:0:0: [sde]  
[ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8347.731188] sd 5:0:0:0: [sde]  
[ 8347.731190] Sense Key : Medium Error [current] [descriptor]
[ 8347.731194] Descriptor sense data with sense descriptors (in hex):
[ 8347.731195]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 8347.731204]         17 00 10 78 
[ 8347.731208] sd 5:0:0:0: [sde]  
[ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8347.731214] sd 5:0:0:0: [sde] CDB: 
[ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00
[ 8347.731224] end_request: I/O error, dev sde, sector 385880184
[ 8347.731227] end_request: I/O error, dev sde, sector 385880184
[ 8347.731241] ata6: EH complete
[ 8348.531767] raid5_end_read_request: 2 callbacks suppressed
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
[ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
[ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
[ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1)
[ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1)
[ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1)
[ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1)
[20132.633534] md: md0: data-check done.

Investigating the bad drive

To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:

# smartctl -i /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-003BB1
Serial Number:    WD-WCAV5K430328
LU WWN Device Id: 5 0014ee 2afe6f748
Firmware Version: 80.00A80
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Dec  2 22:09:37 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:

smartctl -t long /dev/sde

The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.

# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     23574         267040872

I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.

Removing the faulty disk

A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:

# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:

mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:

# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166

Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.

# parted -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 1 -1
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
 
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
 
(parted) quit                                                             
Information: You may need to update /etc/fstab.

Now the disk was ready for inclusion in the raid:

mdadm --manage /dev/md0 --add /dev/sde1

Over the next 3 hours I could monitor the rebuild using the following command:

[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
 
unused devices: <none>

Monitoring health of the raid

I have several systems in place to monitor the health of my raid (among other things):

logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
munin – graphical and historical monitoring of performance and all stats of the server.