Tuesday, 9 April 2013

Replacing a failed Software RAID-1 disk on Ubuntu 12.04

Preparation:

Figure out which disk is broken

In my case this was easy, as the server is hosted, and the hosting company did a hardware check for me after random failures where the server suddenly died. They reported the following:
The requested check is finished now. We found problems with the drive with SN Z1F0WA1P so please confirm the complete loss of data on it and tell us when we may shutdown the server to change it.

Nice!
Luckily I set up RAID-1 so the "complete loss of data" is not really a big problem.
I found the device id of the failed disk this way:
$ ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG -> ../../sda
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0V6DG-part5 -> ../../sda5
lrwxrwxrwx 1 root root  9 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P -> ../../sdb
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P-part4 -> ../../sdb4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 ata-ST3000DM001-9YN166_Z1F0WA1P-part5 -> ../../sdb5
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-name-rescue:0 -> ../../md0
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-name-rescue:1 -> ../../md1
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-name-rescue:2 -> ../../md2
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-name-rescue:3 -> ../../md3
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-uuid-3422630c:f91897fa:b2465508:b6c1fad3 -> ../../md2
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-uuid-4e8d9f09:0ab6c8ff:977c0a64:12f6f62f -> ../../md0
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-uuid-be237c26:441c6f5a:e57904b4:32859c0e -> ../../md3
lrwxrwxrwx 1 root root  9 Apr  7 09:55 md-uuid-e3fa8ae1:b83c78d6:2653a380:509bce21 -> ../../md1
lrwxrwxrwx 1 root root  9 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG -> ../../sda
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0V6DG-part5 -> ../../sda5
lrwxrwxrwx 1 root root  9 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P -> ../../sdb
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P-part4 -> ../../sdb4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 scsi-SATA_ST3000DM001-9YN_Z1F0WA1P-part5 -> ../../sdb5
lrwxrwxrwx 1 root root  9 Apr  7 09:55 wwn-0x5000c5004de65feb -> ../../sda
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004de65feb-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004de65feb-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004de65feb-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004de65feb-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004de65feb-part5 -> ../../sda5
lrwxrwxrwx 1 root root  9 Apr  7 09:55 wwn-0x5000c5004df84d19 -> ../../sdb
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004df84d19-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004df84d19-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004df84d19-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004df84d19-part4 -> ../../sdb4
lrwxrwxrwx 1 root root 10 Apr  7 09:55 wwn-0x5000c5004df84d19-part5 -> ../../sdb5

I marked the serial number in red above. Now we know it's sdb that has failed.

Determining what tools we can use

A lot of guides suggest the use of fdisk for setting up the blank replacement disk, specifically to copy the partition table so the disks end up identical:
sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
However this is not possible if the disk is larger than 2TB. You can check it this way:


$ sudo fdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes
256 heads, 63 sectors/track, 363376 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1  4294967295  2147483647+  ee  GPT
Partition 1 does not start on physical sector boundary. 

So in this case we have to use Parted instead, apparently. I don't find recreating partitions manually particularly attractive, so after a bit of research I found that we can use sgdisk instead (see below for the exact procedure).

Checking the current array status

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md3 : active raid1 sda4[0] sdb4[1]
      1822442815 blocks super 1.2 [2/2] [UU]
     
md0 : active raid1 sda1[0] sdb1[1]
      33553336 blocks super 1.2 [2/2] [UU]
     
md2 : active raid1 sda3[0] sdb3[1]
      1073740664 blocks super 1.2 [2/2] [UU]
     
md1 : active raid1 sda2[0] sdb2[1]
      524276 blocks super 1.2 [2/2] [UU]


unused devices: <none>

Notice that the array appears to be functioning correctly (the UU status means they're synchronized). We know one of the underlying disks have failed so we need to change this status. This command is also useful for listing the connection between the various md devices and the corresponding sd devices, such as md0 consisting of sda1 and sdb1.

Disabling failed disk in RAID Array

Marking defective disk as failed

$ sudo mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
$ sudo mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1

$ sudo mdadm --manage /dev/md2 --fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md2
$ sudo mdadm --manage /dev/md3 --fail /dev/sdb4
mdadm: set /dev/sdb4 faulty in /dev/md3
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md3 : active raid1 sda4[0] sdb4[1](F)
      1822442815 blocks super 1.2 [2/1] [U_]
     
md0 : active raid1 sda1[0] sdb1[1](F)
      33553336 blocks super 1.2 [2/1] [U_]
     
md2 : active raid1 sda3[0] sdb3[1](F)
      1073740664 blocks super 1.2 [2/1] [U_]
     
md1 : active raid1 sda2[0] sdb2[1](F)
      524276 blocks super 1.2 [2/1] [U_]
     
unused devices: <none>


Now we can see that all the logical volumes on sdb have been marked as failed, so we go on to "disconnect" them:

Removing failed disk from the RAID-1 array

$ sudo mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0
$ sudo mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md1

$ sudo mdadm --manage /dev/md2 --remove /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md2
$ sudo mdadm --manage /dev/md3 --remove /dev/sdb4
mdadm: hot removed /dev/sdb4 from /dev/md3
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md3 : active raid1 sda4[0]
      1822442815 blocks super 1.2 [2/1] [U_]
     
md0 : active raid1 sda1[0]
      33553336 blocks super 1.2 [2/1] [U_]
     
md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]
     
md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]
     
unused devices: <none>


Notice the lack of sdb partitions as members of the RAID array. Now the failed disk is removed from the array awaiting a fresh new disk to replace it. The system is of course still running as if nothing had happened.

Replace Physical Disk

Shut down the server, replace the defective disk with the new one, and start it up again. The new disk needs to be exactly the same size or bigger than the one it replaces, otherwise we can't reassemble the array again.

Reassemble the RAID Array

Determine that the new disk is detected

$ ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Apr  9 00:18 /dev/sda
brw-rw---- 1 root disk 8,  1 Apr  9 00:18 /dev/sda1
brw-rw---- 1 root disk 8,  2 Apr  9 00:18 /dev/sda2
brw-rw---- 1 root disk 8,  3 Apr  9 00:18 /dev/sda3
brw-rw---- 1 root disk 8,  4 Apr  9 00:18 /dev/sda4
brw-rw---- 1 root disk 8,  5 Apr  9 00:18 /dev/sda5
brw-rw---- 1 root disk 8, 16 Apr  9 00:18 /dev/sdb


Yep, it's there.

Determine that the new disk is the same size as the existing one

 $ sudo sgdisk -p /dev/sda
Disk /dev/sda: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4E334AAB-CB03-40A8-B37D-34071D0CF623
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            4096        67112959   32.0 GiB    FD00
   2        67112960        68161535   512.0 MiB   FD00
   3        68161536      2215645183   1024.0 GiB  FD00
   4      2215645184      5860533134   1.7 TiB     FD00
   5            2048            4095   1024.0 KiB  EF02

$ sudo sgdisk -p /dev/sdb
Creating new GPT entries.
Disk /dev/sdb: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): A035888B-9E42-4470-8316-ECE9796D1245
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 5860533101 sectors (2.7 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name


Perfect!

Prepare new disk

Copy partition table from the existing disk in the array

As mentioned above, most guides I found on the net suggested using
sfdisk -d /dev/sda | sfdisk /dev/sdb
...but because these disks are 3TB and use GUID Partition Table (GPT), we have to use sgdisk instead.
The following command copies the partition table from sda to sdb. Make sure you don't get those mixed up! If you want to make sure, you can back up your partition table first (sgdisk-partition-table-sda is the file name you back up to):
$ sudo sgdisk -b sgdisk-partition-table-sda /dev/sda
The operation has completed successfully.


Now, copy the partition table:
$ sudo sgdisk --replicate=/dev/sdb /dev/sda
The operation has completed successfully.

Let's check sdb now:
$ sudo sgdisk -p /dev/sdb
Disk /dev/sdb: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 4E334AAB-CB03-40A8-B37D-34071D0CF623
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            4096        67112959   32.0 GiB    FD00 
   2        67112960        68161535   512.0 MiB   FD00 
   3        68161536      2215645183   1024.0 GiB  FD00 
   4      2215645184      5860533134   1.7 TiB     FD00 
   5            2048            4095   1024.0 KiB  EF02 

Neat! A perfect copy - even the GUID is the same...

Randomize the GUIDs

Because we copied the partition tables of the existing disk (sda), the new disk (sdb) is identical, including the supposedly unique GUIDs (compare the GUID marked red above with the one of sda higher up on the page). To make the new disk usable in the same system as the other disk, we need to generate new GUIDs.
$ sudo sgdisk -G /dev/sdb
The operation has completed successfully.


$ sudo sgdisk -p /dev/sdb
Disk /dev/sdb: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3B456525-A844-4227-876D-4B6367C2724F
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            4096        67112959   32.0 GiB    FD00
   2        67112960        68161535   512.0 MiB   FD00
   3        68161536      2215645183   1024.0 GiB  FD00
   4      2215645184      5860533134   1.7 TiB     FD00
   5            2048            4095   1024.0 KiB  EF02 

Add the new disk to the RAID array

First, make sure you know which volume to add to which RAID array:
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md2 : active raid1 sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]
    
md0 : active raid1 sda1[0]
      33553336 blocks super 1.2 [2/1] [U_]
    
md1 : active raid1 sda2[0]
      524276 blocks super 1.2 [2/1] [U_]
    
md3 : active raid1 sda4[0]
      1822442815 blocks super 1.2 [2/1] [U_]


unused devices: <none>

Then simply add the sdb* volumes back:
$ sudo mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
$ sudo mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: added /dev/sdb2
$ sudo mdadm --manage /dev/md2 --add /dev/sdb3
mdadm: added /dev/sdb3
$ sudo mdadm --manage /dev/md3 --add /dev/sdb4
mdadm: added /dev/sdb4

Check the progress of synchronizing the disks

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md2 : active raid1 sdb3[2] sda3[0]
      1073740664 blocks super 1.2 [2/1] [U_]
          resync=DELAYED
     
md0 : active raid1 sdb1[2] sda1[0]
      33553336 blocks super 1.2 [2/1] [U_]
      [===>.................]  recovery = 15.4% (5172992/33553336) finish=4.2min speed=110776K/sec
     
md1 : active raid1 sdb2[2] sda2[0]
      524276 blocks super 1.2 [2/1] [U_]
          resync=DELAYED
     
md3 : active raid1 sdb4[2] sda4[0]
      1822442815 blocks super 1.2 [2/1] [U_]
          resync=DELAYED


unused devices: <none>

Here we can see the first disk is already rebuilding. I have a feeling this will take a while!
This is what it looks like when it's done (after quite a few hours!):
$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md2 : active raid1 sdb3[2] sda3[0]
      1073740664 blocks super 1.2 [2/2] [UU]
     
md0 : active raid1 sdb1[2] sda1[0]
      33553336 blocks super 1.2 [2/2] [UU]
     
md1 : active raid1 sdb2[2] sda2[0]
      524276 blocks super 1.2 [2/2] [UU]
     
md3 : active raid1 sdb4[2] sda4[0]
      1822442815 blocks super 1.2 [2/2] [UU]
     
unused devices: <none>

That's it!

6 comments:

  1. Many thanxs Ronny!
    Great info, which just saved me lots of time of googling.

    ReplyDelete
  2. No, problem, Martin, and thanks for the comment. It's always nice to know that I saved someone time by writing this down - that's the main reason for writing it in the first place.

    ReplyDelete
  3. Many thanks Ronny!
    It is good to have engineering scripts like yours.
    Our production server is 'up and running' again, good to see how stron Linux will be !

    ReplyDelete
  4. thanx for your update. but i need more to explain my problem. i have new disk and i have success to copy and synchronize with the old disk, but after i put only new disk in my server, new disk doesn't have boot. how i can fixed it? help your solution

    ReplyDelete
  5. Hello Ronny

    sudo sgdisk -R=/dev/sdb /dev/sda wont work at all... but

    sudo sgdisk -R/dev/sdb /dev/sda works ... just without = sign

    Thanks

    ReplyDelete
    Replies
    1. Thanks for alerting me about the error, Vladan. I've changed it to the long form --replicate=/dev/sdb as it makes it a bit more obvious which disk you're replicating.

      Delete