Monday, March 5, 2012

Debian Squeeze upgrade problem with mdadm

Following the Debian release notes, chapter 4, everything should go fine until you perform the upgrade of the kernel and udev.  Because of big changes in the both, you will get a warning during generation of the initramfs that your mdadm devices do not have entries in mdadm.conf and that you should compare the output of /usr/share/mdadm/mkconf to /etc/mdadm/mdadm.conf.  The problem is that the mdadm.conf that has the wrong entries is the one made for the initramfs, not the one in /etc/mdadm/.  Comparing the two will show the same UUIDs.  I don't know what creates the config file for initramfs, but it uses the newer style of generating the last half of the UUID by hashing the hostname and not by scanning the superblocks for the actual UUID.  So, if you ignore the warning because the output of mkconf matches the contents of /etc/mdadm/mdadm.conf and then reboot, you will find yourself at a busybox prompt when the kernel can't find the root filesystem.  It can't find the root filesystem because it could not mount the md array that contains the lvm partition containing the root filesystem.

I don't know how to avoid this, but I do know how to fix it.  I had written down the md devices and the corresponding UUIDs in case I ran into trouble, so that helped.  I did not run `script` during the upgrade process, but that file would only have been useful during failure analysis.  Anyway, edit /etc/mdadm/mdadm.conf to have the correct UUID entries for your devices.  Save the file and then run

mdadm --assemble --scan

Check /dev to see that it is populated with your md devices, i.e., /dev/md0, /dev/md1, etc.  Then activate lvm volume groups:

vgchange -a y

Now you should have access to the root filesystem and can type 'exit' at the prompt to continue the boot process.

Once the system is up and running again, continue the upgrade process from the release notes.  I had to run apt-get upgrade twice because I got dpkg warnings about some packages not being installed/configured due to errors in post-installation scripts, loops between services, and the like.  I've seen this before and usually it will clean itself up once you run apt-get upgrade a second time.

Now if you check /etc/mdadm/mdadm.conf, you'll see that the UUIDs that the initramfs was trying to use are listed.  Set these to the correct UUIDs and regenerate the initramfs so that it can boot correctly in the future.

update-initramfs -u

You'll notice in section 4.5 that the mdadm gotcha isn't listed.  If you google for it, you'll see similar issues going back to when Squeeze was still in Testing.
Some mdadm/busybox and lvm recovery info.

Without knowing what script is doing the wrong thing regarding UUID, it may be possible to uncompress the initramfs and edit the file for the initial boot to avoid the busybox business.

No comments:

Post a Comment