Multipath mess

Hristofor Pamyatnih
2 min readSep 9, 2020

A week ago we s-started tests of our product on SLES 12 SP5. Everything went perfect until we decided to setup SLEHA with SAN storage. I followed the instructions, rebooted the OS and than OS refuses to boot. Message

a start job is running for root.device

appeared on boot screen and than emergency mode prompt.

In the previous iterations of the testing we used virtual machines and everything went just fine. So I try to reproduce the issue on VM and I was quite surprised when everything passes as expected. Also it was impossible to mount the device manually.

Lets dig.

In the begginning of my research I didn’t establish the relation between multipapth and the accident. I begun to research why boot and swap partitions are not mounted.

I was really surprised when I understood how filesystem mounts are processed in by systemd. In short — fstab is translated into systemd units by one of the systemd components and filesystems are mounted when this units are executed. If SYSTEMD_READY=0 is set for the device in the udev database you can’t mount it. I found another annoying defect — when you try to execute mount /dev/sda1 /boot, nothing happens. The exit code is 0, no message is printed, but the partition is not mounted. Why? Because when this property is set to zero, systemd is umnounting the partition immediately after mount.

So what happens? Damn corner case. Due to some defect, multipath daemon is setting SYSTEMD_READY=0 for all disk drives. Since we are not using LVM and we are not accessing disk devices through device mapper, we can’t mount them. Solution — create /etc/multipath/multipath.conf and blacklist the device by wwid or by name. In my case it was:

blacklist {
devnode “^sda[0–9]*”
}

After I found the reason why it happened, I also found Suse KB article on the subject. Sometimes google result sorting algorithm is playing nice tricks.

--

--