Some nodes failed to boot after kernel batch upgrade

  • A+
Categories:Linux

Today my friend upgraded some nodes and a small part of them failed to boot after the upgrade.

He checked such systems and reinstall the new installed kernel, then reboot them and we could enter the systems without any issue.

I recalled I had resolved similar issues on last round of maintenance and at that time I did not figure out what caused this issue, while this time I will not let it go.

So, any difference between the upgrade by our script and the manual upgrade?

Messages log files are always our best friend in such situation and I extracted both related parts here:

Upgrade by our script:

Mar  9 07:00:54 wxxxxx dracut: Executing: /sbin/dracut -f /boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img 3.10.0-1062.12.1.el7.x86_64
Mar  9 07:00:55 wxxxxx dracut: dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'plymouth' will not be installed, because command 'plymouthd' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'plymouth' will not be installed, because command 'plymouth' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'btrfs' will not be installed, because command 'btrfs' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
Mar  9 07:00:55 wxxxxx dracut: dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'biosdevname' will not be installed, because command 'biosdevname' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'btrfs' will not be installed, because command 'btrfs' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
Mar  9 07:00:56 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
Mar  9 07:00:56 wxxxxx dracut: *** Including module: bash ***
Mar  9 07:00:56 wxxxxx dracut: *** Including module: nss-softokn ***
Mar  9 07:00:56 wxxxxx dracut: *** Including module: i18n ***
Mar  9 07:00:58 wxxxxx dracut: *** Including module: network ***
Mar  9 07:00:59 wxxxxx dracut: *** Including module: ifcfg ***
Mar  9 07:00:59 wxxxxx dracut: *** Including module: kernel-modules ***
Mar  9 07:01:01 wxxxxx systemd: Started Session 47237 of user root.
Mar  9 07:01:06 wxxxxx dracut: *** Including module: resume ***
Mar  9 07:01:06 wxxxxx dracut: *** Including module: rootfs-block ***
Mar  9 07:01:06 wxxxxx dracut: *** Including module: terminfo ***
Mar  9 07:01:06 wxxxxx dracut: *** Including module: udev-rules ***
Mar  9 07:01:06 wxxxxx dracut: Skipping udev rule: 40-redhat-cpu-hotplug.rules
Mar  9 07:01:06 wxxxxx dracut: Skipping udev rule: 91-permissions.rules
Mar  9 07:01:06 wxxxxx dracut: *** Including module: systemd ***
Mar  9 07:01:07 wxxxxx dracut: *** Including module: usrmount ***
Mar  9 07:01:07 wxxxxx dracut: *** Including module: base ***
Mar  9 07:01:07 wxxxxx dracut: *** Including module: fs-lib ***
Mar  9 07:01:07 wxxxxx dracut: *** Including module: microcode_ctl-fw_dir_override ***
Mar  9 07:01:07 wxxxxx dracut:  microcode_ctl module: mangling fw_dir
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: reset fw_dir to "/lib/firmware/updates /lib/firmware"
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel"...
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: intel: Host-Only mode is enabled and ucode name does not match the expected one, skipping caveat ("06-25-01" not in " 
intel-ucode/*")
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-2d-07"...
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-2d-07", skipping
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-4f-01"...
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-4f-01", skipping
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-55-04"...
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-55-04", skipping
Mar  9 07:01:07 wxxxxx dracut:    microcode_ctl: final fw_dir: "/lib/firmware/updates /lib/firmware"
Mar  9 07:01:07 wxxxxx dracut: *** Including module: shutdown ***
Mar  9 07:01:07 wxxxxx dracut: *** Including modules done ***
Mar  9 07:01:07 wxxxxx dracut: *** Installing kernel module dependencies and firmware ***
Mar  9 07:01:07 wxxxxx dracut: *** Installing kernel module dependencies and firmware done ***
Mar  9 07:01:07 wxxxxx dracut: *** Resolving executable dependencies ***
Mar  9 07:01:09 wxxxxx dracut: *** Resolving executable dependencies done***
Mar  9 07:01:09 wxxxxx dracut: *** Hardlinking files ***
Mar  9 07:01:09 wxxxxx dracut: *** Hardlinking files done ***
Mar  9 07:01:09 wxxxxx dracut: *** Stripping files ***
Mar  9 07:01:09 wxxxxx dracut: *** Stripping files done ***
Mar  9 07:01:09 wxxxxx dracut: *** Generating early-microcode cpio image contents ***
Mar  9 07:01:09 wxxxxx dracut: *** Constructing GenuineIntel.bin ****
Mar  9 07:01:09 wxxxxx systemd: Started Session 47238 of user root.
Mar  9 07:01:09 wxxxxx systemd-logind: New session 47238 of user root.
Mar  9 07:01:09 wxxxxx dracut: *** No early-microcode cpio image needed ***
Mar  9 07:01:09 wxxxxx dracut: *** Store current command line parameters ***
Mar  9 07:01:09 wxxxxx dracut: *** Creating image file ***
Mar  9 07:01:10 wxxxxx systemd: Stopping Authorization Manager...
Mar  9 07:01:10 wxxxxx systemd: Stopping Session 47232 of user root.
Mar  9 07:01:10 wxxxxx systemd: Stopped target IBM Tivoli Monitoring for "/opt/IBM/ITM".
Mar  9 07:01:10 wxxxxx systemd: Stopped target Multi-User System.
Mar  9 07:01:10 wxxxxx systemd: Stopping Job spooling tools...
Mar  9 07:01:10 wxxxxx systemd: Stopping Metricbeat service...
Mar  9 07:01:10 wxxxxx systemd: Stopped ei firstboot service.

Upgrade manually:

Mar  9 10:18:49 wxxxxx dracut: Executing: /sbin/dracut -f /boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img 3.10.0-1062.12.1.el7.x86_64
Mar  9 10:18:50 wxxxxx dracut: dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'plymouth' will not be installed, because command 'plymouthd' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'plymouth' will not be installed, because command 'plymouth' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'btrfs' will not be installed, because command 'btrfs' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'biosdevname' will not be installed, because command 'biosdevname' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'btrfs' will not be installed, because command 'btrfs' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'dmsquash-live-ntfs' will not be installed, because command 'ntfs-3g' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'lvm' will not be installed, because command 'lvm' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'multipath' will not be installed, because command 'multipath' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'cifs' will not be installed, because command 'mount.cifs' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsistart' could not be found!
Mar  9 10:18:50 wxxxxx dracut: dracut module 'iscsi' will not be installed, because command 'iscsi-iname' could not be found!
Mar  9 10:18:50 wxxxxx dracut: *** Including module: bash ***
Mar  9 10:18:50 wxxxxx dracut: *** Including module: nss-softokn ***
Mar  9 10:18:50 wxxxxx dracut: *** Including module: i18n ***
Mar  9 10:18:51 wxxxxx dracut: *** Including module: network ***
Mar  9 10:18:52 wxxxxx dracut: *** Including module: ifcfg ***
Mar  9 10:18:52 wxxxxx dracut: *** Including module: kernel-modules ***
Mar  9 10:18:58 wxxxxx dracut: *** Including module: resume ***
Mar  9 10:18:58 wxxxxx dracut: *** Including module: rootfs-block ***
Mar  9 10:18:58 wxxxxx dracut: *** Including module: terminfo ***
Mar  9 10:18:58 wxxxxx dracut: *** Including module: udev-rules ***
Mar  9 10:18:58 wxxxxx dracut: Skipping udev rule: 40-redhat-cpu-hotplug.rules
Mar  9 10:18:58 wxxxxx dracut: Skipping udev rule: 91-permissions.rules
Mar  9 10:18:58 wxxxxx dracut: *** Including module: systemd ***
Mar  9 10:18:59 wxxxxx dracut: *** Including module: usrmount ***
Mar  9 10:18:59 wxxxxx dracut: *** Including module: base ***
Mar  9 10:18:59 wxxxxx dracut: *** Including module: fs-lib ***
Mar  9 10:18:59 wxxxxx dracut: *** Including module: microcode_ctl-fw_dir_override ***
Mar  9 10:18:59 wxxxxx dracut:  microcode_ctl module: mangling fw_dir
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: reset fw_dir to "/lib/firmware/updates /lib/firmware"
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel"...
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: intel: Host-Only mode is enabled and ucode name does not match the expected one, skipping caveat ("06-25-01" not in " 
intel-ucode/*")
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-2d-07"...
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-2d-07", skipping
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-4f-01"...
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-4f-01", skipping
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: processing data directory  "/usr/share/microcode_ctl/ucode_with_caveats/intel-06-55-04"...
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: kernel version "3.10.0-1062.12.1.el7.x86_64" failed early load check for "intel-06-55-04", skipping
Mar  9 10:18:59 wxxxxx dracut:    microcode_ctl: final fw_dir: "/lib/firmware/updates /lib/firmware"
Mar  9 10:18:59 wxxxxx dracut: *** Including module: shutdown ***
Mar  9 10:18:59 wxxxxx dracut: *** Including modules done ***
Mar  9 10:18:59 wxxxxx dracut: *** Installing kernel module dependencies and firmware ***
Mar  9 10:18:59 wxxxxx dracut: *** Installing kernel module dependencies and firmware done ***
Mar  9 10:18:59 wxxxxx dracut: *** Resolving executable dependencies ***
Mar  9 10:19:01 wxxxxx dracut: *** Resolving executable dependencies done***
Mar  9 10:19:01 wxxxxx dracut: *** Hardlinking files ***
Mar  9 10:19:01 wxxxxx dracut: *** Hardlinking files done ***
Mar  9 10:19:01 wxxxxx dracut: *** Stripping files ***
Mar  9 10:19:01 wxxxxx dracut: *** Stripping files done ***
Mar  9 10:19:01 wxxxxx dracut: *** Generating early-microcode cpio image contents ***
Mar  9 10:19:01 wxxxxx dracut: *** Constructing GenuineIntel.bin ****
Mar  9 10:19:01 wxxxxx dracut: *** No early-microcode cpio image needed ***
Mar  9 10:19:01 wxxxxx dracut: *** Store current command line parameters ***
Mar  9 10:19:01 wxxxxx dracut: *** Creating image file ***
Mar  9 10:19:08 wxxxxx dracut: *** Creating image file done ***
Mar  9 10:19:14 wxxxxx dracut: *** Creating initramfs image file '/boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img' done ***
Mar  9 10:19:54 wxxxxx systemd: Stopped target IBM Tivoli Monitoring for "/opt/IBM/ITM".
Mar  9 10:19:54 wxxxxx systemd: Stopped target rpc_pipefs.target.
Mar  9 10:19:54 wxxxxx systemd: Stopped target RPC Port Mapper.
Mar  9 10:19:54 wxxxxx systemd: Stopping Authorization Manager...
Mar  9 10:19:54 wxxxxx systemd: Stopped Dump dmesg to /var/log/dmesg.

So it is clear the first one missed below words:

*** Creating initramfs image file '/boot/initramfs-3.10.0-1062.12.1.el7.x86_64.img' done ***

The reason is that we rebooted the nodes too fast that the initramfs images were not finished.

We upgraded the manged nodes using scripts so we could do the upgrade on hundreds of nodes at the same time, while we must confirm the script was finished and the new kernel initramfs images were generated before we rebooted them.

The case itself is really simple and here I want to share is that we should not just run something and believe they will work well. We have to do some checks to confirm the results. This simple thinking will improve our script stablity dramatically with acceptable effert.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: