When Secure Boot enabled, OS is stuck in an infinite boot loop

Hey Guys

We wanted to test the Secure Boot feature on Balena OS (v3.0.15) using Maxtang EHL-35 motherboard with AMI BIOS v2.22.1282. It has an Intel J6412 CPU with built-in TPM 2.0 chip (with firmware version 600.15).

  • We reset the BIOS and entered into Secure Boot setup mode
  • USB drive inserted, booted, in the cloud dashboard we wait a minute or two for system to copy all files to the SSD drive
  • Installer correctly shuts down the system (all LEDs are off)
  • We restarted the machine, set boot device to SSD UEFI and it is stuck in a “Post Provisioning state”

It keeps rebooting after the “Welcome to GRUB” text. Kinda looks like, Secure Boot feature is working but it might have some problem mounting the LUKS root partition. If we enable Secure Boot in the BIOS, the boot process successfully gets to GRUB, so probably signatures are okay, because we tried resetting the keys in the BIOS and it correctly threw and incorrect signature error upon booting.

We followed this guide:

Here are things we have tried:

  • Without Secure Boot (–secureBoot), OS image works perfectly
  • We tried it with Prod and Dev images as well
  • We tried the first boot in the BIOS with Secure Boot enabled and disabled
  • In the BIOS the boot order is clean, so all boot order options are disabled except for the first one which is set to USB UEFI, and after the shutdown we set it SSD UEFI.

Interesting thing we noticed: On the first boot the installer creates a device in the fleet, something happens, installer reboots and restart the installer and creates another device (the one that actually will be installed). It is all by itself. Then system shuts down for first boot. Only development image does this. Production image only creates one device only.

Is there any way to get more verbose error messages to help further the investigation?

Thank you.

UPDATE: I tried everything with v3.1.3 OS version, with the same results.

UPDATE2: Here are a few screenshots from the bios.

Hi Peter,

Thanks for all of the detail. I have installed balenaOS with Secure Boot & Full Disk Encryption (SB & FDE) on many different types of x86 hardware. Most of the challenges we have seen so far have been with finding the right BIOS settings for provisioning, and these do vary from one type of device to another.

But in this case, I agree with your synopsis that the provisioning seems to have gone well, except for the creation of the extra zombie device. I don’t have a suggestion for you yet. But we will look at it further and keep you posted.

BTW, smart move to try resetting the keys in the BIOS and finding that it correctly threw the incorrect signature error. That’s informative, and rules out some potential causes.

thanks for the reply @rosswesleyporter

is there any way to get some debug messages from the OS? is there any way to at least figure out what goes wrong?

Hi,

thanks for your interest in the secure boot feature. You seem to be doing everything correctly and your thought about failing to unlock LUKS also seems correct. Though uncommon, we have seen a similar behavior on devices where the PCR register values change between reboots even though nothing else has changed on the device. We can quite easily test whether this is the case with your device as well - the procedure is to provision an unencrypted/no-secure-boot OS on the device and reboot it a few times watching whether the PCR values change inbetween.

If you want us to perform the test, all we need from you is to provision the device and share support access. If you prefer running the test yourself, I can put together a testing application that you can deploy to the device. Let us know which of the two you prefer.