When Secure Boot enabled, OS is stuck in an infinite boot loop

Hey Guys

We wanted to test the Secure Boot feature on Balena OS (v3.0.15) using Maxtang EHL-35 motherboard with AMI BIOS v2.22.1282. It has an Intel J6412 CPU with built-in TPM 2.0 chip (with firmware version 600.15).

  • We reset the BIOS and entered into Secure Boot setup mode
  • USB drive inserted, booted, in the cloud dashboard we wait a minute or two for system to copy all files to the SSD drive
  • Installer correctly shuts down the system (all LEDs are off)
  • We restarted the machine, set boot device to SSD UEFI and it is stuck in a “Post Provisioning state”

It keeps rebooting after the “Welcome to GRUB” text. Kinda looks like, Secure Boot feature is working but it might have some problem mounting the LUKS root partition. If we enable Secure Boot in the BIOS, the boot process successfully gets to GRUB, so probably signatures are okay, because we tried resetting the keys in the BIOS and it correctly threw and incorrect signature error upon booting.

We followed this guide:

Here are things we have tried:

  • Without Secure Boot (–secureBoot), OS image works perfectly
  • We tried it with Prod and Dev images as well
  • We tried the first boot in the BIOS with Secure Boot enabled and disabled
  • In the BIOS the boot order is clean, so all boot order options are disabled except for the first one which is set to USB UEFI, and after the shutdown we set it SSD UEFI.

Interesting thing we noticed: On the first boot the installer creates a device in the fleet, something happens, installer reboots and restart the installer and creates another device (the one that actually will be installed). It is all by itself. Then system shuts down for first boot. Only development image does this. Production image only creates one device only.

Is there any way to get more verbose error messages to help further the investigation?

Thank you.

UPDATE: I tried everything with v3.1.3 OS version, with the same results.

UPDATE2: Here are a few screenshots from the bios.

Hi Peter,

Thanks for all of the detail. I have installed balenaOS with Secure Boot & Full Disk Encryption (SB & FDE) on many different types of x86 hardware. Most of the challenges we have seen so far have been with finding the right BIOS settings for provisioning, and these do vary from one type of device to another.

But in this case, I agree with your synopsis that the provisioning seems to have gone well, except for the creation of the extra zombie device. I don’t have a suggestion for you yet. But we will look at it further and keep you posted.

BTW, smart move to try resetting the keys in the BIOS and finding that it correctly threw the incorrect signature error. That’s informative, and rules out some potential causes.

thanks for the reply @rosswesleyporter

is there any way to get some debug messages from the OS? is there any way to at least figure out what goes wrong?

Hi,

thanks for your interest in the secure boot feature. You seem to be doing everything correctly and your thought about failing to unlock LUKS also seems correct. Though uncommon, we have seen a similar behavior on devices where the PCR register values change between reboots even though nothing else has changed on the device. We can quite easily test whether this is the case with your device as well - the procedure is to provision an unencrypted/no-secure-boot OS on the device and reboot it a few times watching whether the PCR values change inbetween.

If you want us to perform the test, all we need from you is to provision the device and share support access. If you prefer running the test yourself, I can put together a testing application that you can deploy to the device. Let us know which of the two you prefer.

Hi,

Is there a workaround for setting up secure boot if the device does change PCR values on reboot? I’m looking to set up a new PC, but haven’t bought the device yet. It’s likely to be a 12th or 13th gen i7 processor. I just want to confirm I’ll be able to setup secure boot if the device turns out to have this issue.

Or better still, is there any way to identify from their specs what devices might have this issue?

Thanks

Hi,

unfortunately there is no simple way to guess this from just the specs. The good news is that we have been able to make this work with most of the 12th and 13th gen hardware available to us (with the only exception being a device that came with Secure Boot enabled and providing no option to enter Setup Mode or replace the default keys). Some devices do need additional steps depending on what gets measured into PCR1 (we have seen e.g. dynamic memory voltage or CPU temperature at boot) but these are mostly set-once-and-forget-forever BIOS settings.

We realize this is not ideal and are alreary working on moving away from PCR1 as it is quite unreliable, if you are interested you can follow the related GitHub PR: Seal LUKS passphrase with PCR7 by jakogut · Pull Request #3259 · balena-os/meta-balena · GitHub but this is still a work in progress and we have not ETA for this to be finished.

In the meantime, could you maybe elaborate about your use-case and why you need to have Secure Boot enabled?

Thanks, that’s very promising.

The use case is a remote deployment of a PC to act as an edge server. Because it’s remote I’m keen to ensure any proprietary data is protected.

Thanks again

1 Like

Hi,

Just some feedback on the current PCR1 based secure boot option for anyone following this thread. I set up a few NUC11 devices with secure boot a few months ago. That went well at the time, but starting them up this morning, all of them went into endless reboots.

I can see there’s been good progress on a PCR7 based solution and look forward to trying it when it’s released.

Thanks