BalenaOS 2.94.4 on RPI CM4 does not boot when NVMe drive is plugged in the PCIe slot

Hi,

I’m working on a project where BalenaOS runs on an Raspberry PI CM4 and CM4IO board.
The compute module is equipped with 32GB eMMC.
The BalenaOS version we are using is 2.94.4+rev1-v12.11.36.
We experience this problem with our own BalenaOS image and with the standard image downloaded from the website.

The goal is to have BalenaOS boot from the internal eMMC of the CM4 and have an PCIe NVMe drive for data storage.

When running raspberry OS, the OS boots nicely from the eMMC and discovers the NVMe drive automatically.

When running BalenaOS, the system only boots when there is no NVMe drive in the PCIe slot. If we install the NVMe drive the system does not even attempt to boot. The eMMC LED does not blink, the UART console does not post any messages, the connected HDMI screen remains black.

The CM4 is running the latest firmware, 6.1 (updated using rpi-update when raspberry OS was installed).

We have boards in our fleet running BalenaOS 2.88.4-v12.11.0 and these are not suffering from this problem. However this version of BalenaOS does not boot on the latest CM4 firmware.

Does anyone have an idea what is problem and how to troubleshoot or resolve?

Any help is appreciated!

Thanks!

@vanmanda

Which image type are you testing production or development?

I am successfully booting from CM4 with eMMC with an NVMe on both BalenaOS 2.88.4 and now with BalenaOS 2.94.4+rev1. Is it possible you changed the boot order on the EEPROM on the CM4 you are testing with?

You mention the CM4 is running on the newest firmware but I don’t think you can update the CM4 with rpi-update. My understanding is you had to use usbboot on Github.

This is successfully working on both the CM4 IO board and our custom PCBA. I did have an issue with an another version of BalenaOS that did something like this. It would boot but wouldn’t allow a reboot.

Hi,

Thank you for taking the time to answer my request.
I was testing with a production image.
I haven’t touched the boot order, but that could be the solution.
I’ll check into usbboot (new for me).
Can you share with me how the boot order should look like?
I’ll let you know the result.

Regards

Hi,

I set the boot order to BOOT_ORDER=0xf25641, which should be the default and corresponds with: SD, USB mass storage, NVME, USB 2.0, and Network. So the eMMC should start first.
I also set BOOT_UART=1 so that I can see some output while the system boots.

Setting the boot order did not resolve the problem and this is the output I get on the serial console:

RPi: BOOTLOADER release VERSION:8ba17717 DATE: 2023/01/11 TIME: 17:40:52
BOOTMODE: 0x06 partition 0 build-ts BUILD_TIMESTAMP=1673458852 serial 69af86a7 b oardrev c03141 stc 476303
PM_RSTS: 0x00001000
part 00000000 reset_info 00000000
uSD voltage 3.3V
Initialising SDRAM ‘Samsung’ 16Gb x2 total-size: 32 Gbit 3200
DDR 3200 1 0 32 152

Boot mode: SD (01) order f2564
SD HOST: 200000000 CTL0: 0x00800000 BUS: 400000 Hz actual: 390625 HZ div: 512 (2 56) status: 0x1fff0000 delay: 276
SD HOST: 200000000 CTL0: 0x00800f00 BUS: 400000 Hz actual: 390625 HZ div: 512 (2 56) status: 0x1fff0000 delay: 276
EMMC
SD retry 1 oc 0
SD HOST: 200000000 CTL0: 0x00800000 BUS: 400000 Hz actual: 390625 HZ div: 512 (2 56) status: 0x1fff0000 delay: 276
OCR c0ff8080 [0]
CID: 00150100424a54443452034425ba4429
SD HOST: 200000000 CTL0: 0x00800f04 BUS: 25000000 Hz actual: 25000000 HZ div: 8 (4) status: 0x1fff0000 delay: 4
SD HOST: 200000000 CTL0: 0x00800f04 BUS: 50000000 Hz actual: 50000000 HZ div: 4 (2) status: 0x1fff0000 delay: 2
MBR: 0x00002000, 81920 type: 0x0c
MBR: 0x00016000, 655360 type: 0x83
MBR: 0x000b6000, 655360 type: 0x83
MBR: 0x00156000, 458752 type: 0x0f
Trying partition: 0
type: 32 lba: 8192 oem: ‘mkfs.fat’ volume: ’ resin-boot ’
rsc 32 fat-sectors 630 c-count 80628 c-size 1
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 80628
Trying partition: 0
type: 32 lba: 8192 oem: ‘mkfs.fat’ volume: ’ resin-boot ’
rsc 32 fat-sectors 630 c-count 80628 c-size 1
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 80628
Read config.txt bytes 36287 hnd 0x156
Read start4cd.elf bytes 800028 hnd 0x1c22
Read fixup4cd.dat bytes 3145 hnd 0x1ae
0x00c03141 0x00000000 0x00000fff
MEM GPU: 16 ARM: 998 TOTAL: 1014
Firmware: bd88f66f8952d34e4e0613a85c7a6d3da49e13e2 Jan 20 2022 13:57:04
Starting start4cd.elf @ 0xff000200 partition 0
+

When I start the system without NVMe drive, I see the exact same output, but the system continues and displays the BalenaOS logo. With NVMe drive the system still displays nothing. It looks like the boot order isn’t causing the problem.

Any idea?

@WestCoastDaz

Do you have any other suggestion?
I’m still strugling with this. I’ve tried several new CM4IO boards and CM4 and I have this problem with each of them. Raspberry PI OS boot out of the box on a CM4 with 8GB RAM and 32GB eMMC plus a 250GB NVMe drive. BalenaOS will not boot and does not post error messages. It just stops after “Starting start4cd.elf @ 0xff000200 partition 0”

Below you’ll find the output when Raspberry pi OS boots:
BOOTMODE: 0x06 partition 0 build-ts BUILD_TIMESTAMP=1673458852 serial 69af86a7 boardrev c03141 stc 476313
PM_RSTS: 0x00000020
part 00000000 reset_info 00000000
uSD voltage 3.3V
Initialising SDRAM ‘Samsung’ 16Gb x2 total-size: 32 Gbit 3200
DDR 3200 1 0 32 152

Boot mode: SD (01) order f2564
SD HOST: 200000000 CTL0: 0x00800000 BUS: 400000 Hz actual: 390625 HZ div: 512 (256) status: 0x1fff0000 delay: 276
SD HOST: 200000000 CTL0: 0x00800f00 BUS: 400000 Hz actual: 390625 HZ div: 512 (256) status: 0x1fff0000 delay: 276
EMMC
SD retry 1 oc 0
SD HOST: 200000000 CTL0: 0x00800000 BUS: 400000 Hz actual: 390625 HZ div: 512 (256) status: 0x1fff0000 delay: 276
OCR c0ff8080 [0]
CID: 00150100424a54443452034425ba4429
SD HOST: 200000000 CTL0: 0x00800f04 BUS: 25000000 Hz actual: 25000000 HZ div: 8 (4) status: 0x1fff0000 delay: 4
SD HOST: 200000000 CTL0: 0x00800f04 BUS: 50000000 Hz actual: 50000000 HZ div: 4 (2) status: 0x1fff0000 delay: 2
MBR: 0x00002000, 524288 type: 0x0c
MBR: 0x00082000,60538880 type: 0x83
MBR: 0x00000000, 0 type: 0x00
MBR: 0x00000000, 0 type: 0x00
Trying partition: 0
type: 32 lba: 8192 oem: ‘mkfs.fat’ volume: ’ boot ’
rsc 32 fat-sectors 1020 c-count 130554 c-size 4
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 130554
Trying partition: 0
type: 32 lba: 8192 oem: ‘mkfs.fat’ volume: ’ boot ’
rsc 32 fat-sectors 1020 c-count 130554 c-size 4
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 130554
Read config.txt bytes 2122 hnd 0xef
Read start4.elf bytes 2250848 hnd 0x6748
Read fixup4.dat bytes 5398 hnd 0x79fd
0x00c03141 0x00000000 0x00001fff
MEM GPU: 76 ARM: 948 TOTAL: 1024
Firmware: 8ba17717fbcedd4c3b6d4bce7e50c7af4155cba9 Jan 5 2023 10:46:54
Starting start4.elf @ 0xfec00200 partition 0
You see that there is a much more recent firmware version embedded in Raspberry PI OS.

Could that be the cause? That would mean a newer build of BalenaOS needs to be created using the latest firmware.
I’m out of inspiration and need this to work otherwise we need to switch to raspberry os instead of BalenaOS and miss out on all management features of BalenaOS.

Thank you

@vanmanda

I know there is a new version of CM4, see this post , that requires you to run v2.94.1 minimum. Are you testing with one of these CM4s? If not then you could try BalenaOS v2.88 as a test.

Also something else I just thought of how is the NVMe partitioned? Mine are GPT, ext4.

I’m not sure what is would causing this.

@WestCoastDaz

Thanks for your reply.

I tried BalenaOS v2.88, but then I get:
Raspberry Pi Compute Module 4 - 4GB
bootloader: 8ba17717 2023/01/11
update-ts: 1676586798

This board requires newer software
Get the latest software from Raspberry Pi OS – Raspberry Pi

Regarding the NVMe: I tried it with an unformatted drive, with an ext4 filesystem. No luck in both cases.

@vanmanda

I can confirm this also happens on the Rev1.1 version of the CM4. I thought I had tested this version with an NVMe but it turns out I was using Rev1.0 CM4s. If you have Rev1.0 CM4s I can confirm that booting from the eMCC with an NVMe installed works well.

Also this was the post I meant to send in the previous message.

I will send a message to Balena support.

Great find,

1 Like

@WestCoastDaz

Thanks, It is a relief to know, the problem is not … me :slight_smile:
As I need to get this working somehow, I used USBBOOT to change the bootorder and can confirm I can boot the CM4 Rev1.1 from NVMe. So the problem only occures when you try to boot from eMMC when an NVMe drive is installed.

Hopefully the people a Balena support can find a solution.

Hi all, just released OS version v2.112.12 since the device type was lagging behind a little bit with the releases.
On my side, I only have the 1.0 so cannot test the problem since I cannot reproduce it on the v1.0 (it boots fine).
Can you build the image from scratch and enable development mode then using a serial console to check what the actual failure is?

git clone --recursive git@github.com:balena-os/balena-raspberrypi.git

cd balena-raspberrypi && ./balena-yocto-scripts/build/barys -d -m raspberrypicm4-ioboard

@floion

I’ll get started on it and will keep you informed.

@vanmanda, just a quick note to let you know that we have run a number of tests with a CM4 v1.1 on a CM4 IO Board. We have replicated what you observed. To remedy, we have tried two additional balenaOS releases, but that has not yet resolved the issue. So we are continuing to work on it. We are also interested to you hear what you found in building the image from scratch.

In addition to that effort, we may also have a short-term workaround, but needs a bit more testing.

@rosswesleyporter thank you very much for taking this issue at heart.
My colleague had some trouble building the image from scratch. So we where not successful yet on that front, but we’ll try again shortly and keep you up to date.

In the meantime we implemented a workaround by booting from the NVMe drive directly. This allows us to continue our work on the software, but eventually we hope to build the software as intended. So we are very much looking forward to test a new BalenaOS release.

@vanmada, I’m glad to hear that you have a temporary workaround. I have a follow-up question about that. Could you please describe that changes that you made to get NVMe booting on a CM4 v1.1? You probably changed BOOT_ORDER in the booloader. But did you have to change anything else? If you had to change anything else, that may be a clue to resolving the general issue. Thanks.

@rosswesleyporter

Hi, I only changed these two values in the boot.conf:
BOOT_UART=1 → not relevant as this is only to enable the serial console
BOOT_ORDER=0xf25416 → as you guessed correctly

That’s it.

@vanmanda, that is the workaround that I was going to suggest. I tried that on Friday. I had a bootable eMMC on CM4 v1.1 and a bootable NVMe drive in place. When I put NVMe first in BOOT_ORDER, it tried to boot from NVMe but then switched and successfully booted from eMMC. It’s that curious switch that made me want to test it again before suggesting the workaround. It sounds like you got a sensible result - booting directly from NVMe. And I certainly hope that is the case. But if you wouldn’t mind double-checking when you get a moment, that would be helpful.

@rosswesleyporter, In order to make sure the system is not accidentally booting from eMMC, I initialized the eMMC-drive before changing the bootorder and before deploying the boot image on the NVMe drive . That’s why I’m sure the system is booting from NVMe, because there isn’t a file system on the eMMC drive (no resin-data).
Is that an answer to your question?

@vanmanda, thank you - that is good info. We’ll keep you posted. We are working on it, but I’m not yet sure when we will have more information for you.

Hello @vanmanda and @WestCoastDaz

This is a quick update. The working hypothesis is a U-Boot issue. We will keep you posted.

@vanmada, is the temporary workaround still working for you?