Issues with CM3 images

Hi !

We created our own image and use Balena Etcher for flashing. We use CM3 modules and the CM3 IO board for this. RPI Boot is installed properly.

It works on fresh/clean CM3. It does NOT always work on CM3 modules with an (other) image on it.
Mostly works on second or third time. The image does not boot. Does Balena change settings in the tables?
Used versions 1.5.50 and 1.5.59

There are other users here reporting failed images on other devices. Perhaps this is the same effect.
We know that there are side effects with the FLASH and partitions with FAT/FAT32 (Raspberry page talking about RPIBOOT at the bottom, images not booting properly). We fixed this to be always FAT32 (table and dump of one partition is now FAT32), but this is not the reason of the failure. Rufus always works with our image.

We presently use Rufus instead for this reason.

There also is a difference in speed:
Use rpiboot to detect CM3 -> speed in Rufus only 5MB/s+
Use Balena etcher to detect CM3 -> speed in Rufus >15MB/+.

I would like to stick to balena, because balena shows 21MB/s, but image does not always work.

Also Eject after Successful does not work here (WIN7 / 10 64bit), because image is not mounted at all after flashing (FAT32 partition). Rufus instead opens this partion after flashing as a readable drive.

Best regards,
HaJo

Hello,

Does Balena change settings in the tables?

Etcher does not modify the image you provide, it just copies it to the destination device.

It does NOT always work on CM3 modules with an (other) image on it.

How does it “NOT always work” ? Is there an error ? Does the flash complete ?

There also is a difference in speed:

Rpiboot and etcher use the same method: they send a firmware to the device that turns it into an usb mass storage device (like an usb stick).
The only difference is the firmware they send. The one sent by etcher allows faster write speeds.

O.K.

  1. Etcher does not change the image, of course…
  2. No error message. Even no verify error. The CM3 does not boot.
  3. The speed of Etcher is very fast, this is great.

I expected to see the image mounted after flashing, because EJECT is disabled. This does not work with good CM3.

Are you sure your image is bootable?
Maybe it works in rufus because rufus makes it bootable?

Hi!

Yes, it is bootable. It is created as a bootable image.

Tested with dd (Linux) and other windows tools (Win32DiskImager).

It nearly always works with Etcher on new CM3 out of the box. But it fails with CM3 where data is present or other partitions are present. Rufus seems to delete partitions first as one can see in the status line. Perhaps this makes the difference? That would be like you call it making it bootable by default.

Could you please share the disk image?

No. This is not possible…
Perhaps I can share parts of it (partition tables or other required information).

Hi,

Thanks, sharing as much detail as possible for the partition here would be useful. Unfortunately without an image artefact, it becomes very difficult for us to determine why your custom image doesn’t work.

Best regards,

Heds

O.K.
Do you need a specific memory area (dump)?

Hi,

A dump of the partition map (say using fdisk would be useful) as well as as much other detail and information you can share, please.

Best regards,

Heds

Hello,
We would need a disk image that can reproduces this behaviour.

What do you mean by “But it fails with CM3 where data is present or other partitions are present.” ?
Do the write and verification succeed?
If yes, does it make the compute module boot?
If yes, can you describe what you see on the screen?

If no, is the first partition on the disk bootable? (check with fdisk).
What are the differences between a bootable CM3 and a non bootable CM3 no the disk?

We are able to reproduce the problem and it is always in the beginning of the CM3 flash (by comparison).
Its always the first 1k area, which is filled with 0x00 in a “damaged” CM3.
fdisk shows nothing in this case, because it is unable to find the 4 partitions.

I will upload the first part of the good image below. I am not familiar with the structure of images. But after diff, it shows that these entries (bytes starting at offset 0x0000 and 0x001B…- not 0x00) are always “missing” in the flash. It all reads 0x00 instead of this:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000 FA B8 00 10 8E D0 BC 00 B0 B8 00 00 8E D8 8E C0 ú¸…ŽÐ¼.°¸…ŽØŽÀ
00000010 FB BE 00 7C BF 00 06 B9 00 02 F3 A4 EA 21 06 00 û¾.|¿…¹…ó¤ê!..
00000020 00 BE BE 07 38 04 75 0B 83 C6 10 81 FE FE 07 75 .¾¾.8.u.ƒÆ…þþ.u
00000030 F3 EB 16 B4 02 B0 01 BB 00 7C B2 80 8A 74 01 8B óë.´.°.».|²€Št.‹
00000040 4C 02 CD 13 EA 00 7C 00 00 EB FE 00 00 00 00 00 L.Í.ê.|…ëþ…
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000001A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000001B0 00 00 00 00 00 00 00 00 98 65 96 1C 00 00 80 00 …˜e–…€.
000001C0 01 C0 0C 03 E0 3F 00 60 00 00 00 40 01 00 00 00 .À…à?....@.... 000001D0 C1 40 83 03 E0 FF 00 A0 01 00 00 C0 30 00 00 03 Á@ƒ.àÿ. ...À0... 000001E0 E0 FF 83 03 E0 FF 00 60 32 00 00 C0 30 00 00 03 àÿƒ.àÿ.2…À0…
000001F0 E0 FF 83 03 E0 FF 00 20 63 00 00 00 10 00 55 AA àÿƒ.àÿ. c…Uª
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
000002F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …

Is this the partition table?
There are a lot of zeroes after that (which should be zero) but it seems that the first block is NOT WRITTEN correctly. I do not know if Etcher simply starts with offset zero or finalizes something at the end of the process. We can reproduce this with empty CM3 and with other CM3 with former working images written by other tools. Sometimes Etcher sets this area to 0x00 and the image does not boot. Perhaps always the first 512 or 1024 bytes. The rest is O.K. And perhaps this could also be a problem of other guys with non-working images on USB sticks or…
We do not use the verify option presently, to speed up the process. But we tested this with activated verify and also found at least one CM3 with damaged start area WITHOUT verify error!
I had no time to do further testing, but I expected a failure message with activated verify already at 1% in this case. Does Etcher report verify error only after 100% read? This needs too much time.
Does “Eject after success” have any influence on the last bytes written?
Perhaps the first block should be written with a delay or something else or deleted and rewritten.

Tomorrow I will send the fdisk output (4 partitions).

Best regards,
HaJo

Hi !

My complete answer is stuck in your filter…

ALL “damaged” images show 0x00 in the first bytes (at least 512) in CM3 flash. Fdisk is not able to resolve the 4 partition entries in this case and it is not bootable. The rest of the data is o.k. It did not matter if CM3 was “clean” or already flashed before, but already flashed CM3 show this failure more often than “fresh” CM3. In our image there is some data starting in offset 0x0000 and 0x01b0.
I will send the fdisk output of a working image.
We tested on different machines, Win7/10 32/64bit with 1.5.50 and 59.

Presently we test again with verify activated, expecting failure at 1%. We found one CM3 with this failure but without verify error. Is there a write condition after finishing, killing the first block?

Best regards,
HaJo

Disk /dev/sda: 3,7 GiB, 3909091328 bytes, 7634944 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x1c966598

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 24576 106495 81920 40M c W95 FAT32 (LBA)
/dev/sda2 106496 3301375 3194880 1,5G 83 Linux
/dev/sda3 3301376 6496255 3194880 1,5G 83 Linux
/dev/sda4 6496256 7544831 1048576 512M 83 Linux

Diff:
1c1,13
< 000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

000000 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0
000010 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00
000020 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75
000030 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b
000040 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00
000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0001b0 00 00 00 00 00 00 00 00 98 65 96 1c 00 00 80 00
0001c0 01 c0 0c 03 e0 3f 00 60 00 00 00 40 01 00 00 00
0001d0 c1 40 83 03 e0 ff 00 a0 01 00 00 c0 30 00 00 03
0001e0 e0 ff 83 03 e0 ff 00 60 32 00 00 c0 30 00 00 03
0001f0 e0 ff 83 03 e0 ff 00 20 63 00 00 00 10 00 55 aa
000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
31365723c31365735
< e9000000


e6400000

Hey there!

Sorry for the delay. We are digging deep into this issue, as we believe it might be hitting other users as well. Our current hypothesis is the following:

Etcher flashes images by zeroing out the partition tables, omitting the first block, writing the rest, and then coming back to the first block at the end. This is a workaround to the fact that we can’t easily get exclusive access over drives on Windows (see https://www.balena.io/blog/the-perils-of-writing-disk-images-on-windows/ for more details).

What we believe is happening is that for some reason, for your image or your CM3 module, we are failing to write the first block. We are going through potential reasons for this, but in the mean-time, can you share the exact byte size of the image you are trying to flash? We believe that some of our block size arithmetic might be going off for that image in particular

Hi !

The size is 3.862.953.984.

I am not 100% sure, but I think using the verify option keeps this block in good shape. We did not use verify and had a lot damaged images. We used verify (and stopped it after 1%) and had no fails.
Perhaps the one with verify we found had another issue.

I do not know if eject (without verify)) also has an effect of the last block. Perhaps a caching problem (last data not written before …).

Best regards,
HaJo

Hey there,

Thanks for the detailed info. You have a good point about ejecting without verification and caching. We’ll try to reproduce that case ourselves and see what we find.

Hi !

Is it possible to get the alternative Raspi startup code - which allows faster speeds?
The RPIBoot default is slow (in Windows environment) and yours are a lot faster. Is it possible to use this via command line? Is it provided in the installation somewhere?

Our image is finished in 3:42min by Etcher (without verify). With RPIBoot before starting Etcher it takes 12min+!

Hey there! The custom firmware we sent in Etcher is open source, and you can find it here: https://github.com/balena-io-modules/node-raspberrypi-usbboot/tree/master/blobs. I believe you can point the original rpiboot tool to a custom firmware directory (check the options the support), but we can’t ensure this will work out of the box!