We’ve been working towards a production version of our app. Today was the first day we’ve flashed our first couple of devices and all seemed well. Booted them to register to our openBalena service and a provisioning sequence of our own registered the device to our app. A few hours later, I’ve grabbed 1 device out of the box and started it. Nothing interesting happened. So I SSH’ed into the device to check the logs, and 1 file that was supposed to be filled, wasn’t filled with any data. We print stickers according to that data and create an Access Point, so the file contained this data after the first boot, when it provisioned itself to our app. Weird, but okay, added the info to the file manually and checked if some more problems occurred.
Checked the logs of our container, all seemed fine. Checked the logs of Balena supervisor, and that wasn’t fine at all. Few seconds later,
balena ps showed that our container had stopped, but it hasn’t stopped at all (it was still running somewhere, because I could reach webserver of that container). Some time later, the supervisor started the container again, but it failed because the port was busy (because the ‘unvisible’ container was still running).
So I kept checking the logs, and I got the following error messages (the
balena-engine.sock ones, the TLS errors are because of our firewall):
The device was behaving really strange.
BalenaOS 2.48.0+rev1 (latest because of openBalena)
Raspberry Pi 4 - 2GB
I’ve googled this problem, and I saw that a faulty SD card would be the problem. However, this SD card is brand new and is a robust SD card (SanDisk High Endurance - 32GB). So it’d be really weird if the SD card would be corrupt after it’s first flash, which is supposed to be robust. But it can always occur of course.
So my questions:
- What can cause these problems?
- If it’s the SD card that’s faulty, how can I prevent this in the future (by checking something?) and how can I be absolutely sure that this SD card is faulty (by running some tests on my Mac or on the Pi itself)?
Its a big problem for us to see this happening on our first production batch of our app. So we’d like to know as much as possible about this problem and how to prevent it.
Thanks in advance.