balenaOS container & socket issues


We’ve been working towards a production version of our app. Today was the first day we’ve flashed our first couple of devices and all seemed well. Booted them to register to our openBalena service and a provisioning sequence of our own registered the device to our app. A few hours later, I’ve grabbed 1 device out of the box and started it. Nothing interesting happened. So I SSH’ed into the device to check the logs, and 1 file that was supposed to be filled, wasn’t filled with any data. We print stickers according to that data and create an Access Point, so the file contained this data after the first boot, when it provisioned itself to our app. Weird, but okay, added the info to the file manually and checked if some more problems occurred.

Checked the logs of our container, all seemed fine. Checked the logs of Balena supervisor, and that wasn’t fine at all. Few seconds later, balena ps showed that our container had stopped, but it hasn’t stopped at all (it was still running somewhere, because I could reach webserver of that container). Some time later, the supervisor started the container again, but it failed because the port was busy (because the ‘unvisible’ container was still running).

So I kept checking the logs, and I got the following error messages (the balena-engine.sock ones, the TLS errors are because of our firewall):

The device was behaving really strange.
Device information:
BalenaOS 2.48.0+rev1 (latest because of openBalena)
Raspberry Pi 4 - 2GB
Supervisor v10.8.0

I’ve googled this problem, and I saw that a faulty SD card would be the problem. However, this SD card is brand new and is a robust SD card (SanDisk High Endurance - 32GB). So it’d be really weird if the SD card would be corrupt after it’s first flash, which is supposed to be robust. But it can always occur of course.

So my questions:

  • What can cause these problems?
  • If it’s the SD card that’s faulty, how can I prevent this in the future (by checking something?) and how can I be absolutely sure that this SD card is faulty (by running some tests on my Mac or on the Pi itself)?

Its a big problem for us to see this happening on our first production batch of our app. So we’d like to know as much as possible about this problem and how to prevent it.

Thanks in advance.

Hi Bart, thank you for reaching out!
Could you please send us the output of the following three commands? hat will help us get a better picture of what is going on.

systemctl status balena-engine.service

systemctl status balena-engine.sock

journalctl --no-pager -eu balena-engine

thank you!
best regards

Hi Juan,

I’ve taken the device from our client’s office to our office, so it’s not in that state anymore.
I’ve booted the device in our office, and I’m happy to give you the output of those commands, but I don’t know if it’s helpful…

systemctl status balena-engine.service

systemctl status balena-engine.sock

Unit balena-engine.sock.service could not be found.

journalctl --no-pager -eu balena-engine

-- Logs begin at Fri 2020-01-31 09:20:36 UTC, end at Thu 2020-07-23 12:44:57 UTC. --
-- No entries --

It looks like it’s not really helpful :sweat_smile:

Hi Bart,

hmm yeah there’s nothing much to say at this point. Please let us know when it goes back into that state. One small correction: it should be systemctl status balena-engine.socket

Isn’t there anything to tell what could’ve caused this behaviour?
I’ve never seen it before, but we’re looking for answers, because we’ve flashed multiple cards and booted them and delivered them to our client.

Without more logs I can’t really tell where the error could have come from. I doubt it was SD card corruption though.
I have some suspicions regarding the balena-engine.sock error, but that could go into a few directions…

If you reach out in this thread we can have a look once you can reproduce it