balenaOS 2.98.33 on raspberry pi 3b (32bit)

Hi,

We had some issues with 2.47 the last two weeks. Because of this we decided to upgrade to the latest version, which was a success. After two days of running, the device showed VPN only.

journalctl showed us ext4 corruption. We did the steps described here SD Corrupted - Can Access Terminal - Can anything be done? which we done successfully.

the journalctl log shows this:

Jul 25 10:04:03 neonlink balenad[10700]: time="2022-07-25T10:04:03.551074404Z" level=error msg="failed to load container" container=166502a7abc46afb161306c6c3541b43b1aa1be4e77203122e5b6e244838f683 error="open /var/lib/docker/containers/166502a7abc46afb161306c6c3541b43b1aa1be4e77203122e5b6e244838f683/config.v2.json: no such file or directory"
Jul 25 10:04:03 neonlink balenad[10700]: time="2022-07-25T10:04:03.556193760Z" level=error msg="failed to load container" container=31af5b098ff67e9635ec111c9ef967344885f4b1a145dc1938e4203958e09116 error="invalid character 'L' after object key"
Jul 25 10:04:04 neonlink balenad[10700]: time="2022-07-25T10:04:04.049928457Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Jul 25 10:04:04 neonlink balenad[10700]: time="2022-07-25T10:04:04.055346093Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Jul 25 10:04:04 neonlink balenad[10700]: time="2022-07-25T10:04:04.055579947Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Jul 25 10:04:04 neonlink extract-balena-ca[10749]: [extract-balena-ca][INFO] The config.json file does not contain custom CA
Jul 25 10:04:05 neonlink balenad[10700]: failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to get bridge network configurations from store: invalid character '\r' in string literal
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Main process exited, code=exited, status=1/FAILURE
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Failed with result 'exit-code'.
Jul 25 10:04:05 neonlink systemd[1]: Failed to start Balena Application Container Engine.
Jul 25 10:04:05 neonlink balena-supervisor[10767]: Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
Jul 25 10:04:05 neonlink balenad[10768]: Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 1706 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 1707 (balena-healthch) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 1708 (balena) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6534 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6615 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6692 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6761 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6830 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6901 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 6989 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7061 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7129 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7198 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7287 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7360 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7434 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7520 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7593 (exe) in control group while starting unit. Ignoring.
Jul 25 10:04:05 neonlink systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 25 10:04:05 neonlink systemd[1]: balena.service: Found left-over process 7661 (exe) in control group while starting unit. Ignoring.

diag shows this

--- prefixing commands with 'date --utc --rfc-3339=ns ; /usr/bin/time -o /dev/stdout timeout --preserve-status --kill-after=20 -v 10 bash -c' ---

Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?
Cannot connect to the balenaEngine daemon at unix:///var/run/balena-engine.sock. Is the balenaEngine daemon running?

I have a few questions:

  • Can I refresh the entire os somehow?
  • In the future. Is there a possibility that makes it possible to switch to an external drive as a backup?
  • Any more tips to prevent this?

I already granted access to you guys can take a look.

Thanks in advance.

Hi @neonlink,

For db backup, you can look at mounting external drive here: Communicate outside the container - Balena Documentation

Relating to balenaEngine issue, lets check the service status via systemctl status balena-engine. Also, please share the device UUID, so we can take a look.

Regards,
Nitish

3d1ea2930d3314992e73e63da3d348a1

systemctl status balena-engine
● balena.service - Balena Application Container Engine
     Loaded: loaded (/lib/systemd/system/balena.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/balena.service.d
             └─storagemigration.conf
     Active: activating (start) since Mon 2022-07-25 13:28:02 UTC; 1s ago
TriggeredBy: ● balena-engine.socket
       Docs: https://www.balena.io/docs/getting-started
   Main PID: 4977 (balenad)
      Tasks: 99 (limit: 2037)
     Memory: 83.2M
     CGroup: /system.slice/balena.service
             ├─ 473 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 472
             ├─ 564 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 562
             ├─ 635 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 634
             ├─ 704 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 703
lines 1-15...skipping...
● balena.service - Balena Application Container Engine
     Loaded: loaded (/lib/systemd/system/balena.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/balena.service.d
             └─storagemigration.conf
     Active: activating (start) since Mon 2022-07-25 13:28:02 UTC; 1s ago
TriggeredBy: ● balena-engine.socket
       Docs: https://www.balena.io/docs/getting-started
   Main PID: 4977 (balenad)
      Tasks: 99 (limit: 2037)
     Memory: 83.2M
     CGroup: /system.slice/balena.service
             ├─ 473 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 472
             ├─ 564 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 562
             ├─ 635 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 634
             ├─ 704 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 703
             ├─ 783 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 782
             ├─ 879 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 873
             ├─ 956 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 955
             ├─1027 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1026
             ├─1103 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1102
             ├─1199 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1193
             ├─1272 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1271
             ├─1341 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1340
             ├─1411 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1410
             ├─1499 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1498
             ├─1579 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1578
             ├─1659 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1658
             ├─1706 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1696
             ├─1707 /bin/sh /usr/lib/balena/balena-healthcheck
             ├─1708 balena image inspect balena-healthcheck-image
             ├─1745 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1744
             ├─1834 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1832
             ├─1912 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1911
             ├─1983 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1982
             ├─2052 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2051
             ├─2140 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2135
             ├─2210 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2208
             ├─2281 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2280
             ├─2353 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2351
             ├─2441 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2434
             ├─2515 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2511
             ├─2598 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2596
             ├─2678 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2677
             ├─2747 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2746
             ├─2818 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2817
             ├─2887 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2886
             ├─2974 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 2973
             ├─3043 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3042
             ├─3111 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3110
             ├─3180 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3179
             ├─3268 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3261
             ├─3341 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3340
             ├─3411 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3410
             ├─3479 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3478
             ├─3566 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3561
             ├─3637 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3636
             ├─3706 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3705
             ├─3776 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3775
             ├─3848 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3847
             ├─3938 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 3936
             ├─4008 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4007
             ├─4079 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4078
             ├─4150 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4148
             ├─4238 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4235
             ├─4310 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4309
             ├─4379 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4378
             ├─4450 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4449
             ├─4525 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4523
             ├─4611 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4610
             ├─4682 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4681
             ├─4751 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4750
             ├─4832 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4829
             ├─4910 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4909
             ├─4977 /usr/bin/balenad --experimental --log-driver=journald --storage-driver=overlay2 -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns=10.114.102.1 --bip=10.114.101.1/24 --fixed-ci>
             ├─4978 /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 4977
             ├─4979 /bin/sh /usr/lib/balena/balena-healthcheck
             ├─4985 balena image inspect balena-healthcheck-image
             └─4993 balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info

Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338329440Z" level=warning msg="Your kernel does not support CPU realtime scheduler"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338456523Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338522877Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338591679Z" level=warning msg="Your kernel does not support cgroup blkio throttle.read_bps_device"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338658085Z" level=warning msg="Your kernel does not support cgroup blkio throttle.write_bps_device"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338747043Z" level=warning msg="Your kernel does not support cgroup blkio throttle.read_iops_device"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.338825740Z" level=warning msg="Your kernel does not support cgroup blkio throttle.write_iops_device"
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.339985163Z" level=info msg="Loading containers: start."
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.341010732Z" level=error msg="failed to load container" container=166502a7abc46afb161306c6c3541b43b1aa1be4e77203122e5b6e244838f683 error="open /var/lib/docker/conta>
Jul 25 13:28:03 neonlink balenad[4977]: time="2022-07-25T13:28:03.346716336Z" level=error msg="failed to load container" container=31af5b098ff67e9635ec111c9ef967344885f4b1a145dc1938e4203958e09116 error="invalid character 'L' afte>

We reproduced these errors also in the 64-bit RasperryPi4 images of version 2.98.* immediately after flashing the images on 3 devices. As far as I can tell these BalenaOS images aren’t working correctly. We reverted to the raspberrypi4-64 v2.95.8 images, they work fine so far. I am guessing that the software versions in the v2.98 images are not somehow not fully compatible.

Hi Bernhard, it’s unlikely there is something basically wrong with the balenaOS 2.98 image series as they are actively being used across thousands of RaspberryPi devices in balenaCloud fleets.
Have you tried creating an empty fleet and downloading the images from there? Those will not run any application and will proof that the images are fine. You can then try pushing some example app to see how they are able to run applications. And then you may move them to your final fleet so we can debug the specific problem.
Messages of the type invalid character 'L' have been associated in the past with errors in compose file and/or corruption in downloaded images.

Martijn, your issue seems to have been caused by filesystem corruption. Even if you fix the errors the engine storage is usually left in an inconsistent state. If you share the UUID and set a long support access to the device we can take a look.