After the upgrade to 2.41.0r3 the system has gone completely crackers and every minute the following happens - any idea? I don’t really know where to start on that one…
28.08.19 16:05:00 (+0200) wallboard Systemd init system enabled. 28.08.19 16:05:00 (+0200) wallboard systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid) 28.08.19 16:05:00 (+0200) wallboard Detected virtualization docker. 28.08.19 16:05:00 (+0200) wallboard Detected architecture arm. 28.08.19 16:05:00 (+0200) wallboard Set hostname to <b83d65ecdb5b>. 28.08.19 16:05:00 (+0200) wallboard Failed to bump fs.file-max, ignoring: Invalid argument 28.08.19 16:05:00 (+0200) wallboard Failed to attach 1 to compat systemd cgroup /docker/b83d65ecdb5bc27bd19d096c89942fb8b7376f30532ad899aaefac091d0a0803/init.scope: No such file or directory 28.08.19 16:05:00 (+0200) wallboard Failed to open pin file: No such file or directory 28.08.19 16:05:00 (+0200) wallboard Failed to allocate manager object: No such file or directory 28.08.19 16:05:00 (+0200) wallboard [!!!!!!] Failed to allocate manager object.
Hello, can you try modifying the entry.sh script so that systemd is not started in quiet mode? I see from your dockerfile you are copying your own one over the default one. Hopefully there should be a line similar to https://github.com/balena-io-library/base-images/blob/master/examples/INITSYSTEM/systemd/systemd.v230/entry.sh#L82 in there. Can you try removing SYSTEMD_LOG_LEVEL=info /sbin/init quiet systemd.show_status=0 from there so we can see all the systemd logs?
You have support access to 468439d25ffaf42298bb7caace3fb776 - it’s up and running at the moment but it probably won’t last long. I’ll update the thread if/when it does.
So far so good. Out of my five wallboards one is offline and I’m going to get it restarted shortly, and the other four are running happily. I’ll keep you posted…
@ajs1k I’ve tried to reproduce this issue locally using the Dockerfile you shared.
Unfortunately, I haven’t managed to reproduce it. (Had to trim down non-existent files etc)
I’ve tried to trace it by accessing the device as well. If I try to manually run the image like balena run --rm IMAGE, i was able to run it just fine.
root@468439d:~# balena run --privileged --rm -i -t 18f9cc788806
Systemd init system enabled.
systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization docker.
Detected architecture arm.
Welcome to Debian GNU/Linux 10 (buster)!
Set hostname to <f21409246a60>.
Failed to bump fs.file-max, ignoring: Invalid argument
File /lib/systemd/system/systemd-journald.service:12 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
[ OK ] Reached target Remote File Systems.
[ OK ] Set up automount Arbitrary Executable File Formats File System Automount Point.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Reached target Slices.
[ OK ] Listening on udev Kernel Socket.
Which led me to believe something else is fishy here.
I stopped the supervisor, deleted the previous container, started the supervisor and the same image started in a new container just fine.
Strange. I wish there was an easy way to reproduce the issue as I’ve seen it on two forum threads now…
Would it be ok if I tried to reboot the device in an attempt to make it go back into the bad state? (this hunch is according to the other forum thread where a ‘reboot’ made the device go into a bad state and a restart of the container fixed the issue.) Container reboot and "Failed to attach 1 to compat systemd cgroup"
Feel free to reboot/restart/whatever you need - the device is unusable for us at the moment anyway.
The problem also doesn’t occur straight away: it can take up to about a day before it throws a fit and gets into this weird state.
For the rest of our fleet I might go back to trying 2.36 and seeing if that brings us stability. The problem is that due to what we’re displaying on the wallboard it requires us to manually log in every time it restarts so it’s a fairly visible failure-mode.
I see the issue again. This time I managed to spot something in the logs
balena-engine crashed. Unfortunately, I didn’t see the initial part of the stack trace as the logs rotated. I’m going to try to reproduce and see if I can catch it in action.