memory issues with supervisor issues

Hi,

I’m testing balenaOS 2.60.1+rev1. Currently I have some memory issues that results in my container not running. I have the idea that this unfluences ModemManager too. I think this issue existed in my project befor 2.60 but is now visible for us. I’m stuck in analysing what the exact problem is now. I’m thinking that it maybe an memory leak. The container cannot start because of the following error:

     Service exited 'neonlink_build sha256:9fa86c43551f19cfb5a56b05f207ac722bb0d27f49e6d8fe9766f1435c4f0f19'
Restarting service 'neonlink_build sha256:9fa86c43551f19cfb5a56b05f207ac722bb0d27f49e6d8fe9766f1435c4f0f19'
 neonlink_build  ./runall-dist.sh: line 26: 31126 Segmentation fault      (core dumped) ps "$PID"
 neonlink_build       31127                       (core dumped) | grep "$PID" > /dev/null
 neonlink_build  22 stopped

This means that it access memory that it doesn’t have access to. This problem came up when I executed Diagnostics on the device. Before that it ran fine, but still had an high memory usage: > 800 of 924 mb.

When closing and starting the container, I would expect that the memory usage would drop to 400/500 MB, but it’s stil 787 MB.
When executing top on host I get this as the result.

    1388     1 root     S     976m 105%   0% /usr/bin/balenad --experimental --log-driver=journald -s aufs -H fd:// -H unix:///var/run/balena.sock -H unix:///var/run/balena-engine.sock --dns 10.114.102.1 --bip 10.114.101.1/24 --fixed-cidr=10.114.101.0/25 --max-download-attempts=10 --exec-opt native.cgroupdriver=systemd
 1444  1388 root     S     975m 105%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/containerd.toml --log-level info
18067  1444 root     S     910m  98%   0% balena-engine-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/92c1c6a696d33b895eff4607a8bfc7adce5e9ca8305febb177c3962c112dd422 -address /var/run/balena-engine/containerd/balena-engine-containerd.sock -containerd-binary /u
18027 17975 root     S     902m  97%   0% balena run --privileged --name resin_supervisor --restart=always --net=host --cidenv=SUPERVISOR_CONTAINER_ID --mount type=bind,source=/var/run/balena-engine.sock,target=/var/run/balena-engine.sock --mount type=bind,source=/mnt/boot/config.json,target=/boot/config.json --mount type=bi
18086 18067 root     S     140m  15%   0% node /usr/src/app/dist/app.js
 1376     1 root     S    67464   7%   0% /usr/sbin/NetworkManager --no-daemon
 1278     1 root     S    51516   5%   0% /usr/sbin/ModemManager --log-journal
 1260     1 root     S    39640   4%   0% /usr/sbin/rngd -f -r /dev/hwrng
  872     1 root     S    26204   3%   0% /lib/systemd/systemd-journald
 1358     1 root     S    25980   3%   0% /usr/libexec/qmi-proxy
    1     0 root     S    25488   3%   0% {systemd} /sbin/init

 1270     1 root     S    12092   1%   0% /usr/sbin/chronyd -d
 1265     1 root     S     8956   1%   1% @sbin/plymouthd --tty=tty1 --mode=boot --pid-file=/run/plymouth/pid --attach-to-session --kernel-command-line=plymouth.ignore-serial-consoles splash
 1439     1 root     S     8492   1%   0% /usr/sbin/wpa_supplicant -u
12587     1 openvpn  S     5432   1%   0% /usr/sbin/openvpn --writepid /run/openvpn/openvpn.pid --cd /etc/openvpn/ --config /etc/openvpn/openvpn.conf --connect-retry 5 120
 1313     1 root     S     5120   1%   0% /lib/systemd/systemd-logind
 1432     1 root     S     5056   1%   0% /usr/libexec/bluetooth/bluetoothd --experimental
28884     1 root     S     4536   0%   0% sshd: root@notty
28722     1 root     S     4420   0%   0% sshd: root@pts/1
  904     1 root     S     4316   0%   0% /lib/systemd/systemd-udevd
 1366     1 avahi    S     4076   0%   0% avahi-daemon: running [fea1261.local]
 1367  1366 avahi    S     3688   0%   0% avahi-daemon: chroot helper
 1305     1 messageb S     3656   0%   0% /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
17977 17975 root     S     3468   0%   0% /proc/self/exe --healthcheck /usr/lib/resin-supervisor/resin-supervisor-healthcheck --pid 17975
 1391  1388 root     S     3468   0%   0% /proc/self/exe --healthcheck /usr/lib/balena/balena-healthcheck --pid 1388
32373 29128 root     S     3040   0%   0% nslookup api.balena-cloud.com 62.140.140.251
 1386     1 nobody   S     2948   0%   0% /usr/bin/dnsmasq -x /run/dnsmasq.pid -a 127.0.0.2,10.114.102.1 -7 /etc/dnsmasq.d/ -r /etc/resolv.dnsmasq -z --servers-file=/run/dnsmasq.servers -k --log-facility=-
28971 28969 root     S     2896   0%   0% jq -s add | {checks:.}
28895 28884 root     S     2624   0%   0% bash -s -- --balenaos-registry registry2.balena-cloud.com
29128 29127 root     S     2624   0%   0% bash -s -- --balenaos-registry registry2.balena-cloud.com
28969 28895 root     S     2624   0%   0% bash -s -- --balenaos-registry registry2.balena-cloud.com
28970 28969 root     S     2624   0%   0% bash -s -- --balenaos-registry registry2.balena-cloud.com
29127 28970 root     S     2624   0%   0% bash -s -- --balenaos-registry registry2.balena-cloud.com
17975     1 root     S     2448   0%   0% {start-resin-sup} /bin/sh /usr/bin/start-resin-supervisor
28732 28722 root     S     2448   0%   0% /bin/bash -l
28752 28732 root     R     2364   0%   0% top
 1428     1 root     S     1464   0%   0% /usr/bin/hciattach /dev/serial1 bcm43xx 460800 noflow - b8:27:eb:82:67:22
24139     2 root     IW       0   0%   0% [kworker/u8:3-br]
27876     2 root     IW       0   0%   0% [kworker/2:1-eve]

when executing journalctl -u resin-supervisor --no-pager

Jan 04 14:08:40 fea1261 systemd[1]: resin-supervisor.service: Failed to run 'start-pre' task: Bad message
Jan 04 14:08:40 fea1261 systemd[1]: resin-supervisor.service: Failed with result 'resources'.
Jan 04 14:08:40 fea1261 systemd[1]: Failed to start Balena supervisor.
Jan 04 14:08:51 fea1261 systemd[1]: resin-supervisor.service: Failed to load environment files: Bad message
Jan 04 14:08:51 fea1261 systemd[1]: resin-supervisor.service: Failed to run 'start-pre' task: Bad message
Jan 04 14:08:51 fea1261 systemd[1]: resin-supervisor.service: Failed with result 'resources'.
Jan 04 14:08:51 fea1261 systemd[1]: Failed to start Balena supervisor.

Here is a global journal: https://pastebin.com/Tz2Q83HT

update:

I’ve noticed that there is a memory increase from ~500 mb to 747 MB usage after executing diagnostics. When I execute diagnostics for a second time, wwan0 disappears. Here’s the log
http://ix.io/2KPA

Hi @Martijn I see a lot of errors related to failed attempts to interract with the balena-engine socket. What is the status of the balena-engine service? is it running? do commands from the host such as balena ps succeed in that scenario?

I think the segmentation fault is caused by a faulty SD card. But still I see high memory usage after a reboot of the service.

when I execute balena stats

CONTAINER ID        NAME                             CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
e9a924c7107b        resin_supervisor                 1.01%               41.67MiB / 924MiB   4.51%               0B / 0B             32.8MB / 1.04MB     10
ec2c7a4fdbbd        neonlink_build_3125138_1649221   6.21%               79.96MiB / 924MiB   8.65%               0B / 0B             0B / 12.3kB         34

memory goes to 793 MB/924 MB and peaks to 900 MB for a short while. On boot the memory is 600MB

Hey there,

memory goes to 793 MB/924 MB and peaks to 900 MB for a short while

Can you please identify which processes are consuming the majority of the memory after a reboot?

Perhaps you can also try the newer 2.65.0+rev1 version of balenaOS? and let us know if you still see similar issues?

Also, can you please confirm if you flashed a new sd card with the version 2.60 or if you upgraded an existing device?

HI, yes just tried it. Memory is the lowest ever, around 300 mb :slight_smile: :slight_smile:

The memory usage stats are a bit off due the delay. This resulted in not fully understanding what the issue was. But the weird memory peaks disappeared