Recently started with balena using the most recent jetson-xavier-nx development image, and I’m having some issues. It’s hard to pin point the problem, but here are some symptoms:
System is Sluggish
Overall the system is very sluggish to perform any actions on it. For example balena push
or making any requests to the supervisor api., and sometimes outright fails.
Supervisor Errors
I repeatedly get this error from the supervisor:
[error] Non-empty stderr stream from journalctl log fetching: Considering root directory '/run/log/journal'.
[error] Root directory /run/log/journal added.
[error] Considering directory '/run/log/journal/72df14572ac545eda3caf972ff8d6baa'.
[error] Directory /run/log/journal/72df14572ac545eda3caf972ff8d6baa added.
[error] Journal effective settings seal=no compress=no compress_threshold_bytes=8B
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system.journal added.
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system@a79d91c044364210896f3ba1b03e3b3a-000000000007bfaa-0005bd4e05aa067c.journal added.
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system@a79d91c044364210896f3ba1b03e3b3a-000000000007b02c-0005bd4e05a0129b.journal added.
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system@a79d91c044364210896f3ba1b03e3b3a-0000000000079abe-0005bd4e0597f396.journal added.
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system@a79d91c044364210896f3ba1b03e3b3a-00000000000785b0-0005bd4e03388dd4.journal added.
[error] File /run/log/journal/72df14572ac545eda3caf972ff8d6baa/system@a79d91c044364210896f3ba1b03e3b3a-0000000000076fbd-0005bd4e031fb6d9.journal added.
There are probably 100+ repeated messages after a minute. In addition, after a while this message will also appear:
[error] Insufficient watch descriptors available. Reverting to -n.
When this happens, trying to grab the logs will return this error
$ balena logs da55d9637fae -f
error from daemon in stream: Error grabbing logs: error getting journald fd: Too many open files
In addition, this debug message will be outputed about 100 times in a row:
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[debug] Spawning journald with: chroot /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
And this one:
[error] Attempt to move to uninitialized object: 126232
[error] Failed to iterate through journal: Bad message
[error] Directory /run/log/journal/72df14572ac545eda3caf972ff8d6baa removed.
[error] Root directory /run/log/journal removed.
[error] mmap cache statistics: 28676 hit, 28 miss
[error]
Any ideas if these are related?
Long Response Time from Supervisor Api
Additionally, any calls to the supervisor api will take around 30 seconds to complete. Is this normal?
For example:
http://192.168.50.10:48484/ping
Took 30 seconds to respond.
Running in Local Mode balena push
sometimes fails
[Debug] Using build source directory: .
[Debug] Pushing to local device: fed25f4.local
[Debug] Checking we can access device
[Debug] Sending request to http://192.168.50.10:48484/ping
[Debug] Checking device supervisor version: 12.3.0
[Info] Starting build on device 192.168.50.10
[Debug] Loading project...
[Debug] Resolving project...
[Debug] docker-compose.yaml file found at "."
[Debug] Creating project...
[Debug] Tarring all non-ignored files...
[Debug] Sending request to http://192.168.50.10:48484/v2/local/device-info
ECONNRESET: read ECONNRESET
Error: read ECONNRESET
at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
For further help or support, visit:
https://www.balena.io/docs/reference/balena-cli/#support-faq-and-troubleshooting
Conslusion(?)
Overall, it seems the system is unstable. Any suggestions would be greatly appreciated.
Thanks,