Host OS Version Stuck

Hi guys, my specific BalenaOS device is operating as expected but the Status is stuck on ‘Reboot in progress’. Its been like this for a month. Please help.

Hey @07_Sam, I acknowledge you said the device is operating as expected, but can you remember what action did you perform before it got stuck on ‘Reboot in progress’? Was it literally a reboot initiated from the dashboard, or something else?

Hi Chris, I updated the Host OS version and when rebooting it got stuck in this loop. I am now unable to change the Host OS because it says the device is currently configuring.

image

Got it - is the host OS version displayed (5.1.8+rev1) the one you updated to, or is it the one you are trying to update from?

Either way, next steps would be to run the device diagnostics and health checks and share the results here, and we can troubleshoot further :slight_smile:

I am trying to update from 5.1.8. Here is the standard error under diagnostics:
— diagnose 4.23.0 —

— NOTE: not all commands are expected to succeed on all device types —

— COMMANDS —

— prefixing commands with ‘date --utc --rfc-3339=ns ; /usr/bin/time -o /dev/stdout timeout --preserve-status --kill-after=20 -v 10 bash -c’ —

— echo === BALENA === —

— curl --unix-socket /var/run/balena.sock http://./debug/pprof/goroutine?debug=2 —

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
100 99347 0 99347 0 0 7842k 0 --:–:-- --:–:-- --:–:-- 8084k

— balena --version —

— balena images —

— balena ps -a —

— balena stats --all --no-stream —

— balena system df —

— balena volume ls —

— balena network ls —

— systemctl status balena --no-pager —

— journalctl --no-pager --no-hostname -n 200 -a -u balena —

— journalctl --no-pager --no-hostname -n 1000 -at balenad —

— balena inspect $(balena ps --all --quiet | tr “\n” " ") | jq “del(..Config.Env)” —

— balena network inspect $(balena network ls --quiet | tr “\n” " ") —

— test -f /mnt/state/balena-engine-storage-migration.log && cat /mnt/state/balena-engine-storage-migration.log —

— echo === BOOT === —

— systemd-analyze —

— systemd-analyze critical-chain —

— echo === HARDWARE === —

— cat /proc/cpuinfo —

— cat /proc/device-tree/model —

cat: /proc/device-tree/model: No such file or directory

— cat /proc/meminfo —

— ps —

— top -b -n 1 —

— cat /var/log/provisioning-progress.log —

cat: /var/log/provisioning-progress.log: No such file or directory

— df -h —

— df -ih —

— for i in /sys/class/thermal/thermal* ; do if [ -e $i/temp ]; then echo $i && cat $i/temp; fi ; done —

— for i in /sys/class/mmc_host/mmc*/mmc* ; do if [ -e $i/oemid ]; then echo $i; for j in manfid oemid name hwrev fwrev; do printf $j: && cat $i/$j; done; fi; done —

— free -h —

— ls -l /dev —

— lsusb -vvv —

can’t get debug descriptor: Resource temporarily unavailable
can’t get debug descriptor: Resource temporarily unavailable
can’t get debug descriptor: Resource temporarily unavailable
can’t get device qualifier: Resource temporarily unavailable
can’t get debug descriptor: Resource temporarily unavailable

— mmcli -L —

— mount —

— uname -a —

— echo === NETWORK === —

— /sbin/ip addr —

— cat /etc/resolv.conf —

— cat /proc/net/dev —

— cat /proc/net/snmp —

— cat /proc/net/udp —

— CURL_CA_BUNDLE=/tmp/tmp.wTbAL9Qfxt curl https://api.balena-cloud.com/ping

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
0 0 0 0 0 0 0 0 --:–:-- 0:00:01 --:–:-- 0
0 0 0 0 0 0 0 0 --:–:-- 0:00:01 --:–:-- 0
100 2 0 2 0 0 0 0 --:–:-- 0:00:02 --:–:-- 0
100 2 0 2 0 0 0 0 --:–:-- 0:00:02 --:–:-- 0

— CURL_CA_BUNDLE=/tmp/tmp.wTbAL9Qfxt curl https://www.google.co.uk

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
0 0 0 0 0 0 0 0 --:–:-- 0:00:02 --:–:-- 0
100 169 0 169 0 0 56 0 --:–:-- 0:00:03 --:–:-- 56
100 20111 0 20111 0 0 6560 0 --:–:-- 0:00:03 --:–:-- 6561

— ifconfig —

— iptables -n -L —

— iptables -n -t nat -L —

— journalctl --no-pager --no-hostname -a -u ModemManager —

— journalctl --no-pager --no-hostname -n 200 -a -u “openvpn*” —

— ls -l /mnt/boot/system-connections —

— mmcli -m 0 —

error: couldn’t find modem

— netstat -ntl —

— nmcli --version —

— ping -c 1 -W 3 google.co.uk

— systemctl kill -s USR1 dnsmasq —

— systemctl status openvpn-resin --no-pager —

— echo === OS === —

— cat /etc/os-release —

— cat /mnt/boot/config.json | jq “. | with_entries(if .key | (contains("apiKey") or contains("deviceApiKey") or contains("pubnubSubscribeKey") or contains("pubnubPublishKey") or contains("mixpanelToken") or contains("wifiKey") or contains("files")) then .value = "" else . end)” —

— cat /mnt/boot/config.txt —

cat: /mnt/boot/config.txt: No such file or directory

— cat /mnt/boot/device-type.json —

— cat /mnt/boot/extlinux/extlinux.conf —

cat: /mnt/boot/extlinux/extlinux.conf: No such file or directory

— cat /mnt/boot/resinOS_uEnv.txt —

cat: /mnt/boot/resinOS_uEnv.txt: No such file or directory

— cat /mnt/boot/uEnv.txt —

cat: /mnt/boot/uEnv.txt: No such file or directory

— cat /mnt/conf/config.json | jq “. | with_entries(if .key | (contains("apiKey") or contains("deviceApiKey") or contains("pubnubSubscribeKey") or contains("pubnubPublishKey") or contains("mixpanelToken") or contains("wifiKey") or contains("files")) then .value = "" else . end)” —

cat: /mnt/conf/config.json: No such file or directory

— cat /mnt/data-disk/config.json | jq “. | with_entries(if .key | (contains("apiKey") or contains("deviceApiKey") or contains("pubnubSubscribeKey") or contains("pubnubPublishKey") or contains("mixpanelToken") or contains("wifiKey") or contains("files")) then .value = "" else . end)” —

cat: /mnt/data-disk/config.json: No such file or directory

— cat /var/log/messages —

cat: /var/log/messages: No such file or directory

— cat /var/log/provisioning-progress.log —

cat: /var/log/provisioning-progress.log: No such file or directory

— dmesg -T —

— find /mnt/data/*hup/*log -mtime -180 | xargs tail -n 250 -v —

— journalctl --no-pager --no-hostname --list-boots —

— journalctl --no-pager --no-hostname -n500 -a —

— journalctl --no-pager --no-hostname -pwarning -perr -a —

— ls -lR /proc/ 2>/dev/null | grep /data/ | grep (deleted) —

— ps —

— stat /var/lock/*hup.lock —

stat: cannot statx ‘/var/lock/*hup.lock’: No such file or directory

— sysctl -a —

— systemctl list-units --failed --no-pager —

— top -b -n 1 —

— grep -vE “/var/cache/ldconfig/aux-cache|md5sum|/etc/hostname|/etc/machine-id|/etc/balena-supervisor/supervisor.conf|/etc/resin-supervisor/supervisor.conf|/etc/systemd/timesyncd.conf|/home/root/.rnd” /resinos.fingerprint | md5sum --quiet -c —

grep: /resinos.fingerprint: No such file or directory
md5sum: ‘standard input’: no properly formatted checksum lines found

— echo === SUPERVISOR === —

— balena exec 7675c2ce018d cat /etc/resolv.conf —

— balena logs 7675c2ce018d —

— curl --max-time 5 localhost:48484/v1/healthy —

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
100 2 100 2 0 0 123 0 --:–:-- --:–:-- --:–:-- 133

— journalctl --no-pager --no-hostname -n 200 -a -u balena-supervisor -u resin-supervisor —

— ls -lR /tmp/-supervisor/**/

— systemctl status balena-supervisor resin-supervisor --no-pager —

— tail -500 /var/log/supervisor-log/resin_supervisor_stdout.log —

tail: cannot open ‘/var/log/supervisor-log/resin_supervisor_stdout.log’ for reading: No such file or directory

— echo === TIME === —

— cat /tmp/chrony_added_dhcp_ntp_servers —

cat: /tmp/chrony_added_dhcp_ntp_servers: No such file or directory

— chronyc sources —

timeout: sending signal TERM to command ‘bash’

— chronyc tracking —

— date —

— journalctl --no-pager --no-hostname -u chronyd —

— timedatectl status —

— uptime —

I also get lots of this is the standard output: — journalctl --no-pager --no-hostname -u chronyd —

2024-04-24 06:18:18.274826148+00:00
Apr 24 04:35:55 healthdog[405481]: try: 1, refid: 293CCF28, correction: 0.026366903, skew: 0.186
Apr 24 04:37:55 healthdog[405685]: try: 1, refid: 293CCF28, correction: 0.026340852, skew: 0.186
Apr 24 04:39:55 healthdog[405876]: try: 1, refid: 293CCF28, correction: 0.026314802, skew: 0.186
Apr 24 04:41:55 healthdog[406066]: try: 1, refid: 293CCF28, correction: 0.026288753, skew: 0.186
Apr 24 04:43:55 healthdog[406259]: try: 1, refid: 293CCF28, correction: 0.026262702, skew: 0.186
Apr 24 04:45:55 healthdog[406451]: try: 1, refid: 293CCF28, correction: 0.026236653, skew: 0.186
Apr 24 04:47:55 healthdog[406705]: try: 1, refid: 293CCF28, correction: 0.026210602, skew: 0.186

Apologies for the spam. Here’s another status log which looks to container the error:

— systemctl status balena-supervisor resin-supervisor --no-pager —

2024-04-24 06:18:06.272328378+00:00
● balena-supervisor.service - Balena supervisor
Loaded: loaded (/lib/systemd/system/balena-supervisor.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/balena-supervisor.service.d
└─balena-supervisor-conf.conf
Active: active (running) since Sun 2024-04-21 07:24:16 UTC; 2 days ago
Process: 2569 ExecStartPre=/usr/bin/balena stop resin_supervisor (code=exited, status=1/FAILURE)
Process: 2619 ExecStartPre=/usr/bin/balena stop balena_supervisor (code=exited, status=0/SUCCESS)
Process: 2626 ExecStartPre=/bin/systemctl is-active balena.service (code=exited, status=0/SUCCESS)
Process: 2627 ExecStartPre=/usr/sbin/gen-conf-unit balena-supervisor (code=exited, status=0/SUCCESS)
Main PID: 2645 (start-balena-su)
Tasks: 13 (limit: 4489)
Memory: 13.0M
CGroup: /system.slice/balena-supervisor.service
├─ 2645 /bin/sh /usr/bin/start-balena-supervisor
├─ 2646 /proc/self/exe --healthcheck /usr/lib/balena-supervisor/balena-supervisor-healthcheck --pid 2645
└─ 2715 balena start --attach balena_supervisor

Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Replacing container for service node-red because of config changes:
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Non-array fields: {“added”:{},“deleted”:{“entrypoint”:{},“environment”:{},“labels”:{},“healthcheck”:{“test”:{}}},“updated”:{“image”:“registry2.balena-cloud.com/v2/9c3e88d34046cd54c0009fb76cb24569@sha256:f6dcdb0f45b1e38d268c0c4c5a63f8d442204a9c892089cda14067c869a243c1",“environment”:{“BALENA_HOST_OS_VERSION”:"balenaOS 5.2.0”,“RESIN_HOST_OS_VERSION”:“balenaOS 5.2.0”},“workingDir”:“”,“user”:“”,“healthcheck”:{“test”:{“0”:“NONE”}}}}
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Array Fields: devices
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] Scheduling another update attempt in 900000ms due to failure: Error: Failed to apply state transition steps. Steps:[“noop”,“noop”,“fetch”]
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] at fn (/usr/src/app/dist/app.js:10:9867)
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] Device state apply error Error: Failed to apply state transition steps. Steps:[“noop”,“noop”,“fetch”]
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] at fn (/usr/src/app/dist/app.js:10:9867)
Apr 24 06:17:39 a39a7aa balena-supervisor[2715]: [api] GET /v1/device 200 - 43.116 ms
Apr 24 06:17:50 a39a7aa balena-supervisor[2715]: [api] GET /v1/healthy 200 - 2.943 ms
Apr 24 06:18:06 a39a7aa balena-supervisor[2715]: [api] GET /v1/healthy 200 - 3.885 ms

● balena-supervisor.service - Balena supervisor
Loaded: loaded (/lib/systemd/system/balena-supervisor.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/balena-supervisor.service.d
└─balena-supervisor-conf.conf
Active: active (running) since Sun 2024-04-21 07:24:16 UTC; 2 days ago
Process: 2569 ExecStartPre=/usr/bin/balena stop resin_supervisor (code=exited, status=1/FAILURE)
Process: 2619 ExecStartPre=/usr/bin/balena stop balena_supervisor (code=exited, status=0/SUCCESS)
Process: 2626 ExecStartPre=/bin/systemctl is-active balena.service (code=exited, status=0/SUCCESS)
Process: 2627 ExecStartPre=/usr/sbin/gen-conf-unit balena-supervisor (code=exited, status=0/SUCCESS)
Main PID: 2645 (start-balena-su)
Tasks: 13 (limit: 4489)
Memory: 13.0M
CGroup: /system.slice/balena-supervisor.service
├─ 2645 /bin/sh /usr/bin/start-balena-supervisor
├─ 2646 /proc/self/exe --healthcheck /usr/lib/balena-supervisor/balena-supervisor-healthcheck --pid 2645
└─ 2715 balena start --attach balena_supervisor

Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Replacing container for service node-red because of config changes:
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Non-array fields: {“added”:{},“deleted”:{“entrypoint”:{},“environment”:{},“labels”:{},“healthcheck”:{“test”:{}}},“updated”:{“image”:“registry2.balena-cloud.com/v2/9c3e88d34046cd54c0009fb76cb24569@sha256:f6dcdb0f45b1e38d268c0c4c5a63f8d442204a9c892089cda14067c869a243c1",“environment”:{“BALENA_HOST_OS_VERSION”:"balenaOS 5.2.0”,“RESIN_HOST_OS_VERSION”:“balenaOS 5.2.0”},“workingDir”:“”,“user”:“”,“healthcheck”:{“test”:{“0”:“NONE”}}}}
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [debug] Array Fields: devices
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] Scheduling another update attempt in 900000ms due to failure: Error: Failed to apply state transition steps. Steps:[“noop”,“noop”,“fetch”]
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] at fn (/usr/src/app/dist/app.js:10:9867)
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] Device state apply error Error: Failed to apply state transition steps. Steps:[“noop”,“noop”,“fetch”]
Apr 24 06:17:05 a39a7aa balena-supervisor[2715]: [error] at fn (/usr/src/app/dist/app.js:10:9867)
Apr 24 06:17:39 a39a7aa balena-supervisor[2715]: [api] GET /v1/device 200 - 43.116 ms
Apr 24 06:17:50 a39a7aa balena-supervisor[2715]: [api] GET /v1/healthy 200 - 2.943 ms
Apr 24 06:18:06 a39a7aa balena-supervisor[2715]: [api] GET /v1/healthy 200 - 3.885 ms
real 0m 0.03s
user 0m 0.00s
sys 0m 0.01s

— tail -500 /var/log/supervisor-log/resin_supervisor_stdout.log —

2024-04-24 06:18:06.325276947+00:00
Command exited with non-zero status 1

Okay it’s resolved. I believe I updates the supervisor version (not sure what it was) and that fixed it