Cannot push to Balena device that appears to be functional: Supervisor API ECONNREFUSED

Hey @klutchell - I tried it - and I got the same issue with the current version downloaded from balenaOS - Run Docker containers on embedded IoT devices

root@eea4356:/etc# cat os-release
ID="balena-os"
NAME="balenaOS"
VERSION="2.94.4"
VERSION_ID="2.94.4"
PRETTY_NAME="balenaOS 2.94.4"
MACHINE="raspberrypi4-64"
META_BALENA_VERSION="2.94.4"
BALENA_BOARD_REV="1dd915c"
META_BALENA_REV="de2352eb"
SLUG="raspberrypi4-64"

Its the same issue as usual, management port / ping is not reachable after first boot.
After a reboot of the device, everything works perfectly.
Sorry to disappoint here.

I am btw using an older RPi 4 Revision with 4 GB of RAM

Hardware        : BCM2835
Revision        : c03111
Serial          : x
Model           : Raspberry Pi 4 Model B Rev 1.1

Cheers

Short add:
I tried it again with my “flagship” RPi, latest revision with 8 GB RAM, same issue:

Hardware        : BCM2835
Revision        : d03114
Serial          : x
Model           : Raspberry Pi 4 Model B Rev 1.4

Also, after a reboot, I could successfully reach the unavailable ping and push/build a new fleet to it :slight_smile:

@nmaas87 I just tested it again with my Pi4 and I could not reproduce. Are you modifying the config.json at all before/after flashing? You shouldn’t need to and I wonder if the supervisor needs rebooting to apply a setting you have added.

Hardware        : BCM2835
Revision        : c03112
Serial          : 100000002ff1470d
Model           : Raspberry Pi 4 Model B Rev 1.2
ID="balena-os"
NAME="balenaOS"
VERSION="2.94.4"
VERSION_ID="2.94.4"
PRETTY_NAME="balenaOS 2.94.4"
MACHINE="raspberrypi4-64"
META_BALENA_VERSION="2.94.4"
BALENA_BOARD_REV="1dd915c"
META_BALENA_REV="de2352eb"
SLUG="raspberrypi4-64"
{
  "deviceType": "raspberrypi4-64",
  "developmentMode": "true",
  "uuid": "REDACTED"
}

Could you also collect journal logs from the first boot window (since they are cleared on a reboot)? Something like journalctl -u balena-supervisor -u os-config-json -a --no-pager run on the hostOS of the device should be enough.

I do not do any changes to any files. Just flash with balenaEtcher, set sd card into RPi, apply ethernet cable and power. I just had the case where it seemed to work initially, then erased and reflashed the sdcard again - and it did again not work on first try. Only after powercycle. I am sorry, I will not be able to help here anymore as I am working nightshifts now and I got no time. For me, I will mark it down as “let it boot and powercycle to get it working” and move on.

I used the latest version (seems they changed something on the supervisor which created a new image) for this nights testing.

Here are the promised logs, thanks :slight_smile:

root@092ba38:~# ls
root@092ba38:~# journalctl -u balena-supervisor -u os-config-json -a --no-pager
-- Journal begins at Fri 2021-08-06 14:40:59 UTC, ends at Fri 2021-08-06 14:41:37 UTC. --
Aug 06 14:41:04 092ba38 os-config-json[1511]: [os-config-json][INFO] New balena configuration
Aug 06 14:41:05 092ba38 os-config-json[1511]: [os-config-json][INFO] New balena-info configuration
Aug 06 14:41:08 092ba38 os-config-json[1511]: [os-config-json][INFO] New development-features configuration
Aug 06 14:41:08 092ba38 os-config-json[1785]: Failed to print table: Broken pipe
Aug 06 14:41:23 092ba38 balena-supervisor[1992]: Error response from daemon: No such container: resin_supervisor
Aug 06 14:41:24 092ba38 balena-supervisor[2006]: Error response from daemon: No such container: balena_supervisor
Aug 06 14:41:24 092ba38 balena-supervisor[2021]: active
Aug 06 14:41:25 092ba38 balena-supervisor[2053]: Error: No such object: balena_supervisor
Aug 06 14:41:26 092ba38 balena-supervisor[2102]: WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Supervisor v12.11.36 starting up...
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Setting host to discoverable
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [warn]    Invalid firewall mode: . Reverting to state: off
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Applying firewall mode: off
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Performing database cleanup for container log timestamps
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Starting systemd unit: avahi-daemon.service
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Starting systemd unit: avahi-daemon.socket
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Starting logging infrastructure
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Starting up in unmanaged mode, activating local mode
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Starting firewall
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [info]    Switching logging backend to LocalLogBackend
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [success] Firewall mode applied
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Starting api binder
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Storing engine snapshot in the database. Timestamp: 2021-08-06T14:41:31.840Z
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: (node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Unmanaged mode is set, skipping API client initialization
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [event]   Event: Supervisor start {}
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Connectivity check enabled: true
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   No API endpoint specified, skipping connectivity check
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Starting periodic check for IP addresses
Aug 06 14:41:31 092ba38 balena-supervisor[2102]: [debug]   Handling of local mode switch is completed
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [success] Initialised splash image backend
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [info]    Reporting initial state, supervisor version and API info
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [info]    Attempting to load any preloaded applications
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [debug]   Migrated existing apps.json
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [info]    Starting API server
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [info]    Supervisor API successfully started on port 48484
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [info]    Unmanaged mode is set, skipping API binder initialization
Aug 06 14:41:32 092ba38 balena-supervisor[2102]: [debug]   Spawning journald with: chroot  /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
root@092ba38:~#

{
  "deviceType": "raspberrypi4-64",
  "developmentMode": "true",
  "uuid": "xxxxx"
}

root@092ba38:/etc# cat os-release
ID="balena-os"
NAME="balenaOS"
VERSION="2.94.4"
VERSION_ID="2.94.4"
PRETTY_NAME="balenaOS 2.94.4"
MACHINE="raspberrypi4-64"
META_BALENA_VERSION="2.94.4"
BALENA_BOARD_REV="1dd915c"
META_BALENA_REV="de2352eb"
SLUG="raspberrypi4-64"

Thanks for the additional logs! I see that the supervisor is correctly identifying the device as unmanaged and enabling local mode:

[info]    Starting up in unmanaged mode, activating local mode

I will close this issue from our side as we are unable to reproduce. If you would like to revisit this in the future you can just reply to this thread and we can pick up where we left off!

1 Like

Hi @klutchell - I had to replace an RPi 4 in the field as the balenaOS instance died today (somehow the SanDisk sd card broke and the RPi was stuck in a bootloop). as I had not much time, I just created a new sd card with the latest production image BalenaOS 2.95.8 / RPi4-64 and did the changes I do in our company field:

config.json

{
“deviceType”: “raspberrypi4-64”,
“persistentLogging”: false,
“hostname”: “OUR NAME”,
“developmentMode”: true,
“country”: “DE”,
“dnsServers”: “OUR DNS”,
“ntpServers”: “OUR NTP”,
“os”: {
“sshKeys”: [
“OUR SSH KEY”
]
}
}

config.txt


enable_uart=1
dtparam=i2c_arm=on
dtparam=spi=on
disable_splash=1
dtparam=audio=on

gpu_mem=192
hdmi_group=2
hdmi_mode=82
display_hdmi_rotate=0
hdmi_force_hotplug=1
avoid_warnings=1

and I also add custom system-connections and splash screens.

I threw the sdcard into the RPi, booted it up and went to my pc to push the code, but, again the 48484 was not reachable from the outside. Before ssh’ing into it and rebooting it (which solved the issue, as usual), I grabbed those files:

balena logs balena_supervisor

root@xxxxx:~# balena logs balena_supervisor
[info]    Supervisor v12.11.38 starting up...
[info]    Setting host to discoverable
[warn]    Invalid firewall mode: . Reverting to state: off
[info]    Applying firewall mode: off
[debug]   Starting systemd unit: avahi-daemon.service
[debug]   Starting systemd unit: avahi-daemon.socket
[debug]   Starting logging infrastructure
[debug]   Performing database cleanup for container log timestamps
[info]    Starting up in unmanaged mode, activating local mode
[info]    Starting firewall
[info]    Switching logging backend to LocalLogBackend
[success] Firewall mode applied
[debug]   Starting api binder
[debug]   Storing engine snapshot in the database. Timestamp: 2022-04-12T05:27:41.075Z
(node:1) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
[debug]   Unmanaged mode is set, skipping API client initialization
[event]   Event: Supervisor start {}
[debug]   Connectivity check enabled: true
[debug]   No API endpoint specified, skipping connectivity check
[debug]   Starting periodic check for IP addresses
[debug]   Handling of local mode switch is completed
[info]    Reporting initial state, supervisor version and API info
[info]    Attempting to load any preloaded applications
[debug]   Migrated existing apps.json
[info]    Starting API server
[info]    Supervisor API successfully started on port 48484
[info]    Unmanaged mode is set, skipping API binder initialization
[debug]   Spawning journald with: chroot  /mnt/root journalctl -a --follow -o json _SYSTEMD_UNIT=balena.service
[api]     GET /v1/healthy 200 - 14.849 ms
[api]     GET /v1/healthy 200 - 4.498 ms
[debug]   Attempting container log timestamp flush...
[debug]   Container log timestamp flush complete

As well as the output of journalctl, attached to this message.
journal.txt (98.8 KB)

As said, the device works (as usual) after a reboot, so I have no way of testing any further right now.
But maybe the logs help.

Cheers,

Nico

Thanks for the update @nmaas87 and the additional logs!

I see that we are getting the expected message at supervisor start:

[info]    Starting up in unmanaged mode, activating local mode
[info]    Starting firewall
[info]    Switching logging backend to LocalLogBackend
[success] Firewall mode applied

However the port still wasn’t reachable until a reboot, as you say. I would like to try to reproduce it on my own device again, but I would like to collect more info about your process if that’s alright with you?

  1. Your custom config.json looks fine at first glance. How/when do you inject this config file? Do you use a balena CLI command?
  2. Your custom config.txt also looks fine at first glance. How/when do you inject these settings and files?
  3. Are you able to reproduce this issue with a development image from balenaOS - Run Docker containers on embedded IoT devices, with no modifications to config.json or config.txt?
  4. How are you verifying that port 48484 is unreachable? You mentioned it failed to curl the http://192.168.4.123:48484/ping endpoint, but did you scan for open ports? What are the results of lsof -iTCP -sTCP:LISTEN on the hostOS when in this state?

As you can tell I’m trying to narrow down why this is only happening in your environment, as I haven’t seen any other reports of this so far. Not to say there isn’t an issue or bug, but it’s very hard to find when I can’t reproduce it!

Thanks for your assistance on this!

Hey there :slight_smile:

1 + 2: Both are replaced before the first boot. So I write the Sdcard with etcher, re-insert it into the PC and replace the files before ever booting up the device with this card.

  1. Just tested it with the latest RPi 4 64 Development Image. Yes, its the same as reported below on point 4. Makes no difference if using a production image and changing the config files before boot - or using a development image without changing anything.

  2. Yes, a push does not work:

PS > balena push 192.168.4.5
Timeout submitting analytics event to balenaCloud/openBalena.
If you are using the balena CLI in an air-gapped environment with a filtered
internet connection, set the BALENARC_OFFLINE_MODE=1 environment variable
when using CLI commands that do not strictly require access to balenaCloud.
Retrying "Supervisor API (GET http://192.168.4.5:48484/ping)" after 2.0s (1 of 5) due to: Error: connect ETIMEDOUT 192.168.4.5:48484
Retrying "Supervisor API (GET http://192.168.4.5:48484/ping)" after 4.0s (2 of 5) due to: Error: connect ETIMEDOUT 192.168.4.5:48484
Retrying "Supervisor API (GET http://192.168.4.5:48484/ping)" after 4.0s (3 of 5) due to: Error: connect ETIMEDOUT 192.168.4.5:48484
Retrying "Supervisor API (GET http://192.168.4.5:48484/ping)" after 4.0s (4 of 5) due to: Error: connect ETIMEDOUT 192.168.4.5:48484
Retrying "Supervisor API (GET http://192.168.4.5:48484/ping)" after 4.0s (5 of 5) due to: Error: connect ETIMEDOUT 192.168.4.5:48484
Could not communicate with device supervisor at address 192.168.4.5:48484.
Device may not have local mode enabled. Check with:
  balena device local-mode <device-uuid>

lsof:

root@loragw:~# lsof -iTCP -sTCP:LISTEN
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd    1   root   70u  IPv6  14664      0t0  TCP *:22222 (LISTEN)
balenad 1745   root   10u  IPv6  14041      0t0  TCP *:2375 (LISTEN)
dnsmasq 2021 nobody    5u  IPv4  16412      0t0  TCP loragw:domain (LISTEN)
dnsmasq 2021 nobody    7u  IPv4  16414      0t0  TCP loragw:domain (LISTEN)
node    2368   root   21u  IPv6  17794      0t0  TCP *:48484 (LISTEN)

As you can see, the process shows it is only listening via IPv6. I tried to curl 48484/ping via IPv6, however I get connection denied or timedout, as if the port is firewalled. Same with IPv4.

C:\Users\Nico Maas>curl -g -6 "http://[ipv6address]:48484/ping"
curl: (28) Failed to connect to ipv6address port 48484 after 21043 ms: Timed out

after a reboot, it still shows IPv6 useage…

COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd    1   root   70u  IPv6  14800      0t0  TCP *:22222 (LISTEN)
balenad 1648   root   10u  IPv6  16035      0t0  TCP *:2375 (LISTEN)
dnsmasq 1878 nobody    5u  IPv4  12254      0t0  TCP loragw:domain (LISTEN)
dnsmasq 1878 nobody    7u  IPv4  12256      0t0  TCP loragw:domain (LISTEN)
node    2310   root   21u  IPv6  19792      0t0  TCP *:48484 (LISTEN)

however, this time it works

C:\Users\Nico Maas>curl -g -6 "http://[ipv6address]:48484/ping"
OK

C:\Users\Nico Maas>curl -g -4 "http://192.168.4.5:48484/ping"
OK

Also a reminder: I got a dual stack network, both going to WAN as well as in my LAN. So balenaOS gets both a IPv4 as well as an IPv6 address, however I tend to use the IPv4 only for deployment. In other locations I only got IPv4 (but the same issues).

Basically: Flash the RPi, boot it and reboot to make it work (or just pull the power after a first boot and repower)

Hey Nico, thank you for the extra digging! I was able to reproduce on my device and I’ve opened a GH issue on the supervisor for tracking:

You can follow this issue for further updates. I still have some more investigation to do, and I’m sure the supervisor team will weigh in as well. I appreciate your help with this so far!

1 Like

Thank you @klutchell - I am glad that I could be of service and you could dig up something.
I am now feeling less like a mad men, thanks for that :smiley:
While it is a minor inconvenience - I hope we can resolve that issue so that new customers are not falling into that.

Cheers!