Device offline, even if active on network, after updating to balenaOS 2.29.2+rev1

support
raspberrypi3
network

#1

I’ve updated one of my RPi3s to balenaOS 2.29.2+rev1.
The OS update completed successfully, but now the device reports the correct OS version along with Supervisor 7.4.3, while it should be 9.0.1. Also, the device is reported to be off-line, even if it is active on my network actually! That is, it seems to try to download new version of my app, as shown by the logs.
Is someone of you at Balena interested in giving it a glance? If not, I will flash the device again.
Regards,
Danilo


#6

Hi @daghemo, thank you for reporting this issue and giving us the chance to investigate further. I am writing an e-mail to the address you’ve registered in the forum account, so you can reply with the device UUID or dashboard URL for our support team to investigate the issue. Thanks!


#8

Hi @pdcastro, I’ve added my SSH key within the config.json in order to connect to port 22222, but all I get is a “software caused a connection abort” error. Also, I see no banner as long as I telnet to the port. Don’t know if this helps.


#9

Verbose logs do not help here: the device is closing the connection as soon as the port is opened and the SSH client send its banner. That is, the server itself does not show its own, nor any kind of error. It just closes the communication.


#11

Investigation on the device was continued through private conversation. I will leave our findings here as they might be useful to others:

The web dashboard reports the device offline, which almost certainly means the device failed to start the balena VPN. We also know that the host OS upgrade failed to upgrade the supervisor, and saw in the web dashboard error messages similar to:

Error cleaning up <hash>: (HTTP code 409) conflict - conflict: unable to delete (cannot be forced) - image has dependent child images - will ignore for 1 hour

This error message is connected with there being 2 supervisor images and the supervisor failing to delete one of them – an issue that we have recently addressed in the cloud backend and which should not reoccur.


#12

Related thread:


#13

Thank you all guys!


#14

Hi @pdcastro,

I am facing same issue. After the upgrade of the production device to the lates BalenaOS device is offline, but I am still receiving the logs:

13.02.19 16:57:45 (+0100) Service is already running 'main sha256:66dc3934e5db4b6680cfe021e34aac65eb8a6b189026b76703907d99d44a7899'
13.02.19 16:57:47 (+0100) Downloading image 'registry2.balena-cloud.com/v2/2f67f4c33a897aac449de01351ce6016@sha256:0742a565291f2dfd3ba9bf8df337114734210cf536eaa36c27818b124cd1f402'

This is repeating and the state of device do not change.


#16

@pavelbinar
Unfortunately when a device is not able to establish a VPN connection to the balena backend there are only very limited possibilities to still access it.
In any case you should try to make sure, that the device is able to connect to the VPN. Networking requirements are listed in https://www.balena.io/docs/reference/OS/network/2.x/#network-requirements.
For VPN connectivity to vpn.balena-cloud.com:443 is essential.

If you have manual access to the device you could try modifying the SD card by adding a public ssh key to the config.json located in the boot partition of the device. This will allow you to ssh into the device from the local network. How to do this is described in https://github.com/balena-os/meta-balena/blob/master/README.md#sshkeys
For development versions of the OS you can just ssh into the device using ssh root@ -p 22222

Regards
Thomas


#17

Hi, I also encountered the same issue here, my dashboard show my device offline. My console in the things network show my gateway is connected. (I’m using balena cloud to run my gateway to connect to the things network. ) The balenaOS is also 2.29.2+rev1.

Part of the log messages:
15.02.19 14:59:01 (+0800) main
15.02.19 14:59:01 (+0800) main ##### 2019-02-15 06:59:01 GMT #####
15.02.19 14:59:01 (+0800) main ### [UPSTREAM] ###
15.02.19 14:59:01 (+0800) main # RF packets received by concentrator: 1
15.02.19 14:59:01 (+0800) main # CRC_OK: 0.00%, CRC_FAIL: 100.00%, NO_CRC: 0.00%
15.02.19 14:59:01 (+0800) main # RF packets forwarded: 0 (0 bytes)
15.02.19 14:59:01 (+0800) main # PUSH_DATA datagrams sent: 0 (0 bytes)
15.02.19 14:59:01 (+0800) main # PUSH_DATA acknowledged: 0.00%
15.02.19 14:59:01 (+0800) main ### [DOWNSTREAM] ###
15.02.19 14:59:01 (+0800) main # PULL_DATA sent: 0 (0.00% acknowledged)
15.02.19 14:59:01 (+0800) main # PULL_RESP(onse) datagrams received: 0 (0 bytes)
15.02.19 14:59:01 (+0800) main # RF packets sent to concentrator: 0 (0 bytes)
15.02.19 14:59:01 (+0800) main # TX errors: 0
15.02.19 14:59:01 (+0800) main ### BEACON IS DISABLED!
15.02.19 14:59:01 (+0800) main ### [JIT] ###
15.02.19 14:59:01 (+0800) main # INFO: JIT queue contains 0 packets.
15.02.19 14:59:01 (+0800) main # INFO: JIT queue contains 0 beacons.
15.02.19 14:59:01 (+0800) main ### [GPS] ###
15.02.19 14:59:01 (+0800) main # No time keeping possible due to fake gps.
15.02.19 14:59:01 (+0800) main # Manual GPS coordinates: latitude 5.40738, longitude 100.32694, altitude 12 m
15.02.19 14:59:01 (+0800) main ### [PERFORMANCE] ###
15.02.19 14:59:01 (+0800) main # Upstream radio packet quality: 0.00%.
15.02.19 14:59:01 (+0800) main ### [ CONNECTIONS ] ###
15.02.19 14:59:01 (+0800) main # bridge.asia-se.thethings.network: Connected
15.02.19 14:59:01 (+0800) main # Semtech status report send.


#18

@pavelbinar, @KeanTattOng, if your device is running a development image of balenaOS, you may able to open a ssh terminal to the host OS with the command:
ssh -p 22222 root@<ipAddress>
where <ipAddress> is the device’s IP address, for example, 192.168.2.3. (If the web dashboard is not showing the device’s IP address, you may be able to look at the DHCP allocation table of your WiFi router status page, or running the command arp -an on your laptop/desktop.

Once you have opened a ssh terminal on the host OS, you can investigate the issue further by looking at logs with command such as:
journalctl -a
journalctl -u resin-supervisor
journalctl -u openvpn
balena-engine ps -a
balena-engine images
dmesg

And others. If the issue is indeed the same as originally reported in this conversation (“Error cleaning up <hash>: (HTTP code 409) conflict - conflict: unable to delete (cannot be forced)”), then you may attempt to fix the problem by running the commands kindly shared by imrehg in the following post:

If your device is running a production image:

  • If you have convenient physical access to it, it may be easiest to take SD card out and either reflash it, or add a ssh key to it as samothx pointed out in his reply above: https://github.com/balena-os/meta-balena/blob/master/README.md#sshkeys
  • Otherwise, if the device is on the same network as other balenaOS devices and at least one of those other devices is online and operational, we may be able to access the device that is reported offline via one of the online devices. In order to do this, use the web dashboard to grant support access to both devices (the one offline and the one online on the same network) and share the device UUIDs or device dashboard URLs. You may prefer to share the UUIDs or URLs through a private message.