OS update failes

Hi, trying to update a field device from 2.32 to 2.46. It repeatedly gets to show 50% on the progress bar and then fails after a long time period, and then restarts again. How can I get more information on what is going on? I have looked into /mnt/sysroot/inactive to see if files are being updated but not too clear. Also didn’t seem to glean much from journalctl -u balena.service.

What am I looking for to get a better view on progress and hence what may be failing?

Hi there,

The first thing I would do is run the checks in the device’s diagnostics tab. You may also want to look at our Device Debugging Masterclass. Try and eliminate anything simple like an out of storage or memory condition, etc.

Otherwise you can enable support access to the device and send us the device ID and we’ll take a look and see what we can find.

Cheers,
James.

Sorry, should have mentioned that storage is fine, as is memory as far as I can tell - didn’t see anything in dmesg or journalctl. Diagnostics resulted in following.

First item is interesting but not sure what exactly it means.

Hi,
You should check the latest OS upgrade logs - they should explain the problem.
After ssh’ing to the device, please check .log files in /mnt/data/resinhup/ directory.

Keep us posted about your findings, thanks.

BTW, Alternatively, you can go to the diagnostics page, device diagnostics tab and press “Run diagnostics” button. The dashboard will then display a report about the device state and will include the balenaOS upgrade logs.

There are four .log files, one for each update attempt. All have something like below:

================upgrade-2.x.sh HEADER START====================
Wed Apr 15 16:29:33 UTC 2020
[000000000][LOG]Normalized target version: 2.46.1+rev1
[000000003][LOG]Target version supports hostapps, no device type support check required.
[000000003][LOG]Loading info from config.json
[000000003][LOG]Loading info from device-type.json
[000000010][LOG]Device type check: OK
[000000010][LOG]Target OS version "2.46.1+rev1" OK.
[000000010][LOG]VARIANT_ID: prod
[000000010][LOG]Host OS version "2.44.0+rev1" OK.
[000000010][LOG]Checking for manifest of registry.hub.docker.com/resin/resinos:2.46.1_rev1-raspberry-pi
[000000016][LOG]Manifest found, good to go...
[000000016][LOG]No resin-device-progress fix is required...
[000000016][LOG]No supervisor updater fix is required...
[000000016][LOG]hostapp-update command exists, use that for update
[000000020][LOG]Running pre-update fixes for raspberry-pi
Removing start_db.elf from boot partition
Removing fixup_db.dat from boot partition
[000000030][LOG]Found potential leftover data, cleaning /mnt/sysroot/inactive/balena/aufs/
Warning: Stopping balena-host.service, but it can still be activated by:
  balena-host.socket
[000000031][LOG]Inactive partition usage after cleanup: 2.3M
[000000038][LOG]Starting hostapp-update
2.46.1_rev1-raspberry-pi: Pulling from resin/resinos
8fbcc818e699: Pulling fs layer
8fbcc818e699: Ready to download
failed to register layer: Error processing tar file(exit status 1): unexpected EOF
[000005954][ERROR]hostapp-update has failed...-

Hi there, this feels like either a bug in the upgrade scripts or a disk space/network issue on the device. To help us investigate this further, could you please grant support access to the device and share the device guid with us…

Support has been provided: device Id ea6ccbb98c7894c164141ae7c64c9b7f

Hi there, is this device connected to the internet by a Cellular connection, by chance? I have attempted to manually update via the command line, and am getting very slow download speeds. I let it run for about an hour and a half, and it was (very) slowly progressing, until it died at 41 megabytes:

2.46.1_rev1-raspberry-pi: Pulling from resin/resinos
8fbcc818e699: Extracting 100.8MB/100.8MB
Total: 41.32MB/100.8MB
failed to register layer: Error processing tar file(exit status 1): unexpected EOF

I have to wonder if the Cellular provider is timing out the connection and dropping it, resulting in the error. Is there anyway to get WiFi or Ethernet onto the device to try again? I would like to see if a faster, stable connection resolves the issue.

Otherwise, you may need to work with the Cellular provider to isolate and troubleshoot the connectivity, by having them verify if a timeout is occurring and they are dropping the connection.

Thanks for looking at it.

Yes, most of are devices are in remote areas with cellular connections. So it’s tough to do a 100MB uninterrupted upgrade for many of them. Would it be possible to support downloads that withstand service interruption (i.e. resume functionality)? Otherwise we’re severely compromised wrt ability to perform upgrades.

Note that even the delta upgrades are often too large for us to consider so we have written our own delta upgrade code for our images (since most of the time it’s really only our source code that is changing).

Hello!
Unfortunately, this is not possible at the moment. I have raised an issue internally and we will discuss this and get back to you if we do implement such a functionality.

In the meantime, can you please confirm with your cellular provider if they enforce some time of timeout or dropping connections, please?