Can't update supervisor

Hi,
I am using [balenaOS 2.43.0+rev1].
It was updated Balena OS from 2.32.
But, supervisor version didn’t change.
When I see supervisor version in Balena console, it display 10.2.2. (may be right supervisor version to OS)
But, when I execute balena ps -a, I got the result below.

CONTAINER ID        IMAGE                               COMMAND             CREATED             STATUS                            PORTS               NAMES
336cf0648397        balena/armv7hf-supervisor:v9.14.0   "./entry.sh"        8 minutes ago       Up 7 minutes (health: starting)                       resin_supervisor

I tried change the SUPERVISOR_TAG value of file /etc/resin-supervisor/supervisor.conf, and deleted balena image and restart resin-supervisor, but it didn’t change.
And I got an error log below.

Oct 31 18:17:01 C7654321-00275FD resin-supervisor[17106]: Unhandled rejection Error: The migration directory is corrupt, the following files are missing: 20190619152500-engine-snapshot.js

Please tell me, why this happens.
If you can, check my device.

device UUID : 8846ee41360fd969e6c27138eb4c0617

Hi, what steps did you take to update the device?

Hey, so that error is thrown when the supervisor is downgraded to a version which doesn’t include migrations that were present in the post-downgrade version. I’m happy to upgrade your supervisor and sort this out, but I’m going to check the upgrade logs first to try to work out what exactly has happened.

Hey, from what I can tell from the logs, the device looks to have powered off after installing the OS, but before installing the new supervisor. This isn’t consistent with the error you’re receiving though, as the supervisor v10.2.2 was installed at some point, and then the device fell back to the previous supervisor version.

I’ve updated the supervisor on the device to v10.2.2 and things are looking good. By the way, changing the SUPERVISOR_TAG value doesn’t actually change anything on device, and is used internally only, so we would not recommend doing that.

Let me know if you run into any other issues!

@CameronDiver @robertgzr
Thanks for your help.
I found my balena supervisor version was updated rightly.

My steps was,

  1. If the balena OS is under 2.43, we update balena OS to 2.43 with balena sdk (balena.models.device.startOsUpdate).
  2. I checked the OS updated, but supervisor version was unchanged.
  3. I found the topic How to update the supervisor and executed update-resin-supervisor -t v<SUPERVISOR_VERSION> or modify /etc/resin-supervisor/supervisor.conf and systemctl restart resin-supervisor. It seemed to be updated supervisor on balena console.
  4. But, journalctl log keeped outputing error log, so I tried to remove balena images or containers, but unchanged.

I have to wait starting update supervisor version?
Does balena OS update process have issue yet?
Or, Is this rare case?

And how can I update supervisor version by myself?

Hey @sankyo.toshio so this is what we think what happened:

  • the device was already updated before from earlier versions, the latest one had supervisor 9.14.0
  • this time the update didn’t finish properly. The log is truncated in a way that we have never seen before, but it seems the OS was updated, while the supervisor was not
  • after reboot you went and manually edited the /etc/resin-supervisor/supervisor.conf , and restarted the supervisor service.
  • that pulled the new supervisor and started back up, it checked in the dashboard
  • the device has a timer that tries to run a supervisor update 15 minutes after boot, and then every 24h
  • because of the previous update, the backend still was set that the device should be running supervisor 9.14.0, so when the timer kicked in, the supervisor was downgraded.

… and that’s where your device was now, I believe.

So our comment would be:

  • never edit supervisor.conf manually, it pretty much never does what you want to do
  • updating the supervisor with update-resin-supervisor -t vX.Y.Z only works if the device was never updated before (thus the API doesn’t have a supervisor version stored for the device)
  • to get out of this state, would need to run something like this on your device, this time to fix things up. this works now, but not guaranteed to work in the future, if OS/supervisor update changes, please note :warning:

So, to fix up your device:

  1. remove the “new” supervisor version from the device, if it’s there, otherwise update-resin-supervisor won’t function properly (it’s something we wanna fix in the future)
balena rmi -f balena/armv7hf-supervisor:v10.2.2
  1. Update the API’s record for that device, to the right supervisor version, so e.g. for v10.2.2
TAG=v10.2.2

and then run this in the device’s host OS:

if [ -f "/mnt/boot/config.json" ]; then
  CONFIGJSON=/mnt/boot/config.json
elif [ -f "/mnt/conf/config.json" ]; then
  CONFIGJSON=/mnt/conf/config.json
fi
if [ -z "$TAG" ]; then
  echo "Please set TAG=vX.Y.Z supervisor version (e.g TAG=v6.3.5)"
elif [ -z "$CONFIGJSON" ]; then
  echo "Couldn't find config.json, cannot continue".
else
  APIKEY="$(jq -r '.apiKey // .deviceApiKey' "${CONFIGJSON}")"
  DEVICEID="$(jq -r '.deviceId' "${CONFIGJSON}")"
  API_ENDPOINT="$(jq -r '.apiEndpoint' "${CONFIGJSON}")"
  SLUG="$(jq -r '.deviceType' "${CONFIGJSON}")"
  SUPERVISOR_ID=$(curl -s "${API_ENDPOINT}/v3/supervisor_release?\$select=id,image_name&\$filter=((device_type%20eq%20'$SLUG')%20and%20(supervisor_version%20eq%20'$TAG'))&apikey=${APIKEY}" | jq -e -r '.d[0].id')
  echo "Extracted supervisor ID: $SUPERVISOR_ID"
  curl -s "${API_ENDPOINT}/v2/device($DEVICEID)?apikey=$APIKEY" -X PATCH -H 'Content-Type: application/json;charset=UTF-8' --data-binary "{\"supervisor_release\": \"$SUPERVISOR_ID\"}"
fi

If the result says “OK”, you should be able to run update-resin-supervisor (without any flags), and get an updated supervisor. :white_check_mark:

We can also do this for you, if you’d like, just enable susport access for the device you’d like us to check, and our support agents will take care of this! :bowing_man:

In general we are working on making OS/Supervisor updates more robust as well, and would recommend not editing things in the host OS without thorough underestanding what is going on…

What do you think?

Ah, rereading the previous agent replies, I see that your device was fixed up, but hope this will give a bit more explanation…