BalenaOS >=v2.89.0 incompatibility with Openbalena?

It appears that this commit to balenaos (meta-balena-common) is not compatible with openbalena, because it requires the supervisor_release endpoint on open-balena-api, however that endpoint is not included with openbalena, I’m assuming its part of the balena-cloud closed API. Could this URL endpoint either be made public (i.e. not requiring a token) such that open-balena devices can use it, or alternatively (and better) allow either ID or repository:tag references?

As it stands now, it’s not possible to make a yocto build of balenaos for an openbalena endpoint because of the bbfatal that is thrown in the do_install method of balena-supervisor. I also noticed other changes to the device scripts, i.e. “update-balena-supervisor” which will prevent supervisor updates on the devices themselves - so I think this causes some pretty significant issues for openbalena users. But I’m not able to test the second part yet because I can’t complete a new OS build due to the first issue.

Hello,

thanks for reporting this issue, we are having look into it.

Best Regards
Harald

Hi David, a couple of questions to help us understand the problem.

At build time, the supervisor_release endpoint in balena-cloud does not require authentication for public device types, so the defaults should build fine. Could you set a set -x on the api_fetch_supervisor_image() function to see how the API is being called?

About the runtime supervisor updates, how are you currently doing supervisor updates? Are you manually runing update-balena-supervisor with the repository:tag as arguments?

The motivation for this changes is that we have stopped publishing supervisor releases to dockerhub.

@alexgg, see below for the logs from api_fetch_supervisor_image. The issue is that we have our own custom hardware / device type, and if I try to override MACHINE to raspberrypi4-64 (our device is CM4-based) in a bbappend for the balena-supervisor recipe, it completes the install step of balena-supervisor but fails at the package step (also see below). If we can get through this I’ll help troubleshoot the on-device supervisor upgrade process as well.

Output of set -x on api_fetch_supervisor_image (without MACHINE override)

DEBUG: Executing python function extend_recipe_sysroot
NOTE: Direct dependencies are ['virtual:native:/yocto/host/build/../layers/poky/meta/recipes-devtools/pseudo/pseudo_git.bb:do_populate_sysroot', 'virtual:native:/yocto/host/build/../layers/meta-openembedded/meta->
NOTE: Installed into sysroot: []
NOTE: Skipping as already exists in sysroot: ['pseudo-native', 'jq-native', 'patch-native', 'glibc', 'gcc-cross-aarch64', 'gcc-runtime', 'curl-native', 'systemd-systemctl-native', 'quilt-native', 'autoconf-native>
DEBUG: Python function extend_recipe_sysroot finished
DEBUG: Executing shell function do_install
+ _version=v12.11.36
+ jq --raw-output .slug /yocto/host/build/../harmoni-h1-pilot.json
+ _slug=harmoni-h1-pilot
+ _api_env=balena-cloud.com
+ _token=
+ [ -z  ]
+ [ -f ~/.balena/token ]
+ true
+ jq -r .d[].image_name
+ curl -X GET --silent -k https://api.balena-cloud.com/v6/supervisor_release?$select=image_name&$filter=(is_for__device_type/slug%20eq%20%27harmoni-h1-pilot%27)%20and%20(supervisor_version%20eq%20%27v12.11.36%27)>
ERROR: Could not retrieve supervisor image for version v12.11.36
WARNING: exit code 1 from a shell command.

Package fail with MACHINE override to raspberrypi4-64:

DEBUG: Executing python function sstate_task_prefunc
DEBUG: Python function sstate_task_prefunc finished
DEBUG: Executing python function extend_recipe_sysroot
NOTE: Direct dependencies are ['/yocto/host/build/../layers/poky/meta/recipes-core/glibc/glibc_2.34.bb:do_populate_sysroot', 'virtual:native:/yocto/host/build/../layers/poky/meta/recipes-devtools/rpm/rpm_4.16.1.3>
NOTE: Installed into sysroot: ['rpm-native', 'dwarfsrcfiles-native', 'popt-native', 'db-native', 'python3-native', 'bzip2-native', 'elfutils-native', 'file-native', 'libgcrypt-native', 'sqlite3-native', 'libffi-n>
NOTE: Skipping as already exists in sysroot: ['glibc', 'patch-native', 'curl-native', 'gcc-cross-aarch64', 'gcc-runtime', 'pseudo-native', 'jq-native', 'quilt-native', 'systemd-systemctl-native', 'autoconf-native>
DEBUG: sed -e 's:^[^/]*/:/yocto/host/build/tmp/work/cortexa72-poky-linux/balena-supervisor/1.0-r0/recipe-sysroot-native/:g' /yocto/host/build/tmp/sysroots-components/x86_64/rpm-native/fixmepath /yocto/host/build/>
DEBUG: Python function extend_recipe_sysroot finished
DEBUG: Executing python function do_package
DEBUG: Executing python function package_convert_pr_autoinc
DEBUG: Python function package_convert_pr_autoinc finished
DEBUG: Executing python function package_prepare_pkgdata
ERROR: Manifest /yocto/host/build/tmp/sstate-control/manifest-x86_64_x86_64-nativesdk-glibc.packagedata not found in raspberrypi4_64 cortexa72 armv8a-crc-crypto armv8a-crypto armv8a-crc armv8a aarch64 allarch x86>
DEBUG: Python function package_prepare_pkgdata finished
DEBUG: Python function do_package finished

Hi David, I see. There is a difference between the Yocto MACHINE, that in your case has to remain raspberrypi4-64, and the cloud environment slug which is how the machine is known in the cloud. This is set in the raspberrypi4-64.coffee file in the device repository root directory. You have probably modified the slug there to harmoni-h1-pilot, but that needs to be raspberrypi4-64` too as it is used when interrogating balena-cloud for the supervisor version.

@alexgg, we have a lot of modifications throughout our yocto layer that require MACHINE to be set as harmoni-h1-pilot and not raspberrypi4-64. When you say that MACHINE has to remain raspberrypi4-64, do you actually mean the slug in the .coffee file needs to be changed? And won’t this be a problem for anyone who has a custom device type that isn’t registered with the balena cloud server? I’m wondering if there should be a separate variable in the .coffee file just for this purpose, effectively a “equivalent machine type” variable for the purposes of fetching the supervisor API and any updates. Otherwise we are tied to the public device types.

Hi David,

When you say that MACHINE has to remain raspberrypi4-64,

You’re right, that wasn’t clear at all. I meant that MACHINE has to match your hardware. If you have added a new machine and made the needed Yocto recipe modifications for this to be harmoni-h1-pilot, then that’s fine. I just don’t know the extend of the customization you have made so I was assuming you were using the existing raspberrypi4-64.

you actually mean the slug in the .coffee file needs to be changed?

Yes, that’s what I mean. The slug needs to match one of the device types supported in balena-cloud.

won’t this be a problem for anyone who has a custom device type that isn’t registered with the balena cloud server

Given that the supervisor image is served from balena-cloud, there is currently no other way to resolve it than to provide a valid device type and version.
If you are building a custom device type and modifying the recipes anyway, why not just modify the line in https://github.com/balena-os/meta-balena/blob/dbaeeb75e0dc41831964745443e14ceb32990c26/meta-balena-common/recipes-containers/balena-supervisor/balena-supervisor.bb#L63 to something like:

_slug=${BALENACLOUD_SLUG:-$(jq --raw-output '.slug' "${TOPDIR}/../${MACHINE}.json")}

And you can then declare BALENACLOUD_SLUG in your local.conf.

In the meantime we will discuss internally how to best cater for the needs of custom device types.

@alexgg just circling back on this one as I had a thought that might help resolve the issue. Aren’t the balena-supervisor images really arch (i.e. aarch64) specific, and not MACHINE (i.e. raspberrypi4-64) specific? And if so, can you have balena-cloud also accept arch (instead of device type) for the supervisor_release endpoint when serving up a supervisor image? This would provide support for users with different device types based on the same arch as common device types (which I suspect make up most custom device types).

Hi David, you’re right, that’s how we finished up addressing this issue in balena-supervisor: Use architecture instead of device type to query API · balena-os/meta-balena@419e1ee · GitHub. Sorry for not updating you earlier, we have had some busy weeks on the support front.

@alexgg - wow - that’s great! Thank you!

@alexgg - so the yocto build worked great with the new changes! But unfortunately…and kind of as expected…the update-balena-supervisor script failed on the device after the HUP process. Supervisor updates used to work well back when the update-balena-supervisor script accepted the -t flag, however now that it has been removed, and the open-balena-api server that is read from config.json in the update-balena-supervisor script does not have the supervisor_release endpoint, the script fails:

57e0106/host:~$ /usr/bin/update-balena-supervisor
Getting image name and version...
parse error: Invalid numeric literal at line 1, column 9
No supervisor configuration found from API.
Using preloaded values.
ERROR: No preloaded version found.

Similar to the changes to the bitbake recipe, could you add flags to the update-balena-supervisor script to accept both version and arch, which get passed to API to retrieve the supervisor image info to then be downloaded? i.e. it would look something like: update-balena-supervisor -t v12.11.38 -a aarch64 - which should give it enough info to get the right supervisor image and install it.

UPDATE For anyone else who might run into this issue, you can temporarily work around it by adding the following code to your HUP wrapper script: (assumes you have a file called SUPERVISOR_VERSION in /mnt/boot which was placed there by your new host OS).

## Update supervisor

source /etc/balena-supervisor/supervisor.conf
echo "Current supervisor version: ${SUPERVISOR_VERSION}"

SUPERVISOR_TARGET=${SUPERVISOR_VERSION}
if [ -f /mnt/boot/SUPERVISOR_TARGET ]; then
    SUPERVISOR_TARGET=$(cat /mnt/boot/SUPERVISOR_TARGET)
    SUPERVISOR_TARGET_IMAGE=$(curl -X GET --silent -k \
    "https://api.balena-cloud.com/v6/supervisor_release?\$top=1&\$select=image_name&\$filter=(supervisor_version%20eq%20%27${SUPERVISOR_TARGET}%27)%20and%20(is_for__device_type/any(ifdt:ifdt/is_of__cpu_architecture/any(ioca:ioca/slug%20eq%20%27$(arch)%27)))" \
    -H "Content-Type: application/json" | jq -r '.d[].image_name')
    SUPERVISOR_TARGET_VERSION=$(cut -d "/" -f 3 <<< "$SUPERVISOR_TARGET_IMAGE")   
fi

echo "Target supervisor version: ${SUPERVISOR_TARGET_VERSION}"

if [[ "${SUPERVISOR_VERSION}" == "${SUPERVISOR_TARGET_VERSION}" ]]; then
    echo "Target supervisor already installed!"
else
    echo "Newer supervisor available, updating..."
    /usr/bin/update-balena-supervisor -i "${SUPERVISOR_TARGET_IMAGE}"
    echo "Supervisor update complete."
fi

Hi, there is a command line argument to update the supervisor in this cases (see update-balena-supervisor: Support passing command line image argument · balena-os/meta-balena@227fea7 · GitHub).
Could you try:

update-balena-supervisor -i bh.cr/balena_os/<arch>-supervisor/<version>

And let me know if it works?

For example, in the case of the RaspberryPi4-64, it would be:

update-balena-supervisor -i bh.cr/balena_os/aarch64-supervisor/12.11.43

Changing to the desired version

Note that I edited the above message to:

update-balena-supervisor -i bh.cr/balena_os/aarch64-supervisor/12.11.43

As sometimes edits are reverted by the support system.

@alexgg I will give this a shot. One question before I do however is that etc/balena-supervisor/supervisor.conf no longer stores the version number, only the image hash, so how can I compare the target to the current version to determine if I need to do the upgrade?

Hi David, as it stands the version is not provided by the supervisor_release endpoint so for the time being you can usebalena images which contains tags to resolve images to version number. In the near future we will deprecate the supervisor_release endpoint in favor of using the v3 target state so we will be able to store the version again in supervisor.conf.

@alexgg thanks again, I’ve got what I need for now - everything is working with these changes and I’ll keep an eye out for the updates you mentioned.

Maybe starting a new thread is better, but I have a question about the update-balena-supervisor script.

As @drcnyc already said, openBalena doesn’t support the supervisor_release endpoint. Because of this, the whole update-balena-supervisor script fails. It’ll always try to contact the API of (in this case) the openBalena instance and it’ll always fail. Even if you specified the target image or target version using the --supervisor-image option.

Is there a way to bypass this? The script of @drcnyc helps, but you still have to manually disable the CURL part of the update-balena-supervisor script.

Thanks in advance, and I’d be happy to change this to a separate topic.

Hello @bversluijs

checking the latest update-balena-supervisor script here: meta-balena/update-balena-supervisor at master · balena-os/meta-balena · GitHub
I think you could invoke the script with a prefixed API_ENDPOINT="" update-balena-supervisor which unsets the API_ENDPOINT for this call and skips the API call. Then you need to specify the image with the --supervisor-image parameter.

Best Regards
Harald