Adding AMD v1000 BSP Support to balenaOS

Hi all,

I’m looking at adding support for the AMD v1000 processor to balenaOS, but I’m running into a few odd roadblocks.

As it is an x86-64 processor, I’m using balena-intel as a starting point. I’ve moved balena-meta-genericx86 to balena-meta-amd, removed the surface-pro6 target, and added a v1000.coffee target.

I’ve added meta-amd (git link) as a submodule in layers and added the layer to meta-balena-amd/conf/samples/bblayers.conf.sample:

indent preformatted text by 4 spaces
BBLAYERS ?= " \
    ${TOPDIR}/../layers/poky/meta \
    ${TOPDIR}/../layers/poky/meta-poky \
    ${TOPDIR}/../layers/poky/meta-yocto-bsp \
    ${TOPDIR}/../layers/meta-openembedded/meta-oe \
    ${TOPDIR}/../layers/meta-openembedded/meta-filesystems \
    ${TOPDIR}/../layers/meta-openembedded/meta-networking \
    ${TOPDIR}/../layers/meta-openembedded/meta-python \
    ${TOPDIR}/../layers/meta-amd/meta-amd-bsp \
    ${TOPDIR}/../layers/meta-balena/meta-balena-common \
    ${TOPDIR}/../layers/meta-balena/meta-balena-warrior \
    ${TOPDIR}/../layers/meta-balena-amd \
    ${TOPDIR}/../layers/meta-rust \
    "

barys recognizes v1000 as a valid configuration and begins the build process. The initial failure is due to x11 not being built. The quick way around this was to modify layers/meta-balena/meta-balena-common/conf/distro/include/balena-os.inc to no longer remove x11. I imagine there is a better way but this is sufficient for now:

diff --git a/meta-balena-common/conf/distro/include/balena-os.inc b/meta-balena-        
common/conf/distro/include/balena-os.inc
index e1a36879..fa1ec811 100644
--- a/meta-balena-common/conf/distro/include/balena-os.inc
+++ b/meta-balena-common/conf/distro/include/balena-os.inc
@@ -30,7 +30,7 @@ MAINTAINER = "Balena <hello@balena.io>"
 # Strip down unneeded features
 POKY_DEFAULT_DISTRO_FEATURES_remove = "ptest"
 POKY_DEFAULT_DISTRO_FEATURES_remove = "wayland"
-DISTRO_FEATURES_DEFAULT_remove = "x11"
+# DISTRO_FEATURES_DEFAULT_remove = "x11"

 # Development image mode
 DEVELOPMENT_IMAGE ?= "0"

However, the next issue has been stumping me for a bit. balena-healthcheck is installed but not shipped with any packages. However, I haven’t modified anything with the original packages. If I remove the meta-amd layer and build with the genericx86-64 target it builds successfully. I’m guessing it is something simple I’m missing but I’m not sure what:

ERROR: balena-18.09.6-dev+git95c7371304f9cef494efe93f0a8ffd53a75eac21-r0 do_package: QA Issue: balena: Files/directories were installed but not shipped in any package:
  /usr/lib
  /usr/lib/balena
  /usr/lib/balena/balena-healthcheck
Please set FILES such that these items are packaged. Alternatively if they are unneeded, avoid installing them or delete them within do_install.
balena: 3 installed and not shipped files. [installed-vs-shipped]
ERROR: balena-18.09.6-dev+git95c7371304f9cef494efe93f0a8ffd53a75eac21-r0 do_package: Fatal QA errors found, failing task.
ERROR: balena-18.09.6-dev+git95c7371304f9cef494efe93f0a8ffd53a75eac21-r0 do_package:
ERROR: balena-18.09.6-dev+git95c7371304f9cef494efe93f0a8ffd53a75eac21-r0 do_package: Function failed: do_package
ERROR: Logfile of failure stored in: /code/build/tmp/work/dbfp5-poky-linux/balena/18.09.6-dev+git95c7371304f9cef494efe93f0a8ffd53a75eac21-r0/temp/log.do_package.1455
ERROR: Task (/code/build/../layers/meta-balena/meta-balena-common/recipes-containers/balena/balena_git.bb:do_package) failed with exit code '1'

As far as I can tell, balena-healthcheck is never added to FILES_${PN} so I’m not sure why the genericx86-64 build doesn’t run into the same error.

Any help would be greatly appreciated!

Hello, there are a couple other repos about x64 boards over at Github you may take a look and see if it helps: RPi4 (https://github.com/balena-os/balena-raspberrypi) and QEMU x64 (https://github.com/balena-os/balena-qemu). That said, I’ll ask the OS to chime in, but I believe they’ll only be able to get back to you from next week on.

I’ve been looking at balena-raspberrypi to see if that helps but no luck so far. I haven’t looked at balena-qemu so I’ll take a look into that and see where it gets me.

Thanks!

Hey @kasemq
One more clue (hope it will help) could be our balena-intel repo:

generic-x86 is something you can build an image that can be run on a VirtualBox, for instance.

Hey @roman-mazur! We are currently using that as our starting point.

From what other people have said (I can re-find the forum posts if you’d like me to), the genericx86-64 image should work on AMD processors. However, there are specific board features we need in a BSP layer that requires the meta-amd BSP layer.

What we are currently trying to do is add meta-amd to the balena-intel repository but have run into the above problems. Once that is working we are going to add the board BSP on top of meta-amd.

The AMD BSP and board BSP build successfully in yocto-2.7.1 (warrior-21.0.1) so we believe the build itself should work in balenaOS if we can get the configuration right.

Sorry @kasemq
I didn’t read the original post carefully enough :slight_smile:

I haven’t built the OS for a while, so cannot suggest anything specific quickly. But we pinged the guys who should be able to help soon.

Can you share the full barys command and its log?

I’m surprised to see that you faced the x11 error. And this health-check one is strange as well.
It seems like you aren’t bitbaking resin-image but something else…

Also, can you push your work to a repo on github to take a look. That’ll make our life much easier.

@zubairlk I’ve moved our work to a github fork of balena-intel renamed balena-amd here on a branch named meta-amd-warrior (which is the linked branch).

I rebuilt it over the weekend and ran into both the healthcheck and x11 errors. There is a commit to fix the x11 error in the branch.

The barys command being run is:

balena-yocto-scripts/build/barys --build-name build -m v1000

I have tars of the bitbake logs mentioned in the error but they aren’t a supported extension for the site.

The log from the initial build was missing as I didn’t provide the --log option. I re-ran the above command with --log without cleaning and it produced the following, which seems pretty sparse:

================barys HEADER START====================
Mon Sep  9 17:01:35 UTC 2019
Script called from: /code/balena-amd
Script called as: /code/balena-amd/balena-yocto-scripts/build/barys --build-name build -m v1000 --log
Selected machines:  v1000
Selected bitbake arguments:
Build directory name: build
Remove build directory? no
Compressed image? no
Shared downloads directory:
Shared sstate directory:
Development image? no
Forced supervisor image tag:
Inherit rm_work?
Enable build history? no
================barys HEADER STOP=====================
[000000001][LOG]BalenaOS build initialized in directory: build.
[000000001][LOG]Run build for v1000: MACHINE=v1000 bitbake resin-image-flasher
[000000001][LOG]This might take a while ...
[000000027][LOG]Build for v1000 failed. Check failed log in build/tmp/log/cooker/v1000 .
[000000027][LOG]If build for v1000 succeeded, final image should have been generated here:
[000000027][LOG]   build/tmp/deploy/images/v1000/resin-image-flasher-v1000.resinos-img
[000000027][LOG]Done.
====================barys STOP========================

I am re-running the build after cleaning with --log to see if that provides a more useful barys log.

[EDIT] It came out with pretty much the exact same log.

@zubairlk / all, a bit of an update on the overall progress:

I’ve run into three different issues with this route, all caused by and semi-solved by the same thing.

Issue 1: balena-healthcheck

See above for details.

Issue 2: networkmanager

The cleanup for do_install() of networkmanager_1.18.0 failed because it couldn’t find usr/lib64/NetworkManager/conf.d.

| rmdir: failed to remove '/code/balena-amd/build/tmp/work/dbfp5-poky-linux/networkmanager/1.18.0-r0/image/usr/lib64/NetworkManager/conf.d': No such file or directory
| WARNING: /code/balena-amd/build/tmp/work/dbfp5-poky-linux/networkmanager/1.18.0-r0/temp/run.do_install.1856:1 exit 1 from 'rmdir /code/balena-amd/build/tmp/work/dbfp5-poky-linux/networkmanager/1.18.0-r0/image/usr/lib64/NetworkManager/conf.d'
| ERROR: Function failed: do_install (log file is located at /code/balena-amd/build/tmp/work/dbfp5-poky-linux/networkmanager/1.18.0-r0/temp/log.do_install.1856)
ERROR: Task (/code/balena-amd/build/../layers/meta-balena/meta-balena-common/recipes-connectivity/networkmanager/networkmanager_1.18.0.bb:do_install) failed with exit code '1'

Issue 3: plymouth

The do_install step for plymouth was trying to remove something in usr/lib64/. Sadly I didn’t save the exact logs for this.

All of these issues appear to be caused by v1000 setting BASE_LIB_tune-dbfp5 to lib64 rather than lib. It looks like genericx86-64 sets BASE_LIB to lib rather than lib64 (though I am somewhat confused exactly how as it looks like it should be lib64 based on my understanding of the tune files).

The following patch resolves all of these issues, though I don’t believe it is the correct final solution, just a stopgap that proves the issue and allows builds to progress in the meantime:

diff --git a/meta-amd-bsp/conf/machine/include/tune-v1000.inc b/meta-amd-bsp/conf/machine/include/tune-v1000.inc
index 1c481cd3..d062fb5b 100644
--- a/meta-amd-bsp/conf/machine/include/tune-v1000.inc
+++ b/meta-amd-bsp/conf/machine/include/tune-v1000.inc
@@ -10,6 +10,7 @@ TUNE_CCARGS .= "${@bb.utils.contains("TUNE_FEATURES", "dbfp5", " - march=znver1",
 # Extra tune selections
 AVAILTUNES += "dbfp5"
 TUNE_FEATURES_tune-dbfp5 = "m64 dbfp5"
-BASE_LIB_tune-dbfp5 = "lib64"
+//BASE_LIB_tune-dbfp5 = "lib64"
+BASE_LIB_tune-dbfp5 = "lib"
 TUNE_PKGARCH_tune-dbfp5 = "dbfp5"
 PACKAGE_EXTRA_ARCHS_tune-dbfp5 = "${TUNE_PKGARCH_tune-dbfp5}"

The BASE_LIB is used by poky to set libdir, which causes a few issues when it is lib64 rather than lib.

I wasn’t able to track down the exact issue for plymouth or balena-healthcheck (though I have suspicions/assumptions). However, I was able to track down the issue with networkmanager.

The file that causes the issue is layers/meta-balena/meta-balena-common/recipes-connectivity/networkmanager/networkmanager_%.bbappend, lines 54 & 55. Specifically it uses libdir rather than nonarch_libdir. Changing lines 54 and 55 from:

rmdir ${D}${libdir}/NetworkManager/conf.d
rmdir ${D}${libdir}/NetworkManager/VPN

to

rmdir ${D}${nonarch_libdir}/NetworkManager/conf.d
rmdir ${D}${nonarch_libdir}/NetworkManager/VPN
rmdir ${D}${nonarch_libdir}/NetworkManager
rmdir ${D}${nonarch_libdir}

allows the networkmanager do_install to successfully complete. I believe this is the correct (or close to the correct fix) based on a commit 0b8be3760 for layers/meta-balena/meta-balena-common/recipes-connectivity/networkmanager/networkmanager_1.18.0.bb. I’ve included the relative snippet here:

tree 5d476e7825c73344e9eb3851c166eacbca24f8c0
parent 21a7866000f444850b7a73272a7b0d33f7376bec
author Zubair Lutfullah Kakakhel <zubair@balena.io> Wed Dec 12 17:02:00 2018 +0000
committer Zubair Lutfullah Kakakhel <zubair@balena.io> Thu Dec 13 10:43:43 2018 +0000
encoding

networkmanager: Bump recipe to v1.14.4

Update recipe from upstream meta-openembedded
http://cgit.openembedded.org/meta-openembedded/commit/meta-networking/recipes-connectivity/networkmanager?id=331b717b862e3599b99942acb64c1d6b03806042

Difference in size ~ +400K

Change-type: minor
Changelog-entry: Bump network manager from v1.12.2 to v1.14.4
Signed-off-by: Zubair Lutfullah Kakakhel <zubair@balena.io>

diff --git a/meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.12.2.bb b/meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.14.4.bb
similarity index 74%
rename from meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.12.2.bb
rename to meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.14.4.bb
index af221d62..870a5281 100644
--- a/meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.12.2.bb
+++ b/meta-resin-common/recipes-connectivity/networkmanager/networkmanager_1.14.4.bb
@@ -97,24 +104,24 @@ FILES_${PN}-adsl = "${libdir}/NetworkManager/libnm-device-plugin-adsl.so"

 FILES_${PN} += " \
     ${libexecdir} \
-    ${libdir}/pppd/*/nm-pppd-plugin.so \
     ${libdir}/NetworkManager/${PV}/*.so \
-    ${libdir}/NetworkManager/VPN \
-    ${libdir}/NetworkManager/conf.d \
+    ${nonarch_libdir}/NetworkManager/VPN \
+    ${nonarch_libdir}/NetworkManager/conf.d \
     ${datadir}/polkit-1 \
     ${datadir}/dbus-1 \
-    ${base_libdir}/udev/* \
+    ${noarch_base_libdir}/udev/* \
     ${systemd_unitdir}/system \
 "

 RRECOMMENDS_${PN} += "iptables \
-    ${@bb.utils.contains('PACKAGECONFIG', 'dnsmasq', 'dnsmasq', '', d)} \
+    ${@bb.utils.filter('PACKAGECONFIG', 'dnsmasq', d)} \
 "
 RCONFLICTS_${PN} = "connman"

 FILES_${PN}-dev += " \
     ${datadir}/NetworkManager/gdb-cmd \
     ${libdir}/pppd/*/*.la \
+    ${libdir}/NetworkManager/*.la \
     ${libdir}/NetworkManager/${PV}/*.la \
 "

Based on this change (using nonarch_libdir), I am assuming that networkmanager_%.bbappend should also be using that. The modification to nonarch_libdir I specified above does allow the build to continue successfully.

Note: As these changes are in the meta-amd and meta-balena submodules, they aren’t pushed in any branches of balena-amd. The branch that was used for this within balena-amd was meta-amd-warrior-from-scratch. meta-amd-warrior runs into an additional issue with a kernel patch not applying correctly and I wanted to remove that from the possible causes for these issues.

Just for completeness of anyone following along, changing BASE_LIB_tune-dbfp5 to lib allows the build to complete until 99% when the following error gets hit:

ERROR: resin-image-1.0-r0 do_rootfs: The following packages could not be configured offline and rootfs is read-only: ['libglib-2.0-0', 'udev-hwdb']
ERROR: resin-image-1.0-r0 do_rootfs:
ERROR: resin-image-1.0-r0 do_rootfs: Function failed: do_rootfs
ERROR: Logfile of failure stored in: /code/balena-amd/build/tmp/work/v1000-poky-linux/resin-image/1.0-r0/temp/log.do_rootfs.12053
ERROR: Task (/code/balena-amd/build/../layers/meta-balena/meta-balena-common/recipes-core/images/resin-image.bb:do_rootfs) failed with exit code '1'
ERROR: resin-image-flasher-1.0-r0 do_rootfs: The following packages could not be configured offline and rootfs is read-only: ['libglib-2.0-0', 'udev-hwdb']
ERROR: resin-image-flasher-1.0-r0 do_rootfs:
ERROR: resin-image-flasher-1.0-r0 do_rootfs: Function failed: do_rootfs
ERROR: Logfile of failure stored in: /code/balena-amd/build/tmp/work/v1000-poky-linux/resin-image-flasher/1.0-r0/temp/log.do_rootfs.12057
ERROR: Task (/code/balena-amd/build/../layers/meta-balena/meta-balena-common/recipes-core/images/resin-image-flasher.bb:do_rootfs) failed with exit code '1'

Which if you look into the log files gives a bit more info:

NOTE: Installing complementary packages ...
NOTE: Running ['oe-pkgdata-util', '-p', '/code/balena-amd/build/tmp/pkgdata/v1000', 'glob', '/tmp/installed-pkgsh5xcnptp', '']
NOTE: Running intercept scripts:
NOTE: If an image is being built, the postinstalls for the following packages will be postponed for first boot: udev-hwdb
NOTE: > Executing update_gio_module_cache intercept ...
NOTE: Exit code 1. Output:
+ '[' False = False -a qemuwrapper-cross '!=' nativesdk-qemuwrapper-cross ']'
+ echo 'qemuwrapper: qemu usermode is not supported'
qemuwrapper: qemu usermode is not supported
+ exit 1

NOTE: The postinstall intercept hook 'update_gio_module_cache' could not be executed due to missing qemu usermode support, details in /code/balena-amd/build/tmp/work/v1000-poky-linux/resin-image/1.0-r0/temp/log.do_rootfs
NOTE: If an image is being built, the postinstalls for the following packages will be postponed for first boot: libglib-2.0-0

This is caused by qemuwrapper not being enabled as that prevents the various systemd_241.bb files from running their pkg_postinst_udev-hwdb lines in emulation and not being able to on first boot because resin-image has read-only-rootfs set.

AMD disables qemu-usermode in layers/meta-amd-bsp/conf/machine/include/amd-common-configurations.inc:

# QEMU does not support some of the enhanced instructions available
MACHINE_FEATURES_remove = "qemu-usermode"

Adding back in qemu-usermode causes a number of other build issues that we’re currently looking at.

That’s great progress.

Thanks for keeping us updated!

I’m afraid I haven’t had time to look in a more hands-on way.

I’m quite surprised there are userspace differences needed for amd. I was thinking that it should only require a bootloader/kernel from the bsp layers.

Any further updates from your end?

No, no further updates from our side yet.

What do you mean by userspace differences? The x11 and networkmanager stuff? Or the qemu stuff?

I mean that usually a board support involves only fiddling with kernel/grub/bootloaders. And rarely requires tweaking things like x11 and networkmanager.

I’m still not quite sure what is adding x11 to the OS. balenaOS doesn’t include x11. And whatever in meta-amd is adding x11 needs to be removed.

Ah, yeah I agree it is a bit odd.

For reference, here is the dependency chain that meta-amd adds that requires x11. Specifically libdrm:

ERROR: Nothing PROVIDES 'libxrender' (but /code/balena-amd/build/../layers/meta-amd/meta-amd-bsp/recipes-graphics/drm/libdrm_git.bb DEPENDS on or otherwise requires it)
libxrender was skipped: missing required distro feature 'x11' (not in DISTRO_FEATURES)
NOTE: Runtime target 'plymouth' is unbuildable, removing...
Missing or unbuildable dependency chain was: ['plymouth', 'libdrm', 'libxrender']
ERROR: Required build target 'resin-image-flasher' has no buildable providers.
Missing or unbuildable dependency chain was: ['resin-image-flasher', 'resin-image', 'plymouth', 'libdrm', 'libxrender']

From what you are saying, I take it the recommendation is to find a way to remove libdrm?

The network-manager and related issues seem to be because some of the bitbake files assume that BASE_LIB is lib and break if it is lib64

I gather that libdrm is a “library for accessing the DRM, direct rendering manager […] libdrm is a low-level library, typically used by graphics drivers such as the Mesa DRI drivers, the X drivers, libva and similar projects”. Reference: Libdrm-2.4.118
I assume that libdrm would not be required in the BSP layer, and therefore could/should be removed, yes.