We are trying to get an AMD Strix Halo iGPU to run on BalenaOS. Specifically an AMD Ryzen™ AI Max+ 395. However, under a fresh BalenaOS installation with version 6.6.13+rev1, we get the following firmware/driver errors:
Summary
dmesg | grep -i amd
[ 0.003875] ACPI: SSDT 0x0000000070376000 00848D (v02 AMD AmdTable 00000002 MSFT 04000000)
[ 0.003885] ACPI: VFCT 0x0000000070361000 004484 (v01 ALASKA A M I 00000001 AMD 33504F47)
[ 0.003888] ACPI: SSDT 0x0000000070355000 00A8CE (v02 AMD AMD CPU 00000001 AMD 00000001)
[ 0.003890] ACPI: SSDT 0x0000000070353000 000CA9 (v02 AMD CPMDFIG4 00000001 INTL 20230331)
[ 0.003892] ACPI: SSDT 0x0000000070350000 002AA6 (v02 AMD CDFAAIG2 00000001 INTL 20230331)
[ 0.003893] ACPI: SSDT 0x000000007034F000 000678 (v02 AMD OEMACP 00000001 INTL 20230331)
[ 0.003894] ACPI: SSDT 0x0000000070345000 009A9F (v02 AMD CPMCMN 00000001 INTL 20230331)
[ 0.003899] ACPI: SSDT 0x0000000070341000 000051 (v02 AMD DRTM 00000001 INTL 20230331)
[ 0.003901] ACPI: IVRS 0x0000000070340000 0001F6 (v02 AMD AmdTable 00000001 AMD 00000001)
[ 0.003902] ACPI: SSDT 0x000000007033F000 0009D0 (v02 AMD CPMMSOSC 00000001 INTL 20230331)
[ 0.003903] ACPI: SSDT 0x000000007033E000 000F5C (v02 AMD CPMACPV8 00000001 INTL 20230331)
[ 0.003904] ACPI: SSDT 0x000000007033D000 000500 (v02 AMD MEMTOOL0 00000002 INTL 20230331)
[ 0.003906] ACPI: SSDT 0x000000007033C000 000968 (v02 AMD THERMAL0 00000001 INTL 20230331)
[ 0.003907] ACPI: SSDT 0x000000007033A000 0010BB (v02 AMD GPP_PME_ 00000001 INTL 20230331)
[ 0.003908] ACPI: SSDT 0x0000000070330000 0097A5 (v02 AMD INTGPPC_ 00000001 INTL 20230331)
[ 0.003909] ACPI: SSDT 0x000000007032B000 0046FB (v02 AMD INTGPPA_ 00000001 INTL 20230331)
[ 0.003910] ACPI: SSDT 0x000000007032A000 000C14 (v02 AMD CPMGPIO0 00000001 INTL 20230331)
[ 0.003912] ACPI: SSDT 0x0000000070329000 00008D (v02 AMD CPMMSLPI 00000001 INTL 20230331)
[ 0.003913] ACPI: SSDT 0x0000000070328000 000D18 (v02 AMD WwanSsdt 00000001 INTL 20230331)
[ 0.003914] ACPI: SSDT 0x0000000070327000 000B07 (v02 AMD SDCR 00000001 INTL 20230331)
[ 0.003915] ACPI: SSDT 0x0000000070326000 000ABC (v02 AMD LOM 00000001 INTL 20230331)
[ 0.003917] ACPI: SSDT 0x0000000070325000 000C69 (v02 AMD WLAN 00000001 INTL 20230331)
[ 0.003918] ACPI: SSDT 0x0000000070324000 000CE4 (v02 AMD NVME 00000001 INTL 20230331)
[ 0.003919] ACPI: SSDT 0x0000000070323000 000CE4 (v02 AMD NVME 00000001 INTL 20230331)
[ 0.003920] ACPI: SSDT 0x0000000070321000 0013BA (v02 AMD GpMsSsdt 00000001 INTL 20230331)
[ 0.003922] ACPI: SSDT 0x0000000070320000 00005E (v02 AMD GP10 00000001 INTL 20230331)
[ 0.003923] ACPI: SSDT 0x000000007031E000 0018B3 (v02 AMD UPEP 00000001 INTL 20230331)
[ 0.083485] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID00, rdevid:160
[ 0.083486] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID01, rdevid:160
[ 0.083488] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID02, rdevid:160
[ 0.083488] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID03, rdevid:160
[ 0.083489] AMD-Vi: ivrs, add hid:MSFT0201, uid:1, rdevid:96
[ 0.083490] AMD-Vi: ivrs, add hid:AMDI0020, uid:ID04, rdevid:160
[ 0.083490] AMD-Vi: Using global IVHD EFR:0x246577efa2254afa, EFR2:0x10
[ 0.216505] smpboot: CPU0: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S (family: 0x1a, model: 0x70, stepping: 0x0)
[ 0.216612] Performance Events: Fam17h+ 16-deep LBR, core perfctr, AMD PMU driver.
[ 0.613779] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.615892] AMD-Vi: Extended features (0x246577efa2254afa, 0x10): PPR NX GT [5] IA GA PC GA_vAPIC
[ 0.615895] AMD-Vi: Interrupt remapping enabled
[ 0.616021] AMD-Vi: Virtual APIC enabled
[ 0.617841] perf: AMD IBS detected (0x00081bff)
[ 0.618029] amd_uncore: 8 amd_df counters detected
[ 0.618034] amd_uncore: 6 amd_l3 counters detected
[ 0.618040] amd_uncore: 0 amd_umc_0 counters detected
[ 0.618043] amd_uncore: 0 amd_umc_1 counters detected
[ 0.618046] amd_uncore: 0 amd_umc_2 counters detected
[ 0.618049] amd_uncore: 0 amd_umc_3 counters detected
[ 0.618052] amd_uncore: 0 amd_umc_4 counters detected
[ 0.618055] amd_uncore: 0 amd_umc_5 counters detected
[ 0.618059] amd_uncore: 0 amd_umc_6 counters detected
[ 0.618061] amd_uncore: 0 amd_umc_7 counters detected
[ 0.618064] amd_uncore: 0 amd_umc_8 counters detected
[ 0.618067] amd_uncore: 0 amd_umc_9 counters detected
[ 0.618070] amd_uncore: 0 amd_umc_10 counters detected
[ 0.618073] amd_uncore: 0 amd_umc_11 counters detected
[ 0.618076] amd_uncore: 0 amd_umc_12 counters detected
[ 0.618079] amd_uncore: 0 amd_umc_13 counters detected
[ 0.618082] amd_uncore: 0 amd_umc_14 counters detected
[ 0.618084] amd_uncore: 0 amd_umc_15 counters detected
[ 0.618290] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 2.293075] event_source amd_umc_0: hash matches
[ 4.479321] kvm_amd: TSC scaling supported
[ 4.479323] kvm_amd: Nested Virtualization enabled
[ 4.479324] kvm_amd: Nested Paging enabled
[ 4.479325] kvm_amd: LBR virtualization supported
[ 4.479335] kvm_amd: Virtual VMLOAD VMSAVE supported
[ 4.479336] kvm_amd: Virtual GIF supported
[ 4.479336] kvm_amd: Virtual NMI enabled
[ 5.139859] amd_atl: AMD Address Translation Library initialized
[ 5.140052] [drm] amdgpu kernel modesetting enabled.
[ 5.141580] amdgpu: Virtual CRAT table created for CPU
[ 5.141592] amdgpu: Topology: Add CPU node
[ 5.141717] amdgpu 0000:c5:00.0: enabling device (0006 -> 0007)
[ 5.144771] amdgpu 0000:c5:00.0: amdgpu: Fetched VBIOS from VFCT
[ 5.144772] amdgpu: ATOM BIOS: 113-STRXLGEN-001
[ 5.154212] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/psp_14_0_1_toc.bin failed with error -2
[ 5.154214] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <psp> failed -19
[ 5.154425] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/dcn_3_5_1_dmcub.bin failed with error -2
[ 5.154426] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <dm> failed -19
[ 5.154629] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/gc_11_5_1_pfp.bin failed with error -2
[ 5.154630] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <gfx_v11_0> failed -19
[ 5.154727] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/sdma_6_1_1.bin failed with error -2
[ 5.154728] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <sdma_v6_0> failed -19
[ 5.154911] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/vcn_4_0_6.bin failed with error -2
[ 5.154912] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <vcn_v4_0_5> failed -19
[ 5.155008] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/gc_11_5_1_mes_2.bin failed with error -2
[ 5.155009] amdgpu 0000:c5:00.0: amdgpu: try to fall back to gc_11_5_1_mes.bin
[ 5.155017] amdgpu 0000:c5:00.0: Direct firmware load for amdgpu/gc_11_5_1_mes.bin failed with error -2
[ 5.155017] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <mes_v11_0> failed -19
[ 5.155103] amdgpu 0000:c5:00.0: amdgpu: VPE: collaborate mode true
[ 5.155103] amdgpu 0000:c5:00.0: amdgpu: Fatal error during GPU init
[ 5.155105] amdgpu 0000:c5:00.0: amdgpu: amdgpu: finishing device.
Could this be related to an outdated kernel since the CPU model is fairly new and BalenaOS typically doesn’t seem to support the newest kernels? Could building a custom out-of-tree AMD driver module be an option here?
Note: We are using Secure Boot and Full Disk Encryption.