balenaCloud emulated build breaks application

Yesterday, my Telegraf service suddenly crashed after pushing a new build. After further inspection, I found out that the issue was caused by an emulated build. I’ve set up a small repository to reliably reproduce the issue. For convenience, I’ve posted the README of this repo below.

Problem statement

balenaCloud offers two types of builders for ARM devices:

  1. Native ARM builders
  2. x86 builders using QEMU emulation

This repository shows that there are applications for which these two builders do not produce the same output. In fact, emulated building produces a broken container. The sample application provided is extremely simple and consists of a single Dockerfile containing the following line:

FROM telegraf

Prerequisites

  1. A balenaCloud account.
  2. Installation of balena-cli.
  3. A balenaCloud application named balenacloud-emulation with an ARM device type (such as a Raspberry Pi) and at least one device.

Steps to reproduce

Run the following commands to download this repository and reproduce the issue:

git clone https://github.com/pascal-hwky/balenacloud-emulation.git
cd balenacloud-emulation
balena push balenacloud-emulation --emulated

After the deployment is completed, you should see the following device output:

standard_init_linux.go:211: exec user process caused "exec format error"

This error message indicates that the Docker image contains invalid instructions and was therefore compiled for a different architecture. Repeating the build without emulation (by leaving out the --emulated flag) does not cause this issue.

The balena-cli documentation mentions the following:

The emulated builds will also happen on the rare occasion that the native ARM builder is overloaded or unavailable.

However, as in the above case the emulated build fails, this could cause applications to randomly crash.

@pdboef, thank you for reporting this issue and preparing the repo for reproduction.

The issue is probably that telegraf is a “multi-architecture” Dockerhub image (where a single image name:tag refers to multiple different images of different architectures, such as as ARM and Intel x86), which is not yet fully supported by the balenaCloud builders or the balena CLI. We are working towards adding multiarch image support. Meanwhile, the workaround you may use is to append a sha256 digest to the FROM line of your Dockerfile, thus “manually selecting” the base image architecture. To do so, check the different sha256 digest for each available architecture on the Dockerhub page:

Dockerhub arch Device Dockerfile FROM line
arm64/v8 RPi 4 FROM telegraf:1.15.3@sha256:fd61aa216d0bd94fb868075c3ead2a933949d738a74a0e8c6720729a8b722e9d
arm/v7 RPi 3 FROM telegraf:1.15.3@sha256:ff07f829cae2b0c5305b09d534ab2ddb868900aaf06e32f93722919a155e0e21
amd64 Intel NUC FROM telegraf:1.15.3@sha256:c29f6cd7a329f91e631a1c24c5210d05c81541d2d4bbf39b1298a7e2866bfd1e

Choose one of the Dockerfile FROM lines from the table above, depending on your device type. The sha256 digests above were extracted from the Dockerhub page linked above, by using the dropdown box to select the target architecture.

Let us know if this workaround works for you. To track our progress on adding multiarch image support, you can subscribe to this GitHub issue: https://github.com/balena-io/balena-cli/issues/1508
I am also linking that GitHub issue and other internal resources to this forum thread, so that we can also notify you on this thread when the work on multiarch image support is done.

Hi @pdcastro, thanks for the quick response. This workaround is perfectly acceptable for us :slight_smile: