Differences between BalenaOS and a Docker on a Raspberry Pi

Hi, i’ve been trying to run some xtensa esp32 compiling tools on balena, first i’ve tested installing and running the xtensa toolchain in a pi3, it worked fine.

Then i wrote a dockerfile for balena and started getting some segmentation faults when running the toolchain on the device.
Thought it would be docker incompability but after testing the same dockerfile on a raspberry pi running docker i dont get the same error.

I understand that balena docker images are build over QEMU, but since the toolchain binary giving seg faults is downloaded I dont think that could cause the problem.

what other diferences should i consider when comparing running a dockerfile on raspberry pi and deploying using balena.

Thanks in advance for the help,
Diogo

but after testing the same dockerfile on a raspberry pi running docker i dont get the same error

@diogovaraujo, there are a few different ways of deploying applications / app containers to a balenaOS device, mostly described on this page: https://www.balena.io/docs/learn/deploy/deployment/

For example, if you use the git push or balena push <app-name> methods to deploy the application, the balenaCloud builder will build the images (run the Dockerfile), typically using native ARM cloud servers (with hardware support for the specific instruction set of the target devices, like the RPi3). QEMU is not necessarily used by the balenaCloud builder.

By contrast, other deployment methods like balena build and balena deploy --build, either with balenaCloud or openBalena, may rely on QEMU to build on the local computer (running Docker). This varies further depending on the host OS and the Docker version: Docker Desktop for Windows or macOS don’t require QEMU thanks to binfmt_misc, whereas Docker for Linux typically doesn’t have binfmt_misc enabled.

A third option that also avoids the need for QEMU is to use balena build or balena deploy --build with the --dockerHost and --dockerPort command line options to point to the IP address and port number of a local device (e.g. RPi3) running a development image of balenaOS, such that a native target processor (ARM) is used to build the image.

Given the above, to narrow the issue you’re having, some relevant questions are:

  • Which host OS / method do you use when “testing on a raspberry pi running docker” - e.g. Raspbian?
  • Which deployment method do you use in order to “run the toolchain on the device”? E.g. git push, balena push, balena build, balena deploy, etc.
  • Could you share the Dockerfile as well? And copy-and-paste of error messages?

Sharing commands lines would be helpful too.

Also: when you say “segmentation faults”, is there any chance that it is cause by mismatched architectures? For example, an image that was built on a laptop using an x86 target, which somehow gets run on an ARM device. The answers to the questions above (deployment method) should help clarify it.

Ok, that clarifies some miss understandings by my part.
Answering your questions:

The error is happening when trying to compile some example code from esp-idf, it compiles some files and in the middle it fails just saying segmentation fault (program cc1) error code 4.

Since sometimes it crashes compiling different files, i was wondering if its not a out of memory problem but free -m has allways lots of free memory.

What scratches my head is why is not happening on a raspibian install using the same Dockerfile, thought even raspbian vs balenaOS differences would be negligible at container side.

Edit: Maybe was not clear what i’m doing, i’m installing an MCU compiling tools using the dockerfile and then, after the image is deployed to the device, i access the container in the device and use the tools already installed to build a test code.

On raspbian or docker running on raspbian i can access the container and compile the example without any problems, when deploying the same dockerfile on balena, i acess the container and try to compile the example, then the seg faults start to appear.

I Just modified the dockerfile a little bit so it will starts and automatically try to compile the example.

Doing tests on a raspberry pi and at with git push balena master right now

Ive been pushing using git push balena master, for what you have said that should build on ARM machines instead of QEMU, right?

Probably. You can tell by the build output, look for a line that says something like “Building on arm03”, where “arm03” is the name of a balenaCloud native ARM server (running balenaEngine, which is a nicely tweaked version of Docker). See example below:

$ git push balena master
...
[Info]     Starting build for test-rpi, user gh_paulo_castro
...
[Info]     Building on arm03

i’m installing an MCU compiling tools using the dockerfile and then, after the image is deployed to the device, i access the container in the device and use the tools already installed to build a test code.

I see, so the image building actually succeeds. The seg faults happen when running the tools in the app container. Still (in part because I can’t think of anything else!), could it be that the app image was built for a slightly different architecture that the actual device? For example, the actual device is a Rapsberry Pi 3, but the image was built for a Raspberry Pi 4 or Raspberry Pi 1. When using git push or balena push, the target of the built image is determined by the main/default device type selected when creating the balena application. (You can see the default device type for the application by attempting to add a device on the web dashboard: it is the device type suggested by default. Or by using the balena CLI, command balena app <app-name>, e.g. balena app test-rpi.)

If that’s not the issue… What about the amount of available RAM memory? As reported by the free command. Do you get a significant difference between the Raspbian/Docker setup, and the balenaOS/balenaEngine setup?

If you’d like me to try to reproduce the seg fault on a RPi3 I have here… What is the specific command line that causes the segmentation fault?

Hi, the dockerfile linked before is now running the build as the CMD, so it is looping on segfault, its easier to test that way.

And the same Dockerfile built on an raspibian docker, only changing to raspberrypi3 on the top of the template, works perfectly. Will try forcing raspberrypi3 for the balena and remove the .template

But i just checked i chose raspberrypi3 as the device type, so think I its not that.

free -m is giving more free memory in the container, since on actual device i’m running gui and some time even the browser during build, makes me doubt the memory problem

Maybe concurrency? CPU usage get pretty high during compilation, would balenaOS or even make would handle parallel jobs different than raspibian?

Edit1:

OMG, i think setting to raspberrypi3 on the template solved the issue! Or maybe looping many times enough finished the build…
Raspberrypi3 is already the device type, would that be different in any way? really weird

Will do some more tests on a fresh flashed card and will get back to you.

PS.: Feliz de ver brasileiros trabalhando em empresas tao interessantes mundo a fora

i think setting to raspberrypi3 on the template solved the issue! Or maybe looping many times enough finished the build… Raspberrypi3 is already the device type, would that be different in any way?

I don’t think it should have made a difference given that the application type is already Raspberry Pi 3. A consequence of changing the FROM line is that cached image layers get invalidated and rebuilt - but again I don’t see why this would matter. (By the way, to force cache invalidation, you can also use git push balena master:balena-nocache – documented here.

Also, “looping many times finished the build” sounds unlikely too. :thinking: I’d think that every time the container crash-loops on the Dockerfile CMD instruction, the previous build work gets lost / discarded. Edit: it looks like the filesystem persists across container crash restarts.

Let us know if the issue is really solved, and if you can pinpoint exactly what solved it. If there is a bug somewhere, we would like to fix it!

PS.: Feliz de ver brasileiros trabalhando em empresas tao interessantes mundo a fora

Yeah, our team is distributed over many countries and timezones, which is good for customer support: balena (almost) never sleeps! :balena: :slightly_smiling_face:

Ok, tested in a new clean balena app and the set fault is still happening.

Don’t know why it compiled all the way that time, since I’ve seen it segfault on different files, Im thinking it is maintaining its data between service reboot and after a lot of tries it compiled all the files. Will do some more tests on this.

Besides that, should be some other tests that I could do? Now im kind of stuck

Edit1:
Ive let it compiling and it its definitely saving its last compile file and its going file after file through service restarts, that time it must looped so much it completed the whole program.

Hi

One thing that you could try is a newer version of the Espressif toolchain - either by compiling it yourself for Arm, or by using something that a community member has provided. I found one here - http://files.deepsoft.com/Other/xtensa-esp32-elf-armv7l/xtensa-esp32-elf-armv7l-6c4433a-01092018.tar.bz2 - but I have to warn you against downloading and running toolchains from “unverified” sources!

So in your Dockerfile, everything would remain the same except the step for installing the xtensa toolchain.

Hi @diogovaraujo I also reproduced this on the rpi3 running balenaOS 2.51.1 and it indeed fails, from what I have seen in various forums it might be related to the rpi firmware. On your working rpi3 running raspbian and its working can you run vcgencmd version? the version we are running on balenaOS is daily old from Feb 20th, 2020:

oot@7775fb6:~/moddable/examples/helloworld# vcgencmd version
Feb 20 2020 16:44:25 
Copyright (c) 2012 Broadcom
version 1614a1ded604d11f395044be137350df84e3d8ee (clean) (release) (start_cd)

What I am gonna try is switch out the firmware on the boot partition (I think this is possible) and see if we can get it working on balenaOS with latest firmware.

Just an update on this, I updated the firmware to what is included in the raspbian 32bit image but it still seems to segfault unfortunately :frowning:

Ive tried with the new toolchain and it also crashed. I can try to build one, but I don’t think is this kind of error, since it compiles all the files little by little if I keep retrying, this is a memory management or parallelization issue I guess.

Will check the firmware on my raspbian back home. Will take some deep time on the weekend to test this.

Hi, I did a test on a 4.19.93 kernel and it built ok. So it looks like this problem will be fixed when updating the kernel to at least 4.19.80. We don’t have a fixed date for such an update at this point. We’ll discuss internally for a date and get back to you.

1 Like

Hi @diogovaraujo

I just wanted to let you know that this matter is being addressed in the following pull request: https://github.com/balena-os/balena-raspberrypi/pull/496

You can keep an eye on its progress, and our engineers will also update the status there as they make progress.

Kind regards
Alida

Hello @diogovaraujo
While the PR my colleague mentioned above, Update to kernel 4.19.118 by floion · Pull Request #496 · balena-os/balena-raspberrypi · GitHub was merged to upgrade the kernel to 4.19.118 on 27th June 2020.
We are reaching back that to inform you that the issue linked to this thread have been resolved and we are closing the thread. Thanks!