Automated balenalib base image smoke testing with GitLab runners

Hi everyone! Since making the GitLab runner on balena devices for continuous integration testing project, I was trying to find a way to put it to some good use. Brainstorming from issues I’ve seen while on support, I thought it would be useful to do a “smoke test” of some of our common base images, to try catching issues that slip through.

Our team makes a lot of base images, as you can tell it from our base image documentation and this earlier blogpost:

In that numbers are both internal and external complexities, such as this sudden change in Debian Jessie, or system and balena-provided Python mixup breaking one of our projects. We are fixing these and improving for the future, but it would be good to catch them earlier.


To do just that, I’ve set up this project:

which has GitLab CI tests, and set up a small test cluster of a Raspberry Pi 1 and 3 devices to provide testing infrastructure (maybe more in the future).


The test definitions in .gitlab-ci.yml go through a bunch of images at this moment 30 different ones across Debian/Alpine/Fedora/Ubuntu and Python/Node.js/Go combinations, multiplied by the device types.

The tests themselves are somewhat limited at the moment but should be already helpful:

  • Check if the OS is what it claims to be
  • Check if package install works
  • Check if language versions is the same as it claims to be
  • Check if a “hello world” project is successfully installed/compiled/run for each language

(You can see the tests in the relevant .sh files in the repository, starting from

All of these are set up with such CI tags, that they are sent to actual physical hardware, managed on balenaCloud:

The test run lineup looking something like this:

The tests are also set up to run on a schedule, 4AM London time each day currently:

Here’s an example run, which for example highlighted that: a) some Node 12 images are not yet generated, b) some images are not yet fixed a Python executable symlinking issue that happened recently. I expect both of these tests succeed in the next few days.

This is all part of the learning and improvements.

Challenges and Future

Would love to add more tests, and should choose meaningful ones (meaningful device types, OS versions, cover all the language stacks we have (so add OpenJDK and .Net). The Gitlab interface seems to start to struggle by ~100 tests or so, maybe it will need some different ideas, how can one tests handle multiple images, I guess. Also should add more thorough test cases, if you have any ideas, please leave a suggestion!

For the AMD64 tests I’m using GitLab’s own runners, which makes those tests very quick, but planning to replace them with a Intel NUC runner set up alongside the Pis, to have complete internal coverage.

One challenge I encountered, that parsing our base image names is not very straightforward, even if the explanation is relatively simple. This resulted in some fun scripting to handle. Also there are a few nuances of what hardware/architecture has which OS and language versions available for which device type (for example Raspberry Pi 1 not having Ubuntu and Fedora images). Dogfooding our own tools is very useful to see where usability issues are, and there are definitely a lot of improvements we can do that warrants further discussion.

What do you think? Any feedback is appreciated!

1 Like

Nice work Gergely! :nerd_face:

@imrehg This looks great. It’s something I’ve been struggling with myself recently!

I’m excited to learn from this and try it out. :+1:

Over the weekend adde openJDK and .NET base images as well, and expanded “a bit” from ~90 base images being tested to ~460. Found some interesting issues that we are looking into. Also added a few more devices to the test rig, as the test rounds started to take upwards of 8 hours to run. Let’s see if we can make that a bit snappier. Also fixed a few issues that this heavy use uncovered in the GitLab runners, thus now there’s better garbage collection for example.

Let’s see what else to tackle next… :slight_smile:

1 Like

Now we’ve added 32-bit x86 builders (some Intel Edison boards that we had around), so we are now covering all the main architectures, and with this we are at more than 500 different base images being tested each day automatically.

This 32-bit builder setup needed some code changed / forks / rebuild / Docker Hub push for the balena-gitlab-builder project, more details in the other forum post.

For reference, this is our current setup below. Next, we should really house them much nicer, this is just not fitting for this project, but it was a proof of concept that grew, so there’s that… :stuck_out_tongue:

1 Like

@imrehg I finally had a chance to try out this project tonight – so cool! :smile:

I made a quick fork to test out a few of my Swift base images. I’m excited to see where this goes and I’m hoping to explore more this weekend.