Hi everyone! Since making the GitLab runner on balena devices for continuous integration testing project, I was trying to find a way to put it to some good use. Brainstorming from issues I’ve seen while on support, I thought it would be useful to do a “smoke test” of some of our common base images, to try catching issues that slip through.
Our team makes a lot of base images, as you can tell it from our base image documentation and this earlier blogpost:
In that numbers are both internal and external complexities, such as this sudden change in Debian Jessie, or system and balena-provided Python mixup breaking one of our projects. We are fixing these and improving for the future, but it would be good to catch them earlier.
CI
To do just that, I’ve set up this project:
which has GitLab CI tests, and set up a small test cluster of a Raspberry Pi 1 and 3 devices to provide testing infrastructure (maybe more in the future).
Testing
The test definitions in .gitlab-ci.yml
go through a bunch of images at this moment 30 different ones across Debian/Alpine/Fedora/Ubuntu and Python/Node.js/Go combinations, multiplied by the device types.
The tests themselves are somewhat limited at the moment but should be already helpful:
- Check if the OS is what it claims to be
- Check if package install works
- Check if language versions is the same as it claims to be
- Check if a “hello world” project is successfully installed/compiled/run for each language
(You can see the tests in the relevant .sh
files in the repository, starting from test.sh
)
All of these are set up with such CI tags, that they are sent to actual physical hardware, managed on balenaCloud:
The test run lineup looking something like this:
The tests are also set up to run on a schedule, 4AM London time each day currently:
Here’s an example run, which for example highlighted that: a) some Node 12 images are not yet generated, b) some images are not yet fixed a Python executable symlinking issue that happened recently. I expect both of these tests succeed in the next few days.
This is all part of the learning and improvements.
Challenges and Future
Would love to add more tests, and should choose meaningful ones (meaningful device types, OS versions, cover all the language stacks we have (so add OpenJDK and .Net). The Gitlab interface seems to start to struggle by ~100 tests or so, maybe it will need some different ideas, how can one tests handle multiple images, I guess. Also should add more thorough test cases, if you have any ideas, please leave a suggestion!
For the AMD64 tests I’m using GitLab’s own runners, which makes those tests very quick, but planning to replace them with a Intel NUC runner set up alongside the Pis, to have complete internal coverage.
One challenge I encountered, that parsing our base image names is not very straightforward, even if the explanation is relatively simple. This resulted in some fun scripting to handle. Also there are a few nuances of what hardware/architecture has which OS and language versions available for which device type (for example Raspberry Pi 1 not having Ubuntu and Fedora images). Dogfooding our own tools is very useful to see where usability issues are, and there are definitely a lot of improvements we can do that warrants further discussion.
What do you think? Any feedback is appreciated!